[
https://issues.apache.org/jira/browse/VFS-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gary D. Gregory resolved VFS-862.
---------------------------------
Resolution: Fixed
> CacheStrategy.ON_RESOLVE triggers refresh on internal VFS navigation, causing
> redundant FTP/SFTP operations
> -----------------------------------------------------------------------------------------------------------
>
> Key: VFS-862
> URL: https://issues.apache.org/jira/browse/VFS-862
> Project: Commons VFS
> Issue Type: Bug
> Reporter: Ilan Goldfeld
> Priority: Major
> Fix For: 2.11.0
>
>
> h2. Description
> h3. Steps to reproduce
> {code:java}
> DefaultFileSystemManager manager = new DefaultFileSystemManager();
> manager.addProvider("ftp", new FtpFileProvider());
> manager.setCacheStrategy(CacheStrategy.ON_RESOLVE);
> manager.init();
> {code}
> *Example 1 – pre-existing, before PR #758:*
> Checking existence of two sibling files should require 1 LIST on the parent,
> but the second file's internal {{getParent()}} re-resolves the parent via
> {{{}resolveFile(){}}}, triggering an ON_RESOLVE refresh that forces a
> redundant LIST.
> {code:java}
> FileObject file1 = manager.resolveFile("ftp://user:pass@host/dir/file1.txt",
> options);
> FileObject file2 = manager.resolveFile("ftp://user:pass@host/dir/file2.txt",
> options);
> file1.exists(); // LIST on parent to determine type
> file2.exists(); // redundant LIST — getParent() refreshes parent via
> ON_RESOLVE
> // Expected: 1 LIST
> // Actual: 2 LIST commands
> {code}
> *Example 2 – after PR #758, amplified to O(N):*
> Scanning a directory with N files now produces ~N LIST commands instead of 1.
> {code:java}
> FileObject dir = manager.resolveFile("ftp://user:pass@host/directory",
> options);
> dir.refresh();
> FileObject[] children = dir.findFiles(new FileDepthSelector(1, 1));
> // Expected: 1 LIST
> // Actual: ~N LIST commands — each child's getParent() refreshes the parent,
> // and the unconditional childMap = null in refresh() clears the
> // cached listing every time
> {code}
> h3. The issue
> {{CacheStrategy.ON_RESOLVE}} is designed to refresh a file when the user
> resolves it via the public API. However, internal VFS navigation methods also
> call {{{}fileSystem.resolveFile(){}}}, which triggers the same ON_RESOLVE
> refresh. This means that internal plumbing – not user actions – invalidates
> cached state on files the user never asked to refresh.
> Affected internal call sites:
> * {{AbstractFileObject.getParent()}} – resolves the parent directory
> * {{AbstractFileObject.getChildren()}} – resolves each child via
> {{resolveFiles()}}
> * {{AbstractFileSystem.getRoot()}} – resolves the root file object (called
> by {{getParent()}} for the root check)
> * {{FtpFileObject.getLinkDestination()}} – resolves symlink targets
> * {{SftpFileObject.doListChildrenResolved()}} – resolves children during
> SFTP listing
> For example, resolving two sibling files and checking their existence should
> require a single LIST on the parent directory. Instead, the second file's
> internal {{getParent()}} call re-resolves the parent via
> {{{}resolveFile(){}}}, triggering an ON_RESOLVE refresh that clears the
> parent's cached directory listing, forcing a redundant LIST.
> h3. History
> This was always a wasteful pattern, but it became a severe performance
> regression after PR #758 ("Fix refresh() not clearing cached state when file
> was never attached").
> PR #758 addressed a real correctness issue: provider-specific cached fields
> like {{FtpFileObject.childMap}} could be populated without going through
> {{attach()}} (e.g. via {{getChildFile()}} -> {{{}doGetChildren(){}}}). When
> {{refresh()}} was called on such a file, {{detach()}} skipped {{doDetach()}}
> because {{{}attached == false{}}}, silently preserving stale data. The fix
> added {{childMap = null}} to {{FtpFileObject.refresh()}} to ensure cached
> state is always cleared on refresh.
> However, this turned the unnecessary internal refreshes into an O(N)
> amplification: when scanning a directory with N files via
> {{{}findFiles(){}}}, each child's {{getType()}} call triggers
> {{{}getParent(){}}}, which re-resolves the parent via {{{}resolveFile(){}}}.
> With ON_RESOLVE, this refreshes the parent, and the new unconditional
> {{childMap = null}} clears the parent's cached directory listing for every
> child – forcing a new FTP LIST command each time. A directory with 50 files
> produces ~50 LIST commands instead of 1.
> Before PR #758, the parent's {{childMap}} survived these unnecessary
> refreshes because {{refresh()}} only cleared {{childMap}} via
> {{{}doDetach(){}}}, which was guarded by {{{}if (attached){}}}. After the
> first child's refresh set the parent to {{{}attached=false{}}}, subsequent
> refreshes left {{childMap}} intact. PR #758 removed this accidental
> protection.
> Even without PR #758, the pattern was wasteful. For example, resolving two
> sibling files and checking their existence would produce 2 LIST commands
> instead of 1, because the second file's internal {{getParent()}} call
> refreshes the parent (clearing its state via {{doDetach()}} when the parent
> is attached).
> h3. Fix
> The fix has two parts:
> *1. Internal navigation skips ON_RESOLVE.* A new {{resolveFileInternal()}}
> method resolves files from the cache without applying the ON_RESOLVE refresh
> policy. All internal navigation call sites use this instead of the public
> {{{}resolveFile(){}}}:
> * {{AbstractFileObject.getParent()}} – parent resolution
> * {{AbstractFileObject.resolveFile(FileName)}} – child resolution in
> {{{}resolveFiles(){}}}/{{{}getChild(){}}}
> * {{AbstractFileSystem.getRoot()}} – root file object lookup
> * {{FtpFileObject.getLinkDestination()}} – symlink target resolution
> * {{SftpFileObject.doListChildrenResolved()}} – SFTP child resolution
> A protected bridge method in {{AbstractFileObject}} makes
> {{resolveFileInternal()}} available to provider subclasses.
> *2. Fresh directory listings propagate metadata to cached children in-place.*
> When a directory listing is performed after a refresh, FTP and SFTP providers
> update cached child objects directly with the fresh metadata from the
> listing, preserving object identity:
> * {*}FTP{*}: {{FtpFileObject.getChildren()}} detects a fresh LIST (because
> {{childMap}} was null after refresh). After the LIST populates
> {{{}childMap{}}}, it iterates the returned children and sets each child's
> {{ftpFile}} to the matching entry from the fresh {{{}childMap{}}}, then
> clears the cached {{type}} via {{injectType(null)}} so it re-evaluates from
> the fresh metadata.
> * {*}SFTP{*}: {{SftpFileObject.doListChildrenResolved()}} already sets fresh
> {{attrs}} on each child via {{{}setStat(){}}}. Now it also clears the cached
> {{type}} via {{injectType(null)}} so it re-evaluates from the updated
> {{{}attrs{}}}.
> This approach was chosen over cache eviction (removing and recreating child
> objects) because eviction breaks object identity – existing VFS tests assert
> that resolving the same path returns the same FileObject instance. In-place
> propagation updates the same objects with fresh data.
> Other providers like SMB and local files don't need this – their
> {{doGetType()}} resolves directly from the server/filesystem on every call,
> so after {{detach()}} clears the cached {{{}type{}}}, children naturally get
> fresh data.
> Together, these changes establish a clear contract: *cached state is used
> until the user explicitly calls {{refresh()}} or resolves a file via the
> public API.* Internal navigation never triggers server operations. When a
> directory is refreshed, both the directory listing and its children are fresh.
> h3. Jackrabbit 1.x test fix
> The {{resolveFileInternal}} change in {{getParent()}} exposed a
> pre-existing bug in the Jackrabbit 1.x embedded test server
> ({{WebdavProviderTest.testFileCreate}} failed with "expected size 0, got
> 112").
> *Root cause:* Jackrabbit 1.x bundles Jetty 6.x, which does not drain
> unconsumed request bodies before sending error responses on persistent HTTP
> connections. When a PROPFIND with a 112-byte XML request body
> receives a 404, and the connection is reused for a subsequent PUT to the
> same URL, the server reads the stale PROPFIND bytes as the PUT body — storing
> the XML as file content instead of an empty file.
> Previously this was masked: {{getParent()}} used {{resolveFile()}} which
> triggered an ON_RESOLVE refresh on the parent directory, sending an extra
> PROPFIND request that happened to flush the stale bytes from
> the connection. With {{resolveFileInternal}}, the PUT follows the 404
> immediately on the same connection, exposing the server bug.
> This is a known class of issue in Jetty:
> * [jetty#651|https://github.com/jetty/jetty.project/issues/651]
> * [jetty#4117|https://github.com/jetty/jetty.project/issues/4117]
> * [jetty#6168|https://github.com/jetty/jetty.project/issues/6168]
> Jetty 9.4+ drains or closes connections with unconsumed content; Jetty 6.x
> does not.
> *Fix:* Add {{Connection: close}} to the Jackrabbit 1.x test {{HttpClient}}
> configuration, disabling persistent connection reuse for the embedded test
> server. The Jackrabbit 2 tests (newer Jetty + HttpClient
> 4.x) pass without any changes. No production VFS code is modified.
> Note: Jackrabbit 1.x tests are already skipped on Java 14+
> ({{<jdk>[14,)</jdk>}} in pom.xml) since Jackrabbit 1.6.5 (2011) cannot run on
> modern JVMs.
>
> h3. Tests
> * {{{}FtpGetChildrenListCommandTest{}}}: verifies that {{findFiles()}} on an
> FTP directory with 50 files issues exactly 1 LIST command with
> {{{}CacheStrategy.ON_RESOLVE{}}}.
> * {{{}SftpGetChildrenListCommandTest{}}}: verifies that refreshing an SFTP
> directory and calling {{findFiles()}} returns fresh children reflecting
> filesystem changes, proving in-place metadata propagation works for SFTP.
> h3. Compatibility note
> This changes the observable behavior of {{CacheStrategy.ON_RESOLVE}} for
> files reached through internal navigation. The following call sites no longer
> trigger ON_RESOLVE refresh:
> * {{AbstractFileObject.getParent()}} – resolving the parent directory
> * {{AbstractFileObject.getChildren()}} / {{getChild()}} – resolving children
> * {{AbstractFileSystem.getRoot()}} – resolving the root file object
> * {{FtpFileObject.getLinkDestination()}} – resolving symlink targets
> * {{SftpFileObject.doListChildrenResolved()}} – resolving children during
> SFTP listing
> Previously, these methods called {{{}fileSystem.resolveFile(){}}}, which
> triggered an ON_RESOLVE refresh on each returned file. After this fix,
> internally navigated files are served from the cache and only refreshed when
> the user explicitly calls {{refresh()}} or re-resolves them via the public
> API.
> When a directory is refreshed and its children are listed, the FTP and SFTP
> providers propagate fresh metadata from the listing to cached child objects
> in-place (updating {{{}ftpFile{}}}/{{{}attrs{}}} and clearing the cached
> {{{}type{}}}). Object identity is preserved – the same FileObject instances
> are returned with updated data.
> This behavior is consistent with {{CacheStrategy.MANUAL}} and aligns with the
> intended ON_RESOLVE contract: refresh on explicit user resolution, not on
> internal navigation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)