[ 
https://issues.apache.org/jira/browse/COMPRESS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18092162#comment-18092162
 ] 

Piotr Karwasz commented on COMPRESS-727:
----------------------------------------

Hi [~tomasilluminati],

Thank you for the patch. Could you submit it as a PR?

See [https://github.com/apache/commons-compress/blob/master/CONTRIBUTING.md] 
for the details.

> Provide a symlink-resistant (and concurrency-safe) archive Extractor
> --------------------------------------------------------------------
>
>                 Key: COMPRESS-727
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-727
>             Project: Commons Compress
>          Issue Type: Improvement
>            Reporter: Piotr Karwasz
>            Priority: Major
>         Attachments: COMPRESS-727.patch
>
>
> h2. Summary
> {{ArchiveEntry.resolveIn(Path)}} protects against classic Zip-slip by 
> resolving the entry name against the target directory, normalizing it, and 
> rejecting the result if it no longer {{startsWith}} the target. This check is 
> purely *lexical*: it manipulates path strings only. It does not look at the 
> file system, so it does not defend against *symlink* attacks, and the example 
> {{Expander}} that drives extraction does not create symlinks at all (so it is 
> both lossy for legitimate archives and silent about the risk).
> This issue proposes a first-class, security-first {{Extractor}} that safely 
> materializes a whole archive into a directory, including symlinks, and 
> resists both symlink-slip from the archive itself and concurrent symlink 
> races from third parties. The API is centered on 
> {{Extractor.newExtractor(Path)}}, which returns a {{SecureExtractor}} when 
> the platform supports {{java.nio.file.SecureDirectoryStream}}, or a 
> best-effort {{Extractor}} otherwise.
> h2. Background: what resolveIn does and does not cover
> {code:java}
> default Path resolveIn(final Path parentPath) throws IOException {
>     final Path outputFile = parentPath.resolve(getName()).normalize();
>     if (!outputFile.startsWith(parentPath)) {
>         throw new ArchiveException("Zip slip ...");
>     }
>     return outputFile;
> }
> {code}
> This stops {{../../etc/passwd}} style names. It does *not* stop:
> * A symlink *entry inside the archive* (for example {{link -> /etc}} or 
> {{link -> ../..}}) followed by a later entry written *through* that link 
> ({{link/evil}}). The lexical check on {{target/link/evil}} passes because the 
> string starts with {{target}}, but {{link}} is a symlink and the write 
> escapes. This is "symlink-slip", and it defeats the lexical guard even when 
> the target directory is fully trusted.
> * A symlink (or directory-to-symlink swap) planted *concurrently* by another 
> process between the {{resolveIn}} check and the actual 
> {{Files.newOutputStream}} call. The check and the write are not atomic with 
> respect to the intermediate path components, so this is a classic TOCTOU race.
> The metadata needed to handle links is already exposed:
> * {{TarArchiveEntry}}: {{isSymbolicLink()}}, {{isLink()}} (hard link), 
> {{getLinkName()}}, plus {{isBlockDevice()}} / {{isCharacterDevice()}} / 
> {{isFIFO()}} for special types.
> * {{ZipArchiveEntry}}: {{isUnixSymlink()}}, with 
> {{ZipFile.getUnixSymlink(entry)}} returning the link target.
> There is currently no use of {{SecureDirectoryStream}}, 
> {{LinkOption.NOFOLLOW_LINKS}}, or {{Files.createSymbolicLink}} anywhere in 
> main code, so this is greenfield.
> h2. Threat model
> # *Zip-slip* (path traversal via {{..}} or absolute names). Already handled 
> by {{resolveIn}}.
> # *Symlink-slip from the archive* (trusted target directory). The archive 
> plants a symlink and then writes through it, or supplies a symlink whose 
> target points outside the extraction root. Must be handled by an extractor 
> that controls symlink creation and never follows a link when materializing a 
> child path.
> # *Concurrent symlink race* (untrusted / shared target directory). A third 
> party with write access to the target tree swaps a path component for a 
> symlink during extraction. Defeating this requires file-descriptor-relative, 
> one-component-at-a-time resolution with no-follow semantics, that is, 
> {{SecureDirectoryStream}}.
> h2. Real-life examples of untrusted / shared target directories
> These are settings where threat 3 is realistic because another user or 
> process can write into the extraction tree while extraction runs:
> * *Shared CI / build agents*: multiple jobs extract into a common workspace 
> or temp directory on the same host; one job races another.
> * *Multi-user package-manager caches*: extracting into group- or 
> world-writable shared caches (Maven {{~/.m2}}, Gradle, pip, npm) on a shared 
> build box.
> * *Web upload processing*: an upload is extracted into a directory that other 
> worker processes or requests (possibly other tenants) can also write.
> * *Container image / layer extraction*: a layer's symlink redirects a later 
> layer's write. Container runtimes specifically defend this (for example 
> Docker/containerd use chroot-style joins, and Go's {{filepath-securejoin}} 
> exists for exactly this).
> * *Background daemons*: antivirus, indexers, or backup agents racing the 
> extractor over the same tree.
> Prior art that hardened against the same classes of bug:
> * The original Zip-slip disclosure (2018) prompted the lexical fix that 
> {{resolveIn}} implements.
> * Python's {{tarfile}} extraction filters (PEP 706, shipped in 3.12, 
> hardening the long-standing CVE-2007-4559) added {{data}} / {{tar}} / 
> {{fully_trusted}} filters that, among other things, reject symlinks and links 
> escaping the destination.
> * GNU tar performs "secure" extraction by removing an existing symlink before 
> writing through its name and by deferring symlink creation.
> * Container runtimes rely on {{filepath-securejoin}} (Go) to resolve each 
> component safely against symlink races.
> h2. Proposed API
> A supported {{Extractor}} type (promoted out of {{archivers.examples}} into, 
> for example, {{org.apache.commons.compress.archivers}}), with a 
> capability-detecting factory:
> {code:java}
> // Picks the strongest implementation the platform/filesystem allows.
> public static Extractor newExtractor(Path targetDirectory) throws IOException;
> {code}
> * On platforms where {{Files.newDirectoryStream(targetDirectory) instanceof 
> SecureDirectoryStream}} (Unix-like systems, see 
> {{sun.nio.fs.UnixSecureDirectoryStream}}), {{newExtractor}} returns a 
> *{{SecureExtractor}}* that is resistant to both symlink-slip and concurrent 
> symlink races.
> * Otherwise (notably Windows, or filesystems whose provider does not return a 
> secure stream) it returns a base {{Extractor}} that defends symlink-slip from 
> the archive on a best-effort basis but documents that it cannot fully close 
> the TOCTOU race.
> The {{Extractor}} mirrors the entry points of today's {{Expander}} 
> ({{ArchiveInputStream}}, {{ZipFile}}, {{TarFile}}, {{SevenZFile}}) and adds 
> configuration:
> {code:java}
> Extractor extractor = Extractor.newExtractor(targetDir)
>         .setSymlinkPolicy(SymlinkPolicy.ALLOW_WITHIN_ROOT) // default REJECT
>         .setSpecialFilePolicy(SpecialFilePolicy.SKIP)      // devices, FIFOs
>         .setOverwrite(false);
> extractor.extract(archiveInputStream);
> {code}
> h3. SecureExtractor design
> Walk each entry's parent path *component by component* starting from an open 
> directory handle on the (validated) extraction root:
> * For each intermediate component, open the subdirectory via 
> {{SecureDirectoryStream.newDirectoryStream(name, 
> LinkOption.NOFOLLOW_LINKS)}}. If a component is a symlink, refuse (it cannot 
> be a legitimate parent under no-follow).
> * Create files relative to the innermost directory handle with no-follow / 
> create-new semantics ({{newByteChannel}} with {{NOFOLLOW_LINKS}} + 
> {{CREATE_NEW}}), so a concurrently planted symlink cannot redirect the write.
> * Create directories and symlinks relative to the directory handle, never via 
> an absolute resolved path.
> Because every step is relative to a file descriptor rather than a re-resolved 
> string path, a third party cannot win the race by swapping a parent component.
> h3. Symlink policy
> ||Policy||Behavior||
> |{{REJECT}} (default)|Fail if the archive contains any symlink entry.|
> |{{SKIP}}|Silently ignore symlink entries.|
> |{{ALLOW_WITHIN_ROOT}}|Create a symlink only if its target resolves to a 
> location inside the extraction root; reject links escaping the root. Never 
> follow the new link when writing later entries.|
> |{{ALLOW_ALL}}|Create symlinks verbatim (only for fully trusted archives).|
> h3. Entry-type handling
> ||Entry type||Default action||
> |Regular file|Extract (no-follow, create-new unless overwrite enabled).|
> |Directory|Create (no-follow).|
> |Symbolic link|Per {{SymlinkPolicy}} (default {{REJECT}}).|
> |Hard link (tar {{isLink()}})|Resolve link target within root, no-follow; 
> reject if it escapes.|
> |Device / FIFO / other special|Per {{SpecialFilePolicy}} (default {{SKIP}}; 
> never created by the secure path).|
> h2. Platform support and degradation
> * {{SecureDirectoryStream}} is available on Unix-like platforms via the 
> default file system provider; {{newExtractor}} detects it with an 
> {{instanceof}} check on a directory stream opened over the target.
> * On platforms without it, the base {{Extractor}} still validates each parent 
> component with {{LinkOption.NOFOLLOW_LINKS}} (via {{Files.readAttributes}}) 
> and refuses to traverse existing symlinks, which closes symlink-slip from the 
> archive but remains non-atomic against a concurrent attacker. This residual 
> limitation must be documented clearly.
> h2. Acceptance criteria
> * New {{Extractor}} (supported, not under {{examples}}) with 
> {{newExtractor(Path)}} returning {{SecureExtractor}} when 
> {{SecureDirectoryStream}} is available, else a best-effort {{Extractor}}.
> * Configurable {{SymlinkPolicy}} (default {{REJECT}}) and special-file 
> handling; safe hard-link resolution.
> * {{SecureExtractor}} performs descriptor-relative, component-by-component, 
> no-follow resolution and create-new writes.
> * Entry points covering {{ArchiveInputStream}}, {{ZipFile}}, {{TarFile}}, and 
> {{SevenZFile}}, matching the existing {{Expander}} surface.
> * Tests: symlink-slip archive (link then write-through) is blocked under a 
> trusted target; an escaping symlink target is rejected; a concurrent 
> directory-to-symlink swap is defeated by {{SecureExtractor}} (and documented 
> as not fully defeated by the base extractor); legitimate within-root symlinks 
> extract correctly when allowed.
> * Javadoc plus a "safe extraction" section in the user guide, 
> cross-referencing {{ArchiveEntry.resolveIn}} and the platform caveats.
> * Consider deprecating / redirecting {{Expander}} toward {{Extractor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to