[ 
https://issues.apache.org/jira/browse/COMPRESS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18092597#comment-18092597
 ] 

Marcono1234 commented on COMPRESS-727:
--------------------------------------

Implementing {{ALLOW_WITHIN_ROOT}} in a way which fully rejects any symlinks 
which escape the root might be pretty difficult or impossible. Not sure how 
other libraries are doing it, but it seems some of them just simply do not 
reject this at all, they just disallow writing when the path is or includes a 
symlink (similar to how it is proposed in the description above already).

The main problem is that {{'..'}} in a path seems to be resolved relative to 
any previous symlink components, and not on the plain path (as {{normalize()}} 
would do). So you could have two symlinks where one points to the other. In 
isolation neither of them escapes the root, but when combined (e.g. when after 
extraction reading from one of them), they escape the root. The symlinks 
entries in the archive might even be ordered in a way that the escaping entry 
is created first, pointing to the not yet existing second entry (which would 
pass validation of most extractors I assume), and afterwards the second symlink 
is created.

For example in the context of the above mentioned PR 
https://github.com/apache/commons-compress/pull/776, the following can escape 
the root:
{code:java}
private static String repeatPath(String element, int n) {
    return element + ("/" + element).repeat(n);
}

private static void createZip(Path zipPath) throws Exception {
    try (ZipArchiveOutputStream zipStream = new 
ZipArchiveOutputStream(zipPath)) {
        // Repeat `..` very often to eventually reach the root
        int nesting = 40;
        String nestedName = repeatPath("a", nesting);

        // First create a symlink whose target seems to still be within the 
ZIP: `a/a/a/../../../etc/passwd`
        // The Zip Slip check allows this
        var entry = new ZipArchiveEntry("malicious-entry");
        entry.setUnixMode(UnixStat.LINK_FLAG);
        zipStream.putArchiveEntry(entry);
        zipStream.write((nestedName + "/..".repeat(nesting) + 
"/etc/passwd").getBytes(StandardCharsets.UTF_8));
        zipStream.closeArchiveEntry();

        // Now actually create the ZIP entry `a/a/a` which is itself a symlink 
as well: `a/a/a` -> `../..`
        // The Zip Slip check allows this because *this* symlink on its own 
still points within the ZIP
        entry = new ZipArchiveEntry(nestedName);
        entry.setUnixMode(UnixStat.LINK_FLAG);
        zipStream.putArchiveEntry(entry);
        zipStream.write(repeatPath("..", nesting - 
1).getBytes(StandardCharsets.UTF_8));
        zipStream.closeArchiveEntry();

        // ! However, the combination of both symlinks now actually performs 
Zip Slip:
        // `malicious-entry` -> `(a/a/a)/../../../etc/passwd` -> [resolving the 
symlink for (a/a/a)] -> `(../..)/../../../etc/passwd` = `/etc/passwd`
    }
}

public static void main(String[] args) throws Exception {
    Path zipPath = Path.of("indirect-symlink.zip");

    createZip(zipPath);

    Path outputDir = Path.of("indirect-symlink");
    if (Files.exists(outputDir)) {
        System.err.println("Have to clean up output directory from previous 
run: " + outputDir);
        System.exit(1);
    }
    Files.createDirectory(outputDir);
    Extractor extractor = Extractor.newExtractor(outputDir);
    extractor.setSymlinkPolicy(SymlinkPolicy.ALLOW_WITHIN_ROOT);

    try (ZipFile zipFile = ZipFile.builder().setPath(zipPath).get()) {
        extractor.extract(zipFile);
    }
}
{code}

{code}
$ cat indirect-symlink/malicious-entry
root:x:0:0:root:/root:/bin/bash
...
{code}

----

To clarify, I don't want to discredit the thoughts and work put into this issue 
here and the PR. What is proposed here still seems to be way better than what 
many extraction libraries do. But maybe for {{ALLOW_WITHIN_ROOT}} it would have 
to be documented that this is best-effort and can probably not prevent all 
kinds of escapes.

> Provide a symlink-resistant (and concurrency-safe) archive Extractor
> --------------------------------------------------------------------
>
>                 Key: COMPRESS-727
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-727
>             Project: Commons Compress
>          Issue Type: Improvement
>            Reporter: Piotr Karwasz
>            Priority: Major
>         Attachments: COMPRESS-727.patch
>
>
> h2. Summary
> {{ArchiveEntry.resolveIn(Path)}} protects against classic Zip-slip by 
> resolving the entry name against the target directory, normalizing it, and 
> rejecting the result if it no longer {{startsWith}} the target. This check is 
> purely *lexical*: it manipulates path strings only. It does not look at the 
> file system, so it does not defend against *symlink* attacks, and the example 
> {{Expander}} that drives extraction does not create symlinks at all (so it is 
> both lossy for legitimate archives and silent about the risk).
> This issue proposes a first-class, security-first {{Extractor}} that safely 
> materializes a whole archive into a directory, including symlinks, and 
> resists both symlink-slip from the archive itself and concurrent symlink 
> races from third parties. The API is centered on 
> {{Extractor.newExtractor(Path)}}, which returns a {{SecureExtractor}} when 
> the platform supports {{java.nio.file.SecureDirectoryStream}}, or a 
> best-effort {{Extractor}} otherwise.
> h2. Background: what resolveIn does and does not cover
> {code:java}
> default Path resolveIn(final Path parentPath) throws IOException {
>     final Path outputFile = parentPath.resolve(getName()).normalize();
>     if (!outputFile.startsWith(parentPath)) {
>         throw new ArchiveException("Zip slip ...");
>     }
>     return outputFile;
> }
> {code}
> This stops {{../../etc/passwd}} style names. It does *not* stop:
> * A symlink *entry inside the archive* (for example {{link -> /etc}} or 
> {{link -> ../..}}) followed by a later entry written *through* that link 
> ({{link/evil}}). The lexical check on {{target/link/evil}} passes because the 
> string starts with {{target}}, but {{link}} is a symlink and the write 
> escapes. This is "symlink-slip", and it defeats the lexical guard even when 
> the target directory is fully trusted.
> * A symlink (or directory-to-symlink swap) planted *concurrently* by another 
> process between the {{resolveIn}} check and the actual 
> {{Files.newOutputStream}} call. The check and the write are not atomic with 
> respect to the intermediate path components, so this is a classic TOCTOU race.
> The metadata needed to handle links is already exposed:
> * {{TarArchiveEntry}}: {{isSymbolicLink()}}, {{isLink()}} (hard link), 
> {{getLinkName()}}, plus {{isBlockDevice()}} / {{isCharacterDevice()}} / 
> {{isFIFO()}} for special types.
> * {{ZipArchiveEntry}}: {{isUnixSymlink()}}, with 
> {{ZipFile.getUnixSymlink(entry)}} returning the link target.
> There is currently no use of {{SecureDirectoryStream}}, 
> {{LinkOption.NOFOLLOW_LINKS}}, or {{Files.createSymbolicLink}} anywhere in 
> main code, so this is greenfield.
> h2. Threat model
> # *Zip-slip* (path traversal via {{..}} or absolute names). Already handled 
> by {{resolveIn}}.
> # *Symlink-slip from the archive* (trusted target directory). The archive 
> plants a symlink and then writes through it, or supplies a symlink whose 
> target points outside the extraction root. Must be handled by an extractor 
> that controls symlink creation and never follows a link when materializing a 
> child path.
> # *Concurrent symlink race* (untrusted / shared target directory). A third 
> party with write access to the target tree swaps a path component for a 
> symlink during extraction. Defeating this requires file-descriptor-relative, 
> one-component-at-a-time resolution with no-follow semantics, that is, 
> {{SecureDirectoryStream}}.
> h2. Real-life examples of untrusted / shared target directories
> These are settings where threat 3 is realistic because another user or 
> process can write into the extraction tree while extraction runs:
> * *Shared CI / build agents*: multiple jobs extract into a common workspace 
> or temp directory on the same host; one job races another.
> * *Multi-user package-manager caches*: extracting into group- or 
> world-writable shared caches (Maven {{~/.m2}}, Gradle, pip, npm) on a shared 
> build box.
> * *Web upload processing*: an upload is extracted into a directory that other 
> worker processes or requests (possibly other tenants) can also write.
> * *Container image / layer extraction*: a layer's symlink redirects a later 
> layer's write. Container runtimes specifically defend this (for example 
> Docker/containerd use chroot-style joins, and Go's {{filepath-securejoin}} 
> exists for exactly this).
> * *Background daemons*: antivirus, indexers, or backup agents racing the 
> extractor over the same tree.
> Prior art that hardened against the same classes of bug:
> * The original Zip-slip disclosure (2018) prompted the lexical fix that 
> {{resolveIn}} implements.
> * Python's {{tarfile}} extraction filters (PEP 706, shipped in 3.12, 
> hardening the long-standing CVE-2007-4559) added {{data}} / {{tar}} / 
> {{fully_trusted}} filters that, among other things, reject symlinks and links 
> escaping the destination.
> * GNU tar performs "secure" extraction by removing an existing symlink before 
> writing through its name and by deferring symlink creation.
> * Container runtimes rely on {{filepath-securejoin}} (Go) to resolve each 
> component safely against symlink races.
> h2. Proposed API
> A supported {{Extractor}} type (promoted out of {{archivers.examples}} into, 
> for example, {{org.apache.commons.compress.archivers}}), with a 
> capability-detecting factory:
> {code:java}
> // Picks the strongest implementation the platform/filesystem allows.
> public static Extractor newExtractor(Path targetDirectory) throws IOException;
> {code}
> * On platforms where {{Files.newDirectoryStream(targetDirectory) instanceof 
> SecureDirectoryStream}} (Unix-like systems, see 
> {{sun.nio.fs.UnixSecureDirectoryStream}}), {{newExtractor}} returns a 
> *{{SecureExtractor}}* that is resistant to both symlink-slip and concurrent 
> symlink races.
> * Otherwise (notably Windows, or filesystems whose provider does not return a 
> secure stream) it returns a base {{Extractor}} that defends symlink-slip from 
> the archive on a best-effort basis but documents that it cannot fully close 
> the TOCTOU race.
> The {{Extractor}} mirrors the entry points of today's {{Expander}} 
> ({{ArchiveInputStream}}, {{ZipFile}}, {{TarFile}}, {{SevenZFile}}) and adds 
> configuration:
> {code:java}
> Extractor extractor = Extractor.newExtractor(targetDir)
>         .setSymlinkPolicy(SymlinkPolicy.ALLOW_WITHIN_ROOT) // default REJECT
>         .setSpecialFilePolicy(SpecialFilePolicy.SKIP)      // devices, FIFOs
>         .setOverwrite(false);
> extractor.extract(archiveInputStream);
> {code}
> h3. SecureExtractor design
> Walk each entry's parent path *component by component* starting from an open 
> directory handle on the (validated) extraction root:
> * For each intermediate component, open the subdirectory via 
> {{SecureDirectoryStream.newDirectoryStream(name, 
> LinkOption.NOFOLLOW_LINKS)}}. If a component is a symlink, refuse (it cannot 
> be a legitimate parent under no-follow).
> * Create files relative to the innermost directory handle with no-follow / 
> create-new semantics ({{newByteChannel}} with {{NOFOLLOW_LINKS}} + 
> {{CREATE_NEW}}), so a concurrently planted symlink cannot redirect the write.
> * Create directories and symlinks relative to the directory handle, never via 
> an absolute resolved path.
> Because every step is relative to a file descriptor rather than a re-resolved 
> string path, a third party cannot win the race by swapping a parent component.
> h3. Symlink policy
> ||Policy||Behavior||
> |{{REJECT}} (default)|Fail if the archive contains any symlink entry.|
> |{{SKIP}}|Silently ignore symlink entries.|
> |{{ALLOW_WITHIN_ROOT}}|Create a symlink only if its target resolves to a 
> location inside the extraction root; reject links escaping the root. Never 
> follow the new link when writing later entries.|
> |{{ALLOW_ALL}}|Create symlinks verbatim (only for fully trusted archives).|
> h3. Entry-type handling
> ||Entry type||Default action||
> |Regular file|Extract (no-follow, create-new unless overwrite enabled).|
> |Directory|Create (no-follow).|
> |Symbolic link|Per {{SymlinkPolicy}} (default {{REJECT}}).|
> |Hard link (tar {{isLink()}})|Resolve link target within root, no-follow; 
> reject if it escapes.|
> |Device / FIFO / other special|Per {{SpecialFilePolicy}} (default {{SKIP}}; 
> never created by the secure path).|
> h2. Platform support and degradation
> * {{SecureDirectoryStream}} is available on Unix-like platforms via the 
> default file system provider; {{newExtractor}} detects it with an 
> {{instanceof}} check on a directory stream opened over the target.
> * On platforms without it, the base {{Extractor}} still validates each parent 
> component with {{LinkOption.NOFOLLOW_LINKS}} (via {{Files.readAttributes}}) 
> and refuses to traverse existing symlinks, which closes symlink-slip from the 
> archive but remains non-atomic against a concurrent attacker. This residual 
> limitation must be documented clearly.
> h2. Acceptance criteria
> * New {{Extractor}} (supported, not under {{examples}}) with 
> {{newExtractor(Path)}} returning {{SecureExtractor}} when 
> {{SecureDirectoryStream}} is available, else a best-effort {{Extractor}}.
> * Configurable {{SymlinkPolicy}} (default {{REJECT}}) and special-file 
> handling; safe hard-link resolution.
> * {{SecureExtractor}} performs descriptor-relative, component-by-component, 
> no-follow resolution and create-new writes.
> * Entry points covering {{ArchiveInputStream}}, {{ZipFile}}, {{TarFile}}, and 
> {{SevenZFile}}, matching the existing {{Expander}} surface.
> * Tests: symlink-slip archive (link then write-through) is blocked under a 
> trusted target; an escaping symlink target is rejected; a concurrent 
> directory-to-symlink swap is defeated by {{SecureExtractor}} (and documented 
> as not fully defeated by the base extractor); legitimate within-root symlinks 
> extract correctly when allowed.
> * Javadoc plus a "safe extraction" section in the user guide, 
> cross-referencing {{ArchiveEntry.resolveIn}} and the platform caveats.
> * Consider deprecating / redirecting {{Expander}} toward {{Extractor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to