[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers
[ https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679526#comment-17679526 ] ASF subversion and git services commented on JCLOUDS-1488: -- Commit e478dd5452d70a5ea2082337b05ad91f331f0eb6 in jclouds's branch refs/heads/master from Andrew Gaul [ https://gitbox.apache.org/repos/asf?p=jclouds.git;h=e478dd5452 ] JCLOUDS-1371: JCLOUDS-1488: optimize fs prefix This reduces the number of stat calls required when prefix is deep in the filesystem hierarchy. Further optimizations to delimiter are possible. References gaul/s3proxy#473. > Filesystem list call with prefix is slow in large containers > > > Key: JCLOUDS-1488 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1488 > Project: jclouds > Issue Type: Bug > Components: jclouds-blobstore >Affects Versions: 2.1.1 > Environment: Java version: java version "1.8.0_131" > Operating system: Fedora 27 x86_64 >Reporter: Lari Sinisalo >Assignee: Andrew Gaul >Priority: Major > Labels: filesystem > Fix For: 2.2.0, 2.1.2 > > Attachments: JCLOUDS1488.java > > > When the filesystem blobstore is used, running the following code takes very > long if there are a lot of files in the container: > {code:java} > ListContainerOptions options = new ListContainerOptions(); > options.prefix("test-container-subdirectory/"); > Set results = > blobStore.list("test-container",options); > {code} > See the attached Java source file [^JCLOUDS1488.java] for the full code. > On my system, running the attached Java code takes over 10 seconds to list a > single file if there are 500,000 files in the container outside that prefix. > Output from the attached code: > {code:java} > Number of blobs listed: 1 > First listed blob: test-container-subdirectory/file-to-list > Time it took to list the blobs: 13256 ms > {code} > A more general version of this problem was reported previously in > JCLOUDS-1371. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers
[ https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755717#comment-16755717 ] ASF subversion and git services commented on JCLOUDS-1488: -- Commit 30b2ee9016a9f296a7e7ff5e219972f32db385dd in jclouds-labs's branch refs/heads/2.1.x from Andrew Gaul [ https://gitbox.apache.org/repos/asf?p=jclouds-labs.git;h=30b2ee9 ] JCLOUDS-1371: JCLOUDS-1488: list optimize prefix Previously getBlobKeysInsideContainer returned all keys and filtered in LocalBlobStore. Now getBlobKeysInsideContainer filters via prefix which can dramatically decrease the number of keys returned, especially for the filesystem provider. Further optimizations are possible for delimiter. > Filesystem list call with prefix is slow in large containers > > > Key: JCLOUDS-1488 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1488 > Project: jclouds > Issue Type: Bug > Components: jclouds-blobstore >Affects Versions: 2.1.1 > Environment: Java version: java version "1.8.0_131" > Operating system: Fedora 27 x86_64 >Reporter: Lari Sinisalo >Assignee: Andrew Gaul >Priority: Major > Labels: filesystem > Fix For: 2.2.0, 2.1.2 > > Attachments: JCLOUDS1488.java > > > When the filesystem blobstore is used, running the following code takes very > long if there are a lot of files in the container: > {code:java} > ListContainerOptions options = new ListContainerOptions(); > options.prefix("test-container-subdirectory/"); > Set results = > blobStore.list("test-container",options); > {code} > See the attached Java source file [^JCLOUDS1488.java] for the full code. > On my system, running the attached Java code takes over 10 seconds to list a > single file if there are 500,000 files in the container outside that prefix. > Output from the attached code: > {code:java} > Number of blobs listed: 1 > First listed blob: test-container-subdirectory/file-to-list > Time it took to list the blobs: 13256 ms > {code} > A more general version of this problem was reported previously in > JCLOUDS-1371. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers
[ https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755715#comment-16755715 ] ASF subversion and git services commented on JCLOUDS-1488: -- Commit aad98e6f9660fd7a4dd30fa72a2c16a41e1d8584 in jclouds-labs's branch refs/heads/master from Andrew Gaul [ https://gitbox.apache.org/repos/asf?p=jclouds-labs.git;h=aad98e6 ] JCLOUDS-1371: JCLOUDS-1488: list optimize prefix Previously getBlobKeysInsideContainer returned all keys and filtered in LocalBlobStore. Now getBlobKeysInsideContainer filters via prefix which can dramatically decrease the number of keys returned, especially for the filesystem provider. Further optimizations are possible for delimiter. > Filesystem list call with prefix is slow in large containers > > > Key: JCLOUDS-1488 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1488 > Project: jclouds > Issue Type: Bug > Components: jclouds-blobstore >Affects Versions: 2.1.1 > Environment: Java version: java version "1.8.0_131" > Operating system: Fedora 27 x86_64 >Reporter: Lari Sinisalo >Assignee: Andrew Gaul >Priority: Major > Labels: filesystem > Fix For: 2.2.0, 2.1.2 > > Attachments: JCLOUDS1488.java > > > When the filesystem blobstore is used, running the following code takes very > long if there are a lot of files in the container: > {code:java} > ListContainerOptions options = new ListContainerOptions(); > options.prefix("test-container-subdirectory/"); > Set results = > blobStore.list("test-container",options); > {code} > See the attached Java source file [^JCLOUDS1488.java] for the full code. > On my system, running the attached Java code takes over 10 seconds to list a > single file if there are 500,000 files in the container outside that prefix. > Output from the attached code: > {code:java} > Number of blobs listed: 1 > First listed blob: test-container-subdirectory/file-to-list > Time it took to list the blobs: 13256 ms > {code} > A more general version of this problem was reported previously in > JCLOUDS-1371. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers
[ https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755566#comment-16755566 ] ASF subversion and git services commented on JCLOUDS-1488: -- Commit 7bf9c474c656926203a9ac34a1ed27db35c8515d in jclouds's branch refs/heads/2.1.x from Andrew Gaul [ https://gitbox.apache.org/repos/asf?p=jclouds.git;h=7bf9c47 ] JCLOUDS-1371: JCLOUDS-1488: list optimize prefix Previously getBlobKeysInsideContainer returned all keys and filtered in LocalBlobStore. Now getBlobKeysInsideContainer filters via prefix which can dramatically decrease the number of keys returned, especially for the filesystem provider. Further optimizations are possible for delimiter. > Filesystem list call with prefix is slow in large containers > > > Key: JCLOUDS-1488 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1488 > Project: jclouds > Issue Type: Bug > Components: jclouds-blobstore >Affects Versions: 2.1.1 > Environment: Java version: java version "1.8.0_131" > Operating system: Fedora 27 x86_64 >Reporter: Lari Sinisalo >Priority: Major > Labels: filesystem > Attachments: JCLOUDS1488.java > > > When the filesystem blobstore is used, running the following code takes very > long if there are a lot of files in the container: > {code:java} > ListContainerOptions options = new ListContainerOptions(); > options.prefix("test-container-subdirectory/"); > Set results = > blobStore.list("test-container",options); > {code} > See the attached Java source file [^JCLOUDS1488.java] for the full code. > On my system, running the attached Java code takes over 10 seconds to list a > single file if there are 500,000 files in the container outside that prefix. > Output from the attached code: > {code:java} > Number of blobs listed: 1 > First listed blob: test-container-subdirectory/file-to-list > Time it took to list the blobs: 13256 ms > {code} > A more general version of this problem was reported previously in > JCLOUDS-1371. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers
[ https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746537#comment-16746537 ] Andrew Gaul commented on JCLOUDS-1488: -- I agree with your diagnosis and this is a long-standing shortcoming of the filesystem provider. Could you submit a pull request with your proposed solution? See also [JCLOUDS-1371|https://issues.apache.org/jira/browse/JCLOUDS-1371]. > Filesystem list call with prefix is slow in large containers > > > Key: JCLOUDS-1488 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1488 > Project: jclouds > Issue Type: Bug > Components: jclouds-blobstore >Affects Versions: 2.1.1 > Environment: Java version: java version "1.8.0_131" > Operating system: Fedora 27 x86_64 >Reporter: Lari Sinisalo >Priority: Major > Labels: filesystem > Attachments: JCLOUDS1488.java > > > When the filesystem blobstore is used, running the following code takes very > long if there are a lot of files in the container: > {code:java} > ListContainerOptions options = new ListContainerOptions(); > options.prefix("test-container-subdirectory/"); > Set results = > blobStore.list("test-container",options); > {code} > See the attached Java source file [^JCLOUDS1488.java] for the full code. > On my system, running the attached Java code takes over 10 seconds to list a > single file if there are 500,000 files in the container outside that prefix. > Output from the attached code: > {code:java} > Number of blobs listed: 1 > First listed blob: test-container-subdirectory/file-to-list > Time it took to list the blobs: 13256 ms > {code} > A more general version of this problem was reported previously in > JCLOUDS-1371. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JCLOUDS-1488) Filesystem list call with prefix is slow in large containers
[ https://issues.apache.org/jira/browse/JCLOUDS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746126#comment-16746126 ] Lari Sinisalo commented on JCLOUDS-1488: In org.jclouds.blobstore.config.LocalBlobStore.list(String, ListContainerOptions), there is the following code: {code} // Loading blobs from container Iterable blobBelongingToContainer = null; try { blobBelongingToContainer = storageStrategy.getBlobKeysInsideContainer(containerName); } catch (IOException e) { logger.error(e, "An error occurred loading blobs contained into container %s", containerName); propagate(e); } {code} This getBlobKeysInsideContainer lists the keys of all blobs inside the container. It takes only the container name as a parameter, so it will always ignore the prefix in the ListContainerOptions. The getBlobKeysInsideContainer implementation in FilesystemStorageStrategyImpl is as follows: {code} /** * Returns all the blobs key inside a container * * @param container * @return * @throws IOException */ @Override public Iterable getBlobKeysInsideContainer(String container) throws IOException { filesystemContainerNameValidator.validate(container); // check if container exists // TODO maybe an error is more appropriate Set blobNames = Sets.newHashSet(); if (!containerExists(container)) { return blobNames; } File containerFile = openFolder(container); final int containerPathLength = containerFile.getAbsolutePath().length() + 1; populateBlobKeysInContainer(containerFile, blobNames, new Function() { @Override public String apply(String string) { return denormalize(string.substring(containerPathLength)); } }); return blobNames; } {code} The openFolder call here opens the container root directory. It seems that if this call would receive a subdirectory path instead, the list call would be much more efficient. I am not quite sure what would be the appropriate way to extract the subdirectory path from the prefix. This would need to be done in a way that does not allow path traversal outside the container root directory. Passing the necessary information to getBlobKeysInsideContainer would also require interface changes. > Filesystem list call with prefix is slow in large containers > > > Key: JCLOUDS-1488 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1488 > Project: jclouds > Issue Type: Bug > Components: jclouds-blobstore >Affects Versions: 2.1.1 > Environment: Java version: java version "1.8.0_131" > Operating system: Fedora 27 x86_64 >Reporter: Lari Sinisalo >Priority: Major > Labels: filesystem > Attachments: JCLOUDS1488.java > > > When the filesystem blobstore is used, running the following code takes very > long if there are a lot of files in the container: > {code:java} > ListContainerOptions options = new ListContainerOptions(); > options.prefix("test-container-subdirectory/"); > Set results = > blobStore.list("test-container",options); > {code} > See the attached Java source file [^JCLOUDS1488.java] for the full code. > On my system, running the attached Java code takes over 10 seconds to list a > single file if there are 500,000 files in the container outside that prefix. > Output from the attached code: > {code:java} > Number of blobs listed: 1 > First listed blob: test-container-subdirectory/file-to-list > Time it took to list the blobs: 13256 ms > {code} > A more general version of this problem was reported previously in > JCLOUDS-1371. -- This message was sent by Atlassian JIRA (v7.6.3#76005)