[ https://issues.apache.org/jira/browse/JCLOUDS-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860666#comment-17860666 ]
Jacob Nguyen edited comment on JCLOUDS-1638 at 6/28/24 12:07 AM: ----------------------------------------------------------------- Extra thing to note, S3 SDK now uses encoding-type = url by default in their newer versions now. Not sure whether JClouds should do the same thing. https://github.com/aws/aws-sdk-java/issues/333#issuecomment-213096411 was (Author: JIRAUSER306001): Extra thing to note, S3 SDK now uses encoding-type = url by default in their newer versions now. Not sure whether JClouds should do the same thing. https://github.com/aws/aws-sdk-java/issues/460#issuecomment-240296956 > SAXParseException on S3 Listing > ------------------------------- > > Key: JCLOUDS-1638 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1638 > Project: jclouds > Issue Type: Bug > Affects Versions: 2.5.0, 2.6.0 > Reporter: Jacob Nguyen > Assignee: Andrew Gaul > Priority: Major > > {noformat} > java.lang.RuntimeException: request: GET > https://cloudsync-performance-tests.s3.amazonaws.com/?delimiter=/&prefix=some/&max-keys=1000 > HTTP/1.1; response: HTTP/1.1 200 OK; cause: java.lang.RuntimeException: > request: GET > https://cloudsync-performance-tests.s3.amazonaws.com/?delimiter=/&prefix=some/&max-keys=1000 > HTTP/1.1; error at 586:2 in document ; cause: org.xml.sax.SAXParseException; > lineNumber: 2; columnNumber: 586; Character reference "" is an invalid > XML character. > at > org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:174) > at > org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:146) > at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:86) > at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:52) > at > org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:91) > at > org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:74) > at > org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:45) > at > org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:156) > at > org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123) > at jdk.proxy2/jdk.proxy2.$Proxy235.listBucket(Unknown Source) > at org.jclouds.s3.blobstore.S3BlobStore.list(S3BlobStore.java:177) > {noformat} > When there's a control character in the folder path in S3, we can't parse it > from the response because it throws SAXParseException. > Can there be an option that at least lets us forward the encoding-type param? > https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestSyntax > And url decode it for us so that listing can be possible? This bug currently > doesn't allow us to list any children of a root folder if one of the children > contains control characters. > Here's an example XML response from S3 when listing objects from cURL: > {noformat} > <?xml version="1.0" encoding="UTF-8"?> > <ListBucketResult > xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test/</Prefix></CommonPrefixes></ListBucketResult> > {noformat} > Child folder of 'some' contains > {noformat} > <Prefix>some/test/</Prefix> > {noformat} > which can't be parsed. > But with the urlParam &encoding-type=url : > {noformat} > <?xml version="1.0" encoding="UTF-8"?> > <ListBucketResult > xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><EncodingType>url</EncodingType><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test%10/</Prefix></CommonPrefixes></ListBucketResult> > {noformat} > {noformat} > <Prefix>some/test%10/</Prefix> > {noformat} > Can probably be parsed. -- This message was sent by Atlassian Jira (v8.20.10#820010)