[jira] [Created] (HADOOP-15221) Swift driver should not fail if JSONUtils reports UnknowPropertyException
Chen He created HADOOP-15221: Summary: Swift driver should not fail if JSONUtils reports UnknowPropertyException Key: HADOOP-15221 URL: https://issues.apache.org/jira/browse/HADOOP-15221 Project: Hadoop Common Issue Type: Improvement Components: fs/swift Reporter: Chen He Assignee: Chen He org.apache.hadoop.fs.swift.exceptions.SwiftJsonMarshallingException: org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field We know system is keep involving and new field will be added. However, for compatibility point of view, extra field added to json should be logged but may not lead to failure from the robustness point of view. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14716) SwiftNativeFileSystem should not eat the exception when rename
[ https://issues.apache.org/jira/browse/HADOOP-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-14716: - Attachment: HADOOP-14716-WIP.patch WIP patch. > SwiftNativeFileSystem should not eat the exception when rename > -- > > Key: HADOOP-14716 > URL: https://issues.apache.org/jira/browse/HADOOP-14716 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Chen He >Assignee: Chen He >Priority: Minor > Attachments: HADOOP-14716-WIP.patch > > > Currently, if "rename" will eat excpetions and return "false" in > SwiftNativeFileSystem. It is not easy for user to find root cause about why > rename failed. It has to, at least, write out some logs instead of directly > eats these exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14716) SwiftNativeFileSystem should not eat the exception when rename
[ https://issues.apache.org/jira/browse/HADOOP-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111387#comment-16111387 ] Chen He commented on HADOOP-14716: -- Thank you for the quick reply, [~steve_l]. IMHO, HADOOP-11452 is very helpful, at the same time, I will come up with a patch. > SwiftNativeFileSystem should not eat the exception when rename > -- > > Key: HADOOP-14716 > URL: https://issues.apache.org/jira/browse/HADOOP-14716 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Chen He >Assignee: Chen He >Priority: Minor > > Currently, if "rename" will eat excpetions and return "false" in > SwiftNativeFileSystem. It is not easy for user to find root cause about why > rename failed. It has to, at least, write out some logs instead of directly > eats these exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14716) SwiftNativeFileSystem should not eat the exception when rename
Chen He created HADOOP-14716: Summary: SwiftNativeFileSystem should not eat the exception when rename Key: HADOOP-14716 URL: https://issues.apache.org/jira/browse/HADOOP-14716 Project: Hadoop Common Issue Type: Bug Components: tools Affects Versions: 3.0.0-alpha3, 2.8.1 Reporter: Chen He Assignee: Chen He Priority: Minor Currently, if "rename" will eat excpetions and return "false" in SwiftNativeFileSystem. It is not easy for user to find root cause about why rename failed. It has to, at least, write out some logs instead of directly eats these exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14641) hadoop-openstack driver reports input stream leaking
Chen He created HADOOP-14641: Summary: hadoop-openstack driver reports input stream leaking Key: HADOOP-14641 URL: https://issues.apache.org/jira/browse/HADOOP-14641 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.3 Reporter: Chen He [2017-07-07 14:51:07,052] ERROR Input stream is leaking handles by not being closed() properly: HttpInputStreamWithRelease working with https://url/logs released=false dataConsumed=false (org.apache.hadoop.fs.swift.snative.SwiftNativeInputStream:259) [2017-07-07 14:51:07,052] DEBUG Releasing connection to https://url/logs: finalize() (org.apache.hadoop.fs.swift.http.HttpInputStreamWithRelease:101) java.lang.Exception: stack at org.apache.hadoop.fs.swift.http.HttpInputStreamWithRelease.(HttpInputStreamWithRelease.java:71) at org.apache.hadoop.fs.swift.http.SwiftRestClient$10.extractResult(SwiftRestClient.java:1523) at org.apache.hadoop.fs.swift.http.SwiftRestClient$10.extractResult(SwiftRestClient.java:1520) at org.apache.hadoop.fs.swift.http.SwiftRestClient.perform(SwiftRestClient.java:1406) at org.apache.hadoop.fs.swift.http.SwiftRestClient.doGet(SwiftRestClient.java:1520) at org.apache.hadoop.fs.swift.http.SwiftRestClient.getData(SwiftRestClient.java:679) at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObject(SwiftNativeFileSystemStore.java:276) at org.apache.hadoop.fs.swift.snative.SwiftNativeInputStream.(SwiftNativeInputStream.java:104) at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.open(SwiftNativeFileSystem.java:555) at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.open(SwiftNativeFileSystem.java:536) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at com.oracle.kafka.connect.swift.SwiftStorage.exists(SwiftStorage.java:74) at io.confluent.connect.hdfs.DataWriter.createDir(DataWriter.java:371) at io.confluent.connect.hdfs.DataWriter.(DataWriter.java:175) at com.oracle.kafka.connect.swift.SwiftSinkTask.start(SwiftSinkTask.java:78) at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:231) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:145) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:139) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:182) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12554) Swift client to read credentials from a credential provider
[ https://issues.apache.org/jira/browse/HADOOP-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637831#comment-15637831 ] Chen He commented on HADOOP-12554: -- I test it against openstack object store. It does not work: {quote} httpclient.HttpMethodDirector: Unable to respond to any of these challenges: {token=Token} {quote} Maybe the ReadMe is not clear? > Swift client to read credentials from a credential provider > --- > > Key: HADOOP-12554 > URL: https://issues.apache.org/jira/browse/HADOOP-12554 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: ramtin >Priority: Minor > Attachments: HADOOP-12554.001.patch, HADOOP-12554.002.patch > > > As HADOOP-12548 is going to do for s3, Swift should be reading credentials, > particularly passwords, from a credential provider. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13570) Hadoop Swift driver should use new Apache httpclient
[ https://issues.apache.org/jira/browse/HADOOP-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He resolved HADOOP-13570. -- Resolution: Duplicate Dup to HADOOP-11614, close it. > Hadoop Swift driver should use new Apache httpclient > > > Key: HADOOP-13570 > URL: https://issues.apache.org/jira/browse/HADOOP-13570 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.3, 2.6.4 >Reporter: Chen He > > Current Hadoop openstack module is still using apache httpclient v1.x. It is > too old. We need to update it to a higher version to catch up in performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13570) Hadoop Swift driver should use new Apache httpclient
[ https://issues.apache.org/jira/browse/HADOOP-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456269#comment-15456269 ] Chen He commented on HADOOP-13570: -- Hi [~steve_l], thank you for pointing out the duplication. I will comment on 11614 and close this one. > Hadoop Swift driver should use new Apache httpclient > > > Key: HADOOP-13570 > URL: https://issues.apache.org/jira/browse/HADOOP-13570 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.3, 2.6.4 >Reporter: Chen He > > Current Hadoop openstack module is still using apache httpclient v1.x. It is > too old. We need to update it to a higher version to catch up in performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13570) Hadoop Swift driver should use new Apache httpclient
[ https://issues.apache.org/jira/browse/HADOOP-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-13570: - Summary: Hadoop Swift driver should use new Apache httpclient (was: Hadoop swift Driver should use new Apache httpclient) > Hadoop Swift driver should use new Apache httpclient > > > Key: HADOOP-13570 > URL: https://issues.apache.org/jira/browse/HADOOP-13570 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.3, 2.6.4 >Reporter: Chen He > > Current Hadoop openstack module is still using apache httpclient v1.x. It is > too old. We need to update it to a higher version to catch up in performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13570) Hadoop swift Driver should use new Apache httpclient
Chen He created HADOOP-13570: Summary: Hadoop swift Driver should use new Apache httpclient Key: HADOOP-13570 URL: https://issues.apache.org/jira/browse/HADOOP-13570 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.6.4, 2.7.3 Reporter: Chen He Current Hadoop openstack module is still using apache httpclient v1.x. It is too old. We need to update it to a higher version to catch up the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13570) Hadoop swift Driver should use new Apache httpclient
[ https://issues.apache.org/jira/browse/HADOOP-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-13570: - Component/s: fs/swift > Hadoop swift Driver should use new Apache httpclient > > > Key: HADOOP-13570 > URL: https://issues.apache.org/jira/browse/HADOOP-13570 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.3, 2.6.4 >Reporter: Chen He > > Current Hadoop openstack module is still using apache httpclient v1.x. It is > too old. We need to update it to a higher version to catch up the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13570) Hadoop swift Driver should use new Apache httpclient
[ https://issues.apache.org/jira/browse/HADOOP-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-13570: - Description: Current Hadoop openstack module is still using apache httpclient v1.x. It is too old. We need to update it to a higher version to catch up in performance. (was: Current Hadoop openstack module is still using apache httpclient v1.x. It is too old. We need to update it to a higher version to catch up the performance.) > Hadoop swift Driver should use new Apache httpclient > > > Key: HADOOP-13570 > URL: https://issues.apache.org/jira/browse/HADOOP-13570 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.3, 2.6.4 >Reporter: Chen He > > Current Hadoop openstack module is still using apache httpclient v1.x. It is > too old. We need to update it to a higher version to catch up in performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435538#comment-15435538 ] Chen He edited comment on HADOOP-9565 at 8/24/16 7:46 PM: -- Hi [~steve_l], thank you for spending time on my question. The new version of FileOutputCommitter has algorithm 2 which does not have serial rename of all tasks in commitJob. Just find the parameter. It should resolve our problem. was (Author: airbots): Hi [~steve_l], thank you for spending time on my question. The new version of FileOutputCommitter has algorithm 2 which does not have serial rename of all task in commitJob. Just find the parameter. It should resolve our problem. > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435538#comment-15435538 ] Chen He commented on HADOOP-9565: - Hi [~steve_l], thank you for spending time on my question. The new version of FileOutputCommitter has algorithm 2 which does not have serial rename of all task in commitJob. Just find the parameter. It should resolve our problem. > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428975#comment-15428975 ] Chen He edited comment on HADOOP-9565 at 8/19/16 10:52 PM: --- >From our experiences, the main renaming overhead comes from >"FileOutputCommitter.commitTask()". Because it moves the files from temp dir >to dest dir. Some frameworks may not care whether the final task files are >under "dst/_temporary/0/_temporary/" or "dst/". Why don't we add a parameter >such as "mapreduce.skip.task.commit" parameter (default is false), so that >once a task is done, the output just stay in "dst/_temporary/0/_temporary/". >Then, the next job or application just need to take the "dst/" as input dir, >they do not care about whether is is deep or not. It avoids the atomicwrite >issue, provide compatibility, and avoid rename overhead. If there is no >objection, I am happy to create a JIRA to tracking that. was (Author: airbots): >From our experiences, the main renaming overhead comes from >"FileOutputCommitter.commitTask()". Because it moves the files from temp dir >to dest dir. Some frameworks may not care whether the final task files are >under "dst/_temporary/0/_temporary/" or "dst/". Why don't we add a parameter >such as "mapreduce.skip.task.commit" parameter (default is false), so that >once a task is done, the output just stay in "dst/_temporary/0/_temporary/". >Then, the next job or application just need to take the "dst/" as input dir, >they do not care about whether is is deep or not. It avoids the atomicwrite >issue, provide compatibility, and avoid rename overhead. If there is no >objection, I will create a JIRA to tracking that. > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428975#comment-15428975 ] Chen He commented on HADOOP-9565: - >From our experiences, the main renaming overhead comes from >"FileOutputCommitter.commitTask()". Because it moves the files from temp dir >to dest dir. Some frameworks may not care whether the final task files are >under "dst/_temporary/0/_temporary/" or "dst/". Why don't we add a parameter >such as "mapreduce.skip.task.commit" parameter (default is false), so that >once a task is done, the output just stay in "dst/_temporary/0/_temporary/". >Then, the next job or application just need to take the "dst/" as input dir, >they do not care about whether is is deep or not. It avoids the atomicwrite >issue, provide compatibility, and avoid rename overhead. If there is no >objection, I will create a JIRA to tracking that. > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11786) Fix Javadoc typos in org.apache.hadoop.fs.FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425479#comment-15425479 ] Chen He commented on HADOOP-11786: -- Thank you for the work, [~anu] and [~boky01] > Fix Javadoc typos in org.apache.hadoop.fs.FileSystem > > > Key: HADOOP-11786 > URL: https://issues.apache.org/jira/browse/HADOOP-11786 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0 >Reporter: Chen He >Assignee: Andras Bokor >Priority: Trivial > Labels: newbie++ > Attachments: HADOOP-11786.patch > > > /** > * Resets all statistics to 0. > * > * In order to reset, we add up all the thread-local statistics data, and > * set rootData to the negative of that. > * > * This may seem like a counterintuitive way to reset the statsitics. Why > * can't we just zero out all the thread-local data? Well, thread-local > * data can only be modified by the thread that owns it. If we tried to > * modify the thread-local data from this thread, our modification might > get > * interleaved with a read-modify-write operation done by the thread that > * owns the data. That would result in our update getting lost. > * > * The approach used here avoids this problem because it only ever reads > * (not writes) the thread-local data. Both reads and writes to rootData > * are done under the lock, so we're free to modify rootData from any > thread > * that holds the lock. > */ > etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13211) Swift driver should have a configurable retry feature when ecounter 5xx error
[ https://issues.apache.org/jira/browse/HADOOP-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-13211: - Assignee: (was: Chen He) > Swift driver should have a configurable retry feature when ecounter 5xx error > - > > Key: HADOOP-13211 > URL: https://issues.apache.org/jira/browse/HADOOP-13211 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.2 >Reporter: Chen He > > In current code. if Swift driver meets a HTTP 5xx, it will throw exception > and stop. As a driver, it will be more sophisticate if it can retry a > configurable times before report failure. There are two reasons that I can > image: > 1. if the server is really busy, it is possible that the server will drop > some requests to avoid DDoS attack. > 2. If server accidentally unavailable for a short period of time and come > back again, we may not need to fail the whole driver. Just record the > exception and retry may be more flexible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13211) Swift driver should have a configurable retry feature when ecounter 5xx error
[ https://issues.apache.org/jira/browse/HADOOP-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387082#comment-15387082 ] Chen He commented on HADOOP-13211: -- Thank you for the update, [~ste...@apache.org]. If we want to avoid data loss on retry. How about recursive retry ? It looks little bit ugly but won't fail. For object store server dead case, recursive is nightmare. > Swift driver should have a configurable retry feature when ecounter 5xx error > - > > Key: HADOOP-13211 > URL: https://issues.apache.org/jira/browse/HADOOP-13211 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.2 >Reporter: Chen He >Assignee: Chen He > > In current code. if Swift driver meets a HTTP 5xx, it will throw exception > and stop. As a driver, it will be more sophisticate if it can retry a > configurable times before report failure. There are two reasons that I can > image: > 1. if the server is really busy, it is possible that the server will drop > some requests to avoid DDoS attack. > 2. If server accidentally unavailable for a short period of time and come > back again, we may not need to fail the whole driver. Just record the > exception and retry may be more flexible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13211) Swift driver should have a configurable retry feature when ecounter 5xx error
[ https://issues.apache.org/jira/browse/HADOOP-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309263#comment-15309263 ] Chen He commented on HADOOP-13211: -- Thank you for the reply, [~ste...@apache.org]. IMHO, the hadoop openstack driver is a bridge between HDFS and Openstack object store. MR or other native Hadoop frameworks should be able to utilize the Hadoop IPC retry. With the increasing popularity of HDFS, other computing frameworks like Spark, in memory storage system like Tachyon, they are using hadoop openstack driver. I am not sure if Spark or other frameworks use hadoop-openstack driver, the Hadoop IPC retry will trigger or not. Those frameworks have retry on task level, however, it could be costly to retry a task than just retry in the driver level. For the data lose, it is a really good catch. If the server keeps failing and providing 5xx, the upload will finally fail. The object store is not file system and may not guarantee file system level integrity. I can't figure out a scenario that data loss caused by retry. Could you provide an suggestion? > Swift driver should have a configurable retry feature when ecounter 5xx error > - > > Key: HADOOP-13211 > URL: https://issues.apache.org/jira/browse/HADOOP-13211 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.2 >Reporter: Chen He >Assignee: Chen He > > In current code. if Swift driver meets a HTTP 5xx, it will throw exception > and stop. As a driver, it will be more sophisticate if it can retry a > configurable times before report failure. There are two reasons that I can > image: > 1. if the server is really busy, it is possible that the server will drop > some requests to avoid DDoS attack. > 2. If server accidentally unavailable for a short period of time and come > back again, we may not need to fail the whole driver. Just record the > exception and retry may be more flexible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13211) Swift driver should have a configurable retry feature when ecounter 5xx error
Chen He created HADOOP-13211: Summary: Swift driver should have a configurable retry feature when ecounter 5xx error Key: HADOOP-13211 URL: https://issues.apache.org/jira/browse/HADOOP-13211 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.2 Reporter: Chen He Assignee: Chen He In current code. if Swift driver meets a HTTP 5xx, it will throw exception and stop. As a driver, it will be more sophisticate if it can retry a configurable times before report failure. There are two reasons that I can image: 1. if the server is really busy, it is possible that the server will drop some requests to avoid DDoS attack. 2. If server accidentally unavailable for a short period of time and come back again, we may not need to fail the whole driver. Just record the exception and retry may be more flexible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12057) swiftfs rename on partitioned file attempts to consolidate partitions
[ https://issues.apache.org/jira/browse/HADOOP-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263341#comment-15263341 ] Chen He commented on HADOOP-12057: -- It automatically skips "unit tests" if the auth-keys.xml is not configured. > swiftfs rename on partitioned file attempts to consolidate partitions > - > > Key: HADOOP-12057 > URL: https://issues.apache.org/jira/browse/HADOOP-12057 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Reporter: David Dobbins >Assignee: David Dobbins > Attachments: HADOOP-12057-006.patch, HADOOP-12057-008.patch, > HADOOP-12057.007.patch, HADOOP-12057.patch, HADOOP-12057.patch, > HADOOP-12057.patch, HADOOP-12057.patch, HADOOP-12057.patch > > > In the swift filesystem for openstack, a rename operation on a partitioned > file uses the swift COPY operation, which attempts to consolidate all of the > partitions into a single object. This causes the rename to fail when the > total size of all the partitions exceeds the maximum object size for swift. > Since partitioned files are primarily created to allow a file to exceed the > maximum object size, this bug makes writing to swift extremely unreliable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12057) swiftfs rename on partitioned file attempts to consolidate partitions
[ https://issues.apache.org/jira/browse/HADOOP-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263308#comment-15263308 ] Chen He commented on HADOOP-12057: -- I think the reason why the patch get a -1 is because it fails a unit test. I agree it works in some cases. Chen He@Oracle from Samsung Mega > swiftfs rename on partitioned file attempts to consolidate partitions > - > > Key: HADOOP-12057 > URL: https://issues.apache.org/jira/browse/HADOOP-12057 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Reporter: David Dobbins >Assignee: David Dobbins > Attachments: HADOOP-12057-006.patch, HADOOP-12057-008.patch, > HADOOP-12057.007.patch, HADOOP-12057.patch, HADOOP-12057.patch, > HADOOP-12057.patch, HADOOP-12057.patch, HADOOP-12057.patch > > > In the swift filesystem for openstack, a rename operation on a partitioned > file uses the swift COPY operation, which attempts to consolidate all of the > partitions into a single object. This causes the rename to fail when the > total size of all the partitions exceeds the maximum object size for swift. > Since partitioned files are primarily created to allow a file to exceed the > maximum object size, this bug makes writing to swift extremely unreliable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12057) swiftfs rename on partitioned file attempts to consolidate partitions
[ https://issues.apache.org/jira/browse/HADOOP-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263265#comment-15263265 ] Chen He commented on HADOOP-12057: -- What object store you are talking about. It is also a server side configuration. If your server's config for the maximum size of a object is 10GB, you will not meet this problem. Try to get this info first. Or try to copy a 1TB file to object store, see what happend. :). I just suspect that no server will support this large single object . Please feel free to provide feedback. Thanks! > swiftfs rename on partitioned file attempts to consolidate partitions > - > > Key: HADOOP-12057 > URL: https://issues.apache.org/jira/browse/HADOOP-12057 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Reporter: David Dobbins >Assignee: David Dobbins > Attachments: HADOOP-12057-006.patch, HADOOP-12057-008.patch, > HADOOP-12057.007.patch, HADOOP-12057.patch, HADOOP-12057.patch, > HADOOP-12057.patch, HADOOP-12057.patch, HADOOP-12057.patch > > > In the swift filesystem for openstack, a rename operation on a partitioned > file uses the swift COPY operation, which attempts to consolidate all of the > partitions into a single object. This causes the rename to fail when the > total size of all the partitions exceeds the maximum object size for swift. > Since partitioned files are primarily created to allow a file to exceed the > maximum object size, this bug makes writing to swift extremely unreliable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13021) Hadoop swift driver unit test should use unique directory for each run
[ https://issues.apache.org/jira/browse/HADOOP-13021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256638#comment-15256638 ] Chen He commented on HADOOP-13021: -- Thank you for the quick reply, [~steve_l]. I agree, we need to clean before and after tests. I will update the patch. > Hadoop swift driver unit test should use unique directory for each run > -- > > Key: HADOOP-13021 > URL: https://issues.apache.org/jira/browse/HADOOP-13021 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Affects Versions: 2.7.2 >Reporter: Chen He >Assignee: Chen He > Labels: unit-test > Attachments: HADOOP-13021.001.patch > > > Since all "unit test" in swift package are actually functionality test, it > requires server's information in the core-site.xml file. However, multiple > unit test runs on difference machines using the same core-site.xml file will > result in some unit tests failure. For example: > In TestSwiftFileSystemBasicOps.java > public void testMkDir() throws Throwable { > Path path = new Path("/test/MkDir"); > fs.mkdirs(path); > //success then -so try a recursive operation > fs.delete(path, true); > } > It is possible that machine A and B are running "mvn clean install" using > same core-site.xml file. However, machine A run testMkDir() first and delete > the dir, but machine B just tried to run fs.delete(path,true). It will report > failure. This is just an example. There are many similar cases in the unit > test sets. I would propose we use a unique dir for each unit test run instead > of using "Path path = new Path("/test/MkDir")" for all concurrent runs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-13021) Hadoop swift driver unit test should use unique directory for each run
[ https://issues.apache.org/jira/browse/HADOOP-13021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-13021: - Attachment: HADOOP-13021.001.patch > Hadoop swift driver unit test should use unique directory for each run > -- > > Key: HADOOP-13021 > URL: https://issues.apache.org/jira/browse/HADOOP-13021 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Affects Versions: 2.7.2 >Reporter: Chen He >Assignee: Chen He > Labels: unit-test > Attachments: HADOOP-13021.001.patch > > > Since all "unit test" in swift package are actually functionality test, it > requires server's information in the core-site.xml file. However, multiple > unit test runs on difference machines using the same core-site.xml file will > result in some unit tests failure. For example: > In TestSwiftFileSystemBasicOps.java > public void testMkDir() throws Throwable { > Path path = new Path("/test/MkDir"); > fs.mkdirs(path); > //success then -so try a recursive operation > fs.delete(path, true); > } > It is possible that machine A and B are running "mvn clean install" using > same core-site.xml file. However, machine A run testMkDir() first and delete > the dir, but machine B just tried to run fs.delete(path,true). It will report > failure. This is just an example. There are many similar cases in the unit > test sets. I would propose we use a unique dir for each unit test run instead > of using "Path path = new Path("/test/MkDir")" for all concurrent runs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-13021) Hadoop swift driver unit test should use unique directory for each run
[ https://issues.apache.org/jira/browse/HADOOP-13021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254431#comment-15254431 ] Chen He commented on HADOOP-13021: -- Thank you for the reply, [~ste...@apache.org]. I agree with you. However, there could be corner case such as JVM crashes or unit tests terminated in outage. Even we set different value for each machine, for example, machine A has its own bucket. Because of previous outage, there is some leftover directories or files. The next unit test run incline to report error. I propose we use some timestamp for those hard values. Combining your suggest, we can guarantee that in every time unit test runs on every machine, they are using different hard values. Then, we may be little bit safer than current solution. > Hadoop swift driver unit test should use unique directory for each run > -- > > Key: HADOOP-13021 > URL: https://issues.apache.org/jira/browse/HADOOP-13021 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Affects Versions: 2.7.2 >Reporter: Chen He >Assignee: Chen He > Labels: unit-test > > Since all "unit test" in swift package are actually functionality test, it > requires server's information in the core-site.xml file. However, multiple > unit test runs on difference machines using the same core-site.xml file will > result in some unit tests failure. For example: > In TestSwiftFileSystemBasicOps.java > public void testMkDir() throws Throwable { > Path path = new Path("/test/MkDir"); > fs.mkdirs(path); > //success then -so try a recursive operation > fs.delete(path, true); > } > It is possible that machine A and B are running "mvn clean install" using > same core-site.xml file. However, machine A run testMkDir() first and delete > the dir, but machine B just tried to run fs.delete(path,true). It will report > failure. This is just an example. There are many similar cases in the unit > test sets. I would propose we use a unique dir for each unit test run instead > of using "Path path = new Path("/test/MkDir")" for all concurrent runs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12291) Add support for nested groups in LdapGroupsMapping
[ https://issues.apache.org/jira/browse/HADOOP-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12291: - Fix Version/s: (was: 2.8.0) > Add support for nested groups in LdapGroupsMapping > -- > > Key: HADOOP-12291 > URL: https://issues.apache.org/jira/browse/HADOOP-12291 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 2.8.0 >Reporter: Gautam Gopalakrishnan >Assignee: Esther Kundin > Labels: features, patch > Attachments: HADOOP-12291.001.patch > > > When using {{LdapGroupsMapping}} with Hadoop, nested groups are not > supported. So for example if user {{jdoe}} is part of group A which is a > member of group B, the group mapping currently returns only group A. > Currently this facility is available with {{ShellBasedUnixGroupsMapping}} and > SSSD (or similar tools) but would be good to have this feature as part of > {{LdapGroupsMapping}} directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11786) Fix Javadoc typos in org.apache.hadoop.fs.FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249119#comment-15249119 ] Chen He commented on HADOOP-11786: -- Thank you for the patch, [~boky01]. I will review it this week. > Fix Javadoc typos in org.apache.hadoop.fs.FileSystem > > > Key: HADOOP-11786 > URL: https://issues.apache.org/jira/browse/HADOOP-11786 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0 >Reporter: Chen He >Assignee: Yanjun Wang >Priority: Trivial > Labels: newbie++ > Attachments: HADOOP-11786.patch > > > /** > * Resets all statistics to 0. > * > * In order to reset, we add up all the thread-local statistics data, and > * set rootData to the negative of that. > * > * This may seem like a counterintuitive way to reset the statsitics. Why > * can't we just zero out all the thread-local data? Well, thread-local > * data can only be modified by the thread that owns it. If we tried to > * modify the thread-local data from this thread, our modification might > get > * interleaved with a read-modify-write operation done by the thread that > * owns the data. That would result in our update getting lost. > * > * The approach used here avoids this problem because it only ever reads > * (not writes) the thread-local data. Both reads and writes to rootData > * are done under the lock, so we're free to modify rootData from any > thread > * that holds the lock. > */ > etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-13021) Hadoop swift driver unit test should use unique directory each run
Chen He created HADOOP-13021: Summary: Hadoop swift driver unit test should use unique directory each run Key: HADOOP-13021 URL: https://issues.apache.org/jira/browse/HADOOP-13021 Project: Hadoop Common Issue Type: Bug Components: fs/swift Affects Versions: 2.7.2 Reporter: Chen He Assignee: Chen He Since all "unit test" in swift package are actually functionality test, it requires server's information in the core-site.xml file. However, multiple unit test runs on difference machines using the same core-site.xml file will result in some unit tests failure. For example: In TestSwiftFileSystemBasicOps.java public void testMkDir() throws Throwable { Path path = new Path("/test/MkDir"); fs.mkdirs(path); //success then -so try a recursive operation fs.delete(path, true); } It is possible that machine A and B are running "mvn clean install" using same core-site.xml file. However, machine A run testMkDir() first and delete the dir, but machine B just tried to run fs.delete(path,true). It will report failure. This is just an example. There are many similar cases in the unit test sets. I would propose we use a unique dir for each unit test run instead of using "Path path = new Path("/test/MkDir")" for all concurrent runs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-13021) Hadoop swift driver unit test should use unique directory for each run
[ https://issues.apache.org/jira/browse/HADOOP-13021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-13021: - Labels: unit-test (was: ) > Hadoop swift driver unit test should use unique directory for each run > -- > > Key: HADOOP-13021 > URL: https://issues.apache.org/jira/browse/HADOOP-13021 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Affects Versions: 2.7.2 >Reporter: Chen He >Assignee: Chen He > Labels: unit-test > > Since all "unit test" in swift package are actually functionality test, it > requires server's information in the core-site.xml file. However, multiple > unit test runs on difference machines using the same core-site.xml file will > result in some unit tests failure. For example: > In TestSwiftFileSystemBasicOps.java > public void testMkDir() throws Throwable { > Path path = new Path("/test/MkDir"); > fs.mkdirs(path); > //success then -so try a recursive operation > fs.delete(path, true); > } > It is possible that machine A and B are running "mvn clean install" using > same core-site.xml file. However, machine A run testMkDir() first and delete > the dir, but machine B just tried to run fs.delete(path,true). It will report > failure. This is just an example. There are many similar cases in the unit > test sets. I would propose we use a unique dir for each unit test run instead > of using "Path path = new Path("/test/MkDir")" for all concurrent runs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-13021) Hadoop swift driver unit test should use unique directory for each run
[ https://issues.apache.org/jira/browse/HADOOP-13021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-13021: - Summary: Hadoop swift driver unit test should use unique directory for each run (was: Hadoop swift driver unit test should use unique directory each run) > Hadoop swift driver unit test should use unique directory for each run > -- > > Key: HADOOP-13021 > URL: https://issues.apache.org/jira/browse/HADOOP-13021 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Affects Versions: 2.7.2 >Reporter: Chen He >Assignee: Chen He > Labels: unit-test > > Since all "unit test" in swift package are actually functionality test, it > requires server's information in the core-site.xml file. However, multiple > unit test runs on difference machines using the same core-site.xml file will > result in some unit tests failure. For example: > In TestSwiftFileSystemBasicOps.java > public void testMkDir() throws Throwable { > Path path = new Path("/test/MkDir"); > fs.mkdirs(path); > //success then -so try a recursive operation > fs.delete(path, true); > } > It is possible that machine A and B are running "mvn clean install" using > same core-site.xml file. However, machine A run testMkDir() first and delete > the dir, but machine B just tried to run fs.delete(path,true). It will report > failure. This is just an example. There are many similar cases in the unit > test sets. I would propose we use a unique dir for each unit test run instead > of using "Path path = new Path("/test/MkDir")" for all concurrent runs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12501) Enable SwiftNativeFileSystem to ACLs
[ https://issues.apache.org/jira/browse/HADOOP-12501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12501: - Summary: Enable SwiftNativeFileSystem to ACLs (was: Enable SwiftNativeFileSystem to preserve user, group, permission) > Enable SwiftNativeFileSystem to ACLs > > > Key: HADOOP-12501 > URL: https://issues.apache.org/jira/browse/HADOOP-12501 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He >Assignee: Chen He > > Currently, if user copy file/dir from localFS or HDFS to swift object store, > u/g/p will be gone. There should be a way to preserve u/g/p. It will provide > benefit for a large number of files/dirs transferring between HDFS/localFS > and Swift object store. We also need to be careful since Hadoop prevent > general user from changing u/g/p especially if Kerberos is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12735) Fix typo in core-default.xml property
[ https://issues.apache.org/jira/browse/HADOOP-12735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12735: - Affects Version/s: (was: 2.8.0) 2.7.1 > Fix typo in core-default.xml property > - > > Key: HADOOP-12735 > URL: https://issues.apache.org/jira/browse/HADOOP-12735 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Minor > Labels: supportability > > The property as defined in core-default.xml is > bq. hadoop.work.around.non.threadsafe.getpwuid > But in NativeIO.java (the only place I can see a similar reference), the > property is defined as: > bq. static final String WORKAROUND_NON_THREADSAFE_CALLS_KEY = > "hadoop.workaround.non.threadsafe.getpwuid"; > Note the extra period (.) in the word "workaround". > Should the code be made to match the property or vice versa? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12735) Fix typo in core-default.xml property
[ https://issues.apache.org/jira/browse/HADOOP-12735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113864#comment-15113864 ] Chen He commented on HADOOP-12735: -- 2.8.0 is not released yet, change to 2.7.1 > Fix typo in core-default.xml property > - > > Key: HADOOP-12735 > URL: https://issues.apache.org/jira/browse/HADOOP-12735 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Minor > Labels: supportability > > The property as defined in core-default.xml is > bq. hadoop.work.around.non.threadsafe.getpwuid > But in NativeIO.java (the only place I can see a similar reference), the > property is defined as: > bq. static final String WORKAROUND_NON_THREADSAFE_CALLS_KEY = > "hadoop.workaround.non.threadsafe.getpwuid"; > Note the extra period (.) in the word "workaround". > Should the code be made to match the property or vice versa? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12623) Hadoop Swift driver should support more flexible container name than RFC952
[ https://issues.apache.org/jira/browse/HADOOP-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12623: - Summary: Hadoop Swift driver should support more flexible container name than RFC952 (was: Swift should support more flexible container name than RFC952) > Hadoop Swift driver should support more flexible container name than RFC952 > --- > > Key: HADOOP-12623 > URL: https://issues.apache.org/jira/browse/HADOOP-12623 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1, 2.6.2 >Reporter: Chen He > > Just a thought. > It will be great if Hadoop swift driver can support more flexible container > name. Current Hadoop swift driver requires container name to follow RFC952. > It will report error if container name does not obey RFC952: > "Invalid swift hostname 'test.1.serviceName': hostname must in form > container.service" > However, user can use any other Swift object store drivers (cURL, cyberduck, > JOSS, swift python driver, etc) to upload data to Object store but current > hadoop swift driver can not recognize those containers whose names do not > follow RFC952. > I dig into the source code and figure out it is because of in > RestClientBindings.java{ > public static String extractContainerName(URI uri) throws > SwiftConfigurationException { > return extractContainerName(uri.getHost()); > } > } > And URI.java line 3143 gives "host = null" . > We may need to find a better way to do the container name parsing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12623) Swift should support more flexible container name than RFC952
Chen He created HADOOP-12623: Summary: Swift should support more flexible container name than RFC952 Key: HADOOP-12623 URL: https://issues.apache.org/jira/browse/HADOOP-12623 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.6.2, 2.7.1 Reporter: Chen He Just a thought. It will be great if Hadoop swift driver can support more flexible container name. Current Hadoop swift driver requires container name to follow RFC952. It will report error if container name does not obey RFC952: "Invalid swift hostname 'test.1.serviceName': hostname must in form container.service" However, user can use any other Swift object store drivers (cURL, cyberduck, JOSS, swift python driver, etc) to upload data to Object store but current hadoop swift driver can not recognize those containers whose names do not follow RFC952. I dig into the source code and figure out it is because of in RestClientBindings.java{ public static String extractContainerName(URI uri) throws SwiftConfigurationException { return extractContainerName(uri.getHost()); } } And URI.java line 3143 gives "host = null" . We may need to find a better way to do the container name parsing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12551) Introduce FileNotFoundException for open and getFileStatus API's in WASB
[ https://issues.apache.org/jira/browse/HADOOP-12551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12551: - Fix Version/s: (was: 2.8.0) > Introduce FileNotFoundException for open and getFileStatus API's in WASB > > > Key: HADOOP-12551 > URL: https://issues.apache.org/jira/browse/HADOOP-12551 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Dushyanth >Assignee: Dushyanth > > HADOOP-12533 introduced FileNotFoundException to the read and seek API for > WASB. The open and getFileStatus api currently throws FileNotFoundException > correctly when the file does not exists when the API is called but does not > throw the same exception if there is another thread/process deletes the file > during its execution. This Jira fixes that behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12551) Introduce FileNotFoundException for open and getFileStatus API's in WASB
[ https://issues.apache.org/jira/browse/HADOOP-12551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12551: - Affects Version/s: (was: 2.8.0) 2.7.1 > Introduce FileNotFoundException for open and getFileStatus API's in WASB > > > Key: HADOOP-12551 > URL: https://issues.apache.org/jira/browse/HADOOP-12551 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Dushyanth >Assignee: Dushyanth > > HADOOP-12533 introduced FileNotFoundException to the read and seek API for > WASB. The open and getFileStatus api currently throws FileNotFoundException > correctly when the file does not exists when the API is called but does not > throw the same exception if there is another thread/process deletes the file > during its execution. This Jira fixes that behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12501) Enable SwiftNativeFileSystem to preserve user, group, permission
[ https://issues.apache.org/jira/browse/HADOOP-12501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975120#comment-14975120 ] Chen He commented on HADOOP-12501: -- Thank you for the suggestion, [~steve_l] "though I think it's dangerous as people may thing those permissions may actually apply. " Actually, I have another idea to enable swift driver to do permission check, then the blobstore looks more like a real filesystem. The idea about changing 'distcp' is a great solution. IMHO, it could be more helpful if we find a way to let the '-p' option works for all filesystem implementations. > Enable SwiftNativeFileSystem to preserve user, group, permission > > > Key: HADOOP-12501 > URL: https://issues.apache.org/jira/browse/HADOOP-12501 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He >Assignee: Chen He > > Currently, if user copy file/dir from localFS or HDFS to swift object store, > u/g/p will be gone. There should be a way to preserve u/g/p. It will provide > benefit for a large number of files/dirs transferring between HDFS/localFS > and Swift object store. We also need to be careful since Hadoop prevent > general user from changing u/g/p especially if Kerberos is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12501) Enable SwiftNativeFileSystem to preserve user, group, permission
[ https://issues.apache.org/jira/browse/HADOOP-12501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969436#comment-14969436 ] Chen He commented on HADOOP-12501: -- Hi [~steve_l], thank you for the reply. Swift server has its own access control mechanism in the backend. However, it may not satisfy the needs in some cases. For example: Current storage provider sells a service to a company A, the A has several types of users: admin, general user, etc. If the admin wants to backup or restore all files on HDFS to swift object store, the u/g/p data may lost if the admin uses 'distcp' to copy files (it is also possible that admin copies data blocks and metadata instead of using 'distcp', then there is no need to preserve u/g/p). All u/g/p will disappear. My thought was to preserve the u/g/p into somewhere in the metadata of each object, then, it will avoid a lot of work for the admin to recover u/g/p in this case. > Enable SwiftNativeFileSystem to preserve user, group, permission > > > Key: HADOOP-12501 > URL: https://issues.apache.org/jira/browse/HADOOP-12501 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He >Assignee: Chen He > > Currently, if user copy file/dir from localFS or HDFS to swift object store, > u/g/p will be gone. There should be a way to preserve u/g/p. It will provide > benefit for a large number of files/dirs transferring between HDFS/localFS > and Swift object store. We also need to be careful since Hadoop prevent > general user from changing u/g/p especially if Kerberos is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12501) Enable SwiftNativeFileSystem to preserve user, group, permission
Chen He created HADOOP-12501: Summary: Enable SwiftNativeFileSystem to preserve user, group, permission Key: HADOOP-12501 URL: https://issues.apache.org/jira/browse/HADOOP-12501 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Assignee: Chen He Currently, if user copy file/dir from localFS or HDFS to swift object store, u/g/p will be gone. There should be a way to preserve u/g/p. It will provide benefit for a large number of files/dirs transferring between HDFS/localFS and Swift object store. We also need to be careful since Hadoop prevent general user from changing u/g/p especially if Kerberos is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12461) Swift driver should have the ability to renew token if it expired
[ https://issues.apache.org/jira/browse/HADOOP-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12461: - Summary: Swift driver should have the ability to renew token if it expired (was: Swift driver should have the ability to renew token if server has timeout) > Swift driver should have the ability to renew token if it expired > - > > Key: HADOOP-12461 > URL: https://issues.apache.org/jira/browse/HADOOP-12461 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current swift driver will encounter authentication issue if swift server has > token timeout. It will be good if driver can automatically renew once it > expired. We met HTTP 401 error when transferring a 100gb file to swift object > store. Since the large file is chunked into 27 files, the server will ask > each chunk for token inspection. If server has timeout and 100GB file > transferring time is longer than this timeout, token will expire and the file > transferring will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12461) Swift driver should have the ability to renew token if server has timeout
[ https://issues.apache.org/jira/browse/HADOOP-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12461: - Description: Current swift driver will encounter authentication issue if swift server has token timeout. It will be good if driver can automatically renew once it expired. We met HTTP 401 error when transferring a 100gb file to swift object store. Since the large file is chunked into 27 files, the server will ask each chunk for token inspection. If server has timeout and 100GB file transferring time is longer than this timeout, token will expire and the file transferring will fail. (was: Current swift driver will encounter authentication issue if swift server has token timeout. It will be good if driver can automatically renew once it expired.) > Swift driver should have the ability to renew token if server has timeout > - > > Key: HADOOP-12461 > URL: https://issues.apache.org/jira/browse/HADOOP-12461 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current swift driver will encounter authentication issue if swift server has > token timeout. It will be good if driver can automatically renew once it > expired. We met HTTP 401 error when transferring a 100gb file to swift object > store. Since the large file is chunked into 27 files, the server will ask > each chunk for token inspection. If server has timeout and 100GB file > transferring time is longer than this timeout, token will expire and the file > transferring will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
Chen He created HADOOP-12471: Summary: Support Swift file (> 5GB) continuious uploading where there is a failure Key: HADOOP-12471 URL: https://issues.apache.org/jira/browse/HADOOP-12471 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Current Swift FileSystem supports file larger than 5GB. File will be chunked as large as 4.6GB (configurable). For example, if there is a 46GB file "foo" in swift, Then the structure will look like: foo/01 foo/02 foo/03 ... foo/10 User will not see those 0x files if they don't specify. That means, if use do: \> hadoop fs -ls swift://container.serviceProvidor/foo It only shows: dwr-r--r--4.6GBfoo However, in my test, if there is a failure, during uploading the foo file, the previous uploaded chunks will be left in the object store. It will be good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
[ https://issues.apache.org/jira/browse/HADOOP-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949990#comment-14949990 ] Chen He commented on HADOOP-12471: -- First of all, I think we need a way to differentiate those failed leftover files from other file. > Support Swift file (> 5GB) continuious uploading where there is a failure > - > > Key: HADOOP-12471 > URL: https://issues.apache.org/jira/browse/HADOOP-12471 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current Swift FileSystem supports file larger than 5GB. > File will be chunked as large as 4.6GB (configurable). For example, if there > is a 46GB file "foo" in swift, > Then the structure will look like: > foo/01 > foo/02 > foo/03 > ... > foo/10 > User will not see those 0x files if they don't specify. That means, if > user does: > \> hadoop fs -ls swift://container.serviceProvidor/foo > It only shows: > dwr-r--r--46GBfoo > However, in my test, if there is a failure, during uploading the foo file, > the previous uploaded chunks will be left in the object store. It will be > good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
[ https://issues.apache.org/jira/browse/HADOOP-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12471: - Description: Current Swift FileSystem supports file larger than 5GB. File will be chunked as large as 4.6GB (configurable). For example, if there is a 46GB file "foo" in swift, Then the structure will look like: foo/01 foo/02 foo/03 ... foo/10 User will not see those 0x files if they don't specify. That means, if user does: \> hadoop fs -ls swift://container.serviceProvidor/foo It only shows: dwr-r--r--4.6GBfoo However, in my test, if there is a failure, during uploading the foo file, the previous uploaded chunks will be left in the object store. It will be good to support continuous uploading based on previous leftover was: Current Swift FileSystem supports file larger than 5GB. File will be chunked as large as 4.6GB (configurable). For example, if there is a 46GB file "foo" in swift, Then the structure will look like: foo/01 foo/02 foo/03 ... foo/10 User will not see those 0x files if they don't specify. That means, if use do: \> hadoop fs -ls swift://container.serviceProvidor/foo It only shows: dwr-r--r--4.6GBfoo However, in my test, if there is a failure, during uploading the foo file, the previous uploaded chunks will be left in the object store. It will be good to support continuous uploading based on previous leftover > Support Swift file (> 5GB) continuious uploading where there is a failure > - > > Key: HADOOP-12471 > URL: https://issues.apache.org/jira/browse/HADOOP-12471 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current Swift FileSystem supports file larger than 5GB. > File will be chunked as large as 4.6GB (configurable). For example, if there > is a 46GB file "foo" in swift, > Then the structure will look like: > foo/01 > foo/02 > foo/03 > ... > foo/10 > User will not see those 0x files if they don't specify. That means, if > user does: > \> hadoop fs -ls swift://container.serviceProvidor/foo > It only shows: > dwr-r--r--4.6GBfoo > However, in my test, if there is a failure, during uploading the foo file, > the previous uploaded chunks will be left in the object store. It will be > good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
[ https://issues.apache.org/jira/browse/HADOOP-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12471: - Description: Current Swift FileSystem supports file larger than 5GB. File will be chunked as large as 4.6GB (configurable). For example, if there is a 46GB file "foo" in swift, Then the structure will look like: foo/01 foo/02 foo/03 ... foo/10 User will not see those 0x files if they don't specify. That means, if user does: \> hadoop fs -ls swift://container.serviceProvidor/foo It only shows: dwr-r--r--46GBfoo However, in my test, if there is a failure, during uploading the foo file, the previous uploaded chunks will be left in the object store. It will be good to support continuous uploading based on previous leftover was: Current Swift FileSystem supports file larger than 5GB. File will be chunked as large as 4.6GB (configurable). For example, if there is a 46GB file "foo" in swift, Then the structure will look like: foo/01 foo/02 foo/03 ... foo/10 User will not see those 0x files if they don't specify. That means, if user does: \> hadoop fs -ls swift://container.serviceProvidor/foo It only shows: dwr-r--r--4.6GBfoo However, in my test, if there is a failure, during uploading the foo file, the previous uploaded chunks will be left in the object store. It will be good to support continuous uploading based on previous leftover > Support Swift file (> 5GB) continuious uploading where there is a failure > - > > Key: HADOOP-12471 > URL: https://issues.apache.org/jira/browse/HADOOP-12471 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current Swift FileSystem supports file larger than 5GB. > File will be chunked as large as 4.6GB (configurable). For example, if there > is a 46GB file "foo" in swift, > Then the structure will look like: > foo/01 > foo/02 > foo/03 > ... > foo/10 > User will not see those 0x files if they don't specify. That means, if > user does: > \> hadoop fs -ls swift://container.serviceProvidor/foo > It only shows: > dwr-r--r--46GBfoo > However, in my test, if there is a failure, during uploading the foo file, > the previous uploaded chunks will be left in the object store. It will be > good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12461) Swift driver should have the ability to renew token if server has timeout
[ https://issues.apache.org/jira/browse/HADOOP-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949993#comment-14949993 ] Chen He commented on HADOOP-12461: -- I propose to add a new configuration parameter like:"fs.swift.token.renew.interval" to enable user config the renew interval. Then, swift driver can do the renew accordingly. > Swift driver should have the ability to renew token if server has timeout > - > > Key: HADOOP-12461 > URL: https://issues.apache.org/jira/browse/HADOOP-12461 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current swift driver will encounter authentication issue if swift server has > token timeout. It will be good if driver can automatically renew once it > expired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
[ https://issues.apache.org/jira/browse/HADOOP-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950604#comment-14950604 ] Chen He commented on HADOOP-12471: -- Thank you for the comment, [~ste...@apache.org]. I agree with you. There should be a persistent lock exists in metadata. Another observation I got is that if a user is uploading a foo file but another user tries to delete, the delete operation will succeed. > Support Swift file (> 5GB) continuious uploading where there is a failure > - > > Key: HADOOP-12471 > URL: https://issues.apache.org/jira/browse/HADOOP-12471 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current Swift FileSystem supports file larger than 5GB. > File will be chunked as large as 4.6GB (configurable). For example, if there > is a 46GB file "foo" in swift, > Then the structure will look like: > foo/01 > foo/02 > foo/03 > ... > foo/10 > User will not see those 0x files if they don't specify. That means, if > user does: > \> hadoop fs -ls swift://container.serviceProvidor/foo > It only shows: > dwr-r--r--46GBfoo > However, in my test, if there is a failure, during uploading the foo file, > the previous uploaded chunks will be left in the object store. It will be > good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
[ https://issues.apache.org/jira/browse/HADOOP-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950885#comment-14950885 ] Chen He commented on HADOOP-12471: -- Hi [~arpitagarwal], I met this problem before, it is caused by the renaming process. We should remove renaming process, if not, file that larger than 5GB will not be successfully renamed. > Support Swift file (> 5GB) continuious uploading where there is a failure > - > > Key: HADOOP-12471 > URL: https://issues.apache.org/jira/browse/HADOOP-12471 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current Swift FileSystem supports file larger than 5GB. > File will be chunked as large as 4.6GB (configurable). For example, if there > is a 46GB file "foo" in swift, > Then the structure will look like: > foo/01 > foo/02 > foo/03 > ... > foo/10 > User will not see those 0x files if they don't specify. That means, if > user does: > \> hadoop fs -ls swift://container.serviceProvidor/foo > It only shows: > dwr-r--r--46GBfoo > However, in my test, if there is a failure, during uploading the foo file, > the previous uploaded chunks will be left in the object store. It will be > good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12109) Distcp of file > 5GB to swift fails with HTTP 413 error
[ https://issues.apache.org/jira/browse/HADOOP-12109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12109: - Affects Version/s: 2.7.1 > Distcp of file > 5GB to swift fails with HTTP 413 error > --- > > Key: HADOOP-12109 > URL: https://issues.apache.org/jira/browse/HADOOP-12109 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Affects Versions: 2.6.0, 2.7.1 >Reporter: Phil D'Amore > > Trying to use distcp to copy a file more than 5GB to swift fs results in a > stack like the following: > 15/06/01 20:58:57 ERROR util.RetriableCommand: Failure in Retriable command: > Copying hdfs://xxx:8020/path/to/random-5Gplus.dat to swift://xxx/5Gplus.dat > Invalid Response: Method COPY on > http://xxx:8080/v1/AUTH_fb7a8901dd8d4c8dba27f5e5d55a46a9/test/.distcp.tmp.attempt_local1097967418_0001_m_00_0 > failed, status code: 413, status line: HTTP/1.1 413 Request Entity Too Large > COPY > http://xxx:8080/v1/AUTH_fb7a8901dd8d4c8dba27f5e5d55a46a9/test/.distcp.tmp.attempt_local1097967418_0001_m_00_0 > => 413 : Request Entity Too LargeThe body of your request > was too large for this server. > at > org.apache.hadoop.fs.swift.http.SwiftRestClient.buildException(SwiftRestClient.java:1502) > at > org.apache.hadoop.fs.swift.http.SwiftRestClient.perform(SwiftRestClient.java:1403) > at > org.apache.hadoop.fs.swift.http.SwiftRestClient.copyObject(SwiftRestClient.java:923) > at > org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.copyObject(SwiftNativeFileSystemStore.java:765) > at > org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.rename(SwiftNativeFileSystemStore.java:617) > at > org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.rename(SwiftNativeFileSystem.java:577) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.promoteTmpToTarget(RetriableFileCopyCommand.java:220) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:137) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > It looks like the problem actually occurs in the rename operation which > happens after the copy. The rename is implemented as a copy/delete, and this > secondary copy looks like it's not done in a way that breaks up the file into > smaller chunks. > It looks like the following bug: > https://bugs.launchpad.net/sahara/+bug/1428941 > It does not look like the fix for this is incorporated into hadoop's swift > client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12109) Distcp of file > 5GB to swift fails with HTTP 413 error
[ https://issues.apache.org/jira/browse/HADOOP-12109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950891#comment-14950891 ] Chen He commented on HADOOP-12109: -- This is because current swift code do a copy and a rename process when upload a file (>5GB) to object store. We should avoid the rename process if not it will always complain HTTP 413. > Distcp of file > 5GB to swift fails with HTTP 413 error > --- > > Key: HADOOP-12109 > URL: https://issues.apache.org/jira/browse/HADOOP-12109 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Affects Versions: 2.6.0, 2.7.1 >Reporter: Phil D'Amore > > Trying to use distcp to copy a file more than 5GB to swift fs results in a > stack like the following: > 15/06/01 20:58:57 ERROR util.RetriableCommand: Failure in Retriable command: > Copying hdfs://xxx:8020/path/to/random-5Gplus.dat to swift://xxx/5Gplus.dat > Invalid Response: Method COPY on > http://xxx:8080/v1/AUTH_fb7a8901dd8d4c8dba27f5e5d55a46a9/test/.distcp.tmp.attempt_local1097967418_0001_m_00_0 > failed, status code: 413, status line: HTTP/1.1 413 Request Entity Too Large > COPY > http://xxx:8080/v1/AUTH_fb7a8901dd8d4c8dba27f5e5d55a46a9/test/.distcp.tmp.attempt_local1097967418_0001_m_00_0 > => 413 : Request Entity Too LargeThe body of your request > was too large for this server. > at > org.apache.hadoop.fs.swift.http.SwiftRestClient.buildException(SwiftRestClient.java:1502) > at > org.apache.hadoop.fs.swift.http.SwiftRestClient.perform(SwiftRestClient.java:1403) > at > org.apache.hadoop.fs.swift.http.SwiftRestClient.copyObject(SwiftRestClient.java:923) > at > org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.copyObject(SwiftNativeFileSystemStore.java:765) > at > org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.rename(SwiftNativeFileSystemStore.java:617) > at > org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.rename(SwiftNativeFileSystem.java:577) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.promoteTmpToTarget(RetriableFileCopyCommand.java:220) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:137) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > It looks like the problem actually occurs in the rename operation which > happens after the copy. The rename is implemented as a copy/delete, and this > secondary copy looks like it's not done in a way that breaks up the file into > smaller chunks. > It looks like the following bug: > https://bugs.launchpad.net/sahara/+bug/1428941 > It does not look like the fix for this is incorporated into hadoop's swift > client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
[ https://issues.apache.org/jira/browse/HADOOP-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950877#comment-14950877 ] Chen He commented on HADOOP-12471: -- I am not sure how swift do the chunk operation in the beginning. However, the DLO flag will be added once all chunks are successfully uploaded. If there is failure, the DLO flag is not created, then, there are leftovers. I assume swift does not know how many chunks will be if user upload a large file. If that is a case, can we add another header flag that identifies whether this large file is succeed or not in the begging? For example: X-Object-Succeed-Flag In the beginning, this flag will be false (or any value that can be changed later), once all chunks get successfully uploaded, we change it to true. If there is any failure in the middle, this flag will remain false. Any request to a file with this header which is false will be prevented. > Support Swift file (> 5GB) continuious uploading where there is a failure > - > > Key: HADOOP-12471 > URL: https://issues.apache.org/jira/browse/HADOOP-12471 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current Swift FileSystem supports file larger than 5GB. > File will be chunked as large as 4.6GB (configurable). For example, if there > is a 46GB file "foo" in swift, > Then the structure will look like: > foo/01 > foo/02 > foo/03 > ... > foo/10 > User will not see those 0x files if they don't specify. That means, if > user does: > \> hadoop fs -ls swift://container.serviceProvidor/foo > It only shows: > dwr-r--r--46GBfoo > However, in my test, if there is a failure, during uploading the foo file, > the previous uploaded chunks will be left in the object store. It will be > good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
[ https://issues.apache.org/jira/browse/HADOOP-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950906#comment-14950906 ] Chen He commented on HADOOP-12471: -- If the server is offline, User will get HTTP 50x errors which identify that is not driver's problem and out of control of driver's scope. > Support Swift file (> 5GB) continuious uploading where there is a failure > - > > Key: HADOOP-12471 > URL: https://issues.apache.org/jira/browse/HADOOP-12471 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current Swift FileSystem supports file larger than 5GB. > File will be chunked as large as 4.6GB (configurable). For example, if there > is a 46GB file "foo" in swift, > Then the structure will look like: > foo/01 > foo/02 > foo/03 > ... > foo/10 > User will not see those 0x files if they don't specify. That means, if > user does: > \> hadoop fs -ls swift://container.serviceProvidor/foo > It only shows: > dwr-r--r--46GBfoo > However, in my test, if there is a failure, during uploading the foo file, > the previous uploaded chunks will be left in the object store. It will be > good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12471) Support Swift file (> 5GB) continuious uploading where there is a failure
[ https://issues.apache.org/jira/browse/HADOOP-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950904#comment-14950904 ] Chen He commented on HADOOP-12471: -- I mean, request to this failed file will report warning and suggest user to delete. > Support Swift file (> 5GB) continuious uploading where there is a failure > - > > Key: HADOOP-12471 > URL: https://issues.apache.org/jira/browse/HADOOP-12471 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/swift >Affects Versions: 2.7.1 >Reporter: Chen He > > Current Swift FileSystem supports file larger than 5GB. > File will be chunked as large as 4.6GB (configurable). For example, if there > is a 46GB file "foo" in swift, > Then the structure will look like: > foo/01 > foo/02 > foo/03 > ... > foo/10 > User will not see those 0x files if they don't specify. That means, if > user does: > \> hadoop fs -ls swift://container.serviceProvidor/foo > It only shows: > dwr-r--r--46GBfoo > However, in my test, if there is a failure, during uploading the foo file, > the previous uploaded chunks will be left in the object store. It will be > good to support continuous uploading based on previous leftover -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12461) Swift driver should have the ability to renew token if server has timeout
Chen He created HADOOP-12461: Summary: Swift driver should have the ability to renew token if server has timeout Key: HADOOP-12461 URL: https://issues.apache.org/jira/browse/HADOOP-12461 Project: Hadoop Common Issue Type: Improvement Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Current swift driver will encounter authentication issue if swift server has token timeout. It will be good if driver can automatically renew once it expired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12057) swiftfs rename on partitioned file attempts to consolidate partitions
[ https://issues.apache.org/jira/browse/HADOOP-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706259#comment-14706259 ] Chen He commented on HADOOP-12057: -- Hi [~highlycaffeinated], add a auth-keys.xml file under the resource directory of the openstack unit test, it will automatically trigger the openstack unit test and provide you more hints. swiftfs rename on partitioned file attempts to consolidate partitions - Key: HADOOP-12057 URL: https://issues.apache.org/jira/browse/HADOOP-12057 Project: Hadoop Common Issue Type: Bug Components: fs/swift Reporter: David Dobbins Assignee: David Dobbins Attachments: HADOOP-12057-006.patch, HADOOP-12057-008.patch, HADOOP-12057.007.patch, HADOOP-12057.patch, HADOOP-12057.patch, HADOOP-12057.patch, HADOOP-12057.patch, HADOOP-12057.patch In the swift filesystem for openstack, a rename operation on a partitioned file uses the swift COPY operation, which attempts to consolidate all of the partitions into a single object. This causes the rename to fail when the total size of all the partitions exceeds the maximum object size for swift. Since partitioned files are primarily created to allow a file to exceed the maximum object size, this bug makes writing to swift extremely unreliable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12343) Swift Driver should verify whether container name follows RFC952
[ https://issues.apache.org/jira/browse/HADOOP-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12343: - Summary: Swift Driver should verify whether container name follows RFC952 (was: Swift Driver should verify whether container and service name follows RFC952) Swift Driver should verify whether container name follows RFC952 Key: HADOOP-12343 URL: https://issues.apache.org/jira/browse/HADOOP-12343 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Assignee: Chen He Swift driver reports:Invalid swift hostname 'null', hostname must in form container.service if the container name does not follow RFC952. However, the container or service name is not 'null'. The error message should be more clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12343) Swift Driver should verify whether container and service name follows RFC952
[ https://issues.apache.org/jira/browse/HADOOP-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12343: - Summary: Swift Driver should verify whether container and service name follows RFC952 (was: Error message of Swift driver should be more clear when there is mal-format of hostname or service) Swift Driver should verify whether container and service name follows RFC952 Key: HADOOP-12343 URL: https://issues.apache.org/jira/browse/HADOOP-12343 Project: Hadoop Common Issue Type: Bug Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Assignee: Chen He Priority: Minor Swift driver reports:Invalid swift hostname 'null', hostname must in form container.service if the container name does not follow RFC952. However, the container or service name is not 'null'. The error message should be more clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12343) Swift Driver should verify whether container and service name follows RFC952
[ https://issues.apache.org/jira/browse/HADOOP-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12343: - Priority: Major (was: Minor) Swift Driver should verify whether container and service name follows RFC952 Key: HADOOP-12343 URL: https://issues.apache.org/jira/browse/HADOOP-12343 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Assignee: Chen He Swift driver reports:Invalid swift hostname 'null', hostname must in form container.service if the container name does not follow RFC952. However, the container or service name is not 'null'. The error message should be more clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12343) Swift Driver should verify whether container and service name follows RFC952
[ https://issues.apache.org/jira/browse/HADOOP-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12343: - Issue Type: New Feature (was: Bug) Swift Driver should verify whether container and service name follows RFC952 Key: HADOOP-12343 URL: https://issues.apache.org/jira/browse/HADOOP-12343 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Assignee: Chen He Priority: Minor Swift driver reports:Invalid swift hostname 'null', hostname must in form container.service if the container name does not follow RFC952. However, the container or service name is not 'null'. The error message should be more clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12343) Error message of Swift driver should be more clear when there is mal-format of hostname or service
[ https://issues.apache.org/jira/browse/HADOOP-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12343: - Summary: Error message of Swift driver should be more clear when there is mal-format of hostname or service (was: Error message of Swift driver should be more clear when there is mal-format of hostname and service) Error message of Swift driver should be more clear when there is mal-format of hostname or service -- Key: HADOOP-12343 URL: https://issues.apache.org/jira/browse/HADOOP-12343 Project: Hadoop Common Issue Type: Bug Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Assignee: Chen He Priority: Minor Swift driver reports:Invalid swift hostname 'null', hostname must in form container.service if the container name does not follow RFC952. However, the container or service name is not 'null'. The error message should be more clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12343) Error message of Swift driver should be more clear when there is mal-format of hostname and service
Chen He created HADOOP-12343: Summary: Error message of Swift driver should be more clear when there is mal-format of hostname and service Key: HADOOP-12343 URL: https://issues.apache.org/jira/browse/HADOOP-12343 Project: Hadoop Common Issue Type: Bug Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Assignee: Chen He Swift driver reports:Invalid swift hostname 'null', hostname must in form container.service if the container name does not follow RFC952. However, the container or service name is not 'null'. The error message should be more clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12343) Error message of Swift driver should be more clear when there is mal-format of hostname and service
[ https://issues.apache.org/jira/browse/HADOOP-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12343: - Priority: Minor (was: Major) Error message of Swift driver should be more clear when there is mal-format of hostname and service --- Key: HADOOP-12343 URL: https://issues.apache.org/jira/browse/HADOOP-12343 Project: Hadoop Common Issue Type: Bug Components: fs/swift Affects Versions: 2.7.1 Reporter: Chen He Assignee: Chen He Priority: Minor Swift driver reports:Invalid swift hostname 'null', hostname must in form container.service if the container name does not follow RFC952. However, the container or service name is not 'null'. The error message should be more clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12291) Add support for nested groups in LdapGroupsMapping
[ https://issues.apache.org/jira/browse/HADOOP-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648746#comment-14648746 ] Chen He commented on HADOOP-12291: -- +1 for the idea. Add support for nested groups in LdapGroupsMapping -- Key: HADOOP-12291 URL: https://issues.apache.org/jira/browse/HADOOP-12291 Project: Hadoop Common Issue Type: Improvement Components: security Reporter: Gautam Gopalakrishnan When using {{LdapGroupsMapping}} with Hadoop, nested groups are not supported. So for example if user {{jdoe}} is part of group A which is a member of group B, the group mapping currently returns only group A. Currently this facility is available with {{ShellBasedUnixGroupsMapping}} and SSSD (or similar tools) but would be good to have this feature as part of {{LdapGroupsMapping}} directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11762) Enable swift distcp to secure HDFS
[ https://issues.apache.org/jira/browse/HADOOP-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635583#comment-14635583 ] Chen He commented on HADOOP-11762: -- Thanks, [~aw] Enable swift distcp to secure HDFS -- Key: HADOOP-11762 URL: https://issues.apache.org/jira/browse/HADOOP-11762 Project: Hadoop Common Issue Type: Bug Components: fs/swift Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1, 2.5.1, 2.6.0 Reporter: Chen He Assignee: Chen He Fix For: 3.0.0 Attachments: HADOOP-11762.000.patch Even we can use dfs -put or dfs -cp to move data between swift and secured HDFS, it will be impractical for moving huge amount of data like 10TB or larger. Current Hadoop code will result in :java.lang.IllegalArgumentException: java.net.UnknownHostException: container.swiftdomain Since it does not support token feature in SwiftNativeFileSystem right now, it will be reasonable that we override the getCanonicalServiceName method like other filesystem extensions (S3FileSystem, S3AFileSystem) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635613#comment-14635613 ] Chen He commented on HADOOP-12038: -- Assign back to [~ste...@apache.org] who is professional and has more experiences in writing unit tests for Swift Driver. SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Steve Loughran Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12038: - Assignee: Steve Loughran (was: Chen He) SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Steve Loughran Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633747#comment-14633747 ] Chen He commented on HADOOP-12038: -- Appologize, I will work on it tonight. SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10615) FileInputStream in JenkinsHash#main() is never closed
[ https://issues.apache.org/jira/browse/HADOOP-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628351#comment-14628351 ] Chen He commented on HADOOP-10615: -- Sure, thank you for the suggestion, [~ozawa]. FileInputStream in JenkinsHash#main() is never closed - Key: HADOOP-10615 URL: https://issues.apache.org/jira/browse/HADOOP-10615 Project: Hadoop Common Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Labels: BB2015-05-TBR Attachments: HADOOP-10615-2.patch, HADOOP-10615.patch {code} FileInputStream in = new FileInputStream(args[0]); {code} The above FileInputStream is not closed upon exit of main. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-10615) FileInputStream in JenkinsHash#main() is never closed
[ https://issues.apache.org/jira/browse/HADOOP-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-10615: - Attachment: HADOOP-10615.003.patch patch updated. FileInputStream in JenkinsHash#main() is never closed - Key: HADOOP-10615 URL: https://issues.apache.org/jira/browse/HADOOP-10615 Project: Hadoop Common Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Labels: BB2015-05-TBR Attachments: HADOOP-10615-2.patch, HADOOP-10615.003.patch, HADOOP-10615.patch {code} FileInputStream in = new FileInputStream(args[0]); {code} The above FileInputStream is not closed upon exit of main. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602430#comment-14602430 ] Chen He commented on HADOOP-9565: - The ._COPYING_ mechanism actually has problem. I create the bug HDFS-8673. Add a Blobstore interface to add to blobstore FileSystems - Key: HADOOP-9565 URL: https://issues.apache.org/jira/browse/HADOOP-9565 Project: Hadoop Common Issue Type: Improvement Components: fs, fs/s3, fs/swift Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Labels: BB2015-05-TBR Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, HADOOP-9565-003.patch We can make the fact that some {{FileSystem}} implementations are really blobstores, with different atomicity and consistency guarantees, by adding a {{Blobstore}} interface to add to them. This could also be a place to add a {{Copy(Path,Path)}} method, assuming that all blobstores implement at server-side copy operation as a substitute for rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12046) Avoid creating ._COPYING_ temporary file when copying file to Swift file system
[ https://issues.apache.org/jira/browse/HADOOP-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12046: - Attachment: Copy Large file to Swift using Hadoop Client.png Avoid creating ._COPYING_ temporary file when copying file to Swift file system - Key: HADOOP-12046 URL: https://issues.apache.org/jira/browse/HADOOP-12046 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Attachments: Copy Large file to Swift using Hadoop Client.png When copy file from HDFS or local to another file system implementation, in CommandWithDestination.java, it creates a temp file by adding suffix ._COPYING_. Once file is successfully copied, it will remove the suffix by rename(). try { PathData tempTarget = target.suffix(._COPYING_); targetFs.setWriteChecksum(writeChecksum); targetFs.writeStreamToFile(in, tempTarget, lazyPersist); targetFs.rename(tempTarget, target); } finally { targetFs.close(); // last ditch effort to ensure temp file is removed } It is not costly in HDFS. However, if copy to Swift file system, the rename process is to create a new file. It is not efficient if users copy a lot of files to swift file system. I did some tests, for a 1G file copying to swift, it will take 10% more time. We should only do the copy one time for Swift file system. Changes should be limited to the Swift driver level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12046) Avoid creating ._COPYING_ temporary file when copying file to Swift file system
[ https://issues.apache.org/jira/browse/HADOOP-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12046: - Attachment: (was: 屏幕快照 2015-06-21 下午10.38.34.png) Avoid creating ._COPYING_ temporary file when copying file to Swift file system - Key: HADOOP-12046 URL: https://issues.apache.org/jira/browse/HADOOP-12046 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He When copy file from HDFS or local to another file system implementation, in CommandWithDestination.java, it creates a temp file by adding suffix ._COPYING_. Once file is successfully copied, it will remove the suffix by rename(). try { PathData tempTarget = target.suffix(._COPYING_); targetFs.setWriteChecksum(writeChecksum); targetFs.writeStreamToFile(in, tempTarget, lazyPersist); targetFs.rename(tempTarget, target); } finally { targetFs.close(); // last ditch effort to ensure temp file is removed } It is not costly in HDFS. However, if copy to Swift file system, the rename process is to create a new file. It is not efficient if users copy a lot of files to swift file system. I did some tests, for a 1G file copying to swift, it will take 10% more time. We should only do the copy one time for Swift file system. Changes should be limited to the Swift driver level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12046) Avoid creating ._COPYING_ temporary file when copying file to Swift file system
[ https://issues.apache.org/jira/browse/HADOOP-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12046: - Attachment: 屏幕快照 2015-06-21 下午10.38.34.png Attach the file copy process if user tries to copy a file(larger than 5GB) from HDFS to Swift using current Swift driver. Avoid creating ._COPYING_ temporary file when copying file to Swift file system - Key: HADOOP-12046 URL: https://issues.apache.org/jira/browse/HADOOP-12046 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He When copy file from HDFS or local to another file system implementation, in CommandWithDestination.java, it creates a temp file by adding suffix ._COPYING_. Once file is successfully copied, it will remove the suffix by rename(). try { PathData tempTarget = target.suffix(._COPYING_); targetFs.setWriteChecksum(writeChecksum); targetFs.writeStreamToFile(in, tempTarget, lazyPersist); targetFs.rename(tempTarget, target); } finally { targetFs.close(); // last ditch effort to ensure temp file is removed } It is not costly in HDFS. However, if copy to Swift file system, the rename process is to create a new file. It is not efficient if users copy a lot of files to swift file system. I did some tests, for a 1G file copying to swift, it will take 10% more time. We should only do the copy one time for Swift file system. Changes should be limited to the Swift driver level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584724#comment-14584724 ] Chen He commented on HADOOP-9565: - Thank you for the explanation, [~ste...@apache.org]. You are right, the ._COPYING_ is added by CLI (distcp refers to this also) and hardcoded there. IMHO, it may be more flexible if we can choose to not add ._COPYING_ by setting a parameter like OBJECTSTORE_NO_RENAME_IN_COPY. Add a Blobstore interface to add to blobstore FileSystems - Key: HADOOP-9565 URL: https://issues.apache.org/jira/browse/HADOOP-9565 Project: Hadoop Common Issue Type: Improvement Components: fs, fs/s3, fs/swift Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Labels: BB2015-05-TBR Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, HADOOP-9565-003.patch We can make the fact that some {{FileSystem}} implementations are really blobstores, with different atomicity and consistency guarantees, by adding a {{Blobstore}} interface to add to them. This could also be a place to add a {{Copy(Path,Path)}} method, assuming that all blobstores implement at server-side copy operation as a substitute for rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584226#comment-14584226 ] Chen He commented on HADOOP-9565: - Thank you for the contribution [~ste...@apache.org]. I have a question about the copying process. Why we have to add ._COPYING_ for swift storage which is using filename to decide the location of file blocks? Another potential problem is the rename process. It may cause YARN timeout (10 mins) if we use distcp to copy a large file. Add a Blobstore interface to add to blobstore FileSystems - Key: HADOOP-9565 URL: https://issues.apache.org/jira/browse/HADOOP-9565 Project: Hadoop Common Issue Type: Improvement Components: fs, fs/s3, fs/swift Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Labels: BB2015-05-TBR Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, HADOOP-9565-003.patch We can make the fact that some {{FileSystem}} implementations are really blobstores, with different atomicity and consistency guarantees, by adding a {{Blobstore}} interface to add to them. This could also be a place to add a {{Copy(Path,Path)}} method, assuming that all blobstores implement at server-side copy operation as a substitute for rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12086) Swift driver reports NPE if user try to create a dir without name
Chen He created HADOOP-12086: Summary: Swift driver reports NPE if user try to create a dir without name Key: HADOOP-12086 URL: https://issues.apache.org/jira/browse/HADOOP-12086 Project: Hadoop Common Issue Type: Bug Components: fs/swift Affects Versions: 2.3.0 Reporter: Chen He Assignee: Chen He hadoop fs -mkdir swift://container.Provider/ -mkdir: Fatal internal error java.lang.NullPointerException at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.makeAbsolute(SwiftNativeFileSystem.java:691) at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.getFileStatus(SwiftNativeFileSystem.java:197) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) at org.apache.hadoop.fs.shell.Mkdir.processNonexistentPath(Mkdir.java:73) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:262) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573375#comment-14573375 ] Chen He commented on HADOOP-12038: -- Thank you very much, Steve. I will come up with a patch. SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12046) Avoid creating ._COPYING_ temporary file when copying file to Swift file system
[ https://issues.apache.org/jira/browse/HADOOP-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568155#comment-14568155 ] Chen He commented on HADOOP-12046: -- Thank you for the quick reply, [~steve_l]. I will read HADOOP-9565, it sounds interesting. Avoid creating ._COPYING_ temporary file when copying file to Swift file system - Key: HADOOP-12046 URL: https://issues.apache.org/jira/browse/HADOOP-12046 Project: Hadoop Common Issue Type: New Feature Components: fs/swift Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He When copy file from HDFS or local to another file system implementation, in CommandWithDestination.java, it creates a temp file by adding suffix ._COPYING_. Once file is successfully copied, it will remove the suffix by rename(). try { PathData tempTarget = target.suffix(._COPYING_); targetFs.setWriteChecksum(writeChecksum); targetFs.writeStreamToFile(in, tempTarget, lazyPersist); targetFs.rename(tempTarget, target); } finally { targetFs.close(); // last ditch effort to ensure temp file is removed } It is not costly in HDFS. However, if copy to Swift file system, the rename process is to create a new file. It is not efficient if users copy a lot of files to swift file system. I did some tests, for a 1G file copying to swift, it will take 10% more time. We should only do the copy one time for Swift file system. Changes should be limited to the Swift driver level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566906#comment-14566906 ] Chen He commented on HADOOP-12038: -- I didn't find data losing when it reports this warning. SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566900#comment-14566900 ] Chen He commented on HADOOP-12038: -- Thanks, [~steve_l]. Actually, Openstack community has another version of swift driver for Hadoop. It supports files that are larger than 5GB, what I did is to add those functions to hadoop-openstack module. I don't know why Hadoo community does not have similar solution. The error was reported during my test process. Openstack driver is call Sahara. It breaks file (larger than 5GB) into a configurable chunks (default 4.6GB) and create a manifest fold in swift file system and point to those chunks. However, since swift rename process is to create a new file instead of changing original file's name (Because of Swift DHT using name to do the hash). It is inefficient for large file copying. I resolved this issue and will create issue and post patch later. SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12046) Avoid creating ._COPYING_ temporary file when copying file to Swift file system
Chen He created HADOOP-12046: Summary: Avoid creating ._COPYING_ temporary file when copying file to Swift file system Key: HADOOP-12046 URL: https://issues.apache.org/jira/browse/HADOOP-12046 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He When copy file from HDFS or local to another file system implementation, in CommandWithDestination.java, it creates a temp file by adding suffix ._COPYING_. Once file is successfully copied, it will remove the suffix by rename(). try { PathData tempTarget = target.suffix(._COPYING_); targetFs.setWriteChecksum(writeChecksum); targetFs.writeStreamToFile(in, tempTarget, lazyPersist); targetFs.rename(tempTarget, target); } finally { targetFs.close(); // last ditch effort to ensure temp file is removed } It is not costly in HDFS. However, if copy to Swift file system, the rename process is to create a new file. It is not efficient if users copy a lot of files to swift file system. I did some tests, for a 1G file copying to swift, it will take 10% more time. We should only do the copy one time for Swift file system. Changes should be limited to the Swift driver level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565178#comment-14565178 ] Chen He commented on HADOOP-12038: -- Hi [~steve_l], thank you for the comment. I should describe it more clear. In hadoop-openstack module, it says there is not unit test but only functional test since it has dependency on swift server. If there is no swift server, all those unit tests in hadoop-openstack module will not be executed. I met this issue when I copying a large file to swift server. It returns to me this warning because the tmp file has already been deleted. I will try to add a unit test following the same pattern that previous unit tests have. SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10661) Ineffective user/passsword check in FTPFileSystem#initialize()
[ https://issues.apache.org/jira/browse/HADOOP-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562378#comment-14562378 ] Chen He commented on HADOOP-10661: -- Actually, if user specify user is and password is , current code: Preconditions.checkState(userPasswdInfo.length 1, Invalid username / password); will report Invalid username / password because userPasswdInfo.length is 0; Ineffective user/passsword check in FTPFileSystem#initialize() -- Key: HADOOP-10661 URL: https://issues.apache.org/jira/browse/HADOOP-10661 Project: Hadoop Common Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Labels: BB2015-05-TBR Attachments: HADOOP-10661.patch, HADOOP-10661.patch Here is related code: {code} userAndPassword = (conf.get(fs.ftp.user. + host, null) + : + conf .get(fs.ftp.password. + host, null)); if (userAndPassword == null) { throw new IOException(Invalid user/passsword specified); } {code} The intention seems to be checking that username / password should not be null. But due to the presence of colon, the above check is not effective. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10661) Ineffective user/passsword check in FTPFileSystem#initialize()
[ https://issues.apache.org/jira/browse/HADOOP-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562379#comment-14562379 ] Chen He commented on HADOOP-10661: -- If we only provide user but passwd is it will also report invalid since userPasswdInfo.length is 1 Ineffective user/passsword check in FTPFileSystem#initialize() -- Key: HADOOP-10661 URL: https://issues.apache.org/jira/browse/HADOOP-10661 Project: Hadoop Common Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Labels: BB2015-05-TBR Attachments: HADOOP-10661.patch, HADOOP-10661.patch Here is related code: {code} userAndPassword = (conf.get(fs.ftp.user. + host, null) + : + conf .get(fs.ftp.password. + host, null)); if (userAndPassword == null) { throw new IOException(Invalid user/passsword specified); } {code} The intention seems to be checking that username / password should not be null. But due to the presence of colon, the above check is not effective. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10661) Ineffective user/passsword check in FTPFileSystem#initialize()
[ https://issues.apache.org/jira/browse/HADOOP-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562401#comment-14562401 ] Chen He commented on HADOOP-10661: -- Hi [~te...@apache.org], looks like current Preconditions check can guarantee neighter user nor passwd to be . But null is allowed. Do we still need to fix this? Ineffective user/passsword check in FTPFileSystem#initialize() -- Key: HADOOP-10661 URL: https://issues.apache.org/jira/browse/HADOOP-10661 Project: Hadoop Common Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Labels: BB2015-05-TBR Attachments: HADOOP-10661.patch, HADOOP-10661.patch Here is related code: {code} userAndPassword = (conf.get(fs.ftp.user. + host, null) + : + conf .get(fs.ftp.password. + host, null)); if (userAndPassword == null) { throw new IOException(Invalid user/passsword specified); } {code} The intention seems to be checking that username / password should not be null. But due to the presence of colon, the above check is not effective. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12038: - Status: Patch Available (was: Open) SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12038: - Attachment: HADOOP-12038.000.patch SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
Chen He created HADOOP-12038: Summary: SwiftNativeOutputStream should check whether a file exists or not before deleting Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Reporter: Chen He Assignee: Chen He Priority: Minor 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-12038: - Affects Version/s: 2.7.0 SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12038) SwiftNativeOutputStream should check whether a file exists or not before deleting
[ https://issues.apache.org/jira/browse/HADOOP-12038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562246#comment-14562246 ] Chen He commented on HADOOP-12038: -- The change is too simple and may not need a unit test. SwiftNativeOutputStream should check whether a file exists or not before deleting - Key: HADOOP-12038 URL: https://issues.apache.org/jira/browse/HADOOP-12038 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Chen He Assignee: Chen He Priority: Minor Attachments: HADOOP-12038.000.patch 15/05/27 15:27:03 WARN snative.SwiftNativeOutputStream: Could not delete /tmp/hadoop-root/output-3695386887711395289.tmp It should check whether the file exists or not before deleting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10661) Ineffective user/passsword check in FTPFileSystem#initialize()
[ https://issues.apache.org/jira/browse/HADOOP-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561976#comment-14561976 ] Chen He commented on HADOOP-10661: -- Sure, I will do it tonight. Thank you for remaindering me [~ted_yu]. Ineffective user/passsword check in FTPFileSystem#initialize() -- Key: HADOOP-10661 URL: https://issues.apache.org/jira/browse/HADOOP-10661 Project: Hadoop Common Issue Type: Bug Reporter: Ted Yu Assignee: Chen He Priority: Minor Labels: BB2015-05-TBR Attachments: HADOOP-10661.patch, HADOOP-10661.patch Here is related code: {code} userAndPassword = (conf.get(fs.ftp.user. + host, null) + : + conf .get(fs.ftp.password. + host, null)); if (userAndPassword == null) { throw new IOException(Invalid user/passsword specified); } {code} The intention seems to be checking that username / password should not be null. But due to the presence of colon, the above check is not effective. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11775) Fix Javadoc typos in hadoop-openstack module
[ https://issues.apache.org/jira/browse/HADOOP-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535230#comment-14535230 ] Chen He commented on HADOOP-11775: -- Anyone can review this ticket? Thanks! Fix Javadoc typos in hadoop-openstack module Key: HADOOP-11775 URL: https://issues.apache.org/jira/browse/HADOOP-11775 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Chen He Assignee: Yanjun Wang Priority: Trivial Attachments: HADOOP-11775.000.patch, HADOOP-11775.001.patch Some typos are listed below but not limited to: SwiftNativeFileSystemObject.java /** * Initalize the filesystem store -this creates the REST client binding. * * @param fsURI URI of the filesystem, which is used to map to the filesystem-specific * options in the configuration file * @param configuration configuration * @throws IOException on any failure. */ SwiftNativeFileSystem.java /** * Low level method to do a deep listing of all entries, not stopping * at the next directory entry. This is to let tests be confident that * recursive deletes c really are working. * @param path path to recurse down * @param newest ask for the newest data, potentially slower than not. * @return a potentially empty array of file status * @throws IOException any problem */ /** * Low-level operation to also set the block size for this operation * @param path the file name to open * @param bufferSize the size of the buffer to be used. * @param readBlockSize how big should the read blockk/buffer size be? * @return the input stream * @throws FileNotFoundException if the file is not found * @throws IOException any IO problem */ SwiftRestClient.java /** * Converts Swift path to URI to make request. * This is public for unit testing * * @param path path to object * @param endpointURI damain url e.g. http://domain.com * @return valid URI for object */ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11775) Fix Javadoc typos in hadoop-openstack module
[ https://issues.apache.org/jira/browse/HADOOP-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535908#comment-14535908 ] Chen He commented on HADOOP-11775: -- [~ajisakaa], would you mind take a look of this ticket? Thank you. Fix Javadoc typos in hadoop-openstack module Key: HADOOP-11775 URL: https://issues.apache.org/jira/browse/HADOOP-11775 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Chen He Assignee: Yanjun Wang Priority: Trivial Labels: BB2015-05-RFC Attachments: HADOOP-11775.000.patch, HADOOP-11775.001.patch Some typos are listed below but not limited to: SwiftNativeFileSystemObject.java /** * Initalize the filesystem store -this creates the REST client binding. * * @param fsURI URI of the filesystem, which is used to map to the filesystem-specific * options in the configuration file * @param configuration configuration * @throws IOException on any failure. */ SwiftNativeFileSystem.java /** * Low level method to do a deep listing of all entries, not stopping * at the next directory entry. This is to let tests be confident that * recursive deletes c really are working. * @param path path to recurse down * @param newest ask for the newest data, potentially slower than not. * @return a potentially empty array of file status * @throws IOException any problem */ /** * Low-level operation to also set the block size for this operation * @param path the file name to open * @param bufferSize the size of the buffer to be used. * @param readBlockSize how big should the read blockk/buffer size be? * @return the input stream * @throws FileNotFoundException if the file is not found * @throws IOException any IO problem */ SwiftRestClient.java /** * Converts Swift path to URI to make request. * This is public for unit testing * * @param path path to object * @param endpointURI damain url e.g. http://domain.com * @return valid URI for object */ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11811) Fix typos in hadoop-project/pom.xml
Chen He created HADOOP-11811: Summary: Fix typos in hadoop-project/pom.xml Key: HADOOP-11811 URL: https://issues.apache.org/jira/browse/HADOOP-11811 Project: Hadoop Common Issue Type: Bug Reporter: Chen He Priority: Trivial !-- These 2 versions are defined here becuase they are used -- !-- JDIFF generation from embedded ant in the antrun plugin -- etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11786) Fix Javadoc typos in org.apache.hadoop.fs.FileSystem
Chen He created HADOOP-11786: Summary: Fix Javadoc typos in org.apache.hadoop.fs.FileSystem Key: HADOOP-11786 URL: https://issues.apache.org/jira/browse/HADOOP-11786 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Chen He Assignee: Yanjun Wang Priority: Trivial /** * Resets all statistics to 0. * * In order to reset, we add up all the thread-local statistics data, and * set rootData to the negative of that. * * This may seem like a counterintuitive way to reset the statsitics. Why * can't we just zero out all the thread-local data? Well, thread-local * data can only be modified by the thread that owns it. If we tried to * modify the thread-local data from this thread, our modification might get * interleaved with a read-modify-write operation done by the thread that * owns the data. That would result in our update getting lost. * * The approach used here avoids this problem because it only ever reads * (not writes) the thread-local data. Both reads and writes to rootData * are done under the lock, so we're free to modify rootData from any thread * that holds the lock. */ etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11775) Fix Javadoc typos in hadoop-openstack module
[ https://issues.apache.org/jira/browse/HADOOP-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HADOOP-11775: - Summary: Fix Javadoc typos in hadoop-openstack module (was: Fix typos in Swift related Javadoc) Fix Javadoc typos in hadoop-openstack module Key: HADOOP-11775 URL: https://issues.apache.org/jira/browse/HADOOP-11775 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Chen He Assignee: Yanjun Wang Priority: Trivial Attachments: HADOOP-11775.000.patch Some typos are listed below but not limited to: SwiftNativeFileSystemObject.java /** * Initalize the filesystem store -this creates the REST client binding. * * @param fsURI URI of the filesystem, which is used to map to the filesystem-specific * options in the configuration file * @param configuration configuration * @throws IOException on any failure. */ SwiftNativeFileSystem.java /** * Low level method to do a deep listing of all entries, not stopping * at the next directory entry. This is to let tests be confident that * recursive deletes c really are working. * @param path path to recurse down * @param newest ask for the newest data, potentially slower than not. * @return a potentially empty array of file status * @throws IOException any problem */ /** * Low-level operation to also set the block size for this operation * @param path the file name to open * @param bufferSize the size of the buffer to be used. * @param readBlockSize how big should the read blockk/buffer size be? * @return the input stream * @throws FileNotFoundException if the file is not found * @throws IOException any IO problem */ SwiftRestClient.java /** * Converts Swift path to URI to make request. * This is public for unit testing * * @param path path to object * @param endpointURI damain url e.g. http://domain.com * @return valid URI for object */ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11775) Fix typos in Swift related Javadoc
[ https://issues.apache.org/jira/browse/HADOOP-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388928#comment-14388928 ] Chen He commented on HADOOP-11775: -- Thank you for the patch, [~kelwyj], I think this JIRA also include fixing typos in all javadocs under hadoop-openstack project. Such as: org.apache.hadoop.fs.swift.http.ExceptionDiags.java /** * Variant of Hadoop Netutils exception wrapping with URI awareness and * available in branch-1 too. */ org.apache.hadoop.fs.swift.http.HttpBodyContent.java /** * build a body response * @param inputStream input stream from the operatin * @param contentLength length of content; may be -1 for don't know */ etc. Looking forward to see your updated patch. Fix typos in Swift related Javadoc -- Key: HADOOP-11775 URL: https://issues.apache.org/jira/browse/HADOOP-11775 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Chen He Assignee: Yanjun Wang Priority: Trivial Attachments: HADOOP-11775.000.patch Some typos are listed below but not limited to: SwiftNativeFileSystemObject.java /** * Initalize the filesystem store -this creates the REST client binding. * * @param fsURI URI of the filesystem, which is used to map to the filesystem-specific * options in the configuration file * @param configuration configuration * @throws IOException on any failure. */ SwiftNativeFileSystem.java /** * Low level method to do a deep listing of all entries, not stopping * at the next directory entry. This is to let tests be confident that * recursive deletes c really are working. * @param path path to recurse down * @param newest ask for the newest data, potentially slower than not. * @return a potentially empty array of file status * @throws IOException any problem */ /** * Low-level operation to also set the block size for this operation * @param path the file name to open * @param bufferSize the size of the buffer to be used. * @param readBlockSize how big should the read blockk/buffer size be? * @return the input stream * @throws FileNotFoundException if the file is not found * @throws IOException any IO problem */ SwiftRestClient.java /** * Converts Swift path to URI to make request. * This is public for unit testing * * @param path path to object * @param endpointURI damain url e.g. http://domain.com * @return valid URI for object */ -- This message was sent by Atlassian JIRA (v6.3.4#6332)