[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512247#comment-15512247
]
Chris Douglas commented on HDFS-7878:
-
bq. is this somehow going to change end user APIs so a path isn't enough to
refer to things?
No, I'll try to summarize.
This JIRA proposes an API for {{FileSystem}} that exposes HDFS open-by-inode.
Not every implementation can enforce HDFS semantics precisely, but many can
support an "open with verification" API that improves on the TOCTOU races
common to most applications. These would mostly be new APIs.
While v05 proposed a {{long}} as the fileId, implementations other than HDFS
use different metadata (mostly strings; matching the {{FileStatus}} metadata is
often possible). If the {{FileHandle}} were exposed as a type, then
implementations could (opaquely) embed metadata in {{FileStatus}} for
"consistent" operations. v06 added a {{InodeId}} type (renamed {{FileHandle}}
or {{PathHandle}} or {{BikeShed}}).
These metadata can be encoded as a new field in {{FileStatus}}, which would be
API compatible, but serialized instances would not work across major versions.
While this seems like a reasonable jump to make in 3.x, it could cause some
pain. This aspect is discussed in HDFS-6984.
In this JIRA, we're discussing user-facing APIs for consistently opening a
file, a use case [~sershe] needs in Hive and HDFS-9806 needs for correctness.
As [~cmccabe] pointed out, we also want to consider consistent handling of
directories and symlinks as we define this API.
In favor of augmenting {{FileStatus}}: it's simple, it's probably what most
users expect, and it's serializable. It would be a natural, unsurprising API
for create/rename/delete/listFileStatus. That said, it's also significantly
larger than an 8 byte fileId, but implementations can detect crossed streams
(i.e., requesting a fileId from the wrong {{FileSystem}}; {{ViewFS}} can use it
for demux).
In favor of {{open(FileHandle)}}, we could add
{{FileSystem#createHandle(FileStatus)}} that _may_ use an RPC to generate a
serializable instance. These could be the minimal, serializable metadata to
refer to that inode.
ping [~fabbri], [~eddyxu]
> API - expose an unique file identifier
> --
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch,
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch,
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512154#comment-15512154
]
Rakesh R commented on HDFS-10800:
-
Thanks [~umamaheswararao], +1(non-binding) lgtm.
Pending jenkins report.
> [SPS]: Daemon thread in Namenode to find blocks placed in other storage than
> what the policy specifies
> --
>
> Key: HDFS-10800
> URL: https://issues.apache.org/jira/browse/HDFS-10800
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10800-HDFS-10285-00.patch,
> HDFS-10800-HDFS-10285-01.patch, HDFS-10800-HDFS-10285-02.patch,
> HDFS-10800-HDFS-10285-03.patch, HDFS-10800-HDFS-10285-04.patch,
> HDFS-10800-HDFS-10285-05.patch
>
>
> This JIRA is for implementing a daemon thread called StoragePolicySatisfier
> in namatode, which scans the asked files blocks which are placed in different
> storages in DNs than the related policies specifie.
> The idea is:
> # When user called on some files/dirs to satisfy storage policy, they
> should have been tracked in NN and then StoragePolicySatisfier thread will
> pick one by one file, then check the blocks which might have been placed in
> different storage in DN than what the storage policy is expecting it to.
> # After checking all, it should also construct the data structures with
> the required information to move a block from one storage to another.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uma Maheswara Rao G updated HDFS-10800:
---
Attachment: HDFS-10800-HDFS-10285-05.patch
A minor update in patch. Forgot to move blockMovingInfos inside
computeAndAssign* API. Please check this patch for review.
> [SPS]: Daemon thread in Namenode to find blocks placed in other storage than
> what the policy specifies
> --
>
> Key: HDFS-10800
> URL: https://issues.apache.org/jira/browse/HDFS-10800
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10800-HDFS-10285-00.patch,
> HDFS-10800-HDFS-10285-01.patch, HDFS-10800-HDFS-10285-02.patch,
> HDFS-10800-HDFS-10285-03.patch, HDFS-10800-HDFS-10285-04.patch,
> HDFS-10800-HDFS-10285-05.patch
>
>
> This JIRA is for implementing a daemon thread called StoragePolicySatisfier
> in namatode, which scans the asked files blocks which are placed in different
> storages in DNs than the related policies specifie.
> The idea is:
> # When user called on some files/dirs to satisfy storage policy, they
> should have been tracked in NN and then StoragePolicySatisfier thread will
> pick one by one file, then check the blocks which might have been placed in
> different storage in DN than what the storage policy is expecting it to.
> # After checking all, it should also construct the data structures with
> the required information to move a block from one storage to another.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512098#comment-15512098
]
Yongjun Zhang commented on HDFS-10314:
--
Hi [~jingzhao],
Thanks again for your earlier feedback.
Would like to share the details below about why I don't think your proposed
method is simpler. Hope it makes some sense to you, and please correct me if
I'm wrong. I hope you could elaborate here to help me understand better.
DistCp does two basic steps:
# based on the input, create the copyListing, which is a sequence file for
mapreduce, and each entry contains info to figure out one pair of and file attribute info
# throw the sequence file to the mapreduce job
Step 2 is relatively stable these days, mostly we are manipulating step 1 based
on the input.
"-diff s1 s2" replaced the original step 1 with a new step 1:
* 1.1 compute snapshot diff,
* 1.2 figure out the rename/delete operation's source and target, based on the
snapshot diff info
* 1.3 apply the rename/delete to the target path
* 1.4 figure out the add/modification operation's source and target, based on
the snapshot diff info
* 1.5 create copyListing based on step 1.4
*The tricky parts* are 1,2 and 1.4, and the order of applying the rename/delete
operations in step 1.3. With HDFS-7535 and HDFS-8828, a framework has been
implemented in DistCp that does the new step 1. What I did was to re-use the
framework.
Now the questions:
# With what you proposed, I don't see how the tricky parts I listed above are
simplified. And you suggested not to touch existing DistCp implementation, I
thought you meant to rewrite the code that does the tricky parts, which is not
simpler.
# Which step in your proposal will generate the copyListing? Step 3 or step 4?
** If it's in step 3, how we are going to pass the result to distcp in step 4?
** or if it's in step 4, that means we need to calculate the snapshot diff
again in step 4, and do the tricky manipulation again there. It doesn't look
simpler, and probably additional access to NN.
# Would you please share the specific problems you see with my implementation,
other than you think your proposal would be simpler? I really hope you could do
that.
Thanks much.
> A new tool to sync current HDFS view to specified snapshot
> --
>
> Key: HDFS-10314
> URL: https://issues.apache.org/jira/browse/HDFS-10314
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: tools
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-10314.001.patch
>
>
> HDFS-9820 proposed adding -rdiff switch to distcp, as a reversed operation of
> -diff switch.
> Upon discussion with [~jingzhao], we will introduce a new tool that wraps
> around distcp to achieve the same purpose.
> I'm thinking about calling the new tool "rsync", similar to unix/linux
> command "rsync". The "r" here means remote.
> The syntax that simulate -rdiff behavior proposed in HDFS-9820 is
> {code}
> rsync
> {code}
> This command ensure is newer than .
> I think, In the future, we can add another command to have the functionality
> of -diff switch of distcp.
> {code}
> sync
> {code}
> that ensures is older than .
> Thanks [~jingzhao].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brahma Reddy Battula updated HDFS-10886:
Summary: Replace "fs.default.name" with "fs.defaultFS" in viewfs document
(was: Replace "fs.default.name" with "fs.defaultFS" in viewfs d)
> Replace "fs.default.name" with "fs.defaultFS" in viewfs document
>
>
> Key: HDFS-10886
> URL: https://issues.apache.org/jira/browse/HDFS-10886
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: documentation, federation
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HDFS-10886.patch
>
>
> As we given two sections,update in *New World – Federation and ViewFs* not
> in *The Old World (Prior to Federation)* section.
> "fs.default.name" is deprecated, we should use "fs.defaultFS"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brahma Reddy Battula updated HDFS-10886:
Summary: Replace "fs.default.name" with "fs.defaultFS" in viewfs d (was:
Replace "fs.default.name" with "fs.defaultFS" in viewfs)
> Replace "fs.default.name" with "fs.defaultFS" in viewfs d
> -
>
> Key: HDFS-10886
> URL: https://issues.apache.org/jira/browse/HDFS-10886
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: documentation, federation
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HDFS-10886.patch
>
>
> As we given two sections,update in *New World – Federation and ViewFs* not
> in *The Old World (Prior to Federation)* section.
> "fs.default.name" is deprecated, we should use "fs.defaultFS"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brahma Reddy Battula reassigned HDFS-10886:
---
Assignee: Brahma Reddy Battula
> Replace "fs.default.name" with "fs.defaultFS" in viewfs
> ---
>
> Key: HDFS-10886
> URL: https://issues.apache.org/jira/browse/HDFS-10886
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: documentation, federation
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HDFS-10886.patch
>
>
> As we given two sections,update in *New World – Federation and ViewFs* not
> in *The Old World (Prior to Federation)* section.
> "fs.default.name" is deprecated, we should use "fs.defaultFS"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brahma Reddy Battula updated HDFS-10886:
Status: Patch Available (was: Open)
> Replace "fs.default.name" with "fs.defaultFS" in viewfs
> ---
>
> Key: HDFS-10886
> URL: https://issues.apache.org/jira/browse/HDFS-10886
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: documentation, federation
>Reporter: Brahma Reddy Battula
>Priority: Minor
> Attachments: HDFS-10886.patch
>
>
> As we given two sections,update in *New World – Federation and ViewFs* not
> in *The Old World (Prior to Federation)* section.
> "fs.default.name" is deprecated, we should use "fs.defaultFS"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brahma Reddy Battula updated HDFS-10886:
Attachment: HDFS-10886.patch
Uploaded the patch kindly review.
> Replace "fs.default.name" with "fs.defaultFS" in viewfs
> ---
>
> Key: HDFS-10886
> URL: https://issues.apache.org/jira/browse/HDFS-10886
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: documentation, federation
>Reporter: Brahma Reddy Battula
>Priority: Minor
> Attachments: HDFS-10886.patch
>
>
> As we given two sections,update in *New World – Federation and ViewFs* not
> in *The Old World (Prior to Federation)* section.
> "fs.default.name" is deprecated, we should use "fs.defaultFS"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
Brahma Reddy Battula created HDFS-10886:
---
Summary: Replace "fs.default.name" with "fs.defaultFS" in viewfs
Key: HDFS-10886
URL: https://issues.apache.org/jira/browse/HDFS-10886
Project: Hadoop HDFS
Issue Type: Bug
Components: documentation, federation
Reporter: Brahma Reddy Battula
Priority: Minor
As we given two sections,update in *New World – Federation and ViewFs* not in
*The Old World (Prior to Federation)* section.
"fs.default.name" is deprecated, we should use "fs.defaultFS"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Kace updated HDFS-10881:
--
Attachment: HDFS-10881-HDFS-10467-003.patch
Updated patch incorporating your feedback. I addressed all of the checkstyle
issues, including adding package-info.java files to each package, except one
that is a false warning. I also updated the naming conventions to focus on
"records" being the data entities and to use "clazz" for class parameters.
> Federation State Store Driver API
> -
>
> Key: HDFS-10881
> URL: https://issues.apache.org/jira/browse/HDFS-10881
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: fs
>Reporter: Jason Kace
>Assignee: Jason Kace
> Attachments: HDFS-10881-HDFS-10467-001.patch,
> HDFS-10881-HDFS-10467-002.patch, HDFS-10881-HDFS-10467-003.patch
>
>
> The API interfaces and minimal classes required to support a state store data
> backend such as ZooKeeper or a file system.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511922#comment-15511922
]
Yuanbo Liu commented on HDFS-10883:
---
[~xyao] Thanks for your comments.
{quote}
Also, we need to update the document regarding nested encryption...
{quote}
How about adding an subsection here: TransparentEncryption.md#Architecture and
name it as "operation with a nested encryption zone".
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kai Zheng updated HDFS-10800:
-
Description:
This JIRA is for implementing a daemon thread called StoragePolicySatisfier in
namatode, which scans the asked files blocks which are placed in different
storages in DNs than the related policies specifie.
The idea is:
# When user called on some files/dirs to satisfy storage policy, they
should have been tracked in NN and then StoragePolicySatisfier thread will pick
one by one file, then check the blocks which might have been placed in
different storage in DN than what the storage policy is expecting it to.
# After checking all, it should also construct the data structures with
the required information to move a block from one storage to another.
was:
This JIRA is for implementing a daemon thread called StoragePolicySatisfier in
nematode, which should scan the asked files blocks which were placed in wrong
storages in DNs.
The idea is:
# When user called on some files/dirs for satisfyStorage policy, They
should have tracked in NN and then StoragePolicyDaemon thread will pick one by
one file and then check the blocks which might have placed in wrong storage in
DN than what NN is expecting it to.
# After checking all, it should also construct the data structures for
the required information to move a block from one storage to another.
> [SPS]: Daemon thread in Namenode to find blocks placed in other storage than
> what the policy specifies
> --
>
> Key: HDFS-10800
> URL: https://issues.apache.org/jira/browse/HDFS-10800
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10800-HDFS-10285-00.patch,
> HDFS-10800-HDFS-10285-01.patch, HDFS-10800-HDFS-10285-02.patch,
> HDFS-10800-HDFS-10285-03.patch, HDFS-10800-HDFS-10285-04.patch
>
>
> This JIRA is for implementing a daemon thread called StoragePolicySatisfier
> in namatode, which scans the asked files blocks which are placed in different
> storages in DNs than the related policies specifie.
> The idea is:
> # When user called on some files/dirs to satisfy storage policy, they
> should have been tracked in NN and then StoragePolicySatisfier thread will
> pick one by one file, then check the blocks which might have been placed in
> different storage in DN than what the storage policy is expecting it to.
> # After checking all, it should also construct the data structures with
> the required information to move a block from one storage to another.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kai Zheng updated HDFS-10800:
-
Summary: [SPS]: Daemon thread in Namenode to find blocks placed in other
storage than what the policy specifies (was: [SPS]: Daemon thread in Namenode
to find the blocks placed in other storage than what the policy is expecting)
> [SPS]: Daemon thread in Namenode to find blocks placed in other storage than
> what the policy specifies
> --
>
> Key: HDFS-10800
> URL: https://issues.apache.org/jira/browse/HDFS-10800
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10800-HDFS-10285-00.patch,
> HDFS-10800-HDFS-10285-01.patch, HDFS-10800-HDFS-10285-02.patch,
> HDFS-10800-HDFS-10285-03.patch, HDFS-10800-HDFS-10285-04.patch
>
>
> This JIRA is for implementing a daemon thread called StoragePolicySatisfier
> in nematode, which should scan the asked files blocks which were placed in
> wrong storages in DNs.
> The idea is:
> # When user called on some files/dirs for satisfyStorage policy, They
> should have tracked in NN and then StoragePolicyDaemon thread will pick one
> by one file and then check the blocks which might have placed in wrong
> storage in DN than what NN is expecting it to.
> # After checking all, it should also construct the data structures for
> the required information to move a block from one storage to another.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kai Zheng updated HDFS-10800:
-
Summary: [SPS]: Daemon thread in Namenode to find the blocks placed in
other storage than what the policy is expecting (was: [SPS]: Storage Policy
Satisfier daemon thread in Namenode to find the blocks which were placed in
other storages than what NN is expecting.)
> [SPS]: Daemon thread in Namenode to find the blocks placed in other storage
> than what the policy is expecting
> -
>
> Key: HDFS-10800
> URL: https://issues.apache.org/jira/browse/HDFS-10800
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10800-HDFS-10285-00.patch,
> HDFS-10800-HDFS-10285-01.patch, HDFS-10800-HDFS-10285-02.patch,
> HDFS-10800-HDFS-10285-03.patch, HDFS-10800-HDFS-10285-04.patch
>
>
> This JIRA is for implementing a daemon thread called StoragePolicySatisfier
> in nematode, which should scan the asked files blocks which were placed in
> wrong storages in DNs.
> The idea is:
> # When user called on some files/dirs for satisfyStorage policy, They
> should have tracked in NN and then StoragePolicyDaemon thread will pick one
> by one file and then check the blocks which might have placed in wrong
> storage in DN than what NN is expecting it to.
> # After checking all, it should also construct the data structures for
> the required information to move a block from one storage to another.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511831#comment-15511831
]
Subru Krishnan edited comment on HDFS-10467 at 9/22/16 2:00 AM:
Thanks [~jakace] and [~goiri] for the refactored patch. I made a quick pass in
the context of HADOOP-13378 and I think that we can represent the YARN
{{FederationStateStore}} using the generic {{StateStoreDriver}} you guys have
defined. Personally I prefer the push mechanism we have in YARN-3671 as it's
much simpler than the pull mechanism proposed here though I do agree both
achieve the same result.
A couple of comments based on my quick scan:
* We should add versioning to the generic {{StateStoreDriver}}. Refer
[FederationStateStore|https://github.com/apache/hadoop/blob/YARN-2915/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/FederationStateStore.java].
* Use _jcache_ instead of writing custom key-caches based on
_ConcurrentHashMaps_. In fact, I feel we can refactor the (jcache-based) cache
in
[FederationStateStoreFacade|https://github.com/apache/hadoop/blob/YARN-2915/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/utils/FederationStateStoreFacade.java]
and use it across both efforts.
* We should reuse the *RecordFactory* and supporting infrastructure from YARN
as opposed to coming up with a parallel structure in HDFS.
* Lastly I would suggest using [Curator|http://curator.apache.org/] for
*ZooKeeper* implementation as we have moved to it in YARN (YARN-4438 and follow
up work).
was (Author: subru):
Thanks [~jakace] and [~goiri] for the refactored patch. I made a quick pass in
the context of HADOOP-13378 and I think that we can represent the YARN
{{FederationStateStore}} using the generic {{StateStoreDriver}} you guys have
defined. Personally I prefer the push mechanism we have in YARN-3671 as it's
much simpler than the pull mechanism proposed here though I do agree both
achieve the same result.
A couple of comments based on my quick scan:
* We should add versioning to the generic {{StateStoreDriver}}. Refer
[FederationStateStore|https://github.com/apache/hadoop/blob/YARN-2915/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/FederationStateStore.java].
* Use _jcache_ instead of writing custom key-caches based on
_ConcurrentHashMaps_. In fact, I feel we can refactor the (jcache-based) cache
in
[FederationStateStoreFacade|https://github.com/apache/hadoop/blob/YARN-2915/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/utils/FederationStateStoreFacade.java]
and use it across both efforts.
* Lastly I would suggest using [Curator|http://curator.apache.org/] for
*ZooKeeper* implementation as we have moved to it in YARN (YARN-4438 and follow
up work).
> Router-based HDFS federation
>
>
> Key: HDFS-10467
> URL: https://issues.apache.org/jira/browse/HDFS-10467
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: fs
>Affects Versions: 2.7.2
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS Router Federation.pdf, HDFS-10467.PoC.001.patch,
> HDFS-10467.PoC.patch, HDFS-Router-Federation-Prototype.patch
>
>
> Add a Router to provide a federated view of multiple HDFS clusters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511831#comment-15511831
]
Subru Krishnan commented on HDFS-10467:
---
Thanks [~jakace] and [~goiri] for the refactored patch. I made a quick pass in
the context of HADOOP-13378 and I think that we can represent the YARN
{{FederationStateStore}} using the generic {{StateStoreDriver}} you guys have
defined. Personally I prefer the push mechanism we have in YARN-3671 as it's
much simpler than the pull mechanism proposed here though I do agree both
achieve the same result.
A couple of comments based on my quick scan:
* We should add versioning to the generic {{StateStoreDriver}}. Refer
[FederationStateStore|https://github.com/apache/hadoop/blob/YARN-2915/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/FederationStateStore.java].
* Use _jcache_ instead of writing custom key-caches based on
_ConcurrentHashMaps_. In fact, I feel we can refactor the (jcache-based) cache
in
[FederationStateStoreFacade|https://github.com/apache/hadoop/blob/YARN-2915/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/utils/FederationStateStoreFacade.java]
and use it across both efforts.
* Lastly I would suggest using [Curator|http://curator.apache.org/] for
*ZooKeeper* implementation as we have moved to it in YARN (YARN-4438 and follow
up work).
> Router-based HDFS federation
>
>
> Key: HDFS-10467
> URL: https://issues.apache.org/jira/browse/HDFS-10467
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: fs
>Affects Versions: 2.7.2
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS Router Federation.pdf, HDFS-10467.PoC.001.patch,
> HDFS-10467.PoC.patch, HDFS-Router-Federation-Prototype.patch
>
>
> Add a Router to provide a federated view of multiple HDFS clusters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511783#comment-15511783
]
Yuanbo Liu edited comment on HDFS-10883 at 9/22/16 1:37 AM:
[~andrew.wang] Sure, I can work on it if you don't mind.
{quote}
I think the second behavior is more consistent...
{quote}
[~jojochuang]'s comment reminds me that things're gonna be complicated if we
take {{getTrashRoot("/")}} as a special case. I will change my description.
was (Author: yuanbo):
[~andrew.wang] Sure, I can work on it if you don't mind.
{quote}
I think the second behavior is more consistent...
{quote}
[~jojochuang]'s comment reminds me that things're gonna be complicated if we
take {{etTrashRoot("/")}} as a special case. I will change my description.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511783#comment-15511783
]
Yuanbo Liu commented on HDFS-10883:
---
[~andrew.wang] Sure, I can work on it if you don't mind.
{quote}
I think the second behavior is more consistent...
{quote}
[~jojochuang]'s comment reminds me that things're gonna be complicated if we
take {{etTrashRoot("/")}} as a special case. I will change my description.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuanbo Liu updated HDFS-10883:
--
Description:
Let's say root path ("/") is the encryption zone, and there is a file called
"/test" in root path.
{code}
dfs.getTrashRoot(new Path("/"))
{code}
returns "/user/$USER/.Trash",
while
{code}
dfs.getTrashRoot(new Path("/test"))
{code}
returns "/.Trash/$USER".
Please see the attachment to know how to reproduce this issue.
was:
Let's say root path ("/") is the encryption zone, and there is a file called
"/test" in root path.
{code}
dfs.getTrashRoot(new Path("/"))
{code}
returns "/user/$USER/.Trash",
while
{code}
dfs.getTrashRoot(new Path("/test"))
{code}
returns "/.Trash/$USER".
The second behavior is not correct. Since root path is the encryption zone,
which means all files/directories in DFS are encrypted, it's more reasonable
to return "/user/$USER/.Trash" no matter what the path is.
Please see the attachment to know how to reproduce this issue.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manoj Govindassamy updated HDFS-9063:
-
Attachment: test.002.patch
[~jingzhao],
Tried querying {{ContentSummary}} for the Directory which has snapshots, and
still I don't see ContentSummary's {{SnpashotDirectoryCount}} reflecting any
snapshots taken. Is there any other way of querying the ContentSummary which
will reflect the right Snap Dir Count ? Please clarify.
Output from attached test v002.
{noformat}
ContentSummary: /foo - Dir: 2, SnapDir: 0, SnapFile: 0
ContentSummary: /foo/bar - Dir: 1, SnapDir: 0, SnapFile: 0
Count /foo - none infnone inf
20 0 /foo
Created Snaphot: /foo/.snapshot/s1
Created Snaphot: /foo/.snapshot/s2
Count /foo - none infnone inf
20 0 /foo
ContentSummary: /foo - Dir: 2, SnapDir: 0, SnapFile: 0
ContentSummary: /foo/bar - Dir: 1, SnapDir: 0, SnapFile: 0
ContentSummary: /foo/.snapshot/s1 - Dir: 2, SnapDir: 0, SnapFile: 0
ContentSummary: /foo/.snapshot/s2 - Dir: 2, SnapDir: 0, SnapFile: 0
{noformat}
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Labels: incompatible
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch, test.002.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
> In the meanwhile, the snapshot diff list size is no longer included in the
> computation result. This can bring minor incompatibility but is consistent
> with the change in HDFS-7728.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511451#comment-15511451
]
Lei (Eddy) Xu commented on HDFS-9390:
-
Ping [~mingma] Is there any update on this issue?
> Block management for maintenance states
> ---
>
> Key: HDFS-9390
> URL: https://issues.apache.org/jira/browse/HDFS-9390
> Project: Hadoop HDFS
> Issue Type: Sub-task
>Reporter: Ming Ma
>
> When a node is transitioned to/stay in/transitioned out of maintenance state,
> we need to make sure blocks w.r.t. that nodes are properly handled.
> * When nodes are put into maintenance, it will first go to
> ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before
> the nodes are transitioned to IN_MAINTENANCE.
> * Do not replica blocks when nodes are in maintenance states. Maintenance
> replica will remain in BlockMaps and thus is still considered valid from
> block replication point of view. In other words, putting a node to
> “maintenance” mode won’t trigger BlockManager to replicate its blocks.
> * Do not invalidate replicas on node under maintenance. After any file's
> replication factor is reduced, NN needs to invalidate some replicas. It
> should exclude nodes under maintenance in the handling.
> * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation.
> * Do not allocate any new block on nodes under maintenance.
> * Have Balancer exclude nodes under maintenance.
> * Exclude nodes under maintenance for DN cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lei (Eddy) Xu updated HDFS-7877:
Assignee: Ming Ma
> Support maintenance state for datanodes
> ---
>
> Key: HDFS-7877
> URL: https://issues.apache.org/jira/browse/HDFS-7877
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7877-2.patch, HDFS-7877.patch,
> Supportmaintenancestatefordatanodes-2.pdf,
> Supportmaintenancestatefordatanodes.pdf
>
>
> This requirement came up during the design for HDFS-7541. Given this feature
> is mostly independent of upgrade domain feature, it is better to track it
> under a separate jira. The design and draft patch will be available soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uma Maheswara Rao G updated HDFS-10800:
---
Attachment: HDFS-10800-HDFS-10285-04.patch
Thanks a lot [~rakeshr], for the thoughtful suggestions. I have incorporated
all of them in this patch. Please review.
> [SPS]: Storage Policy Satisfier daemon thread in Namenode to find the blocks
> which were placed in other storages than what NN is expecting.
> ---
>
> Key: HDFS-10800
> URL: https://issues.apache.org/jira/browse/HDFS-10800
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10800-HDFS-10285-00.patch,
> HDFS-10800-HDFS-10285-01.patch, HDFS-10800-HDFS-10285-02.patch,
> HDFS-10800-HDFS-10285-03.patch, HDFS-10800-HDFS-10285-04.patch
>
>
> This JIRA is for implementing a daemon thread called StoragePolicySatisfier
> in nematode, which should scan the asked files blocks which were placed in
> wrong storages in DNs.
> The idea is:
> # When user called on some files/dirs for satisfyStorage policy, They
> should have tracked in NN and then StoragePolicyDaemon thread will pick one
> by one file and then check the blocks which might have placed in wrong
> storage in DN than what NN is expecting it to.
> # After checking all, it should also construct the data structures for
> the required information to move a block from one storage to another.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511295#comment-15511295
]
Inigo Goiri commented on HDFS-10882:
[~jakace], a few comments:
* Could you extend the comment for {{FederationProtocolBase}}?
* In {{FederationProtocolFactory}}, could we use record and clazz instead of
entity/entry and entityType? Can you also add a javadoc for {{useProtobuf()}}.
* Do we need the constructor in {{FederationPBHelper}}?
* In general, the PBHelper is a little different to what HDFS has. However,
this matches the YARN approach; ideally, we should use the YARN
{{RecordFactory}} but as this is not in commons, this is the closest approach.
* Not sure what to do about the Federation prefix in classes like
FederationPBHelper. Kind of superfluous.
> Federation State Store Interface API
>
>
> Key: HDFS-10882
> URL: https://issues.apache.org/jira/browse/HDFS-10882
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: fs
>Reporter: Jason Kace
>Assignee: Jason Kace
> Attachments: HDFS-10882-HDFS-10467-001.patch,
> HDFS-10882-HDFS-10467-002.patch
>
>
> The minimal classes and interfaces required to create state store internal
> data APIs using protobuf serialization. This is a pre-requisite for higher
> level APIs such as the registration API and the mount table API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511227#comment-15511227
]
Jing Zhao commented on HDFS-9063:
-
Agree. Added the incompatible label and also updated the release note. Thanks
for looking into this, [~manojg].
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Labels: incompatible
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
> In the meanwhile, the snapshot diff list size is no longer included in the
> computation result. This can bring minor incompatibility but is consistent
> with the change in HDFS-7728.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511225#comment-15511225
]
Inigo Goiri commented on HDFS-10881:
[~jakace], this is a good starting point for the rest of the patches.
A few comments:
* Can you fix the checkstyles? Basically, it's just adding the
package-info.java and adding periods to the javadocs.
* There are a bunch of javadocs warning (e.g.,
{{StateStoreDriver#getIdentifier}} and missing ones (e.g., {{BaseRecord}}).
* In {{FederationStateStoreService}} could you add the full comment of what the
State Store does (something like what you had in HDFS-10630 but without going
into particular implementation) and move the current class comment into the
internal TODO?
* Do we need the constructor in {{FederationStateStoreUtils}}?
* Across the code, you have entity and entry for the name of some variables;
can we make them just record? When we do entityType, it should be clazz to
follow the rest of the HDFS/YARN code.
* In {{StateStoreUnavailableException}}, can we make it {{serialVersionUID =
1L;}}?
* In the {{QueryResult}} constructor, don't break the line.
* In {{QueryResult}}, you have a typo in {{driverTimstamp}}.
> Federation State Store Driver API
> -
>
> Key: HDFS-10881
> URL: https://issues.apache.org/jira/browse/HDFS-10881
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: fs
>Reporter: Jason Kace
>Assignee: Jason Kace
> Attachments: HDFS-10881-HDFS-10467-001.patch,
> HDFS-10881-HDFS-10467-002.patch
>
>
> The API interfaces and minimal classes required to support a state store data
> backend such as ZooKeeper or a file system.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HDFS-9063:
Release Note:
The jira made the following changes:
1. Fix a bug to exclude newly-created files from quota usage calculation for a
snapshot path.
2. Number of snapshots is no longer counted as directory number in
getContentSummary result.
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Labels: incompatible
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
> In the meanwhile, the snapshot diff list size is no longer included in the
> computation result. This can bring minor incompatibility but is consistent
> with the change in HDFS-7728.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HDFS-9063:
Description:
The current getContentSummary implementation does not take into account the
snapshot path, thus if we have the following ops:
1. create dirs /foo/bar
2. take snapshot s1 on /foo
3. create a 1 byte file /foo/bar/baz
then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
which is incorrect since the 1 byte file is not included in snapshot s1.
In the meanwhile, the snapshot diff list size is no longer included in the
computation result. This can bring minor incompatibility but is consistent with
the change in HDFS-7728.
was:
The current getContentSummary implementation does not take into account the
snapshot path, thus if we have the following ops:
1. create dirs /foo/bar
2. take snapshot s1 on /foo
3. create a 1 byte file /foo/bar/baz
then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
which is incorrect since the 1 byte file is not included in snapshot s1.
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Labels: incompatible
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
> In the meanwhile, the snapshot diff list size is no longer included in the
> computation result. This can bring minor incompatibility but is consistent
> with the change in HDFS-7728.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HDFS-9063:
Labels: incompatible (was: )
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Labels: incompatible
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511199#comment-15511199
]
Manoj Govindassamy commented on HDFS-9063:
--
Thanks for looking into this. Thats right, {{ContentSummary}} does have good
info about Snapshots and I do feel its better to keep dirCount and snapCount
separate. However, the behavior of the {{Count}} command has changed and it
would be good to mark this jira as incompatible.
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Mackrory updated HDFS-10877:
-
Attachment: HDFS-10877.002.patch
> Make RemoteEditLogManifest.committedTxnId optional in Protocol Buffers
> --
>
> Key: HDFS-10877
> URL: https://issues.apache.org/jira/browse/HDFS-10877
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: qjm
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HDFS-10877.001.patch, HDFS-10877.002.patch
>
>
> HDFS-10519 introduced a new field in the RemoteEditLogManifest message. It
> can be made optional to improve wire-compatibility with previous versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511159#comment-15511159
]
Masatake Iwasaki commented on HDFS-9333:
Thanks for the reviewing, [~andrew.wang].
> Some tests using MiniDFSCluster errored complaining port in use
> ---
>
> Key: HDFS-9333
> URL: https://issues.apache.org/jira/browse/HDFS-9333
> Project: Hadoop HDFS
> Issue Type: Test
> Components: test
>Reporter: Kai Zheng
>Assignee: Masatake Iwasaki
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-9333.001.patch, HDFS-9333.002.patch,
> HDFS-9333.003.patch
>
>
> Ref. the following:
> {noformat}
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.483 sec
> <<< FAILURE! - in
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
> testRead(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped)
> Time elapsed: 11.021 sec <<< ERROR!
> java.net.BindException: Port in use: localhost:49333
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
> at
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:884)
> at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:826)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:821)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:675)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:883)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:862)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1555)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2015)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1996)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.doTestRead(TestBlockTokenWithDFS.java:539)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.testRead(TestBlockTokenWithDFSStriped.java:62)
> {noformat}
> Another one:
> {noformat}
> Tests run: 5, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 9.859 sec <<<
> FAILURE! - in org.apache.hadoop.hdfs.tools.TestDFSZKFailoverController
> testFailoverAndBackOnNNShutdown(org.apache.hadoop.hdfs.tools.TestDFSZKFailoverController)
> Time elapsed: 0.41 sec <<< ERROR!
> java.net.BindException: Problem binding to [localhost:10021]
> java.net.BindException: Address already in use; For more details see:
> http://wiki.apache.org/hadoop/BindException
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at org.apache.hadoop.ipc.Server.bind(Server.java:469)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:695)
> at org.apache.hadoop.ipc.Server.(Server.java:2464)
> at org.apache.hadoop.ipc.RPC$Server.(RPC.java:945)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
> at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:399)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:742)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:680)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:883)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:862)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1555)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1245)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1014)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:889)
> at
>
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511051#comment-15511051
]
Jing Zhao commented on HDFS-9063:
-
Ah, I see what you mean. Yes, we have a minor incompatibility here, please see
my first comment:
{quote}
In the meanwhile, the snapshot diff list size is no longer included in the
computation result. This can bring minor incompatibility but is consistent with
the change in HDFS-7728.
{quote}
I.e., since we already have a number in the content summary to indicate the
total number of snapshots, the number of snapshots is no longer added into the
directory number. To me the old behavior is more like a bug, but we can still
mark this jira as incompatible maybe.
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511006#comment-15511006
]
Andrew Wang commented on HDFS-10883:
I think the second behavior is more consistent, because of the case Weichiu
mentioned with nested EZs. That is, {{getTrashRoot("/")}} should return
{{/.Trash/USER}}. I don't know why there is a special case.
I tried making this change (removing the isRoot special case), and it looks
like we missed a test case asserting this behavior in TestEncryptionZones and
TestNestedEncryptionZones.
[~yuanbo] do you want to work on this? Else I can provide a patch.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510852#comment-15510852
]
Manoj Govindassamy commented on HDFS-9063:
--
Attached test fails on the latest trunk 3.0.0.alpha2. I didn't see the test
passing after reverting HDFS-8986 fix. But test passes after reverting
HDFS-9063 fix. Let me know if my understanding is wrong.
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manoj Govindassamy updated HDFS-9063:
-
Attachment: test.001.patch
[~jingzhao],
Attached a test case which shows the change in behavior after HDFS-9063 fix.
Test does the following:
1. Count dirs, files in a directory /foo
2. Allow and take snapshot of the directory /foo
3. Count dirs, files of the same directory /foo for which snapshot was taken
4. Verify the new_dir_count == 1 + old_dir_count. Because, it should now
include {{.snapshot}} directory. This verification fails after HDFS-9063 fix.
Let me know your thoughts on the test and the expected behavior.
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch, test.001.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Kace updated HDFS-10882:
--
Attachment: HDFS-10882-HDFS-10467-002.patch
Updating patch:
1) Checkstyle (most), findbugs and javadoc errors. Updating a few javadoc
comments.
2) This patch contains no unit tests and is not fully functional. This code is
used in a fully functional state with unit tests in HDFS-10630.
> Federation State Store Interface API
>
>
> Key: HDFS-10882
> URL: https://issues.apache.org/jira/browse/HDFS-10882
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: fs
>Reporter: Jason Kace
>Assignee: Jason Kace
> Attachments: HDFS-10882-HDFS-10467-001.patch,
> HDFS-10882-HDFS-10467-002.patch
>
>
> The minimal classes and interfaces required to create state store internal
> data APIs using protobuf serialization. This is a pre-requisite for higher
> level APIs such as the registration API and the mount table API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510734#comment-15510734
]
Xiaoyu Yao commented on HDFS-10883:
---
[~yuanbo], thanks for reporting the issue and posting the patches.
bq. The second behavior is not correct. Since root path is the encryption zone,
which means all files/directories in DFS are encrypted, it's more reasonable
to return "/user/$USER/.Trash" no matter what the path is.
Taking a looking at the history of changes in getTrashRoot API, the special
case when encryption zone is created on "/" is added by HDFS-9244 to support
nested encryption zone. Before that, the API always keeps the trash of
encryption zone always under the ".Trash" directory of the encryption zone's
root. [~zhz]/[~andrew.wang], can you provide more context why this special case
is needed?
Also, we need to update the document regarding nested encryption support and
its impact to rename and Trash support.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Kace updated HDFS-10881:
--
Attachment: HDFS-10881-HDFS-10467-002.patch
Updating patch:
1) Findbugs and checkstyle fixes
2) There are no unit tests associated with this patch. The code isn't fully
functional as is, it is a base template for later patches. The patch (built on
top of this patch) in HDFS-10630 includes basic functionality unit tests and is
fully functional.
> Federation State Store Driver API
> -
>
> Key: HDFS-10881
> URL: https://issues.apache.org/jira/browse/HDFS-10881
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: fs
>Reporter: Jason Kace
>Assignee: Jason Kace
> Attachments: HDFS-10881-HDFS-10467-001.patch,
> HDFS-10881-HDFS-10467-002.patch
>
>
> The API interfaces and minimal classes required to support a state store data
> backend such as ZooKeeper or a file system.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510683#comment-15510683
]
Jing Zhao edited comment on HDFS-9063 at 9/21/16 6:00 PM:
--
[~manojg], the patch fixed a bug and did not change the semantic of {{du}}
command. Before the patch, files that were created outside of a snapshot could
be included when calculating the count/usage of the snapshot. The patch fixed
this bug and there is such an example in the description.
For your case, maybe HDFS-8986 is the one related? Which Hadoop version are you
using?
was (Author: jingzhao):
[~manojg], the patch fixed a bug and did not change the semantic of {{du}}
command. Before the patch, files that were created outside of a snapshot could
be included when calculating the count/usage of the snapshot. The patch fixed
this bug and there is such an example int the description.
For your case, maybe HDFS-8986 is the one related? Which Hadoop version are you
using?
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510683#comment-15510683
]
Jing Zhao commented on HDFS-9063:
-
[~manojg], the patch fixed a bug and did not change the semantic of {{du}}
command. Before the patch, files that were created outside of a snapshot could
be included when calculating the count/usage of the snapshot. The patch fixed
this bug and there is such an example int the description.
For your case, maybe HDFS-8986 is the one related? Which Hadoop version are you
using?
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Inigo Goiri updated HDFS-10687:
---
Summary: Federation Membership State Store internal API (was: Federation
Membership State Store internal APIs)
> Federation Membership State Store internal API
> --
>
> Key: HDFS-10687
> URL: https://issues.apache.org/jira/browse/HDFS-10687
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: fs
>Reporter: Inigo Goiri
>Assignee: Jason Kace
> Attachments: HDFS-10467-HDFS-10687-001.patch
>
>
> The Federation Membership State encapsulates the information about the
> Namenodes of each sub-cluster that are participating in Federation. The
> information includes addresses for RPC, Web. This information is stored in
> the State Store and later used by the Router to find data in the federation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510568#comment-15510568
]
Manoj Govindassamy commented on HDFS-9063:
--
Hi [~jingzhao],
I am seeing a difference in behavior on the {{Count}} command before and after
this fix. {{Count}} command used to account for {{.snapshot}} directory, but
after this fix, _numDirs_ from Count command is less than expected as it is not
including the {{.snapshot}} directory. Is the change in behavior intended ?
Please clarify.
> Correctly handle snapshot path for getContentSummary
>
>
> Key: HDFS-9063
> URL: https://issues.apache.org/jira/browse/HDFS-9063
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-9063.000.patch
>
>
> The current getContentSummary implementation does not take into account the
> snapshot path, thus if we have the following ops:
> 1. create dirs /foo/bar
> 2. take snapshot s1 on /foo
> 3. create a 1 byte file /foo/bar/baz
> then "du /foo" and "du /foo/.snapshot/s1" can report same results for "bar",
> which is incorrect since the 1 byte file is not included in snapshot s1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510535#comment-15510535
]
Akira Ajisaka commented on HDFS-10866:
--
Thanks Konstatin for the update and thanks Zhe Zhang for the review.
Rethinking this, I'm +1 for the v1 patch. After re-importing the project, the
v1 patch worked on my environment. Hi [~zhz], would you verify the v1 patch?
Sorry for back and force.
> Fix Eclipse Java 8 compile errors related to generic parameters.
>
>
> Key: HDFS-10866
> URL: https://issues.apache.org/jira/browse/HDFS-10866
> Project: Hadoop HDFS
> Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: HDFS-10866.01.patch, HDFS-10866.02.patch, IntelliJ.png,
> Screen Shot 2016-09-19 at 1.50.22 PM.png
>
>
> Compilation with Java 8 in Eclipse returns errors, which are related to the
> use of generics. This does not effect command line maven builds and is
> confirmed to be a [bug in
> Eclipse|https://bugs.eclipse.org/bugs/show_bug.cgi?id=497905#c1].The fix is
> scheduled only for the next release, so all of us using Eclipse now will have
> that error.
> Unless we fix it in Hadoop code, which makes sense to me as it appears as a
> warning in any case.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510391#comment-15510391
]
Yongjun Zhang edited comment on HDFS-10314 at 9/21/16 4:11 PM:
---
Hi [~jingzhao],
For clarity, and as a recap, here is a comparison table between -diff and the
proposed -rdiff, which shows the symmetricity:
||Comparison||-diff s1 s2 ||-rdiff s2 s1 ||
|Current feature state|Existing in distcp|Proposed Addition |
|Functionality| Given 's current state is s1, make 's current state
the same as newer snapshot s2 | Given 's current state is s2, make 's
current state the same as older snapshot s1 |
|Requirements| # and need to be different paths
# both and have snapshot s1 with exact same content
# has snapshot s2
# s2 is newer than s1
# 's current state is the same as s1
# doesn't have snapshot s2 | # and can be the same or
different paths
# both and have snapshot s1 with exact same content
# has snapshot s2
# s2 is newer than s1
# 's current state is the same as s2
# may or may not have snapshot s2 |
|Steps|# calculate snapshotDiff at
# apply rename/delete part of snapshotDiff on
# copy modified part of snapshotDiff from s2 of to | # calculate
snapshotDiff at
# apply rename/delete part of snapshotDiff on
# copy modified part of snapshotDiff from s1 of to |
The original thinking was to add -ridff to distcp (solution A), but because of
the concern of confusing semantics, it's suggested to introduce a new command
here (solution B).
Thanks.
was (Author: yzhangal):
Hi [~jingzhao],
For clarity, and as a recap, here is a comparison table between -diff and the
proposed -rdiff, which shows the symmetricity:
||Comparison||-diff s1 s2 ||-rdiff s2 s1 ||
|Current feature state|Existing in distcp|Proposed Addition |
|Functionality| Given 's current state is s1, make 's current state
the same as newer snapshot s2 | Given 's current state is s2, make 's
current state the same as older snapshot s1 |
|Requirements| # and need to be different paths
# both and have snapshot s1 with exact same content
# has snapshot s2
# s2 is newer than s1
# 's current state is the same as s1
# doesn't have snapshot s2 | # and can be the same or
different paths
# both and have snapshot s1 with exact same content
# has snapshot s2
# s2 is newer than s1
# 's current state is the same as s2
# may or may not have snapshot s2 |
|Steps|# calculate snapshotDiff at
# apply rename/delete part of snapshotDiff on
# copy modified part of snapshotDiff from s1 of to | # calculate
snapshotDiff at
# apply rename/delete part of snapshotDiff on
# copy modified part of snapshotDiff from s1 of to |
The original thinking was to add -ridff to distcp (solution A), but because of
the concern of confusing semantics, it's suggested to introduce a new command
here (solution B).
Thanks.
> A new tool to sync current HDFS view to specified snapshot
> --
>
> Key: HDFS-10314
> URL: https://issues.apache.org/jira/browse/HDFS-10314
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: tools
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-10314.001.patch
>
>
> HDFS-9820 proposed adding -rdiff switch to distcp, as a reversed operation of
> -diff switch.
> Upon discussion with [~jingzhao], we will introduce a new tool that wraps
> around distcp to achieve the same purpose.
> I'm thinking about calling the new tool "rsync", similar to unix/linux
> command "rsync". The "r" here means remote.
> The syntax that simulate -rdiff behavior proposed in HDFS-9820 is
> {code}
> rsync
> {code}
> This command ensure is newer than .
> I think, In the future, we can add another command to have the functionality
> of -diff switch of distcp.
> {code}
> sync
> {code}
> that ensures is older than .
> Thanks [~jingzhao].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510391#comment-15510391
]
Yongjun Zhang commented on HDFS-10314:
--
Hi [~jingzhao],
For clarity, and as a recap, here is a comparison table between -diff and the
proposed -rdiff, which shows the symmetricity:
||Comparison||-diff s1 s2 ||-rdiff s2 s1 ||
|Current feature state|Existing in distcp|Proposed Addition |
|Functionality| Given 's current state is s1, make 's current state
the same as newer snapshot s2 | Given 's current state is s2, make 's
current state the same as older snapshot s1 |
|Requirements| # and need to be different paths
# both and have snapshot s1 with exact same content
# has snapshot s2
# s2 is newer than s1
# 's current state is the same as s1
# doesn't have snapshot s2 | # and can be the same or
different paths
# both and have snapshot s1 with exact same content
# has snapshot s2
# s2 is newer than s1
# 's current state is the same as s2
# may or may not have snapshot s2 |
|Steps|# calculate snapshotDiff at
# apply rename/delete part of snapshotDiff on
# copy modified part of snapshotDiff from s1 of to | # calculate
snapshotDiff at
# apply rename/delete part of snapshotDiff on
# copy modified part of snapshotDiff from s1 of to |
The original thinking was to add -ridff to distcp (solution A), but because of
the concern of confusing semantics, it's suggested to introduce a new command
here (solution B).
Thanks.
> A new tool to sync current HDFS view to specified snapshot
> --
>
> Key: HDFS-10314
> URL: https://issues.apache.org/jira/browse/HDFS-10314
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: tools
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-10314.001.patch
>
>
> HDFS-9820 proposed adding -rdiff switch to distcp, as a reversed operation of
> -diff switch.
> Upon discussion with [~jingzhao], we will introduce a new tool that wraps
> around distcp to achieve the same purpose.
> I'm thinking about calling the new tool "rsync", similar to unix/linux
> command "rsync". The "r" here means remote.
> The syntax that simulate -rdiff behavior proposed in HDFS-9820 is
> {code}
> rsync
> {code}
> This command ensure is newer than .
> I think, In the future, we can add another command to have the functionality
> of -diff switch of distcp.
> {code}
> sync
> {code}
> that ensures is older than .
> Thanks [~jingzhao].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510235#comment-15510235
]
James Clampffer commented on HDFS-10874:
Thanks for the review Bob. I'll commit/resolve momentarily.
bq. Perhaps as another task, we should ensure that the tools and examples build
using just the public headers.
I agree. As part of that it'd be good to make sure that headers from other
directories aren't being implicitly added to the include paths for building the
tools and examples so it errors when tools are written to use them. How about
we generalize HDFS-10787 to cover that work?
> libhdfs++: Public API headers should not depend on internal implementation
> --
>
> Key: HDFS-10874
> URL: https://issues.apache.org/jira/browse/HDFS-10874
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10874.HDFS-8707.000.patch
>
>
> Public headers need to do some combination of the following: stop including
> parts of the implementation, forward declare bits of the implementation where
> absolutely needed, or pull the implementation into include/hdfspp if it's
> inseparable.
> Example:
> If you want to use the C++ API and only stick include/hdfspp in the include
> path you'll get an error when you include include/hdfspp/options.h because
> that goes and includes common/uri.h.
> Related to the work described in HDFS-10787.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510055#comment-15510055
]
Steve Loughran commented on HDFS-7878:
--
proposal wise, I'm somewhat confused. is this somehow going to change end user
APIs so a path isn't enough to refer to things?
> API - expose an unique file identifier
> --
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch,
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch,
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509990#comment-15509990
]
Wei-Chiu Chuang commented on HDFS-10883:
The other corner case is /user as an encryption zone.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509886#comment-15509886
]
Bob Hansen commented on HDFS-10874:
---
+1
> libhdfs++: Public API headers should not depend on internal implementation
> --
>
> Key: HDFS-10874
> URL: https://issues.apache.org/jira/browse/HDFS-10874
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10874.HDFS-8707.000.patch
>
>
> Public headers need to do some combination of the following: stop including
> parts of the implementation, forward declare bits of the implementation where
> absolutely needed, or pull the implementation into include/hdfspp if it's
> inseparable.
> Example:
> If you want to use the C++ API and only stick include/hdfspp in the include
> path you'll get an error when you include include/hdfspp/options.h because
> that goes and includes common/uri.h.
> Related to the work described in HDFS-10787.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509520#comment-15509520
]
Ewan Higgs commented on HDFS-10285:
---
Hi,
Quick question: does the SPS handle erasure coded files?
e.g.: In the design document, it says the following:
{quote}
. When user renames the file with inherited storage policy (not the file level
storage policy set), renamed file should satisfy the destination file’s
applicable inherited storage policy when user calls satisfyStoragePolicy on
that file.
{quote}
So, given {{satisfyStoragePolicy("/")}} how might it handle {{hdfs dfs -mv
/replicated/some-file /erasure-coded/}} where EC blocks have specific BlockIDs
that are <1. Treat this as a copy + delete?
> Storage Policy Satisfier in Namenode
>
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
>Affects Versions: 2.7.2
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These
> policies can be set on directory/file to specify the user preference, where
> to store the physical block. When user set the storage policy before writing
> data, then the blocks could take advantage of storage policy preferences and
> stores physical block accordingly.
> If user set the storage policy after writing and completing the file, then
> the blocks would have been written with default storage policy (nothing but
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such
> file names as a list. In some distributed system scenarios (ex: HBase) it
> would be difficult to collect all the files and run the tool as different
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage
> policy file (inherited policy from parent directory) to another storage
> policy effected directory, it will not copy inherited storage policy from
> source. So it will take effect from destination file/dir parent storage
> policy. This rename operation is just a metadata change in Namenode. The
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for
> admins from distributed nodes(ex: region servers) and running the Mover tool.
> Here the proposal is to provide an API from Namenode itself for trigger the
> storage policy satisfaction. A Daemon thread inside Namenode should track
> such calls and process to DN as movement commands.
> Will post the detailed design thoughts document soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuanbo Liu updated HDFS-10883:
--
Status: Patch Available (was: Open)
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuanbo Liu updated HDFS-10883:
--
Comment: was deleted
(was: [~cheersyang])
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509325#comment-15509325
]
Yuanbo Liu commented on HDFS-10883:
---
[~cheersyang]
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuanbo Liu updated HDFS-10883:
--
Attachment: HDFS-10883.001.patch
upload v1 patch for this issue.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt, HDFS-10883.001.patch
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei Zhou updated HDFS-10885:
Summary: [SPS]: Mover tool should not be allowed to run when Storage Policy
Satisfier is on (was: Mover tool should not be allowed to run when Storage
Policy Satisfier is on)
> [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier
> is on
> --
>
> Key: HDFS-10885
> URL: https://issues.apache.org/jira/browse/HDFS-10885
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>
> These two can not work at the same time to avoid conflicts and fight with
> each other.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509302#comment-15509302
]
Wei Zhou commented on HDFS-10885:
-
Yes, you are right. Thanks for your reminding!
> Mover tool should not be allowed to run when Storage Policy Satisfier is on
> ---
>
> Key: HDFS-10885
> URL: https://issues.apache.org/jira/browse/HDFS-10885
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>
> These two can not work at the same time to avoid conflicts and fight with
> each other.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509286#comment-15509286
]
Yuanbo Liu commented on HDFS-10885:
---
[~zhouwei] Just a little suggestion, could you add "[SPS]:" to the summary for
keeping consistence. :)
> Mover tool should not be allowed to run when Storage Policy Satisfier is on
> ---
>
> Key: HDFS-10885
> URL: https://issues.apache.org/jira/browse/HDFS-10885
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>
> These two can not work at the same time to avoid conflicts and fight with
> each other.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
Wei Zhou created HDFS-10885:
---
Summary: Mover tool should not be allowed to run when Storage
Policy Satisfier is on
Key: HDFS-10885
URL: https://issues.apache.org/jira/browse/HDFS-10885
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Wei Zhou
Assignee: Wei Zhou
These two can not work at the same time to avoid conflicts and fight with each
other.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
Rakesh R created HDFS-10884:
---
Summary: [SPS]: Add block movement tracker to track the completion
of block movement future tasks at DN
Key: HDFS-10884
URL: https://issues.apache.org/jira/browse/HDFS-10884
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
Presently
[StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147]
function act as a blocking call. The idea of this jira is to implement a
mechanism to track these movements async so that would allow other movement
while processing the previous one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509035#comment-15509035
]
Rakesh R commented on HDFS-10800:
-
Thanks [~umamaheswararao], latest patch looks good. Few more suggestions to
improve the patch. sorry, I failed to identify this in my previous review,
would you mind incorporating this.
# We could mark the new classes {{@InterfaceAudience.Private}}
# Can we find a better name for ‘StorageMovementNeeded’ class, this name looks
like its moving the storage. How about BlockMovementNeeded or
BlockStorageMovementNeeded?
# It would be good if we could {{sps.stop();}} the thread at the beginning of
the {{BlockManager#stop()}} function. This could avoid unwanted exceptions in
future if someone add extra codes which depends on already stopped dependent
services
# IMHO, {{#assignStorageMovementTasksToDN()}} can be moved inside
{{#buildStorageMismatchedBlocks(long blockCollectionID)}}, that way we could
avoid passing few args. Also, the below if condition is not required as
duplicate checks inside the assign function, right?
{code}
if (blockInfoToMoveStorages.size() > 0) {
assignStorageMovementTasksToDN(coordinatorNode,
blockCollectionID,
blockInfoToMoveStorages);
}
{code}
# Also, can you make the following variable as {{final}}. This will avoid NPE
if someone mistakenly do any changes in future.
{code}
private StoragePolicySatisfier sps;
private StorageMovementNeededneedStorageMovement = newStorageMovementNeeded();
{code}
> [SPS]: Storage Policy Satisfier daemon thread in Namenode to find the blocks
> which were placed in other storages than what NN is expecting.
> ---
>
> Key: HDFS-10800
> URL: https://issues.apache.org/jira/browse/HDFS-10800
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10800-HDFS-10285-00.patch,
> HDFS-10800-HDFS-10285-01.patch, HDFS-10800-HDFS-10285-02.patch,
> HDFS-10800-HDFS-10285-03.patch
>
>
> This JIRA is for implementing a daemon thread called StoragePolicySatisfier
> in nematode, which should scan the asked files blocks which were placed in
> wrong storages in DNs.
> The idea is:
> # When user called on some files/dirs for satisfyStorage policy, They
> should have tracked in NN and then StoragePolicyDaemon thread will pick one
> by one file and then check the blocks which might have placed in wrong
> storage in DN than what NN is expecting it to.
> # After checking all, it should also construct the data structures for
> the required information to move a block from one storage to another.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuanbo Liu updated HDFS-10883:
--
Description:
Let's say root path ("/") is the encryption zone, and there is a file called
"/test" in root path.
{code}
dfs.getTrashRoot(new Path("/"))
{code}
returns "/user/$USER/.Trash",
while
{code}
dfs.getTrashRoot(new Path("/test"))
{code}
returns "/.Trash/$USER".
The second behavior is not correct. Since root path is the encryption zone,
which means all files/directories in DFS are encrypted, it's more reasonable
to return "/user/$USER/.Trash" no matter what the path is.
Please see the attachment to know how to reproduce this issue.
was:
Let's say root path ("/") is the encryption zone, and there is a file called
"/test" in root path.
{code}
dfs.getTrashRoot(new Path("/"))
{code}
returns "/user/$USER/.Trash",
while
{code}
dfs.getTrashRoot(new Path("/test"))
{code}
returns "/.Trash/$USER".
The second behavior is not correct. Since root path is the encryption zone,
which means all files/directories in DFS are encrypted, it's more reasonable
to return "/user/$USER/.Trash" no matter what the path is.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
> Please see the attachment to know how to reproduce this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508865#comment-15508865
]
Yuanbo Liu commented on HDFS-10883:
---
[~xiaochen] After discussing with you in HDFS-10756, I tested the behavior of
getTrashRoot in DFS and found this issue, I've uploaded a test case file in
this jira and hope to get your thoughts.
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[
https://issues.apache.org/jira/browse/HDFS-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuanbo Liu updated HDFS-10883:
--
Attachment: HDFS-10883-test-case.txt
> `getTrashRoot`'s behavior is not consistent in DFS after enabling EZ.
> -
>
> Key: HDFS-10883
> URL: https://issues.apache.org/jira/browse/HDFS-10883
> Project: Hadoop HDFS
> Issue Type: Bug
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
> Attachments: HDFS-10883-test-case.txt
>
>
> Let's say root path ("/") is the encryption zone, and there is a file called
> "/test" in root path.
> {code}
> dfs.getTrashRoot(new Path("/"))
> {code}
> returns "/user/$USER/.Trash",
> while
> {code}
> dfs.getTrashRoot(new Path("/test"))
> {code}
> returns "/.Trash/$USER".
> The second behavior is not correct. Since root path is the encryption zone,
> which means all files/directories in DFS are encrypted, it's more reasonable
> to return "/user/$USER/.Trash" no matter what the path is.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org