[jira] [Commented] (HDFS-11902) [READ] Merge BlockFormatProvider and FileRegionProvider.

2017-10-26 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220215#comment-16220215
 ] 

Ewan Higgs commented on HDFS-11902:
---

+1. The new patch looks good. The only difference is that code is moved as 
[~virajith] said.

> [READ] Merge BlockFormatProvider and FileRegionProvider.
> 
>
> Key: HDFS-11902
> URL: https://issues.apache.org/jira/browse/HDFS-11902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-11902-HDFS-9806.001.patch, 
> HDFS-11902-HDFS-9806.002.patch, HDFS-11902-HDFS-9806.003.patch, 
> HDFS-11902-HDFS-9806.004.patch, HDFS-11902-HDFS-9806.005.patch, 
> HDFS-11902-HDFS-9806.006.patch, HDFS-11902-HDFS-9806.007.patch, 
> HDFS-11902-HDFS-9806.008.patch, HDFS-11902-HDFS-9806.009.patch, 
> HDFS-11902-HDFS-9806.010.patch
>
>
> Currently {{BlockFormatProvider}} and {{TextFileRegionProvider}} perform 
> almost the same function on the Namenode and Datanode respectively. This JIRA 
> is to merge them into one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12713) [READ] Refactor FileRegion and BlockAliasMap to separate out HDFS metadata and PROVIDED storage metadata

2017-10-26 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-12713:
-

Assignee: Ewan Higgs

> [READ] Refactor FileRegion and BlockAliasMap to separate out HDFS metadata 
> and PROVIDED storage metadata
> 
>
> Key: HDFS-12713
> URL: https://issues.apache.org/jira/browse/HDFS-12713
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Ewan Higgs
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system

2017-10-26 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-9806:


Assignee: Ewan Higgs

> Allow HDFS block replicas to be provided by an external storage system
> --
>
> Key: HDFS-9806
> URL: https://issues.apache.org/jira/browse/HDFS-9806
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Chris Douglas
>Assignee: Ewan Higgs
> Attachments: HDFS-9806-design.001.pdf, HDFS-9806-design.002.pdf
>
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system

2017-10-26 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-9806:


Assignee: (was: Ewan Higgs)

> Allow HDFS block replicas to be provided by an external storage system
> --
>
> Key: HDFS-9806
> URL: https://issues.apache.org/jira/browse/HDFS-9806
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Chris Douglas
> Attachments: HDFS-9806-design.001.pdf, HDFS-9806-design.002.pdf
>
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12594) SnapshotDiff - snapshotDiff fails if the snapshotDiff report exceeds the RPC response limit

2017-10-24 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216518#comment-16216518
 ] 

Ewan Higgs commented on HDFS-12594:
---

Some minor things on a first pass:

{code}
+  if (getLastIndex() != -1) {
+setLastIndex(-1);
+  }
{code}
Why not just set it?

I think the basic design is a good approach but it would be nicer to 
restructure it by acknowledging that we're making a cursor/iterator here. So 
the report request/response as follows: 

{code}
message GetSnapshotDiffReportListingRequestProto {
  required string snapshotRoot = 1;
  required string fromSnapshot = 2;
  required string toSnapshot = 3;
  required string startPath = 4;
  required int32 index = 5 [default = -1];
}
// ...

message SnapshotDiffReportListingProto {
  // full path of the directory where snapshots were taken
  repeated SnapshotDiffReportListingEntryProto modifiedEntries = 1;
  repeated SnapshotDiffReportListingEntryProto createdEntries = 2;
  repeated SnapshotDiffReportListingEntryProto deletedEntries = 3;
  required bytes startPath = 4;
  required int32 index = 5 [default = -1];
  required bool isFromEarlier = 6;
}
{code}

... could be: 

{code}

message SnapshotDiffReportCursorProto
  required string startPath = 4;
  required int32 index = 5 [default = -1];
}

message GetSnapshotDiffReportListingRequestProto {
  required string snapshotRoot = 1;
  required string fromSnapshot = 2;
  required string toSnapshot = 3;
  optional SnapshotDiffReportCursorProto cursor = 4;
}

// ...

message SnapshotDiffReportListingProto {
  // full path of the directory where snapshots were taken
  repeated SnapshotDiffReportListingEntryProto modifiedEntries = 1;
  repeated SnapshotDiffReportListingEntryProto createdEntries = 2;
  repeated SnapshotDiffReportListingEntryProto deletedEntries = 3;
  required bool isFromEarlier = 4;
  optional SnapshotDiffReportCursorProto cursor = 5;
}
{code}

Making a request with no cursor starts at the beginning.

> SnapshotDiff - snapshotDiff fails if the snapshotDiff report exceeds the RPC 
> response limit
> ---
>
> Key: HDFS-12594
> URL: https://issues.apache.org/jira/browse/HDFS-12594
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
> Attachments: HDFS-12594.001.patch, HDFS-12594.002.patch, 
> HDFS-12594.003.patch, SnapshotDiff_Improvemnets .pdf
>
>
> The snapshotDiff command fails if the snapshotDiff report size is larger than 
> the configuration value of ipc.maximum.response.length which is by default 
> 128 MB. 
> Worst case, with all Renames ops in sanpshots each with source and target 
> name equal to MAX_PATH_LEN which is 8k characters, this would result in at 
> 8192 renames.
>  
> SnapshotDiff is currently used by distcp to optimize copy operations and in 
> case of the the diff report exceeding the limit , it fails with the below 
> exception:
> Test set: 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 112.095 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> testDiffReportWithMillionFiles(org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport)
>   Time elapsed: 111.906 sec  <<< ERROR!
> java.io.IOException: Failed on local exception: 
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; 
> Host Details : local host is: "hw15685.local/10.200.5.230"; destination host 
> is: "localhost":59808;
> Attached is the proposal for the changes required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-21 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214114#comment-16214114
 ] 

Ewan Higgs commented on HDFS-12665:
---

[~virajith], thanks for the quick review.

{quote}Can you please add javadocs for all the new classes added?{quote}
Sure.

{quote}Is there a reason to refactor FileRegion and introduce 
ProvidedStorageLocation? Also, I think the name ProvidedStorageLocation is 
confusing given there is also a StorageLocation, which is something very 
different. May be rename to ProvidedLocation.{quote}
The AliasMap is a mapping between a block and a location in an external storage 
system so we need to break the FileRegion in two: the key and the value. The 
Block is the key in this mapping and {{FileRegion}} is a good name for the 
value being stored but it's taken by the entire KV entry itself.

{quote}The new AliasMap class has a confusing name. It is supposed to be an 
implementation of the AliasMapProtocol but the name is a prefix of the 
latter.{quote}
{{AliasMapProtocol}} is an interface (hence {{Protocol}} so the concrete 
version doesn't have the suffix. It should be more clear when there are javadoc 
comments attached to it.

{quote}Renaming LevelDBAliasMapClient to something along the lines 
InMemoryLevelDBAliasMap will make it a more descriptive name for the class. In 
general, adding a similar prefix to AliasMapProtocol, LevelDBAliasMapServer 
will improve the readability of the code.{quote}Agree. This will also help 
differentiate between the {{LevelDBFileRegionFormat}} (HDFS-12591).

{quote}Can we move LevelDBAliasMapClient to the 
org.apache.hadoop.hdfs.server.common.BlockAliasMapImpl package. {quote}
Sure. This means all the classes used by fs2img will be in the same package 
(unless they need dependencies like using DynamoDB, AzureTable, etc).

{quote}ITAliasMap only contains unit tests. I believe the convention is to 
start the name of the class with Test.{quote}I think this was part of how the 
code evolved. e.g. code using MiniDFSCluster was here but moved back to the 
unit tests since the HDFS project has a different sense of what differentiates 
unit and integration tests.

{quote}Why was the block pool id removed from FileRegion? It was used as a 
check in the DN so that only blocks belonging to the correct block pool id were 
reported to the NN.{quote}In an early version we refactored it to use 
{{ExtendedBlock}} as the key but were advised that it should remain {{Block}}. 
AIUI, the AliasMap is unique to a NN so there is no ambiguity.

{quote}Why rename getVolumeMap to fetchVolumeMap in 
ProvidedBlockPoolSlice?{quote} The return type is {{void}} so naming this 
{{getVolumeMap}} is misleading.
{quote}In startAliasMapServerIfNecessary, I think the aliasmap should be 
started only if provided is configured. i.e., check if 
DFSConfigKeys.DFS_NAMENODE_PROVIDED_ENABLED is set to true.{quote}That makes 
sense. If an administrator is running with 
{{DFSConfigKeys.DFS_NAMENODE_PROVIDED_ENABLED}} set to false but 
{{DFSConfigKeys.DFS_USE_ALIASMAP}} set to true, it's pretty lame. Should we 
throw or just log a warning about the misconfiguration.

{quote}Some of the changes have lead to lines crossing the 80 character limit. 
Can you please fix them?{quote}Sure. It seems to be convention to ignore that 
in {{DFSConfigKeys,java}} but I'll take a look at fixing this up elsewhere.

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch, 
> HDFS-12665-HDFS-9806.002.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-12591) [READ] Implement LevelDBFileRegionFormat

2017-10-20 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212698#comment-16212698
 ] 

Ewan Higgs commented on HDFS-12591:
---

This now depends on HDFS-12665 since it uses the same code for reading/writing 
to leveldb as that ticket.

> [READ] Implement LevelDBFileRegionFormat
> 
>
> Key: HDFS-12591
> URL: https://issues.apache.org/jira/browse/HDFS-12591
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12591-HDFS-9806.001.patch, 
> HDFS-12591-HDFS-9806.002.patch, HDFS-12591-HDFS-9806.003.patch
>
>
> The existing work for HDFS-9806 uses an implementation of the {{FileRegion}} 
> read from a csv file. This is good for testability and diagnostic purposes, 
> but it is not very efficient for larger systems.
> There should be a version that is similar to the {{TextFileRegionFormat}} 
> that instead uses LevelDB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12591) [READ] Implement LevelDBFileRegionFormat

2017-10-20 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12591:
--
Attachment: HDFS-12591-HDFS-9806.003.patch

Attaching a patch rebased on HDFS-12665 so the two LevelDB alias maps (file 
based and NN in-memory) use the same code for reading and writing data.

> [READ] Implement LevelDBFileRegionFormat
> 
>
> Key: HDFS-12591
> URL: https://issues.apache.org/jira/browse/HDFS-12591
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12591-HDFS-9806.001.patch, 
> HDFS-12591-HDFS-9806.002.patch, HDFS-12591-HDFS-9806.003.patch
>
>
> The existing work for HDFS-9806 uses an implementation of the {{FileRegion}} 
> read from a csv file. This is good for testability and diagnostic purposes, 
> but it is not very efficient for larger systems.
> There should be a version that is similar to the {{TextFileRegionFormat}} 
> that instead uses LevelDB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-20 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12665:
--
Attachment: HDFS-12665-HDFS-9806.002.patch

Attaching an updated version using {{LevelDB}} (instead of {{LevelDb}} ) and 
making key value serde public static functions so they can be used by 
HDFS-12591.

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch, 
> HDFS-12665-HDFS-9806.002.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9810) Allow support for more than one block replica per datanode

2017-10-20 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212300#comment-16212300
 ] 

Ewan Higgs commented on HDFS-9810:
--

This should be fixed by HDFS-12685.

> Allow support for more than one block replica per datanode
> --
>
> Key: HDFS-9810
> URL: https://issues.apache.org/jira/browse/HDFS-9810
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Virajith Jalaparti
>
> Datanodes report and store only one replica of each block. It should be 
> possible to store multiple replicas among its different configured storage 
> types, particularly to support non-durable media and remote storage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12665:
--
Parent Issue: HDFS-9806  (was: HDFS-12090)

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9808) Combine READ_ONLY_SHARED DatanodeStorages with the same ID

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs resolved HDFS-9808.
--
Resolution: Won't Fix

This was part of HDFS-11190.

> Combine READ_ONLY_SHARED DatanodeStorages with the same ID
> --
>
> Key: HDFS-9808
> URL: https://issues.apache.org/jira/browse/HDFS-9808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chris Douglas
>
> In HDFS-5318, each datanode that can reach a (read only) block reports itself 
> as a valid location for the block. While accurate, this increases (redundant) 
> block report traffic and- without partitioning on the backend- may return an 
> overwhelming number of replica locations for each block.
> Instead, a DN could report only that the shared storage is reachable. The 
> contents of the storage could be reported separately/synthetically to the 
> block manager, which can collapse all instances into a single storage. A 
> subset of locations- closest to the client, etc.- can be returned, rather 
> than all possible locations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12665:
--
Status: Patch Available  (was: Open)

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12665:
--
Attachment: HDFS-12665-HDFS-9806.001.patch

Attaching work from WDC implementing the In Memory AliasMap.

This work is rebased on top of HDFS-11902.

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-12665:
-

Assignee: Ewan Higgs

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12685) [READ] FsVolumeImpl exception when scanning Provided storage volume

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12685:
--
Description: 
I left a Datanode running overnight and found this in the logs in the morning:

{code}
2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling report 
for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29  
  
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
URI scheme is not "file"

 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  

  
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 

  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)


at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)

 
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 

   
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 

   
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 

   
at java.lang.Thread.run(Thread.java:748)

  
Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" 

  
at java.io.File.(File.java:421)   

  
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319)

 
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:155)

 
at 

[jira] [Updated] (HDFS-12685) [READ] FsVolumeImpl exception when scanning Provided storage volume

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12685:
--
Summary: [READ] FsVolumeImpl exception when scanning Provided storage 
volume  (was: FsVolumeImpl exception when scanning Provided storage volume)

> [READ] FsVolumeImpl exception when scanning Provided storage volume
> ---
>
> Key: HDFS-12685
> URL: https://issues.apache.org/jira/browse/HDFS-12685
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>
> I left a Datanode running overnight and found this in the logs in the morning:
> {code}
> 2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling 
> report for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29 
>   
>  
> java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
> URI scheme is not "file"  
>   
>  
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   
>   
> 
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)   
>   
>   
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)
>   
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)
>   
>   
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
>   
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
>   
>
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)   
>   
>
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)   
>   
>   
> 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   
> 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   
>   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   
>   
> at java.lang.Thread.run(Thread.java:748)  
>   
>   
> 
> Caused by: java.lang.IllegalArgumentException: URI scheme is not "file"   
>   
>   
> 
> at java.io.File.(File.java:421) 
>   
>   

[jira] [Created] (HDFS-12685) FsVolumeImpl exception when scanning Provided storage volume

2017-10-19 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12685:
-

 Summary: FsVolumeImpl exception when scanning Provided storage 
volume
 Key: HDFS-12685
 URL: https://issues.apache.org/jira/browse/HDFS-12685
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ewan Higgs


I left a Datanode running overnight and found this in the logs in the morning:

{code}
2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling report 
for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29  
  
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
URI scheme is not "file"

 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  

  
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 

  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)


at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)

 
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 

   
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 

   
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 

   
at java.lang.Thread.run(Thread.java:748)

  
Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" 

  
at java.io.File.(File.java:421)   

  
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319)

 
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:155)
  

[jira] [Commented] (HDFS-12605) [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails after rebase

2017-10-18 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209894#comment-16209894
 ] 

Ewan Higgs commented on HDFS-12605:
---

I tested this with a NN and DN on a local system and it worked. I previously 
had the exception locally when starting DNs but applying the patch fixed it 
when I tried to bounce a DN.

+1

> [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails 
> after rebase
> -
>
> Key: HDFS-12605
> URL: https://issues.apache.org/jira/browse/HDFS-12605
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-12605-HDFS-9806.001.patch
>
>
> {{TestNameNodeProvidedImplementation#testProvidedDatanodeFailures}} fails 
> after rebase with the following error:
> {code}
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.decStorageTypeCount(DFSTopologyNodeImpl.java:127)
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.remove(DFSTopologyNodeImpl.java:318)
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.remove(DFSTopologyNodeImpl.java:336)
>   at 
> org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:222)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDatanode(DatanodeManager.java:712)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:755)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:407)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerTestUtil.noticeDeadDatanode(BlockManagerTestUtil.java:213)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeProvidedImplementation.testProvidedDatanodeFailures(TestNameNodeProvidedImplementation.java:471)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-12605) [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails after rebase

2017-10-18 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12605:
--
Comment: was deleted

(was: +1

I'm running into this on the branch and this should fix the issue.)

> [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails 
> after rebase
> -
>
> Key: HDFS-12605
> URL: https://issues.apache.org/jira/browse/HDFS-12605
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-12605-HDFS-9806.001.patch
>
>
> {{TestNameNodeProvidedImplementation#testProvidedDatanodeFailures}} fails 
> after rebase with the following error:
> {code}
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.decStorageTypeCount(DFSTopologyNodeImpl.java:127)
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.remove(DFSTopologyNodeImpl.java:318)
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.remove(DFSTopologyNodeImpl.java:336)
>   at 
> org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:222)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDatanode(DatanodeManager.java:712)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:755)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:407)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerTestUtil.noticeDeadDatanode(BlockManagerTestUtil.java:213)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeProvidedImplementation.testProvidedDatanodeFailures(TestNameNodeProvidedImplementation.java:471)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12605) [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails after rebase

2017-10-18 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209864#comment-16209864
 ] 

Ewan Higgs commented on HDFS-12605:
---

+1

I'm running into this on the branch and this should fix the issue.

> [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails 
> after rebase
> -
>
> Key: HDFS-12605
> URL: https://issues.apache.org/jira/browse/HDFS-12605
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-12605-HDFS-9806.001.patch
>
>
> {{TestNameNodeProvidedImplementation#testProvidedDatanodeFailures}} fails 
> after rebase with the following error:
> {code}
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.decStorageTypeCount(DFSTopologyNodeImpl.java:127)
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.remove(DFSTopologyNodeImpl.java:318)
>   at 
> org.apache.hadoop.hdfs.net.DFSTopologyNodeImpl.remove(DFSTopologyNodeImpl.java:336)
>   at 
> org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:222)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDatanode(DatanodeManager.java:712)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:755)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:407)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerTestUtil.noticeDeadDatanode(BlockManagerTestUtil.java:213)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeProvidedImplementation.testProvidedDatanodeFailures(TestNameNodeProvidedImplementation.java:471)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12090) Handling writes from HDFS to Provided storages

2017-10-18 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12090:
--
Attachment: HDFS-12090-Functional-Specification.002.pdf

Attaching updated version of Functional Specification with some cleanups by 
[~virajith].

> Handling writes from HDFS to Provided storages
> --
>
> Key: HDFS-12090
> URL: https://issues.apache.org/jira/browse/HDFS-12090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Virajith Jalaparti
> Attachments: HDFS-12090-Functional-Specification.001.pdf, 
> HDFS-12090-Functional-Specification.002.pdf, HDFS-12090-design.001.pdf
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in 
> external storage systems accessible through HDFS. However, HDFS-9806 is 
> limited to data being read through HDFS. This JIRA will deal with how data 
> can be written to such {{PROVIDED}} storages from HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-12090) Handling writes from HDFS to Provided storages

2017-10-17 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12090:
--
Comment: was deleted

(was: Attaching Functional Specification with description of expected command 
line and results. This should be a entry point for people new to the project.)

> Handling writes from HDFS to Provided storages
> --
>
> Key: HDFS-12090
> URL: https://issues.apache.org/jira/browse/HDFS-12090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Virajith Jalaparti
> Attachments: HDFS-12090-Functional-Specification.001.pdf, 
> HDFS-12090-design.001.pdf
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in 
> external storage systems accessible through HDFS. However, HDFS-9806 is 
> limited to data being read through HDFS. This JIRA will deal with how data 
> can be written to such {{PROVIDED}} storages from HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12090) Handling writes from HDFS to Provided storages

2017-10-17 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12090:
--
Attachment: HDFS-12090-Functional-Specification.001.pdf

Attaching functional specification. This should be a good entry point for 
anyone trying to understand what we're trying to do in this project.

> Handling writes from HDFS to Provided storages
> --
>
> Key: HDFS-12090
> URL: https://issues.apache.org/jira/browse/HDFS-12090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Virajith Jalaparti
> Attachments: HDFS-12090-Functional-Specification.001.pdf, 
> HDFS-12090-design.001.pdf
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in 
> external storage systems accessible through HDFS. However, HDFS-9806 is 
> limited to data being read through HDFS. This JIRA will deal with how data 
> can be written to such {{PROVIDED}} storages from HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12090) Handling writes from HDFS to Provided storages

2017-10-17 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12090:
--
Attachment: HDFS-12090 Functional Specification.pdf

Attaching Functional Specification with description of expected command line 
and results. This should be a entry point for people new to the project.

> Handling writes from HDFS to Provided storages
> --
>
> Key: HDFS-12090
> URL: https://issues.apache.org/jira/browse/HDFS-12090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Virajith Jalaparti
> Attachments: HDFS-12090-design.001.pdf
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in 
> external storage systems accessible through HDFS. However, HDFS-9806 is 
> limited to data being read through HDFS. This JIRA will deal with how data 
> can be written to such {{PROVIDED}} storages from HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12090) Handling writes from HDFS to Provided storages

2017-10-17 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12090:
--
Attachment: (was: HDFS-12090 Functional Specification.pdf)

> Handling writes from HDFS to Provided storages
> --
>
> Key: HDFS-12090
> URL: https://issues.apache.org/jira/browse/HDFS-12090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Virajith Jalaparti
> Attachments: HDFS-12090-design.001.pdf
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in 
> external storage systems accessible through HDFS. However, HDFS-9806 is 
> limited to data being read through HDFS. This JIRA will deal with how data 
> can be written to such {{PROVIDED}} storages from HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12666) Provided Storage Mount Manager (PSMM) mount

2017-10-16 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12666:
-

 Summary: Provided Storage Mount Manager (PSMM) mount
 Key: HDFS-12666
 URL: https://issues.apache.org/jira/browse/HDFS-12666
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ewan Higgs


Implement the Provided Storage Mount Manager. This is a service (thread) in the 
Namenode that manages backup mounts, unmounts, snapshotting, and monitoring the 
progress of backups.

On mount, the mount manager writes XATTR information at the top level of the 
mount to do the appropriate bookkeeping. This is done to maintain state in case 
the Namenode falls over.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-16 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12665:
-

 Summary: [AliasMap] Create a version of the AliasMap that runs in 
memory in the Namenode (leveldb)
 Key: HDFS-12665
 URL: https://issues.apache.org/jira/browse/HDFS-12665
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ewan Higgs


The design of Provided Storage requires the use of an AliasMap to manage the 
mapping between blocks of files on the local HDFS and ranges of files on a 
remote storage system. To reduce load from the Namenode, this can be done using 
a pluggable external service (e.g. AzureTable, Cassandra, Ratis). However, to 
aide adoption and ease of deployment, we propose an in memory version.

This AliasMap will be a wrapper around LevelDB (already a dependency from the 
Timeline Service) and use protobuf for the key (blockpool, blockid, and 
genstamp) and the value (url, offset, length, nonce). The in memory service 
will also have a configurable port on which it will listen for updates from 
Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11902) [READ] Merge BlockFormatProvider and FileRegionProvider.

2017-10-16 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11902:
--
Status: Open  (was: Patch Available)

> [READ] Merge BlockFormatProvider and FileRegionProvider.
> 
>
> Key: HDFS-11902
> URL: https://issues.apache.org/jira/browse/HDFS-11902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-11902-HDFS-9806.001.patch, 
> HDFS-11902-HDFS-9806.002.patch, HDFS-11902-HDFS-9806.003.patch, 
> HDFS-11902-HDFS-9806.004.patch, HDFS-11902-HDFS-9806.005.patch, 
> HDFS-11902-HDFS-9806.006.patch, HDFS-11902-HDFS-9806.007.patch, 
> HDFS-11902-HDFS-9806.008.patch, HDFS-11902-HDFS-9806.009.patch
>
>
> Currently {{BlockFormatProvider}} and {{TextFileRegionProvider}} perform 
> almost the same function on the Namenode and Datanode respectively. This JIRA 
> is to merge them into one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11902) [READ] Merge BlockFormatProvider and FileRegionProvider.

2017-10-16 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11902:
--
Status: Patch Available  (was: Open)

> [READ] Merge BlockFormatProvider and FileRegionProvider.
> 
>
> Key: HDFS-11902
> URL: https://issues.apache.org/jira/browse/HDFS-11902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-11902-HDFS-9806.001.patch, 
> HDFS-11902-HDFS-9806.002.patch, HDFS-11902-HDFS-9806.003.patch, 
> HDFS-11902-HDFS-9806.004.patch, HDFS-11902-HDFS-9806.005.patch, 
> HDFS-11902-HDFS-9806.006.patch, HDFS-11902-HDFS-9806.007.patch, 
> HDFS-11902-HDFS-9806.008.patch, HDFS-11902-HDFS-9806.009.patch
>
>
> Currently {{BlockFormatProvider}} and {{TextFileRegionProvider}} perform 
> almost the same function on the Namenode and Datanode respectively. This JIRA 
> is to merge them into one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11902) [READ] Merge BlockFormatProvider and FileRegionProvider.

2017-10-16 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205819#comment-16205819
 ] 

Ewan Higgs commented on HDFS-11902:
---

I removed patch 10 as it included errors and was only intended to be a basic 
style improvement. In hindsight it wasn't much of an improvement.

Virajith's patch 9 LGTM.

> [READ] Merge BlockFormatProvider and FileRegionProvider.
> 
>
> Key: HDFS-11902
> URL: https://issues.apache.org/jira/browse/HDFS-11902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-11902-HDFS-9806.001.patch, 
> HDFS-11902-HDFS-9806.002.patch, HDFS-11902-HDFS-9806.003.patch, 
> HDFS-11902-HDFS-9806.004.patch, HDFS-11902-HDFS-9806.005.patch, 
> HDFS-11902-HDFS-9806.006.patch, HDFS-11902-HDFS-9806.007.patch, 
> HDFS-11902-HDFS-9806.008.patch, HDFS-11902-HDFS-9806.009.patch
>
>
> Currently {{BlockFormatProvider}} and {{TextFileRegionProvider}} perform 
> almost the same function on the Namenode and Datanode respectively. This JIRA 
> is to merge them into one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11902) [READ] Merge BlockFormatProvider and FileRegionProvider.

2017-10-16 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11902:
--
Attachment: (was: HDFS-11902-HDFS-9806.010.patch)

> [READ] Merge BlockFormatProvider and FileRegionProvider.
> 
>
> Key: HDFS-11902
> URL: https://issues.apache.org/jira/browse/HDFS-11902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-11902-HDFS-9806.001.patch, 
> HDFS-11902-HDFS-9806.002.patch, HDFS-11902-HDFS-9806.003.patch, 
> HDFS-11902-HDFS-9806.004.patch, HDFS-11902-HDFS-9806.005.patch, 
> HDFS-11902-HDFS-9806.006.patch, HDFS-11902-HDFS-9806.007.patch, 
> HDFS-11902-HDFS-9806.008.patch, HDFS-11902-HDFS-9806.009.patch
>
>
> Currently {{BlockFormatProvider}} and {{TextFileRegionProvider}} perform 
> almost the same function on the Namenode and Datanode respectively. This JIRA 
> is to merge them into one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11902) [READ] Merge BlockFormatProvider and FileRegionProvider.

2017-10-13 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11902:
--
Attachment: HDFS-11902-HDFS-9806.010.patch

Patch fixing findbugs issue and making {ImageWriter.Options}} methods use 
{{setBlocks}} style.

> [READ] Merge BlockFormatProvider and FileRegionProvider.
> 
>
> Key: HDFS-11902
> URL: https://issues.apache.org/jira/browse/HDFS-11902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-11902-HDFS-9806.001.patch, 
> HDFS-11902-HDFS-9806.002.patch, HDFS-11902-HDFS-9806.003.patch, 
> HDFS-11902-HDFS-9806.004.patch, HDFS-11902-HDFS-9806.005.patch, 
> HDFS-11902-HDFS-9806.006.patch, HDFS-11902-HDFS-9806.007.patch, 
> HDFS-11902-HDFS-9806.008.patch, HDFS-11902-HDFS-9806.009.patch, 
> HDFS-11902-HDFS-9806.010.patch
>
>
> Currently {{BlockFormatProvider}} and {{TextFileRegionProvider}} perform 
> almost the same function on the Namenode and Datanode respectively. This JIRA 
> is to merge them into one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12591) [READ] Implement LevelDBFileRegionFormat

2017-10-12 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-12591:
-

Assignee: Ewan Higgs

> [READ] Implement LevelDBFileRegionFormat
> 
>
> Key: HDFS-12591
> URL: https://issues.apache.org/jira/browse/HDFS-12591
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12591-HDFS-9806.001.patch, 
> HDFS-12591-HDFS-9806.002.patch
>
>
> The existing work for HDFS-9806 uses an implementation of the {{FileRegion}} 
> read from a csv file. This is good for testability and diagnostic purposes, 
> but it is not very efficient for larger systems.
> There should be a version that is similar to the {{TextFileRegionFormat}} 
> that instead uses LevelDB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12478) [WRITE] Command line tools for managing Provided Storage Backup mounts

2017-10-11 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12478:
--
Status: Patch Available  (was: Open)

> [WRITE] Command line tools for managing Provided Storage Backup mounts
> --
>
> Key: HDFS-12478
> URL: https://issues.apache.org/jira/browse/HDFS-12478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12478-HDFS-9806.001.patch
>
>
> This is a task for implementing the command line interface for attaching a 
> PROVIDED storage backup system (see HDFS-9806, HDFS-12090).
> # The administrator should be able to mount a PROVIDED storage volume from 
> the command line. 
> {code}hdfs attach -create [-name ]   path (external)>{code}
> # Whitelist of users who are able to manage mounts (create, attach, detach).
> # Be able to interrogate the status of the attached storage (last time a 
> snapshot was taken, files being backed up).
> # The administrator should be able to remove an attached PROVIDED storage 
> volume from the command line. This simply means that the synchronization 
> process no longer runs. If the administrator has configured their setup to no 
> longer have local copies of the data, the blocks in the subtree are simply no 
> longer accessible as the external file store system is currently inaccessible.
> {code}hdfs attach -remove  [-force | -flush]{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12478) [WRITE] Command line tools for managing Provided Storage Backup mounts

2017-10-11 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12478:
--
Attachment: HDFS-12478-HDFS-9806.001.patch

Attaching a patch that implements the command line interface.

This takes [~aw]'s advice that {{hdfs attach -remove}} isn't ideal. Instead the 
command line uses {{hdfs syncservice -backupOnly}}.

> [WRITE] Command line tools for managing Provided Storage Backup mounts
> --
>
> Key: HDFS-12478
> URL: https://issues.apache.org/jira/browse/HDFS-12478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12478-HDFS-9806.001.patch
>
>
> This is a task for implementing the command line interface for attaching a 
> PROVIDED storage backup system (see HDFS-9806, HDFS-12090).
> # The administrator should be able to mount a PROVIDED storage volume from 
> the command line. 
> {code}hdfs attach -create [-name ]   path (external)>{code}
> # Whitelist of users who are able to manage mounts (create, attach, detach).
> # Be able to interrogate the status of the attached storage (last time a 
> snapshot was taken, files being backed up).
> # The administrator should be able to remove an attached PROVIDED storage 
> volume from the command line. This simply means that the synchronization 
> process no longer runs. If the administrator has configured their setup to no 
> longer have local copies of the data, the blocks in the subtree are simply no 
> longer accessible as the external file store system is currently inaccessible.
> {code}hdfs attach -remove  [-force | -flush]{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11045) TestDirectoryScanner#testThrottling fails: Throttle is too permissive

2017-10-10 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198703#comment-16198703
 ] 

Ewan Higgs commented on HDFS-11045:
---

[~dan...@cloudera.com], can you rebase this?

{quote} Karthik Kambatla just pointed out an alternative I should have thought 
of a long time ago: don't actually run the scanner. It looks like I should be 
able to easily take the ReportCompiler out of context and test it on its 
own.{quote}
Indeed, if you're trying to test the throttling sleep calculation then there 
should be a function that receives some times and computes a sleep value. Then 
you pump it will all sorts of values and verify that it calculated the correct 
amount of time to sleep.

If you want to test the throttling mechanism then I suggest wrapping 
{{Thread.sleep}} so it can be mocked and then make sure it was called with the 
appropriate values.

> TestDirectoryScanner#testThrottling fails: Throttle is too permissive
> -
>
> Key: HDFS-11045
> URL: https://issues.apache.org/jira/browse/HDFS-11045
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Assignee: Daniel Templeton
>Priority: Minor
>  Labels: flaky-test
> Attachments: HDFS-11045.001.patch, HDFS-11045.002.patch, 
> HDFS-11045.003.patch, HDFS-11045.004.patch, HDFS-11045.005.patch, 
> HDFS-11045.006.patch, HDFS-11045.007.patch, HDFS-11045.008.patch, 
> HDFS-11045.009.patch
>
>
>   TestDirectoryScanner.testThrottling:709 Throttle is too permissive
> https://builds.apache.org/job/PreCommit-HDFS-Build/17259/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11707) TestDirectoryScanner#testThrottling fails on OSX

2017-10-10 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198685#comment-16198685
 ] 

Ewan Higgs commented on HDFS-11707:
---

This should probably be closed as duplicate of HDFS-11045

> TestDirectoryScanner#testThrottling fails on OSX
> 
>
> Key: HDFS-11707
> URL: https://issues.apache.org/jira/browse/HDFS-11707
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Erik Krogen
>Priority: Minor
>
> In branch-2 and trunk, {{TestDirectoryScanner#testThrottling}} consistently 
> fails on OS X (I'm running 10.11 specifically) with:
> {code}
> java.lang.AssertionError: Throttle is too permissive
> {code}
> It seems to work alright on Unix systems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11707) TestDirectoryScanner#testThrottling fails on OSX

2017-10-10 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198672#comment-16198672
 ] 

Ewan Higgs commented on HDFS-11707:
---

I've seen this error pop up. One theory is that if the throttle timer elapses 
for over a second, the function doesn't handle this whatsoever.

Also, in the HDFS-8873, [~cmccabe] asked for the timing based tests to be 
removed. These ratios are definitely timing based tests.

> TestDirectoryScanner#testThrottling fails on OSX
> 
>
> Key: HDFS-11707
> URL: https://issues.apache.org/jira/browse/HDFS-11707
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Erik Krogen
>Priority: Minor
>
> In branch-2 and trunk, {{TestDirectoryScanner#testThrottling}} consistently 
> fails on OS X (I'm running 10.11 specifically) with:
> {code}
> java.lang.AssertionError: Throttle is too permissive
> {code}
> It seems to work alright on Unix systems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11707) TestDirectoryScanner#testThrottling fails on OSX

2017-10-10 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198672#comment-16198672
 ] 

Ewan Higgs edited comment on HDFS-11707 at 10/10/17 1:33 PM:
-

I've seen this error pop up. One theory is that if the throttle timer elapses 
for over a second, the function doesn't handle this whatsoever afaict.

Also, in the HDFS-8873, [~cmccabe] asked for the timing based tests to be 
removed. These ratios are definitely timing based tests.


was (Author: ehiggs):
I've seen this error pop up. One theory is that if the throttle timer elapses 
for over a second, the function doesn't handle this whatsoever.

Also, in the HDFS-8873, [~cmccabe] asked for the timing based tests to be 
removed. These ratios are definitely timing based tests.

> TestDirectoryScanner#testThrottling fails on OSX
> 
>
> Key: HDFS-11707
> URL: https://issues.apache.org/jira/browse/HDFS-11707
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Erik Krogen
>Priority: Minor
>
> In branch-2 and trunk, {{TestDirectoryScanner#testThrottling}} consistently 
> fails on OS X (I'm running 10.11 specifically) with:
> {code}
> java.lang.AssertionError: Throttle is too permissive
> {code}
> It seems to work alright on Unix systems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12591) [READ] Implement LevelDBFileRegionFormat

2017-10-04 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12591:
--
Attachment: HDFS-12591-HDFS-9806.001.patch

Attaching work I've previously done for this. It needs to be rebased onto the 
HDFS-9806 branch.

> [READ] Implement LevelDBFileRegionFormat
> 
>
> Key: HDFS-12591
> URL: https://issues.apache.org/jira/browse/HDFS-12591
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12591-HDFS-9806.001.patch
>
>
> The existing work for HDFS-9806 uses an implementation of the {{FileRegion}} 
> read from a csv file. This is good for testability and diagnostic purposes, 
> but it is not very efficient for larger systems.
> There should be a version that is similar to the {{TextFileRegionFormat}} 
> that instead uses LevelDB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12591) [READ] Implement LevelDBFileRegionFormat

2017-10-04 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12591:
--
Status: Patch Available  (was: Open)

> [READ] Implement LevelDBFileRegionFormat
> 
>
> Key: HDFS-12591
> URL: https://issues.apache.org/jira/browse/HDFS-12591
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12591-HDFS-9806.001.patch
>
>
> The existing work for HDFS-9806 uses an implementation of the {{FileRegion}} 
> read from a csv file. This is good for testability and diagnostic purposes, 
> but it is not very efficient for larger systems.
> There should be a version that is similar to the {{TextFileRegionFormat}} 
> that instead uses LevelDB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12591) [READ] Implement LevelDBFileRegionFormat

2017-10-04 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12591:
-

 Summary: [READ] Implement LevelDBFileRegionFormat
 Key: HDFS-12591
 URL: https://issues.apache.org/jira/browse/HDFS-12591
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Ewan Higgs
Priority: Minor


The existing work for HDFS-9806 uses an implementation of the {{FileRegion}} 
read from a csv file. This is good for testability and diagnostic purposes, but 
it is not very efficient for larger systems.

There should be a version that is similar to the {{TextFileRegionFormat}} that 
instead uses LevelDB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12589) [DISCUSS] Provided Storage BlockAlias Refactoring

2017-10-04 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191054#comment-16191054
 ] 

Ewan Higgs commented on HDFS-12589:
---

Some discussion happened off jira; but we'd much prefer these discussions to be 
in the open and tracked:

>From [~ehiggs]
{quote}
Regarding the BlockAlias, we suggested that we get rid of it since the 
interface is insufficient to work with and it’s not clear how it should be used 
to e.g. dispatch writes to the correct ProvidedVolumeImpl. We proposed to 
replace this with having new styles of retrieving data use their own URI scheme 
(e.g. myformat://). Also, if there are other requirements, it could potentially 
be held in an extra byte[] in the FileRegion to hold extra information that a 
custom ProvidedVolumeImpl could use.
{quote}

>From [~chris.douglas]:
{quote}
There’s a long tradition of stuffing metadata into URIs, so I won’t argue that 
this restricts possible implementations. As we discussed during the call, if 
there are a set of possible providers, an alias doesn’t contain enough 
information to dispatch among them. Since Hadoop already has a mechanism, 
kludgy as it may be, for looking up different FileSystems based on Path/URIs, 
we could use the existing scheme/authority/principal cache instead of layering 
another layer of indirection on top of it.
 
I’ll outline my reservations. The existing object store “FileSystem” 
implementations already manage some impedance mismatches, translating 
hierarchical operations into those stores. Moreover, the layers people are 
adding to HDFS in HBase, Hive/LLAP, etc. are working around the namesystem, 
mostly treating HDFS as if it were a (not particularly good) object store. If 
we make everything into a FileRegion, we’re baking in the FileSystem coupling 
between the HDFS block layer and the provided store. We’re baking in its 
versatility- which is likely sufficient- but also its disadvantages.
 
For example, there are no good batch APIs to FileSystem. There are no 
reasonable async APIs, and the ones being built have no consistency guarantees. 
We’ve been trying to introduce an API providing the most basic consistency 
guarantee, and that’s taken a year of negotiation and prototyping.
 
Thomas/Ewan, you guys are more familiar with the limitations of S3Guard than I 
am. If those won’t materially affect the implementation of future provided 
stores (or those invariants are useful to their implementation) then I won’t 
insist on an abstraction that only gets in the way of implementation. -C
{quote}

> [DISCUSS] Provided Storage BlockAlias Refactoring
> -
>
> Key: HDFS-12589
> URL: https://issues.apache.org/jira/browse/HDFS-12589
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Priority: Minor
>
> A BlockAlias is an interface used by the Datanode to determine where to 
> retrieve data from. It currently has a single implementation: {{FileRegion}} 
> which contains the Block, BlockPoolID, Provided URL for the FileRegion (i.e. 
> block); and length and offset of the FileRegion in the remote storage.
> The BlockAlias currently has a single method: {{getBlock}}. This is not 
> particularly useful since we can't ask it meaningful questions like 'how do 
> we retrieve the data from the external storage system?'. Or 'is the version 
> of the block in the external storage system up to data?'. Either we can do 
> away with the BlockAlias altogether and work with FileRegion, or the 
> BlockAlias needs to be made more robust.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12589) [DISCUSS] Provided Storage BlockAlias Refactoring

2017-10-04 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12589:
-

 Summary: [DISCUSS] Provided Storage BlockAlias Refactoring
 Key: HDFS-12589
 URL: https://issues.apache.org/jira/browse/HDFS-12589
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Ewan Higgs
Priority: Minor


A BlockAlias is an interface used by the Datanode to determine where to 
retrieve data from. It currently has a single implementation: {{FileRegion}} 
which contains the Block, BlockPoolID, Provided URL for the FileRegion (i.e. 
block); and length and offset of the FileRegion in the remote storage.

The BlockAlias currently has a single method: {{getBlock}}. This is not 
particularly useful since we can't ask it meaningful questions like 'how do we 
retrieve the data from the external storage system?'. Or 'is the version of the 
block in the external storage system up to data?'. Either we can do away with 
the BlockAlias altogether and work with FileRegion, or the BlockAlias needs to 
be made more robust.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12478) [WRITE] Command line tools for managing Provided Storage Backup mounts

2017-09-18 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12478:
--
Issue Type: Sub-task  (was: Task)
Parent: HDFS-12090

> [WRITE] Command line tools for managing Provided Storage Backup mounts
> --
>
> Key: HDFS-12478
> URL: https://issues.apache.org/jira/browse/HDFS-12478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
>
> This is a task for implementing the command line interface for attaching a 
> PROVIDED storage backup system (see HDFS-9806, HDFS-12090).
> # The administrator should be able to mount a PROVIDED storage volume from 
> the command line. 
> {code}hdfs attach -create [-name ]   path (external)>{code}
> # Whitelist of users who are able to manage mounts (create, attach, detach).
> # Be able to interrogate the status of the attached storage (last time a 
> snapshot was taken, files being backed up).
> # The administrator should be able to remove an attached PROVIDED storage 
> volume from the command line. This simply means that the synchronization 
> process no longer runs. If the administrator has configured their setup to no 
> longer have local copies of the data, the blocks in the subtree are simply no 
> longer accessible as the external file store system is currently inaccessible.
> {code}hdfs attach -remove  [-force | -flush]{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12478) [WRITE] Command line tools for managing Provided Storage Backup mounts

2017-09-18 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-12478:
-

Assignee: Ewan Higgs

> [WRITE] Command line tools for managing Provided Storage Backup mounts
> --
>
> Key: HDFS-12478
> URL: https://issues.apache.org/jira/browse/HDFS-12478
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
>
> This is a task for implementing the command line interface for attaching a 
> PROVIDED storage backup system (see HDFS-9806, HDFS-12090).
> # The administrator should be able to mount a PROVIDED storage volume from 
> the command line. 
> {code}hdfs attach -create [-name ]   path (external)>{code}
> # Whitelist of users who are able to manage mounts (create, attach, detach).
> # Be able to interrogate the status of the attached storage (last time a 
> snapshot was taken, files being backed up).
> # The administrator should be able to remove an attached PROVIDED storage 
> volume from the command line. This simply means that the synchronization 
> process no longer runs. If the administrator has configured their setup to no 
> longer have local copies of the data, the blocks in the subtree are simply no 
> longer accessible as the external file store system is currently inaccessible.
> {code}hdfs attach -remove  [-force | -flush]{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12090) Handling writes from HDFS to Provided storages

2017-09-18 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-12090:
-

Assignee: (was: Ewan Higgs)

> Handling writes from HDFS to Provided storages
> --
>
> Key: HDFS-12090
> URL: https://issues.apache.org/jira/browse/HDFS-12090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Virajith Jalaparti
> Attachments: HDFS-12090-design.001.pdf
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in 
> external storage systems accessible through HDFS. However, HDFS-9806 is 
> limited to data being read through HDFS. This JIRA will deal with how data 
> can be written to such {{PROVIDED}} storages from HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12090) Handling writes from HDFS to Provided storages

2017-09-18 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-12090:
-

Assignee: Ewan Higgs

> Handling writes from HDFS to Provided storages
> --
>
> Key: HDFS-12090
> URL: https://issues.apache.org/jira/browse/HDFS-12090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Virajith Jalaparti
>Assignee: Ewan Higgs
> Attachments: HDFS-12090-design.001.pdf
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in 
> external storage systems accessible through HDFS. However, HDFS-9806 is 
> limited to data being read through HDFS. This JIRA will deal with how data 
> can be written to such {{PROVIDED}} storages from HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12478) [WRITE] Command line tools for managing Provided Storage Backup mounts

2017-09-18 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12478:
-

 Summary: [WRITE] Command line tools for managing Provided Storage 
Backup mounts
 Key: HDFS-12478
 URL: https://issues.apache.org/jira/browse/HDFS-12478
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Ewan Higgs
Priority: Minor


This is a task for implementing the command line interface for attaching a 
PROVIDED storage backup system (see HDFS-9806, HDFS-12090).

# The administrator should be able to mount a PROVIDED storage volume from the 
command line. 
{code}hdfs attach -create [-name ]  {code}
# Whitelist of users who are able to manage mounts (create, attach, detach).
# Be able to interrogate the status of the attached storage (last time a 
snapshot was taken, files being backed up).
# The administrator should be able to remove an attached PROVIDED storage 
volume from the command line. This simply means that the synchronization 
process no longer runs. If the administrator has configured their setup to no 
longer have local copies of the data, the blocks in the subtree are simply no 
longer accessible as the external file store system is currently inaccessible.
{code}hdfs attach -remove  [-force | -flush]{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12344) LocatedFileStatus regression: no longer accepting null FSPermission

2017-08-24 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139971#comment-16139971
 ] 

Ewan Higgs commented on HDFS-12344:
---

javac issue is because i introduced a new call to a deprecated function. Of 
course, this is the purpose of the patch.

> LocatedFileStatus regression: no longer accepting null FSPermission
> ---
>
> Key: HDFS-12344
> URL: https://issues.apache.org/jira/browse/HDFS-12344
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12344.001.patch, HDFS-12344.002.patch
>
>
> SPARK-21817 was opened to fix a NPE in Spark where it calls 
> {{LocatedFileStatus}} with a null {{FSPermission}}. This breaks in current 
> HEAD. However, the {{LocatedFileStatus}} is a stable/evolving API so this is 
> actually a regression introduced in HDFS-6984.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12344) LocatedFileStatus regression: no longer accepting null FSPermission

2017-08-24 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12344:
--
Attachment: HDFS-12344.002.patch

Attaching updated patch based on [~steve_l]'s comments.


> LocatedFileStatus regression: no longer accepting null FSPermission
> ---
>
> Key: HDFS-12344
> URL: https://issues.apache.org/jira/browse/HDFS-12344
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12344.001.patch, HDFS-12344.002.patch
>
>
> SPARK-21817 was opened to fix a NPE in Spark where it calls 
> {{LocatedFileStatus}} with a null {{FSPermission}}. This breaks in current 
> HEAD. However, the {{LocatedFileStatus}} is a stable/evolving API so this is 
> actually a regression introduced in HDFS-6984.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12344) LocatedFileStatus regression: no longer accepting null FSPermission

2017-08-24 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12344:
--
Status: Patch Available  (was: Open)

> LocatedFileStatus regression: no longer accepting null FSPermission
> ---
>
> Key: HDFS-12344
> URL: https://issues.apache.org/jira/browse/HDFS-12344
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12344.001.patch
>
>
> SPARK-21817 was opened to fix a NPE in Spark where it calls 
> {{LocatedFileStatus}} with a null {{FSPermission}}. This breaks in current 
> HEAD. However, the {{LocatedFileStatus}} is a stable/evolving API so this is 
> actually a regression introduced in HDFS-6984.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12344) LocatedFileStatus regression: no longer accepting null FSPermission

2017-08-24 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12344:
--
Attachment: HDFS-12344.001.patch

Attaching a failing test and the fix. [~steve_l] and/or [~chris.douglas] can 
review.

> LocatedFileStatus regression: no longer accepting null FSPermission
> ---
>
> Key: HDFS-12344
> URL: https://issues.apache.org/jira/browse/HDFS-12344
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
> Attachments: HDFS-12344.001.patch
>
>
> SPARK-21817 was opened to fix a NPE in Spark where it calls 
> {{LocatedFileStatus}} with a null {{FSPermission}}. This breaks in current 
> HEAD. However, the {{LocatedFileStatus}} is a stable/evolving API so this is 
> actually a regression introduced in HDFS-6984.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12344) LocatedFileStatus regression: no longer accepting null FSPermission

2017-08-23 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-12344:
-

Assignee: Ewan Higgs

> LocatedFileStatus regression: no longer accepting null FSPermission
> ---
>
> Key: HDFS-12344
> URL: https://issues.apache.org/jira/browse/HDFS-12344
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Minor
>
> SPARK-21817 was opened to fix a NPE in Spark where it calls 
> {{LocatedFileStatus}} with a null {{FSPermission}}. This breaks in current 
> HEAD. However, the {{LocatedFileStatus}} is a stable/evolving API so this is 
> actually a regression introduced in HDFS-6984.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12344) LocatedFileStatus regression: no longer accepting null FSPermission

2017-08-23 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12344:
-

 Summary: LocatedFileStatus regression: no longer accepting null 
FSPermission
 Key: HDFS-12344
 URL: https://issues.apache.org/jira/browse/HDFS-12344
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ewan Higgs
Priority: Minor


SPARK-21817 was opened to fix a NPE in Spark where it calls 
{{LocatedFileStatus}} with a null {{FSPermission}}. This breaks in current 
HEAD. However, the {{LocatedFileStatus}} is a stable/evolving API so this is 
actually a regression introduced in HDFS-6984.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-08-23 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138036#comment-16138036
 ] 

Ewan Higgs commented on HDFS-11639:
---

{quote}Hi Ewan Higgs, As this change is required only for writes, we can move 
this to HDFS-12090. Are you OK with that?
{quote}

Sure. It needs a rebase as well, I see.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch, HDFS-11639-HDFS-9806.003.patch, 
> HDFS-11639-HDFS-9806.004.patch, HDFS-11639-HDFS-9806.005.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-08-23 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11639:
--
Parent Issue: HDFS-12090  (was: HDFS-9806)

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch, HDFS-11639-HDFS-9806.003.patch, 
> HDFS-11639-HDFS-9806.004.patch, HDFS-11639-HDFS-9806.005.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11828) [READ] Refactor FsDatasetImpl to use the BlockAlias from ClientProtocol for PROVIDED blocks.

2017-08-23 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138040#comment-16138040
 ] 

Ewan Higgs commented on HDFS-11828:
---

{quote} Ewan Higgs, Similar to HDFS-11639, we can make this a sub-task of 
HDFS-12090?{quote}
I moved it now.

> [READ] Refactor FsDatasetImpl to use the BlockAlias from ClientProtocol for 
> PROVIDED blocks.
> 
>
> Key: HDFS-11828
> URL: https://issues.apache.org/jira/browse/HDFS-11828
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>
> From HDFS-11639:
> {quote}[~virajith]
> Looking over this patch, one thing that occurred to me is if it makes sense 
> to unify FileRegionProvider with BlockProvider? They both have very close 
> functionality.
> I like the use of BlockProvider#resolve(). If we unify FileRegionProvider 
> with BlockProvider, then resolve can return null if the block map is 
> accessible from the Datanodes also. If it is accessible only from the 
> Namenode, then a non-null value can be propagated to the Datanode.
> One of the motivations for adding the BlockAlias to the client protocol was 
> to have the blocks map only on the Namenode. In this scenario, the ReplicaMap 
> in FsDatasetImpl of will not have any replicas apriori. Thus, one way to 
> ensure that the FsDatasetImpl interface continues to function as today is to 
> create a FinalizedProvidedReplica in FsDatasetImpl#getBlockInputStream when 
> BlockAlias is not null.
> {quote}
> {quote}[~ehiggs]
> With the pending refactoring of the FsDatasetImpl which won't have replicas a 
> priori, I wonder if it makes sense for the Datanode to have a 
> FileRegionProvider or BlockProvider at all. They are given the appropriate 
> block ID and block alias in the readBlock or writeBlock message. Maybe I'm 
> overlooking what's still being provided.{quote}
> {quote}[~virajith]
> I was trying to reconcile the existing design (FsDatasetImpl knows about 
> provided blocks apriori) with the new design where FsDatasetImpl will not 
> know about these before but just constructs them on-the-fly using the 
> BlockAlias from readBlock or writeBlock. Using BlockProvider#resolve() allows 
> us to have both designs exist in parallel. I was wondering if we should still 
> retain the earlier given the latter design.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11828) [READ] Refactor FsDatasetImpl to use the BlockAlias from ClientProtocol for PROVIDED blocks.

2017-08-23 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11828:
--
Parent Issue: HDFS-12090  (was: HDFS-9806)

> [READ] Refactor FsDatasetImpl to use the BlockAlias from ClientProtocol for 
> PROVIDED blocks.
> 
>
> Key: HDFS-11828
> URL: https://issues.apache.org/jira/browse/HDFS-11828
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>
> From HDFS-11639:
> {quote}[~virajith]
> Looking over this patch, one thing that occurred to me is if it makes sense 
> to unify FileRegionProvider with BlockProvider? They both have very close 
> functionality.
> I like the use of BlockProvider#resolve(). If we unify FileRegionProvider 
> with BlockProvider, then resolve can return null if the block map is 
> accessible from the Datanodes also. If it is accessible only from the 
> Namenode, then a non-null value can be propagated to the Datanode.
> One of the motivations for adding the BlockAlias to the client protocol was 
> to have the blocks map only on the Namenode. In this scenario, the ReplicaMap 
> in FsDatasetImpl of will not have any replicas apriori. Thus, one way to 
> ensure that the FsDatasetImpl interface continues to function as today is to 
> create a FinalizedProvidedReplica in FsDatasetImpl#getBlockInputStream when 
> BlockAlias is not null.
> {quote}
> {quote}[~ehiggs]
> With the pending refactoring of the FsDatasetImpl which won't have replicas a 
> priori, I wonder if it makes sense for the Datanode to have a 
> FileRegionProvider or BlockProvider at all. They are given the appropriate 
> block ID and block alias in the readBlock or writeBlock message. Maybe I'm 
> overlooking what's still being provided.{quote}
> {quote}[~virajith]
> I was trying to reconcile the existing design (FsDatasetImpl knows about 
> provided blocks apriori) with the new design where FsDatasetImpl will not 
> know about these before but just constructs them on-the-fly using the 
> BlockAlias from readBlock or writeBlock. Using BlockProvider#resolve() allows 
> us to have both designs exist in parallel. I was wondering if we should still 
> retain the earlier given the latter design.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12289) [READ] HDFS-12091 breaks the tests for provided block reads

2017-08-11 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124220#comment-16124220
 ] 

Ewan Higgs commented on HDFS-12289:
---

LGTM

> [READ] HDFS-12091 breaks the tests for provided block reads
> ---
>
> Key: HDFS-12289
> URL: https://issues.apache.org/jira/browse/HDFS-12289
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
> Attachments: HDFS-12289-HDFS-9806.001.patch
>
>
> In the tests within {{TestNameNodeProvidedImplementation}}, the files that 
> are supposed to belong to a provided volume are not located under the Storage 
> directory assigned to the volume in {{MiniDFSCluster}}. With HDFS-12091, this 
> isn't correct and thus, it breaks the tests. This JIRA is to fix the tests 
> under {{TestNameNodeProvidedImplementation}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12091) [READ] Check that the replicas served from a {{ProvidedVolumeImpl}} belong to the correct external storage

2017-08-02 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1687#comment-1687
 ] 

Ewan Higgs commented on HDFS-12091:
---

Thanks for adding the test.

I tested the first patch and now we have the test so LGTM. 

> [READ] Check that the replicas served from a {{ProvidedVolumeImpl}} belong to 
> the correct external storage
> --
>
> Key: HDFS-12091
> URL: https://issues.apache.org/jira/browse/HDFS-12091
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-12091-HDFS-9806.001.patch, 
> HDFS-12091-HDFS-9806.002.patch
>
>
> A {{ProvidedVolumeImpl}} can only serve blocks that "belong" to it. i.e., for 
> blocks served from a {{ProvidedVolumeImpl}}, the {{baseURI}} of the 
> {{ProvidedVolumeImpl}} should be a prefix of the URI of the blocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12091) [READ] Check that the replicas served from a {{ProvidedVolumeImpl}} belong to the correct external storage

2017-07-27 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103235#comment-16103235
 ] 

Ewan Higgs commented on HDFS-12091:
---

I tested it on a simple 1NN 1DN setup and it worked.

Could you add some tests for basic algorithms for 
ProvidedVolumeImpl.containsURI with example URLs. This can be static as well 
for ease of testing.

Nits: indentation in ProvidedVolumeImpl is messed about. Somehow checkstyle 
didn't pick it up.

> [READ] Check that the replicas served from a {{ProvidedVolumeImpl}} belong to 
> the correct external storage
> --
>
> Key: HDFS-12091
> URL: https://issues.apache.org/jira/browse/HDFS-12091
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-12091-HDFS-9806.001.patch
>
>
> A {{ProvidedVolumeImpl}} can only serve blocks that "belong" to it. i.e., for 
> blocks served from a {{ProvidedVolumeImpl}}, the {{baseURI}} of the 
> {{ProvidedVolumeImpl}} should be a prefix of the URI of the blocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12093) [READ] Share remoteFS between ProvidedReplica instances.

2017-07-27 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103078#comment-16103078
 ] 

Ewan Higgs edited comment on HDFS-12093 at 7/27/17 12:09 PM:
-

Tested this on a simple 1 NN 1 DN shared machine and it was able to start the 
DN much faster. So that aspect is fixed.

LGTM.

I did run into an exception but I think it's fixed by HDFS-12091.


was (Author: ehiggs):
Tested this on a simple 1 NN 1 DN shared machine and it was able to start the 
DN much faster. So that aspect is fixed.

One issue I did run into, however, is an exception in the FsVolumeSpi. I'm not 
sure if it's related:

{code}
2017-07-27 13:18:45,599 INFO impl.FsVolumeImpl: Adding ScanInfo for blkid 
1073741825 
2017-07-27 13:18:45,600 ERROR datanode.DirectoryScanner: Error compiling report 
for the volume, StorageId: DS-e89a096e-ba2c-4e85-bf2b-5321e8f93852  
   
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
URI scheme is not "file"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  
 
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 
 
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)
  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)
   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
  
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
 
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

   
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

  
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
  
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
  
at java.lang.Thread.run(Thread.java:745)  
Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" 
 
at java.io.File.(File.java:421) 
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319)

at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:151)

  
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl.compileReport(ProvidedVolumeImpl.java:482)

 
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.call(DirectoryScanner.java:618)

at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.call(DirectoryScanner.java:581)

at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
 
... 3 more
{code}

This is coming from the following:
{code}
public ScanInfo(long blockId, File blockFile, File metaFile,
FsVolumeSpi vol, FileRegion fileRegion, long length) {
  this.blockId = blockId;
  String condensedVolPath =
  (vol == null || vol.getBaseURI() == null) ? null :
getCondensedPath(new File(vol.getBaseURI()).getAbsolutePath()); // 
<-- vol.getBaseURI will return my volume's scheme (s3a).
  this.blockSuffix = blockFile == null ? null :
getSuffix(blockFile, condensedVolPath);
  this.blockLength = length;
  if (metaFile == null) {
this.metaSuffix = null;
  } else if (blockFile == null) {
this.metaSuffix = getSuffix(metaFile, condensedVolPath);
  } else {
this.metaSuffix = getSuffix(metaFile,
condensedVolPath + blockSuffix);
  }
  this.volume = vol;
  this.fileRegion = fileRegion;

[jira] [Commented] (HDFS-12093) [READ] Share remoteFS between ProvidedReplica instances.

2017-07-27 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103078#comment-16103078
 ] 

Ewan Higgs commented on HDFS-12093:
---

Tested this on a simple 1 NN 1 DN shared machine and it was able to start the 
DN much faster. So that aspect is fixed.

One issue I did run into, however, is an exception in the FsVolumeSpi. I'm not 
sure if it's related:

{code}
2017-07-27 13:18:45,599 INFO impl.FsVolumeImpl: Adding ScanInfo for blkid 
1073741825 
2017-07-27 13:18:45,600 ERROR datanode.DirectoryScanner: Error compiling report 
for the volume, StorageId: DS-e89a096e-ba2c-4e85-bf2b-5321e8f93852  
   
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
URI scheme is not "file"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  
 
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 
 
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)
  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)
   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
  
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
 
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

   
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

  
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
  
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
  
at java.lang.Thread.run(Thread.java:745)  
Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" 
 
at java.io.File.(File.java:421) 
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319)

at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:151)

  
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl.compileReport(ProvidedVolumeImpl.java:482)

 
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.call(DirectoryScanner.java:618)

at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.call(DirectoryScanner.java:581)

at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
 
... 3 more
{code}

This is coming from the following:
{code}
public ScanInfo(long blockId, File blockFile, File metaFile,
FsVolumeSpi vol, FileRegion fileRegion, long length) {
  this.blockId = blockId;
  String condensedVolPath =
  (vol == null || vol.getBaseURI() == null) ? null :
getCondensedPath(new File(vol.getBaseURI()).getAbsolutePath()); // 
<-- vol.getBaseURI will return my volume's scheme (s3a).
  this.blockSuffix = blockFile == null ? null :
getSuffix(blockFile, condensedVolPath);
  this.blockLength = length;
  if (metaFile == null) {
this.metaSuffix = null;
  } else if (blockFile == null) {
this.metaSuffix = getSuffix(metaFile, condensedVolPath);
  } else {
this.metaSuffix = getSuffix(metaFile,
condensedVolPath + blockSuffix);
  }
  this.volume = vol;
  this.fileRegion = fileRegion;
}
{code}

Not sure if this is related or needs to be fixed under this ticket.

> [READ] Share remoteFS between ProvidedReplica instances.
> 
>
> Key: HDFS-12093
> URL: 

[jira] [Commented] (HDFS-12151) Hadoop 2 clients cannot writeBlock to Hadoop 3 DataNodes

2017-07-26 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101347#comment-16101347
 ] 

Ewan Higgs commented on HDFS-12151:
---

{quote}Should we use {{nst > 0}} rather than {{targetStorageTypes.length > 0}} 
(amended) here for clarity?{quote}
Yes.

{quote}
Should the {{targetStorageTypes.length > 0}} check really be {{nsi > 0}}? We 
could elide it then since it's already captured in the outside if.
{quote}
This does look redundant since {{targetStorageIds.length}} will be either 0 or 
{{== targetStorageTypes.length}}

{quote}
Finally, I don't understand why we need to add the targeted ID/type for 
checkAccess. Each DN only needs to validate itself, yea? BTSM#checkAccess 
indicates this in its javadoc, but it looks like we run through ourselves and 
the targets each time:
{quote}
That seems like a good simplification. I think I had assumed the BTI and 
requested types being checked should be the same (String - String, uint64 - 
uint64); but I don't see a reason why they have to be. [~chris.douglas], what 
do you think?

> Hadoop 2 clients cannot writeBlock to Hadoop 3 DataNodes
> 
>
> Key: HDFS-12151
> URL: https://issues.apache.org/jira/browse/HDFS-12151
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rolling upgrades
>Affects Versions: 3.0.0-alpha4
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HDFS-12151.001.patch
>
>
> Trying to write to a Hadoop 3 DataNode with a Hadoop 2 client currently 
> fails. On the client side it looks like this:
> {code}
> 17/07/14 13:31:58 INFO hdfs.DFSClient: Exception in 
> createBlockOutputStream
> java.io.EOFException: Premature EOF: no length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449){code}
> But on the DataNode side there's an ArrayOutOfBoundsException because there 
> aren't any targetStorageIds:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:815)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
> at java.lang.Thread.run(Thread.java:745){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12151) Hadoop 2 clients cannot writeBlock to Hadoop 3 DataNodes

2017-07-26 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101320#comment-16101320
 ] 

Ewan Higgs commented on HDFS-12151:
---

+1. 

Don't know how I missed that in the fix for HDFS-11956. Thanks for providing 
this patch [~sean_impala_9b93].

> Hadoop 2 clients cannot writeBlock to Hadoop 3 DataNodes
> 
>
> Key: HDFS-12151
> URL: https://issues.apache.org/jira/browse/HDFS-12151
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rolling upgrades
>Affects Versions: 3.0.0-alpha4
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HDFS-12151.001.patch
>
>
> Trying to write to a Hadoop 3 DataNode with a Hadoop 2 client currently 
> fails. On the client side it looks like this:
> {code}
> 17/07/14 13:31:58 INFO hdfs.DFSClient: Exception in 
> createBlockOutputStream
> java.io.EOFException: Premature EOF: no length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449){code}
> But on the DataNode side there's an ArrayOutOfBoundsException because there 
> aren't any targetStorageIds:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:815)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
> at java.lang.Thread.run(Thread.java:745){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12093) [READ] Share remoteFS between ProvidedReplica instances.

2017-07-06 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12093:
-

 Summary: [READ] Share remoteFS between ProvidedReplica instances.
 Key: HDFS-12093
 URL: https://issues.apache.org/jira/browse/HDFS-12093
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ewan Higgs


Then a Datanode comes online using Provided storage, it fills the 
{{ReplicaMap}} with the known replicas. With Provided Storage, this includes 
{{ProvidedReplica}} instances. Each of these objects, in their constructor, 
will construct an FileSystem using the Service Provider. This can result in 
contacting the remote file system and checking that the credentials are correct 
and that the data is there. For large systems this is a prohibitively expensive 
operation to perform per replica.

Instead, the {{ProvidedVolumeImpl}} should own the reference to the 
{{remoteFS}} and should share it with the {{ProvidedReplica}} objects on their 
creation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11956) Fix BlockToken compatibility with Hadoop 2.x clients

2017-06-26 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11956:
--
Attachment: HDFS-11956.004.patch

Attaching version of the patch that doesn't use a config switch. 

> Fix BlockToken compatibility with Hadoop 2.x clients
> 
>
> Key: HDFS-11956
> URL: https://issues.apache.org/jira/browse/HDFS-11956
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HDFS-11956.001.patch, HDFS-11956.002.patch, 
> HDFS-11956.003.patch, HDFS-11956.004.patch
>
>
> Seems like HDFS-9807 broke backwards compatibility with Hadoop 2.x clients. 
> When talking to a 3.0.0-alpha4 DN with security on:
> {noformat}
> 2017-06-06 23:27:22,568 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block token verification failed: op=WRITE_BLOCK, 
> remoteAddress=/172.28.208.200:53900, message=Block token with StorageIDs 
> [DS-c0f24154-a39b-4941-93cd-5b8323067ba2] not valid for access with 
> StorageIDs []
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11956) Fix BlockToken compatibility with Hadoop 2.x clients

2017-06-22 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060089#comment-16060089
 ] 

Ewan Higgs commented on HDFS-11956:
---

Hi Andrew

{quote}
IIUC, we know the storage type even for an old client since it passes it in the 
writeBlock request. Can an old client correctly pass along an unknown 
StorageType (e.g. PROVIDED)? 
{quote}
I think you understood correctly. I don't think an old client will be able to 
deserialise a PROVIDED StorageType from the protobuf, so it will fail to pass 
along that StorageType (though I have not yet done the cross-version testing 
with Hadoop 2.6). I think this is the same as would be the case any time a new 
StorageType is introduced (e.g. if we hypothetically added 
{{StorageType.NVME}}, {{StorageType.SMR}}, etc.). Maybe the forward 
compatibility of StorageTypes is another JIRA I should raise?

{quote}
If so, then I see how this works; essentially, only require storageIDs when 
writing to provided storage.
{quote}
Yes.

{quote}
For 3.0.0-alpha4 I can also revert HDFS-9807 while we figure out this JIRA. We 
did this internally to unblock testing.
{quote}
I'm traveling today so I won't be able to furnish a patch just yet. What's your 
time frame for tagging alpha4?

> Fix BlockToken compatibility with Hadoop 2.x clients
> 
>
> Key: HDFS-11956
> URL: https://issues.apache.org/jira/browse/HDFS-11956
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HDFS-11956.001.patch, HDFS-11956.002.patch, 
> HDFS-11956.003.patch
>
>
> Seems like HDFS-9807 broke backwards compatibility with Hadoop 2.x clients. 
> When talking to a 3.0.0-alpha4 DN with security on:
> {noformat}
> 2017-06-06 23:27:22,568 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block token verification failed: op=WRITE_BLOCK, 
> remoteAddress=/172.28.208.200:53900, message=Block token with StorageIDs 
> [DS-c0f24154-a39b-4941-93cd-5b8323067ba2] not valid for access with 
> StorageIDs []
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11956) Fix BlockToken compatibility with Hadoop 2.x clients

2017-06-22 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059663#comment-16059663
 ] 

Ewan Higgs commented on HDFS-11956:
---

Hi,
Another idea is to just ignore the BlockTokenIdentifier if the storageId list 
in the request is empty. The current intention of the storageId in the message 
is just a suggestion for the datanode in most cases; but in the case of 
provided storage (HDFS-9806) it will be the storageId of the provided storage 
system. If the storageId list is empty then it will just fail the write to the 
provided storage since it won't know where/how to write it.

> Fix BlockToken compatibility with Hadoop 2.x clients
> 
>
> Key: HDFS-11956
> URL: https://issues.apache.org/jira/browse/HDFS-11956
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HDFS-11956.001.patch, HDFS-11956.002.patch, 
> HDFS-11956.003.patch
>
>
> Seems like HDFS-9807 broke backwards compatibility with Hadoop 2.x clients. 
> When talking to a 3.0.0-alpha4 DN with security on:
> {noformat}
> 2017-06-06 23:27:22,568 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block token verification failed: op=WRITE_BLOCK, 
> remoteAddress=/172.28.208.200:53900, message=Block token with StorageIDs 
> [DS-c0f24154-a39b-4941-93cd-5b8323067ba2] not valid for access with 
> StorageIDs []
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11125) [SPS]: Use smaller batches of BlockMovingInfo into the block storage movement command

2017-06-20 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056712#comment-16056712
 ] 

Ewan Higgs commented on HDFS-11125:
---

The Provided Storage system is looking to use the SPS for asynchronously 
writing files to a remote system ({{hdfs://}}, {{s3a://}}, {{wasb://}}). The 
way the SPS currently works is file by file which suits the Provided Storage 
system very well since it works on a file by file basis as well. Moving to use 
smaller batches unlinked to the actual files would break current plans 
discussed offline with [~umamaheswararao].

This could be worked through by offering the interface as 
{{BlockStorageMovementCommandSchedulerSpi}} (or something less of a mouthful) 
and implementing it one way for normal SPS and another way for Provided Storage.

> [SPS]: Use smaller batches of BlockMovingInfo into the block storage movement 
> command
> -
>
> Key: HDFS-11125
> URL: https://issues.apache.org/jira/browse/HDFS-11125
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
>
> This is a follow-up task of HDFS-11068, where it sends all the blocks under a 
> trackID over single heartbeat response(DNA_BLOCK_STORAGE_MOVEMENT command). 
> If blocks are many under a given trackID(For example: a file contains many 
> blocks) then those requests go across a network and come with a lot of 
> overhead. In this jira, we will discuss and implement a mechanism to limit 
> the list of items into smaller batches with in trackID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11956) Fix BlockToken compatibility with Hadoop 2.x clients

2017-06-15 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11956:
--
Attachment: HDFS-11956.003.patch

Attaching patch with value for hdfs-default.xml and some checkstyle fixes.

> Fix BlockToken compatibility with Hadoop 2.x clients
> 
>
> Key: HDFS-11956
> URL: https://issues.apache.org/jira/browse/HDFS-11956
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HDFS-11956.001.patch, HDFS-11956.002.patch, 
> HDFS-11956.003.patch
>
>
> Seems like HDFS-9807 broke backwards compatibility with Hadoop 2.x clients. 
> When talking to a 3.0.0-alpha4 DN with security on:
> {noformat}
> 2017-06-06 23:27:22,568 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block token verification failed: op=WRITE_BLOCK, 
> remoteAddress=/172.28.208.200:53900, message=Block token with StorageIDs 
> [DS-c0f24154-a39b-4941-93cd-5b8323067ba2] not valid for access with 
> StorageIDs []
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11956) Fix BlockToken compatibility with Hadoop 2.x clients

2017-06-14 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11956:
--
Attachment: HDFS-11956.002.patch

Attaching updated patch with a unit test. In the test, {{strictSM}} 
{{BlockTokenSecretManager}} will fail when the passed storageIds are wrong; but 
{{permissiveSM}} will allow it. {{strictSM}} corresponds to having the config 
value enabled while {{permissiveSM}} corresponds to it being disabled for 
legacy clients.

> Fix BlockToken compatibility with Hadoop 2.x clients
> 
>
> Key: HDFS-11956
> URL: https://issues.apache.org/jira/browse/HDFS-11956
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HDFS-11956.001.patch, HDFS-11956.002.patch
>
>
> Seems like HDFS-9807 broke backwards compatibility with Hadoop 2.x clients. 
> When talking to a 3.0.0-alpha4 DN with security on:
> {noformat}
> 2017-06-06 23:27:22,568 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block token verification failed: op=WRITE_BLOCK, 
> remoteAddress=/172.28.208.200:53900, message=Block token with StorageIDs 
> [DS-c0f24154-a39b-4941-93cd-5b8323067ba2] not valid for access with 
> StorageIDs []
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11956) Fix BlockToken compatibility with Hadoop 2.x clients

2017-06-14 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11956:
--
Assignee: Ewan Higgs  (was: Chris Douglas)
Release Note: Introduce dfs.block.access.token.storageid.enable which will 
be false by default. When it's turned on, the 
BlockTokenSecretManager.checkAccess will consider the storage ID when verifying 
the request. This allows for backwards compatibility all the way back to 2.6.x.
  Status: Patch Available  (was: Open)

Introduce dfs.block.access.token.storageid.enable which will be false by 
default. When it's turned on, the BlockTokenSecretManager.checkAccess will 
consider the storage ID when verifying the request. This allows for backwards 
compatibility all the way back to 2.6.x.

> Fix BlockToken compatibility with Hadoop 2.x clients
> 
>
> Key: HDFS-11956
> URL: https://issues.apache.org/jira/browse/HDFS-11956
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Ewan Higgs
>Priority: Blocker
> Attachments: HDFS-11956.001.patch
>
>
> Seems like HDFS-9807 broke backwards compatibility with Hadoop 2.x clients. 
> When talking to a 3.0.0-alpha4 DN with security on:
> {noformat}
> 2017-06-06 23:27:22,568 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block token verification failed: op=WRITE_BLOCK, 
> remoteAddress=/172.28.208.200:53900, message=Block token with StorageIDs 
> [DS-c0f24154-a39b-4941-93cd-5b8323067ba2] not valid for access with 
> StorageIDs []
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11956) Fix BlockToken compatibility with Hadoop 2.x clients

2017-06-14 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11956:
--
Attachment: HDFS-11956.001.patch

Attaching a patch that introduces {{dfs.block.access.token.storageid.enable}} 
which will be false by default. When it's turned on, the 
{{BlockTokenSecretManager.checkAccess}} will consider the storage ID when 
verifying the request.

> Fix BlockToken compatibility with Hadoop 2.x clients
> 
>
> Key: HDFS-11956
> URL: https://issues.apache.org/jira/browse/HDFS-11956
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Chris Douglas
>Priority: Blocker
> Attachments: HDFS-11956.001.patch
>
>
> Seems like HDFS-9807 broke backwards compatibility with Hadoop 2.x clients. 
> When talking to a 3.0.0-alpha4 DN with security on:
> {noformat}
> 2017-06-06 23:27:22,568 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block token verification failed: op=WRITE_BLOCK, 
> remoteAddress=/172.28.208.200:53900, message=Block token with StorageIDs 
> [DS-c0f24154-a39b-4941-93cd-5b8323067ba2] not valid for access with 
> StorageIDs []
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11956) Fix BlockToken compatibility with Hadoop 2.x clients

2017-06-14 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049350#comment-16049350
 ] 

Ewan Higgs commented on HDFS-11956:
---

I took a look and see that this fails when writing blocks. e.g.:

{code}
hadoop-2.6.5/bin/hdfs dfs -copyFromLocal hello.txt /
{code}

This comes from the fact that the {{BlockTokenIdenfitier}} has the StorageID in 
there; but the StorageID is an optional field in the request which is new in 
3.0. This means that it isn't passed in. Defaulting to 'null' and allowing this 
would of course defeat the purpose of the BlockTokenIdentifier, so I think this 
should be fixed with a bitflag (e.g. 
{{dfs.block.access.token.storageid.enable}}) which defaults to false and makes 
the [[BlockTokenSecretManager}} only use the storage id in the {{checkAccess}} 
call if it's enabled. This will allow old clients work; but it won't allow the 
system to take advantage of new features enabled by using the storage id in the 
write calls.

> Fix BlockToken compatibility with Hadoop 2.x clients
> 
>
> Key: HDFS-11956
> URL: https://issues.apache.org/jira/browse/HDFS-11956
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Chris Douglas
>Priority: Blocker
>
> Seems like HDFS-9807 broke backwards compatibility with Hadoop 2.x clients. 
> When talking to a 3.0.0-alpha4 DN with security on:
> {noformat}
> 2017-06-06 23:27:22,568 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block token verification failed: op=WRITE_BLOCK, 
> remoteAddress=/172.28.208.200:53900, message=Block token with StorageIDs 
> [DS-c0f24154-a39b-4941-93cd-5b8323067ba2] not valid for access with 
> StorageIDs []
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11640) [READ] Datanodes should use a unique identifier when reading from external stores

2017-05-23 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020979#comment-16020979
 ] 

Ewan Higgs commented on HDFS-11640:
---

I ran into compilation errors when trying to build this. I suppose I should 
wait until HDFS-6984 and HDFS-7878 are merged before reviewing.

> [READ] Datanodes should use a unique identifier when reading from external 
> stores
> -
>
> Key: HDFS-11640
> URL: https://issues.apache.org/jira/browse/HDFS-11640
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
> Attachments: HDFS-11640-HDFS-9806.001.patch
>
>
> Use a unique identifier when reading from external stores to ensure that 
> datanodes read the correct (version of) file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-23 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11639:
--
Attachment: HDFS-11639-HDFS-9806.005.patch

Attaching a patch that removes the {{BlockAlias}} from the {{readBlocks}} 
operation. The {{BlockAlias}} is only required in the {{writeBlocks}} and 
{{transferBlocks}} calls.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch, HDFS-11639-HDFS-9806.003.patch, 
> HDFS-11639-HDFS-9806.004.patch, HDFS-11639-HDFS-9806.005.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-18 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015479#comment-16015479
 ] 

Ewan Higgs commented on HDFS-11639:
---

{quote}Any particular reason for changing BlockProvider to implement 
Iterable from Iterable?{quote}
Yes, as the purpose is to put the {{BlockAlias}} into the client protocol, the 
{{ProvidedStorageMap}} needs to get more than just the {{Block}}. This was done 
by changing the {{BlockProvider}} to return a {{BlockAlias}} instead of the 
{{Block}}. 

{quote}Was blockId intentionally left out of FileRegionProto even though 
FileRegion contains it?{quote}
Yes, this was done for two reasons:

1. The blockid is already in the message. Having it in two locations is a bug 
vector and more wasteful than it needs to be.
2. The FileRegion is really the value in the key value store. The blockid is 
the key. I was going to investigate whether the blockid could be pulled out and 
mapping a {{FileRegion}} to a blockid would be done by association rather than 
embedding the value in the structure, but it's very low priority and well 
beyond the scope of this PR.

If you think the blockid should be in the {{FileRegionProto}} so it maps 
exactly onto the {{FileRegion}} as it exists today, I'm fine with putting it in.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch, HDFS-11639-HDFS-9806.003.patch, 
> HDFS-11639-HDFS-9806.004.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-17 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11639:
--
Status: Patch Available  (was: Open)

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch, HDFS-11639-HDFS-9806.003.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-17 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11639:
--
Attachment: HDFS-11639-HDFS-9806.003.patch

Attaching an updated patch that addresses some of [~virajith]'s comments:

{quote}

- In ProvidedBlocksBuilder#newLocatedBlock, the fileRegion should be
  resolved only if the block has PROVIDED locations (i.e., 
hasProvidedLocations
  is true). When dfs.namenode.provided.enabled is set to true, all 
LocatedBlock
  are created in this method, and for non-provided blocks, a resolution of
  BlockAlias is needed.
- PBHelperClient#convertLocatedBlockProto() and
  PBHelperClient#convertLocatedBlock() should be modified to decode/encode 
the
  BlockAlias bytes.
- How about decoding the blockAlias bytes in DataXceiver#readBlock using
  a new DataTransferProtoUtil#blockAliasFromProto(bytes[] blockAlias method
  instead of using the BlockAlias#builder()? The former will be in-line with
  the way the protobufs are decoded in DataXceiver. Further, if in the
  future a different BlockAlias is used, the current implementation of using
  the FileRegion#Builder in DataXceiver#readBlock will be hard to extend
  (will end up being try FileRegion#Builder, if null try
  BlockAliasXX#Builder and so on).
- Similar to passing on BlockAlias from DataXceiver#readBlock to
  BlockSender, it should be passed along from DataXceiver#readBlock to
  BlockReceiver. However, we would not need it till we have writes
  implemented.
- DataStreamer#blockAlias will never be non-null. I think it should
  be initialized in DFSOutputStream.
{quote}

This also added BlockAlias to the transferBlocks message but it isn't 
incorporated in since this is a write message. e.g. {{DNA_TRANSFER}} has not 
yet been updated.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch, HDFS-11639-HDFS-9806.003.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11828) Refactor FsDatasetImpl as the BlockAlias is in the wire protocol for PROVIDED blocks.

2017-05-16 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-11828:
-

 Summary: Refactor FsDatasetImpl as the BlockAlias is in the wire 
protocol for PROVIDED blocks.
 Key: HDFS-11828
 URL: https://issues.apache.org/jira/browse/HDFS-11828
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ewan Higgs
Assignee: Ewan Higgs


>From HDFS-11639:

{quote}[~virajith]
Looking over this patch, one thing that occurred to me is if it makes sense to 
unify FileRegionProvider with BlockProvider? They both have very close 
functionality.

I like the use of BlockProvider#resolve(). If we unify FileRegionProvider with 
BlockProvider, then resolve can return null if the block map is accessible from 
the Datanodes also. If it is accessible only from the Namenode, then a non-null 
value can be propagated to the Datanode.
One of the motivations for adding the BlockAlias to the client protocol was to 
have the blocks map only on the Namenode. In this scenario, the ReplicaMap in 
FsDatasetImpl of will not have any replicas apriori. Thus, one way to ensure 
that the FsDatasetImpl interface continues to function as today is to create a 
FinalizedProvidedReplica in FsDatasetImpl#getBlockInputStream when BlockAlias 
is not null.
{quote}

{quote}[~ehiggs]
With the pending refactoring of the FsDatasetImpl which won't have replicas a 
priori, I wonder if it makes sense for the Datanode to have a 
FileRegionProvider or BlockProvider at all. They are given the appropriate 
block ID and block alias in the readBlock or writeBlock message. Maybe I'm 
overlooking what's still being provided.{quote}

{quote}[~virajith]
I was trying to reconcile the existing design (FsDatasetImpl knows about 
provided blocks apriori) with the new design where FsDatasetImpl will not know 
about these before but just constructs them on-the-fly using the BlockAlias 
from readBlock or writeBlock. Using BlockProvider#resolve() allows us to have 
both designs exist in parallel. I was wondering if we should still retain the 
earlier given the latter design.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-15 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010483#comment-16010483
 ] 

Ewan Higgs edited comment on HDFS-11639 at 5/15/17 1:16 PM:


{quote}
Btw, I rebased the HDFS-9806 branch on the most recent version of trunk (hash 
83dd14aa84ad697ad32c51007ac31ad39feb4288).{quote}
Thanks!
{quote}In DataTransfer#run(), the blockAlias should be null unless it is for a 
provided block. I think this will entail adding BlockAlias to transferBlock() 
and also to BlockCommand for the DatanodeProtocol.DNA_TRANSFER command. 
However, this will only be relevant for writing provided blocks (and in 
particular recovery).{quote}
If this entails a protocol change, I think it makes the most sense to do it at 
this point so all the protocol changes happen up front in one change if we need 
this to get in for 3.0. 
Does it make sense to have the BlockAlias in transferBlock? If we know the 
targetStorageTypes and targetStorageIDs then we can know that nothing needs to 
be transferred. Or is this an issue if we want to transfer from PROVIDED to 
DISK?

{quote}Looking over this patch, one thing that occurred to me is if it makes 
sense to unify FileRegionProvider with BlockProvider? They both have very close 
functionality.{quote}
I think this makes a lot of sense.
{quote}I like the use of BlockProvider#resolve(). If we unify 
FileRegionProvider with BlockProvider, then resolve can return null if the 
block map is accessible from the Datanodes also. If it is accessible only from 
the Namenode, then a non-null value can be propagated to the Datanode.{quote}
With the pending refactoring of the FsDatasetImpl which won't have replicas a 
priori, I wonder if it makes sense for the Datanode to have a 
FileRegionProvider or BlockProvider at all. They are given the appropriate 
block ID and block alias in the readBlock or writeBlock message. Maybe I'm 
overlooking what's still being provided.


was (Author: ehiggs):
{quote}
Btw, I rebased the HDFS-9806 branch on the most recent version of trunk (hash 
83dd14aa84ad697ad32c51007ac31ad39feb4288).{quote}
Thanks!
{quote}In DataTransfer#run(), the blockAlias should be null unless it is for a 
provided block. I think this will entail adding BlockAlias to transferBlock() 
and also to BlockCommand for the DatanodeProtocol.DNA_TRANSFER command. 
However, this will only be relevant for writing provided blocks (and in 
particular recovery).{quote}
If this entails a protocol change, I think it makes the most sense to do it at 
this point so all the protocol changes happen up front in one change if we need 
this to get in for 3.0. 
Does it make sense to have the BlockAlias in transferBlock? If we know the 
targetStorageTypes and targetStorageIDs then we can know that nothing needs to 
be transferred. Or is this an issue if we want to transfer from PROVIDED to 
DISK?

{quote}Looking over this patch, one thing that occurred to me is if it makes 
sense to unify FileRegionProvider with BlockProvider? They both have very close 
functionality.{quote}
I think this makes a lot of sense.
{code}I like the use of BlockProvider#resolve(). If we unify FileRegionProvider 
with BlockProvider, then resolve can return null if the block map is accessible 
from the Datanodes also. If it is accessible only from the Namenode, then a 
non-null value can be propagated to the Datanode.{code}
With the pending refactoring of the FsDatasetImpl which won't have replicas a 
priori, I wonder if it makes sense for the Datanode to have a 
FileRegionProvider or BlockProvider at all. They are given the appropriate 
block ID and block alias in the readBlock or writeBlock message. Maybe I'm 
overlooking what's still being provided.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-15 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010483#comment-16010483
 ] 

Ewan Higgs commented on HDFS-11639:
---

{quote}
Btw, I rebased the HDFS-9806 branch on the most recent version of trunk (hash 
83dd14aa84ad697ad32c51007ac31ad39feb4288).{quote}
Thanks!
{quote}In DataTransfer#run(), the blockAlias should be null unless it is for a 
provided block. I think this will entail adding BlockAlias to transferBlock() 
and also to BlockCommand for the DatanodeProtocol.DNA_TRANSFER command. 
However, this will only be relevant for writing provided blocks (and in 
particular recovery).{quote}
If this entails a protocol change, I think it makes the most sense to do it at 
this point so all the protocol changes happen up front in one change if we need 
this to get in for 3.0. 
Does it make sense to have the BlockAlias in transferBlock? If we know the 
targetStorageTypes and targetStorageIDs then we can know that nothing needs to 
be transferred. Or is this an issue if we want to transfer from PROVIDED to 
DISK?

{quote}Looking over this patch, one thing that occurred to me is if it makes 
sense to unify FileRegionProvider with BlockProvider? They both have very close 
functionality.{quote}
I think this makes a lot of sense.
{code}I like the use of BlockProvider#resolve(). If we unify FileRegionProvider 
with BlockProvider, then resolve can return null if the block map is accessible 
from the Datanodes also. If it is accessible only from the Namenode, then a 
non-null value can be propagated to the Datanode.{code}
With the pending refactoring of the FsDatasetImpl which won't have replicas a 
priori, I wonder if it makes sense for the Datanode to have a 
FileRegionProvider or BlockProvider at all. They are given the appropriate 
block ID and block alias in the readBlock or writeBlock message. Maybe I'm 
overlooking what's still being provided.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-11 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006131#comment-16006131
 ] 

Ewan Higgs commented on HDFS-11639:
---

I forgot to mention that the current patch requires 
6e4c5539c50cdea1f4d307c4f4273dc42af5601c and 
6fcaf3097156aa83e83def29006198fb4a632163 to be cherry picked into the branch.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-10 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11639:
--
Attachment: HDFS-11639-HDFS-9806.002.patch

Attaching a minor fix where a RuntimeError escaped to break tests.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-10 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004653#comment-16004653
 ] 

Ewan Higgs edited comment on HDFS-11639 at 5/10/17 1:14 PM:


Attached a patch that encodes the {{BlockAlias}} into the read and write 
protocol. This also adds the {{BlockAlias}} to the {{FileRegion}}.

This work is not yet complete as we need to connect the {{BlockSender}} to the 
{{FsDatasetImpl}} and/or {{ProvidedVolumeImpl}}, {{ReplicaMap}}, etc.


was (Author: ehiggs):
Attached a patch that encodes the {{BlockAlias}} into the read and write 
protocol. This also adds the {{BlockAlias}} to the {{FileRegion]}.

This work is not yet complete as we need to connect the {{BlockSender}} to the 
{{FsDatasetImpl}} and/or {{ProvidedVolumeImpl}}, {{ReplicaMap}}, etc.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol

2017-05-10 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11639:
--
Attachment: HDFS-11639-HDFS-9806.001.patch

Attached a patch that encodes the {{BlockAlias}} into the read and write 
protocol. This also adds the {{BlockAlias}} to the {{FileRegion]}.

This work is not yet complete as we need to connect the {{BlockSender}} to the 
{{FsDatasetImpl}} and/or {{ProvidedVolumeImpl}}, {{ReplicaMap}}, etc.

> [READ] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-11639-HDFS-9806.001.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9807) Add an optional StorageID to writes

2017-05-04 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997509#comment-15997509
 ] 

Ewan Higgs commented on HDFS-9807:
--

{quote}Posted a new patch (v010) that reverts the unrelated changes to 
Host2NodesMap and BlockPlacementPolicyDefault as Chris Douglas pointed out. 
Ewan Higgs, are you ok with this? The changes to those classes in the earlier 
patches don't seem needed. 
{quote}
Yes, that's fine. The reason Host2NodesMap was public was because I had 
originally made the test you added with {{BlockPlacementPolicy}}. That requires 
{{Host2NodesMap}} in the {{initialize}} function. I like your solution of using 
{{BlockPlacementPolicyDefault}} better.

[~linyiqun], [~virajith]'s explanation is spot on. Most notably: 
{quote}
The goal of this JIRA was to only provide the plumbing needed to propagate the 
storageID to the VolumeChoosingPolicy and not to implement a new 
VolumeChoosingPolicy. The actual policies to use can be determined separately.
{quote}

> Add an optional StorageID to writes
> ---
>
> Key: HDFS-9807
> URL: https://issues.apache.org/jira/browse/HDFS-9807
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Chris Douglas
>Assignee: Ewan Higgs
> Attachments: HDFS-9807.001.patch, HDFS-9807.002.patch, 
> HDFS-9807.003.patch, HDFS-9807.004.patch, HDFS-9807.005.patch, 
> HDFS-9807.006.patch, HDFS-9807.007.patch, HDFS-9807.008.patch, 
> HDFS-9807.009.patch, HDFS-9807.010.patch
>
>
> The {{BlockPlacementPolicy}} considers specific storages, but when the 
> replica is written the DN {{VolumeChoosingPolicy}} is unaware of any 
> preference or constraints from other policies affecting placement. This 
> limits heterogeneity to the declared storage types, which are treated as 
> fungible within the target DN. It should be possible to influence or 
> constrain the DN policy to select a particular storage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11703) [READ] Tests for ProvidedStorageMap

2017-05-04 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996651#comment-15996651
 ] 

Ewan Higgs commented on HDFS-11703:
---

+1. 

> [READ] Tests for ProvidedStorageMap
> ---
>
> Key: HDFS-11703
> URL: https://issues.apache.org/jira/browse/HDFS-11703
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
> Attachments: HDFS-11703-HDFS-9806.001.patch, 
> HDFS-11703-HDFS-9806.002.patch
>
>
> Add tests for the {{ProvidedStorageMap}} in the namenode



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-11673) [READ] Handle failures of Datanodes with PROVIDED storage

2017-05-04 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-11673:
--
Comment: was deleted

(was: I reproduced the patch problem that jenkins had:

{code}
$ patch -p1 <../patches/HDFS-11673-HDFS-9806.001.patch 
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockProvider.java
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ProvidedStorageMap.java
Hunk #4 succeeded at 282 (offset -3 lines).
Hunk #5 succeeded at 379 (offset -3 lines).
Hunk #6 succeeded at 407 (offset -3 lines).
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetBlocks.java
patching file 
hadoop-tools/hadoop-fs2img/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeProvidedImplementation.java
Hunk #2 FAILED at 35.
Hunk #3 succeeded at 350 with fuzz 2 (offset -36 lines).
1 out of 3 hunks FAILED -- saving rejects to file 
hadoop-tools/hadoop-fs2img/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeProvidedImplementation.java.rej
{code})

> [READ] Handle failures of Datanodes with PROVIDED storage
> -
>
> Key: HDFS-11673
> URL: https://issues.apache.org/jira/browse/HDFS-11673
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
> Attachments: HDFS-11673-HDFS-9806.001.patch
>
>
> Blocks on {{PROVIDED}} storage should become unavailable if and only if all 
> Datanodes that are configured with {{PROVIDED}} storage become unavailable. 
> Even if one Datanode with {{PROVIDED}} storage is available, all blocks on 
> the {{PROVIDED}} storage should be accessible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11673) [READ] Handle failures of Datanodes with PROVIDED storage

2017-05-04 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996596#comment-15996596
 ] 

Ewan Higgs commented on HDFS-11673:
---

I reproduced the patch problem that jenkins had:

{code}
$ patch -p1 <../patches/HDFS-11673-HDFS-9806.001.patch 
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockProvider.java
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ProvidedStorageMap.java
Hunk #4 succeeded at 282 (offset -3 lines).
Hunk #5 succeeded at 379 (offset -3 lines).
Hunk #6 succeeded at 407 (offset -3 lines).
patching file 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetBlocks.java
patching file 
hadoop-tools/hadoop-fs2img/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeProvidedImplementation.java
Hunk #2 FAILED at 35.
Hunk #3 succeeded at 350 with fuzz 2 (offset -36 lines).
1 out of 3 hunks FAILED -- saving rejects to file 
hadoop-tools/hadoop-fs2img/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeProvidedImplementation.java.rej
{code}

> [READ] Handle failures of Datanodes with PROVIDED storage
> -
>
> Key: HDFS-11673
> URL: https://issues.apache.org/jira/browse/HDFS-11673
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
> Attachments: HDFS-11673-HDFS-9806.001.patch
>
>
> Blocks on {{PROVIDED}} storage should become unavailable if and only if all 
> Datanodes that are configured with {{PROVIDED}} storage become unavailable. 
> Even if one Datanode with {{PROVIDED}} storage is available, all blocks on 
> the {{PROVIDED}} storage should be accessible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11663) [READ] Fix NullPointerException in ProvidedBlocksBuilder

2017-05-04 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996592#comment-15996592
 ] 

Ewan Higgs commented on HDFS-11663:
---

+1. Uncontroversial bug fix.

> [READ] Fix NullPointerException in ProvidedBlocksBuilder
> 
>
> Key: HDFS-11663
> URL: https://issues.apache.org/jira/browse/HDFS-11663
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
> Attachments: HDFS-11663-HDFS-9806.001.patch, 
> HDFS-11663-HDFS-9806.002.patch, HDFS-11663-HDFS-9806.003.patch
>
>
> When there are no Datanodes with PROVIDED storage, 
> {{ProvidedBlocksBuilder#build}} leads to a {{NullPointerException}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11653) [READ] ProvidedReplica should return an InputStream that is bounded by its length

2017-05-04 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996410#comment-15996410
 ] 

Ewan Higgs commented on HDFS-11653:
---

This is a straightforward fix with a unit test. +1

> [READ] ProvidedReplica should return an InputStream that is bounded by its 
> length
> -
>
> Key: HDFS-11653
> URL: https://issues.apache.org/jira/browse/HDFS-11653
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-11653-HDFS-9806.001.patch, 
> HDFS-11653-HDFS-9806.002.patch
>
>
> {{ProvidedReplica#getDataInputStream}} should return an InputStream that is 
> bounded by {{ProvidedReplica#getBlockDataLength()}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-9807) Add an optional StorageID to writes

2017-05-04 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-9807:
-
Comment: was deleted

(was: {quote}The changes to Host2NodesMap and BlockPlacementPolicyDefault are 
unrelated?{quote}

The change to {{Host2NodesMap}} were from when I tried to add a custom 
{{BlockPlacementPolicy}} based on the top level abstract class. This requires 
overloading {{initialize}} which uses {{Host2NodesMap}} as an argument: 
{code}
  protected abstract void initialize(Configuration conf,  FSClusterStats stats,
 NetworkTopology clusterMap, 
 Host2NodesMap host2datanodeMap);
{code}

[~virajith] chose the superior solution of merely extending 
{{BlockPlacementPolicyDefault}} which already implements {{initialize}}. FWIW, 
if {{BlockPlacementPolicy}} is public then all types used in the interface 
should probably also be public.

Re: {{BlockPlacementPolicyDefault}} changes, yes those are unrelated whitespace 
changes.)

> Add an optional StorageID to writes
> ---
>
> Key: HDFS-9807
> URL: https://issues.apache.org/jira/browse/HDFS-9807
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Chris Douglas
>Assignee: Ewan Higgs
> Attachments: HDFS-9807.001.patch, HDFS-9807.002.patch, 
> HDFS-9807.003.patch, HDFS-9807.004.patch, HDFS-9807.005.patch, 
> HDFS-9807.006.patch, HDFS-9807.007.patch, HDFS-9807.008.patch, 
> HDFS-9807.009.patch
>
>
> The {{BlockPlacementPolicy}} considers specific storages, but when the 
> replica is written the DN {{VolumeChoosingPolicy}} is unaware of any 
> preference or constraints from other policies affecting placement. This 
> limits heterogeneity to the declared storage types, which are treated as 
> fungible within the target DN. It should be possible to influence or 
> constrain the DN policy to select a particular storage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9807) Add an optional StorageID to writes

2017-05-04 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996319#comment-15996319
 ] 

Ewan Higgs commented on HDFS-9807:
--

{quote}The changes to Host2NodesMap and BlockPlacementPolicyDefault are 
unrelated?{quote}

The change to {{Host2NodesMap}} were from when I tried to add a custom 
{{BlockPlacementPolicy}} based on the top level abstract class. This requires 
overloading {{initialize}} which uses {{Host2NodesMap}} as an argument: 
{code}
  protected abstract void initialize(Configuration conf,  FSClusterStats stats,
 NetworkTopology clusterMap, 
 Host2NodesMap host2datanodeMap);
{code}

[~virajith] chose the superior solution of merely extending 
{{BlockPlacementPolicyDefault}} which already implements {{initialize}}. FWIW, 
if {{BlockPlacementPolicy}} is public then all types used in the interface 
should probably also be public.

Re: {{BlockPlacementPolicyDefault}} changes, yes those are unrelated whitespace 
changes.

> Add an optional StorageID to writes
> ---
>
> Key: HDFS-9807
> URL: https://issues.apache.org/jira/browse/HDFS-9807
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Chris Douglas
>Assignee: Ewan Higgs
> Attachments: HDFS-9807.001.patch, HDFS-9807.002.patch, 
> HDFS-9807.003.patch, HDFS-9807.004.patch, HDFS-9807.005.patch, 
> HDFS-9807.006.patch, HDFS-9807.007.patch, HDFS-9807.008.patch, 
> HDFS-9807.009.patch
>
>
> The {{BlockPlacementPolicy}} considers specific storages, but when the 
> replica is written the DN {{VolumeChoosingPolicy}} is unaware of any 
> preference or constraints from other policies affecting placement. This 
> limits heterogeneity to the declared storage types, which are treated as 
> fungible within the target DN. It should be possible to influence or 
> constrain the DN policy to select a particular storage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9807) Add an optional StorageID to writes

2017-05-02 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992494#comment-15992494
 ] 

Ewan Higgs commented on HDFS-9807:
--

Thanks [~virajith]! This is good stuff. 

In your solution using {{BlockPlacementPolicy}} and a static variable to record 
the block, I had considered such a solution but was concerned that this will 
break when running tests in parallel if we have multiple test cases. As it 
stands, we only have the one test so it should be fine; but if someone adds 
another test using the same {{BlockPlacementPolicy}} there is a risk that we 
introduce some flakiness into the testing infrastructure.

As per your previous question about why I had removed {{final}} from arguments 
in a function. This is because checkstyle complained that I had added redundant 
{{final}} decls to a function argument, and I figured that if someone deigned 
to turn that warning on in checkstyle then I should fix it where I'm touching 
the code.

For other people watching along: findbugs is bugged; the issues reported are 
not touched/introduced by this patch afaict. Same with the unit tests.

> Add an optional StorageID to writes
> ---
>
> Key: HDFS-9807
> URL: https://issues.apache.org/jira/browse/HDFS-9807
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Chris Douglas
>Assignee: Ewan Higgs
> Attachments: HDFS-9807.001.patch, HDFS-9807.002.patch, 
> HDFS-9807.003.patch, HDFS-9807.004.patch, HDFS-9807.005.patch, 
> HDFS-9807.006.patch, HDFS-9807.007.patch, HDFS-9807.008.patch, 
> HDFS-9807.009.patch
>
>
> The {{BlockPlacementPolicy}} considers specific storages, but when the 
> replica is written the DN {{VolumeChoosingPolicy}} is unaware of any 
> preference or constraints from other policies affecting placement. This 
> limits heterogeneity to the declared storage types, which are treated as 
> fungible within the target DN. It should be possible to influence or 
> constrain the DN policy to select a particular storage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9807) Add an optional StorageID to writes

2017-04-28 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-9807:
-
Attachment: HDFS-9807.007.patch

Adding a patch that fixes check-style issues and adds a nose-tail test using 
{{MiniDFSCluster}}.

Note: the test just checks that {storageId}} passed into the 
{{VolumeChoosingPolicy}} is part of the volume list. It doesn't test that it's 
the same as the one sent by the NN as part of the request. I wasn't sure of the 
best way to get the value from the {{BlockPlacementPolicy}} into the 
{{VolumeChoosingPolicy}} for comparison. If you think it's required and have a 
good idea of how to connect them then let me know what you think.

> Add an optional StorageID to writes
> ---
>
> Key: HDFS-9807
> URL: https://issues.apache.org/jira/browse/HDFS-9807
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Chris Douglas
>Assignee: Ewan Higgs
> Attachments: HDFS-9807.001.patch, HDFS-9807.002.patch, 
> HDFS-9807.003.patch, HDFS-9807.004.patch, HDFS-9807.005.patch, 
> HDFS-9807.006.patch, HDFS-9807.007.patch
>
>
> The {{BlockPlacementPolicy}} considers specific storages, but when the 
> replica is written the DN {{VolumeChoosingPolicy}} is unaware of any 
> preference or constraints from other policies affecting placement. This 
> limits heterogeneity to the declared storage types, which are treated as 
> fungible within the target DN. It should be possible to influence or 
> constrain the DN policy to select a particular storage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9807) Add an optional StorageID to writes

2017-04-27 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-9807:
-
Attachment: HDFS-9807.006.patch

Attaching patch rebased now that HDFS-6708 has been merged.

> Add an optional StorageID to writes
> ---
>
> Key: HDFS-9807
> URL: https://issues.apache.org/jira/browse/HDFS-9807
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Chris Douglas
>Assignee: Ewan Higgs
> Attachments: HDFS-9807.001.patch, HDFS-9807.002.patch, 
> HDFS-9807.003.patch, HDFS-9807.004.patch, HDFS-9807.005.patch, 
> HDFS-9807.006.patch
>
>
> The {{BlockPlacementPolicy}} considers specific storages, but when the 
> replica is written the DN {{VolumeChoosingPolicy}} is unaware of any 
> preference or constraints from other policies affecting placement. This 
> limits heterogeneity to the declared storage types, which are treated as 
> fungible within the target DN. It should be possible to influence or 
> constrain the DN policy to select a particular storage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    1   2   3   4   5   >