[ 
https://issues.apache.org/jira/browse/HBASE-25318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated HBASE-25318:
-------------------------------------
    Description: 
Currently IntegrationTestImportTsv is generating HFiles under the working 
directory of the hdfs user executing the tool, before bulkloading it into HBase.

Assuming you encrypt the HBase root directory within HDFS (using HDFS 
Transparent Encryption), you can bulkload HFiles only if they sit in the same 
encryption zone in HDFS as the HBase root directory itself.

When IntegrationTestImportTsv is executed against a real distributed cluster 
and the working directory of the current user (e.g. /user/hbase) is not in the 
same encryption zone as the HBase root directory (e.g. /hbase/data) then you 
will get an exception:

 
{code:java}
ERROR org.apache.hadoop.hbase.regionserver.HRegion: There was a partial failure 
due to IO when attempting to load d :
hdfs://mycluster/user/hbase/test-data/22d8460d-04cc-e032-88ca-2cc20a7dd01c/IntegrationTestImportTsv/hfiles/d/74655e3f8da142cb94bc31b64f0475cc
org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
/user/hbase/test-data/22d8460d-04cc-e032-88ca-2cc20a7dd01c/IntegrationTestImportTsv/hfiles/d/74655e3f8da142cb94bc31b64f0475cc
 can't be moved into an encryption zone.
{code}
 

In this ticket I make it configurable where the IntegrationTestImportTsv 
generates the HFiles. From now, one can execute this integration test on HDFS 
Transparent Encryption enabled clusters, like:
{code:java}
./bin/hbase org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv -D 
IntegrationTestImportTsv.generatedHFileFolder=/<my hbase encryption zone 
path>/testdata{code}

  was:
IntegrationTestImportTsv is generating HFiles under the working directory of 
the current hdfs user executing the tool, before bulkloading it into HBase.

Assuming you encrypt the HBase root directory within HDFS (using HDFS 
Transparent Encryption), you can bulkload HFiles only if they sit in the same 
encryption zone in HDFS as the HBase root directory itself.

When IntegrationTestImportTsv is executed against a real distributed cluster 
and the working directory of the current user (e.g. /user/hbase) is not in the 
same encryption zone as the HBase root directory (e.g. /hbase/data) then you 
will get an exception:

 
{code:java}
2020-11-21 01:06:28,963 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
There was a partial failure due to IO when attempting to load d : 
hdfs://mycluster/user/hbase/test-data/22d8460d-04cc-e032-88ca-2cc20a7dd01c/IntegrationTestImportTsv/hfiles/d/74655e3f8da142cb94bc31b64f0475cc
org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
/user/hbase/test-data/22d8460d-04cc-e032-88ca-2cc20a7dd01c/IntegrationTestImportTsv/hfiles/d/74655e3f8da142cb94bc31b64f0475cc
 can't be moved into an encryption zone.

{code}
We should make it configurable where the IntegrationTestImportTsv generates the 
HFiles.


> Configure where IntegrationTestImportTsv generates HFiles
> ---------------------------------------------------------
>
>                 Key: HBASE-25318
>                 URL: https://issues.apache.org/jira/browse/HBASE-25318
>             Project: HBase
>          Issue Type: Improvement
>          Components: integration tests
>            Reporter: Mate Szalay-Beko
>            Assignee: Mate Szalay-Beko
>            Priority: Minor
>
> Currently IntegrationTestImportTsv is generating HFiles under the working 
> directory of the hdfs user executing the tool, before bulkloading it into 
> HBase.
> Assuming you encrypt the HBase root directory within HDFS (using HDFS 
> Transparent Encryption), you can bulkload HFiles only if they sit in the same 
> encryption zone in HDFS as the HBase root directory itself.
> When IntegrationTestImportTsv is executed against a real distributed cluster 
> and the working directory of the current user (e.g. /user/hbase) is not in 
> the same encryption zone as the HBase root directory (e.g. /hbase/data) then 
> you will get an exception:
>  
> {code:java}
> ERROR org.apache.hadoop.hbase.regionserver.HRegion: There was a partial 
> failure due to IO when attempting to load d :
> hdfs://mycluster/user/hbase/test-data/22d8460d-04cc-e032-88ca-2cc20a7dd01c/IntegrationTestImportTsv/hfiles/d/74655e3f8da142cb94bc31b64f0475cc
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> /user/hbase/test-data/22d8460d-04cc-e032-88ca-2cc20a7dd01c/IntegrationTestImportTsv/hfiles/d/74655e3f8da142cb94bc31b64f0475cc
>  can't be moved into an encryption zone.
> {code}
>  
> In this ticket I make it configurable where the IntegrationTestImportTsv 
> generates the HFiles. From now, one can execute this integration test on HDFS 
> Transparent Encryption enabled clusters, like:
> {code:java}
> ./bin/hbase org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv -D 
> IntegrationTestImportTsv.generatedHFileFolder=/<my hbase encryption zone 
> path>/testdata{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to