[jira] [Comment Edited] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode

Dave Marion (JIRA) Fri, 24 May 2013 07:14:26 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666314#comment-13666314
 ]


Dave Marion edited comment on ACCUMULO-118 at 5/24/13 2:12 PM:
---------------------------------------------------------------

Personally I am not a fan of the hash idea. I would rather see a mapping of 
namespace prefix to NN in the configuration (ns1 = hdfs://host:port, ns2 = 
hdfs://host:port). I'm thinking forward to table file load balancing across 
namespaces and backups (see my comment from 3/Apr/12). If for example you 
quiesced the database and performed a backup, then you could change the 
namespace mapping such that ns1 and ns2 point to the same hdfs://host:port if 
for some reason you lost the 2nd hdfs instance (it crashed, you wanted to 
remove it, etc). 

This could also allow for an upgrade of Hadoop wile Accumulo is still running. 
Think about the scenario where ns1 is on racks 1&2 and ns2 is on racks 3&4 of a 
cluster and the files of table T are spread across ns1 and ns2. You could 
change the configuration of the table file load balancer (new feature) that 
puts new files on ns2. You recompact the table so now all new files are on ns2. 
When done for all tables (and walogs), then you can shutdown ns1 and upgrade to 
a new version of Hadoop.
                
      was (Author: dlmarion):
    Personally I am not a fan of the hash idea. I would rather see a mapping of 
namespace prefix to NN in the configuration (ns1 = hdfs://host:port, ns2 = 
hdfs://host:port). I'm thinking forward to table file load balancing across 
namespaces and backups (see my comment from 3/Apr/12). If for example you 
quiesced the database and performed a backup, then you could change the 
namespace mapping such that ns1 and ns2 point to the same hdfs://host:port if 
for some reason you lost the 2nd hdfs instance (it crashed, you wanted to 
remove it, etc). 

This could also allow for of Hadoop wile Accumulo is still running. Think about 
the scenario where ns1 is on racks 1&2 and ns2 is on racks 3&4 of a cluster and 
the files of table T are spread across ns1 and ns2. You could change the 
configuration of the table file load balancer (new feature) that puts new files 
on ns2. You recompact the table so now all new files are on ns2. When done for 
all tables (and walogs), then you can shutdown ns1 and upgrade to a new version 
of Hadoop.
                  
> accumulo could work across HDFS instances, which would help it to scale past 
> a single namenode
> ----------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-118
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-118
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master, tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: ACCUMULO-118-01.txt
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> Consider using full path names to files, which would allow the servers to 
> access the files on any HDFS file system.
> Work may exist elsewhere to run HDFS using a number of NameNode instances to 
> break up the namespace.
> We may need a pluggable strategy to determine namespace for new files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode

Reply via email to