Github user keith-turner commented on a diff in the pull request: https://github.com/apache/accumulo/pull/235#discussion_r107684054 --- Diff: INSTALL.md --- @@ -36,62 +36,49 @@ to manage Accumulo: These scripts will be used in the remaining instructions to configure and run Accumulo. For convenience, consider adding `accumulo-X.Y.Z/bin/` to your shell's path. -## Configuring +## Configuring Accumulo -Accumulo has some optional native code that improves its performance and -stability. Before configuring Accumulo, attempt to build this native code -with the following command. +Accumulo requires running [Zookeeper][3] and [HDFS][4] instances which should be set up +before configuring Accumulo. - accumulo-util build-native +The primary configuration files for Accumulo are `accumulo-env.sh` and `accumulo-site.xml` +which are located in the `conf/` directory. -If the command fails, its OK to continue with setup and resolve the issue later. +Follow the steps below to configure `accumulo-site.xml`: -Accumulo is configured by the files `accumulo-site.xml` and `accumulo-env.sh` in the `conf/` -directory. You can either edit these files for your environment or run the command below which will -overwrite them with files configured for your environment. +1. Run `accumulo-util build-native` to build native code. If this command fails, disable + native maps by setting `tserver.memory.maps.native.enabled` to `false`. - accumulo-util create-config +2. Set `instance.volumes` to HDFS location where Accumulo will store data. If your namenode + is running at 192.168.1.9:9000 and you want to store data in `/accumulo` in HDFS, then set + `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`. -The script will ask you questions about your set up. Below are some suggestions: +3. Set `instance.zookeeper.host` to the location of your Zookeepers -* When the script asks about memory-map type, choose Native if the build native script - was successful. Otherwise, choose Java. -* The script will prompt for memory usage. Please note that the footprints are - only for the Accumulo system processes, so ample space should be left for other - processes like Hadoop, Zookeeper, and the Accumulo client code. If Accumulo - worker processes are swapped out and unresponsive, they may be killed. +4. (Optional) Change `instance.secret` (which is used by Accumulo processes to communicate) + from the default. This value should match on all servers. -While `accumulo-util create-config` creates `accumulo-env.sh` and `accumulo-site.xml` files -targeted for your environment, these files still require a few more edits before starting Accumulo. +Follow the steps below to configure `accumulo-env.sh`: -### Secret +1. Set `HADOOP_PREFIX` and `ZOOKEEPER_HOME` to help Accumulo locate Hadoop and Zookeeper jars + and add them to the `CLASSPATH` variable. If you are running a vendor-specific release of Hadoop + or Zookeeper, see the `Vendor-specific configuration` documentation in the Administration section + of the Accumulo user manual as you may need to change how your `CLASSPATH` is built. If Accumulo + has problems later on finding jars, run `accumulo classpath -d` to print Accumulo's classpath. -Accumulo coordination and worker processes can only communicate with each other -if they share the same secret key. To change the secret key set -`instance.secret` in `accumulo-site.xml`. Changing this secret key from -the default is highly recommended. +2. Accumulo tablet servers are configured by default to use 1GB of memory (768MB is allocated to + JVM and 256MB is allocated for native maps). Native maps are allocated memory equal to 33% of + the tserver JVM heap. The table below can be used if you would like to change tsever memory + usage in the `JAVA_OPTS` section of `accumulo-env.sh`: -### Dependencies + | Native? | 512MB | 1GB | 2GB | 3GB | + |---------|-------------------|-------------------|---------------------|---------------| + | Yes | -Xmx384m -Xms384m | -Xmx768m -Xms768m | -Xmx1536m -Xms1536m | -Xmx2g -Xms2g | + | No | -Xmx512m -Xms512m | -Xmx1g -Xms1g | -Xmx2g -Xms2g | -Xmx3g -Xms3g | -Accumulo requires running [Zookeeper][3] and [HDFS][4] instances. Also, the -Accumulo binary distribution does not include jars for Zookeeper and Hadoop. -When configuring Accumulo the following information about these dependencies -must be provided. - - * **Location of Zookeepers** : Provide this by setting `instance.zookeeper.host` - in `accumulo-site.xml`. - * **Where to store data** : Provide this by setting `instance.volumes` in - `accumulo-site.xml`. If your namenode is running at 192.168.1.9:9000 - and you want to store data in `/accumulo` in HDFS, then set - `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`. - * **Location of Zookeeper and Hadoop jars** : Setting `ZOOKEEPER_HOME` and - `HADOOP_PREFIX` in `accumulo-env.sh` will help Accumulo find these jars - when using the default setting for `general.classpaths` in accumulo-site.xml. - -If Accumulo has problems later on finding jars, then run `bin/accumulo -classpath` to print out info about where Accumulo is finding jars. If the -settings mentioned above are correct, then inspect `general.classpaths` in -`accumulo-site.xml`. +3. (Optional) The Accumulo master is configured by default to use 512MB while the garbage collector --- End diff -- Can you word this differently so that the exact memory amounts are not specified here? If those defaults are changed in the env file, then the person making the change would need to know to make the changes here. I think its better to not duplicate the info here.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---