Hi, I guess you've already studied the Bulk Loading data section in wiki [0], but anyway let's go through it one more time.
1. Did you load the data to your HDFS instance? I see hdfs://$RDF_DATA in your command, but do you really declare this envvar? 2. You need to load the rya.mapreduce jar to HDFS too. You can find the shaded jar in mapreduce/target/rya.mapreduce-<version>-shaded.jar. 3. Please, check that you use correct prefixes for the options. In example, the option -Dcb.instance=accumulo must be incorrect, the wiki says to use -Dac.instance=accumulo instead. [0]: https://github.com/apache/incubator-rya/blob/master/extras/rya.manual/src/site/markdown/loaddata.md#bulk-loading-data Maxim On Mon, Nov 26, 2018 at 6:24 AM 徐炜淇 <xuwe...@tju.edu.cn> wrote: > Dear sir or madam, > I really don't want to bother you because these questions seem "stupid." > But I have been troubled by them for a month, so that I can't do the next > step. I am very troubled, so I have to send you this very rude mail! Please > be forgive me, I urgently need your help now. > My question is I can not load data, there are details: > > > 1. Bulk load data > When I executed this instruction, the instruction is > hadoop jar usr/local/rya.mapreduce-4.0.0-incubting-SNAPSHOT-shaded.jar > org.apache.rya.accumulo.mr.RdfFileInputTool -Dac.zk=localhost:2181 > -Dac.instance=accumulo -Dac.username=root -Dac.pwd=111111 > -Drdf.tablePrefix=triplestore_ -Drdf.format=N-Triples hdfs://$RDF_DATA > > there are always out errors about “Exception in thread “main” > java.long.ClassNotFoundException: > org.apache.rya.accumulo.mr.RdfFileInputTool”. > As shown in the picture > > I find the location of "RdfFileInputTool ", so I add "tools." before > "RdfFileInputTool ", the instruction turns to: > hadoop jar /usr/local/rya.mapreduce-4.0.0-incubating-SNAPSHOT-shaded.jar > org.apache.rya.accumulo.mr.tools.RdfFileInputTool > -Drdf.tablePrefix=triplestore_ -Dcb.username=root -Dcb.pwd=111111 > -Dcb.instance=accumulo -Dcb.zk=localhost:2181 -Drdf.format=N-Triples > hdfs://$RDF_DATA > But the errors are: > java.lang.NullPointerException: Accumulo instance name [ac.instance] not > set. > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) > at org.apache.rya.accumulo.mr > .AbstractAccumuloMRTool.init(AbstractAccumuloMRTool.java:133) > at org.apache.rya.accumulo.mr > .tools.RdfFileInputTool.run(RdfFileInputTool.java:63) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.rya.accumulo.mr > .tools.RdfFileInputTool.main(RdfFileInputTool.java:55) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > 2. Web Rest Point > > The quick start in eht website has told me to use these java codes to load > data to REST endpoint, but my question is : I have copy these codes to > Eclipse, and put it as a jar to HDFS, but I can not load the data. > And the codes like the picture, I tried to run the code in eclipse but it > went wrong, so I export these codes as an jar and put this jar to HDFS, but > still can't use it.: > > > > > 3.The accumulo-site.xml > > > ..... > <property> > <name>instance.volumes</name> > <value>hdfs://192.168.122.1:8020/accumulo</value> > <description>comma separated list of URIs for volumes. example: > hdfs://localhost:9000/accumulo</description> > </property> > > <property> > <name>instance.zookeeper.host</name> > <value>192.168.122.1:2181</value> > <description>comma separated list of zookeeper servers</description> > </property> > > <property> > <name>instance.secret</name> > <value>PASS1234</value> > <description>A secret unique to a given instance that all servers must > know in order to communicate with one another. > Change it before initialization. To > change it later use ./bin/accumulo > org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new > [newpasswd], > and then update this file. > </description> > </property> > ..... > > > > > > > 4.the accumulo_evn.sh > > > #! /usr/bin/env bash > > # Licensed to the Apache Software Foundation (ASF) under one or more > # contributor license agreements. See the NOTICE file distributed with > # this work for additional information regarding copyright ownership. > # The ASF licenses this file to You under the Apache License, Version 2.0 > # (the "License"); you may not use this file except in compliance with > # the License. You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > > ### > ### Configure these environment variables to point to your local > installations. > ### > ### The functional tests require conditional values, so keep this style: > ### > ### test -z "$JAVA_HOME" && export JAVA_HOME=/usr/lib/jvm/java > ### > ### > ### Note that the -Xmx -Xms settings below require substantial free memory: > ### you may want to use smaller values, especially when running everything > ### on a single machine. > ### && export HADOOP_PREFIX=/path/to/hadoop > ### > if [[ -z $HADOOP_HOME ]] ; then > test -z "$HADOOP_PREFIX" && export > HADOOP_PREFIX=/opt/cloudera/parcels/CDH-5.11.2-1.cdh5.11.2.p0.4 > else > HADOOP_PREFIX="$HADOOP_HOME" > unset HADOOP_HOME > fi > > ###&& export HADOOP_CONF_DIR="$HADOOP_PREFIX/etc/hadoop" > # hadoop-2.0: > test -z "$HADOOP_CONF_DIR" && export > HADOOP_CONF_DIR="/usr/local/hadoop-2.7.4/etc/hadoop" > > test -z "$JAVA_HOME" && export > JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk > test -z "$ZOOKEEPER_HOME" && export > ZOOKEEPER_HOME=/home/v7/RyaInstall/zookeeper-3.4.10 > test -z "$ACCUMULO_LOG_DIR" && export > ACCUMULO_LOG_DIR=$ACCUMULO_HOME/logs > if [[ -f ${ACCUMULO_CONF_DIR}/accumulo.policy ]] > then > POLICY="-Djava.security.manager > -Djava.security.policy=${ACCUMULO_CONF_DIR}/accumulo.policy" > fi > test -z "$ACCUMULO_TSERVER_OPTS" && export > ACCUMULO_TSERVER_OPTS="${POLICY} -Xmx384m -Xms384m " > test -z "$ACCUMULO_MASTER_OPTS" && export ACCUMULO_MASTER_OPTS="${POLICY} > -Xmx128m -Xms128m" > test -z "$ACCUMULO_MONITOR_OPTS" && export > ACCUMULO_MONITOR_OPTS="${POLICY} -Xmx64m -Xms64m" > test -z "$ACCUMULO_GC_OPTS" && export ACCUMULO_GC_OPTS="-Xmx64m > -Xms64m" > test -z "$ACCUMULO_SHELL_OPTS" && export ACCUMULO_SHELL_OPTS="-Xmx128m > -Xms64m" > test -z "$ACCUMULO_GENERAL_OPTS" && export > ACCUMULO_GENERAL_OPTS="-XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=75 -Djava.net.preferIPv4Stack=true > -XX:+CMSClassUnloadingEnabled" > test -z "$ACCUMULO_OTHER_OPTS" && export ACCUMULO_OTHER_OPTS="-Xmx128m > -Xms64m" > test -z "${ACCUMULO_PID_DIR}" && export > ACCUMULO_PID_DIR="${ACCUMULO_HOME}/run" > # what do when the JVM runs out of heap memory > export ACCUMULO_KILL_CMD='kill -9 %p' > > ### Optionally look for hadoop and accumulo native libraries for your > ### platform in additional directories. (Use DYLD_LIBRARY_PATH on Mac OS > X.) > ### May not be necessary for Hadoop 2.x or using an RPM that installs to > ### the correct system library directory. > # export > LD_LIBRARY_PATH=${HADOOP_PREFIX}/lib/native/${PLATFORM}:${LD_LIBRARY_PATH} > > # Should the monitor bind to all network interfaces -- default: false > export ACCUMULO_MONITOR_BIND_ALL="true" > > # Should process be automatically restarted > # export ACCUMULO_WATCHER="true" > > # What settings should we use for the watcher, if enabled > export UNEXPECTED_TIMESPAN="3600" > export UNEXPECTED_RETRIES="2" > > export OOM_TIMESPAN="3600" > export OOM_RETRIES="5" > > export ZKLOCK_TIMESPAN="600" > export ZKLOCK_RETRIES="5" > > # The number of .out and .err files per process to retain > # export ACCUMULO_NUM_OUT_FILES=5 > > export NUM_TSERVERS=1 > > ### Example for configuring multiple tservers per host. Note that the > ACCUMULO_NUMACTL_OPTIONS > ### environment variable is used when NUM_TSERVERS is 1 to preserve > backwards compatibility. > ### If NUM_TSERVERS is greater than 2, then the TSERVER_NUMA_OPTIONS array > is used if defined. > ### If TSERVER_NUMA_OPTIONS is declared but not the correct size, then the > service will not start. > ### > ### export NUM_TSERVERS=2 > ### declare -a TSERVER_NUMA_OPTIONS > ### TSERVER_NUMA_OPTIONS[1]="--cpunodebind 0" > ### TSERVER_NUMA_OPTIONS[2]="--cpunodebind 1" > > > > > > > 5.My classpath > ..... > export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk > export HADOOP_HOME=/usr/local/hadoop-2.7.4 > export ZOOKEEPER_HOME=/home/v7/RyaInstall/zookeeper-3.4.10 > export MAVEN_HOME=/usr/share/maven > export ACCUMULO_HOME=/home/v7/RyaInstall/accumulo-1.9.2 > export ENVIROMENT_PROPERTIES=/usr/local > export > PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_H > OME/bin:$MAVEN_HOME/bin:$ENVIROMENT_PROPERTIES > ..... > > > 6.environment.properties > instance.name=accumulo > instance.zk=localhost:2181 > instance.username=root > instance.password=111111 > rya.tableprefix=triplestore_ > rya.displayqueryplan=true > > > > > > > > > > >