Re: Rya load data

Maxim Kolchin Mon, 26 Nov 2018 00:07:17 -0800

Hi,

I guess you've already studied the Bulk Loading data section in wiki [0],
but anyway let's go through it one more time.


1. Did you load the data to your HDFS instance? I see hdfs://$RDF_DATA in
your command, but do you really declare this envvar?
2. You need to load the rya.mapreduce jar to HDFS too. You can find the
shaded jar in mapreduce/target/rya.mapreduce-<version>-shaded.jar.
3. Please, check that you use correct prefixes for the options. In example,
the option -Dcb.instance=accumulo must be incorrect, the wiki says to use
-Dac.instance=accumulo instead.

[0]:
https://github.com/apache/incubator-rya/blob/master/extras/rya.manual/src/site/markdown/loaddata.md#bulk-loading-data

Maxim

On Mon, Nov 26, 2018 at 6:24 AM 徐炜淇 <xuwe...@tju.edu.cn> wrote:

> Dear sir or madam,
> I really don't want to bother you because these questions seem "stupid."
> But I have been troubled by them for a month, so that I can't do the next
> step. I am very troubled, so I have to send you this very rude mail! Please
> be forgive me, I urgently need your help now.
> My question is I can not load data, there are details:
>
>
> 1. Bulk load data
> When I executed this instruction, the instruction is
> hadoop jar usr/local/rya.mapreduce-4.0.0-incubting-SNAPSHOT-shaded.jar
> org.apache.rya.accumulo.mr.RdfFileInputTool -Dac.zk=localhost:2181
> -Dac.instance=accumulo -Dac.username=root -Dac.pwd=111111
> -Drdf.tablePrefix=triplestore_ -Drdf.format=N-Triples hdfs://$RDF_DATA
>
> there are always out errors about “Exception in thread “main”
> java.long.ClassNotFoundException: 
> org.apache.rya.accumulo.mr.RdfFileInputTool”.
> As shown in the picture
>
> I find the location of "RdfFileInputTool ", so I add "tools." before
> "RdfFileInputTool ", the instruction turns to:
>  hadoop jar /usr/local/rya.mapreduce-4.0.0-incubating-SNAPSHOT-shaded.jar
> org.apache.rya.accumulo.mr.tools.RdfFileInputTool
> -Drdf.tablePrefix=triplestore_ -Dcb.username=root -Dcb.pwd=111111
> -Dcb.instance=accumulo -Dcb.zk=localhost:2181 -Drdf.format=N-Triples
> hdfs://$RDF_DATA
> But the errors are:
> java.lang.NullPointerException: Accumulo instance name [ac.instance] not
> set.
>     at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>     at org.apache.rya.accumulo.mr
> .AbstractAccumuloMRTool.init(AbstractAccumuloMRTool.java:133)
>     at org.apache.rya.accumulo.mr
> .tools.RdfFileInputTool.run(RdfFileInputTool.java:63)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>     at org.apache.rya.accumulo.mr
> .tools.RdfFileInputTool.main(RdfFileInputTool.java:55)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 2. Web Rest Point
>
> The quick start in eht website has told me to use these java codes to load
> data to REST endpoint, but my question is : I have copy these codes to
> Eclipse, and put it as a jar to HDFS, but I can not load the data.
> And the codes like the picture, I tried to run the code in eclipse but it
> went wrong, so I export these codes as an jar and put this jar to HDFS, but
> still can't use it.:
>
>
>
>
> 3.The accumulo-site.xml
>
>
> .....
>  <property>
>     <name>instance.volumes</name>
>     <value>hdfs://192.168.122.1:8020/accumulo</value>
>     <description>comma separated list of URIs for volumes. example:
> hdfs://localhost:9000/accumulo</description>
>   </property>
>
>   <property>
>     <name>instance.zookeeper.host</name>
>     <value>192.168.122.1:2181</value>
>     <description>comma separated list of zookeeper servers</description>
>   </property>
>
>   <property>
>     <name>instance.secret</name>
>     <value>PASS1234</value>
>     <description>A secret unique to a given instance that all servers must
> know in order to communicate with one another.
>       Change it before initialization. To
>       change it later use ./bin/accumulo
> org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new
> [newpasswd],
>       and then update this file.
>     </description>
>   </property>
> .....
>
>
>
>
>
>
> 4.the accumulo_evn.sh
>
>
> #! /usr/bin/env bash
>
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # contributor license agreements.  See the NOTICE file distributed with
> # this work for additional information regarding copyright ownership.
> # The ASF licenses this file to You under the Apache License, Version 2.0
> # (the "License"); you may not use this file except in compliance with
> # the License.  You may obtain a copy of the License at
> #
> #     http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
>
> ###
> ### Configure these environment variables to point to your local
> installations.
> ###
> ### The functional tests require conditional values, so keep this style:
> ###
> ### test -z "$JAVA_HOME" && export JAVA_HOME=/usr/lib/jvm/java
> ###
> ###
> ### Note that the -Xmx -Xms settings below require substantial free memory:
> ### you may want to use smaller values, especially when running everything
> ### on a single machine.
> ### && export HADOOP_PREFIX=/path/to/hadoop
> ###
> if [[ -z $HADOOP_HOME ]] ; then
>    test -z "$HADOOP_PREFIX"      && export
> HADOOP_PREFIX=/opt/cloudera/parcels/CDH-5.11.2-1.cdh5.11.2.p0.4
> else
>    HADOOP_PREFIX="$HADOOP_HOME"
>    unset HADOOP_HOME
> fi
>
> ###&& export HADOOP_CONF_DIR="$HADOOP_PREFIX/etc/hadoop"
> # hadoop-2.0:
> test -z "$HADOOP_CONF_DIR"       && export
> HADOOP_CONF_DIR="/usr/local/hadoop-2.7.4/etc/hadoop"
>
> test -z "$JAVA_HOME"             && export
> JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
> test -z "$ZOOKEEPER_HOME"        && export
> ZOOKEEPER_HOME=/home/v7/RyaInstall/zookeeper-3.4.10
> test -z "$ACCUMULO_LOG_DIR"      && export
> ACCUMULO_LOG_DIR=$ACCUMULO_HOME/logs
> if [[ -f ${ACCUMULO_CONF_DIR}/accumulo.policy ]]
> then
>    POLICY="-Djava.security.manager
> -Djava.security.policy=${ACCUMULO_CONF_DIR}/accumulo.policy"
> fi
> test -z "$ACCUMULO_TSERVER_OPTS" && export
> ACCUMULO_TSERVER_OPTS="${POLICY} -Xmx384m -Xms384m "
> test -z "$ACCUMULO_MASTER_OPTS"  && export ACCUMULO_MASTER_OPTS="${POLICY}
> -Xmx128m -Xms128m"
> test -z "$ACCUMULO_MONITOR_OPTS" && export
> ACCUMULO_MONITOR_OPTS="${POLICY} -Xmx64m -Xms64m"
> test -z "$ACCUMULO_GC_OPTS"      && export ACCUMULO_GC_OPTS="-Xmx64m
> -Xms64m"
> test -z "$ACCUMULO_SHELL_OPTS"   && export ACCUMULO_SHELL_OPTS="-Xmx128m
> -Xms64m"
> test -z "$ACCUMULO_GENERAL_OPTS" && export
> ACCUMULO_GENERAL_OPTS="-XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75 -Djava.net.preferIPv4Stack=true
> -XX:+CMSClassUnloadingEnabled"
> test -z "$ACCUMULO_OTHER_OPTS"   && export ACCUMULO_OTHER_OPTS="-Xmx128m
> -Xms64m"
> test -z "${ACCUMULO_PID_DIR}"    && export
> ACCUMULO_PID_DIR="${ACCUMULO_HOME}/run"
> # what do when the JVM runs out of heap memory
> export ACCUMULO_KILL_CMD='kill -9 %p'
>
> ### Optionally look for hadoop and accumulo native libraries for your
> ### platform in additional directories. (Use DYLD_LIBRARY_PATH on Mac OS
> X.)
> ### May not be necessary for Hadoop 2.x or using an RPM that installs to
> ### the correct system library directory.
> # export
> LD_LIBRARY_PATH=${HADOOP_PREFIX}/lib/native/${PLATFORM}:${LD_LIBRARY_PATH}
>
> # Should the monitor bind to all network interfaces -- default: false
> export ACCUMULO_MONITOR_BIND_ALL="true"
>
> # Should process be automatically restarted
> # export ACCUMULO_WATCHER="true"
>
> # What settings should we use for the watcher, if enabled
> export UNEXPECTED_TIMESPAN="3600"
> export UNEXPECTED_RETRIES="2"
>
> export OOM_TIMESPAN="3600"
> export OOM_RETRIES="5"
>
> export ZKLOCK_TIMESPAN="600"
> export ZKLOCK_RETRIES="5"
>
> # The number of .out and .err files per process to retain
> # export ACCUMULO_NUM_OUT_FILES=5
>
> export NUM_TSERVERS=1
>
> ### Example for configuring multiple tservers per host. Note that the
> ACCUMULO_NUMACTL_OPTIONS
> ### environment variable is used when NUM_TSERVERS is 1 to preserve
> backwards compatibility.
> ### If NUM_TSERVERS is greater than 2, then the TSERVER_NUMA_OPTIONS array
> is used if defined.
> ### If TSERVER_NUMA_OPTIONS is declared but not the correct size, then the
> service will not start.
> ###
> ### export NUM_TSERVERS=2
> ### declare -a TSERVER_NUMA_OPTIONS
> ### TSERVER_NUMA_OPTIONS[1]="--cpunodebind 0"
> ### TSERVER_NUMA_OPTIONS[2]="--cpunodebind 1"
>
>
>
>
>
>
> 5.My classpath
> .....
> export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
> export HADOOP_HOME=/usr/local/hadoop-2.7.4
> export ZOOKEEPER_HOME=/home/v7/RyaInstall/zookeeper-3.4.10
> export MAVEN_HOME=/usr/share/maven
> export ACCUMULO_HOME=/home/v7/RyaInstall/accumulo-1.9.2
> export ENVIROMENT_PROPERTIES=/usr/local
> export
> PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_H
> OME/bin:$MAVEN_HOME/bin:$ENVIROMENT_PROPERTIES
> .....
>
>
> 6.environment.properties
> instance.name=accumulo
> instance.zk=localhost:2181
> instance.username=root
> instance.password=111111
> rya.tableprefix=triplestore_
> rya.displayqueryplan=true
>
>
>
>
>
>
>
>
>
>
>

Re: Rya load data

Reply via email to