Re: Does Rya directly interact with Hadoop (bypassing Accumulo)?

Maxim Kolchin Mon, 02 Jul 2018 04:41:08 -0700

Hi David,

Thank you for your response!


> Rya uses an Accumulo client.  The Accumulo client is dependent on Hadoop,
particularly HDFS and zookeeper.

Yes, I know that, but I wasn't sure whether the client really does anything
with Hadoop since the SPARQL endpoint executes queries even without it. So
from your response, I guess the answer is: No. Although Rya depends on
Hadoop libraries, it actually doesn't require running the MapReduce or any
other part of Hadoop except HDFS (or its alternative) which is used by
Accumulo.

> BTW, I think we would all love to hear more about your use of Rya on
Google Cloud.

I think it makes sense to move this discussion to another thread. But for
now, I can refer you to the discussion in the Accumulo's mailing list [1].

[1]:
https://lists.apache.org/thread.html/f9628e2b39bfefbe62984f72fa2d30df6f1908fe57895a85d11f7c2a@%3Cuser.accumulo.apache.org%3E

Regards,
Maxim

On Fri, Jun 29, 2018 at 9:52 PM David Lotts <dlo...@gmail.com> wrote:

> Rya uses the hadoop-common library to manage the configuration properties.
> This class is a descendent of org.apache.hadoop.conf.Configuration :
>
>
> https://github.com/apache/incubator-rya/blob/master/dao/accumulo.rya/src/main/java/org/apache/rya/accumulo/AccumuloRdfConfiguration.java
>
> When Rya calls  (AccumuloRdfConfiguration.java:323)
>
>   org.apache.hadoop.conf.Configuration.getInstances(...)
>
> it checks for HADOOP_HOME and throws your error.
>
> You might be able to turn that off using Configuration
> <
> http://hadoop.apache.org/docs/r3.0.1/api/org/apache/hadoop/conf/Configuration.html#Configuration-boolean-
> >
> (boolean loadDefaults) constructor as described here:
>
>
> http://hadoop.apache.org/docs/r3.0.1/api/org/apache/hadoop/conf/Configuration.html
> Quote:
>
>     Unless explicitly turned off, Hadoop by default specifies two
> resources, loaded in-order from the classpath:
>
>    1. core-default.xml
>    <
> http://hadoop.apache.org/docs/r3.0.1/hadoop-project-dist/hadoop-common/core-default.xml
> >:
>    Read-only defaults for hadoop.
>    2. core-site.xml: Site-specific configuration for a given hadoop
>    installation.
>
> BTW, I think we would all love to hear more about your use of Rya on Google
> Cloud.
> david.
>
> On Fri, Jun 29, 2018 at 1:00 PM David Lotts <dlo...@gmail.com> wrote:
>
> > Hi Maxim,
> > Rya uses an Accumulo client.  The Accumulo client is dependent on Hadoop,
> > particularly HDFS and zookeeper.
> > The environment variable  HADOOP_HOME or corresponding property is
> > required to find the path to the locally installed hadoop runtime files.
> > Details can be found in the Accumulo manual where it describes running
> > client code:
> >
> >
> >
> https://accumulo.apache.org/1.7/accumulo_user_manual.html#_writing_accumulo_clients
> >
> > david.
> >
> >
> > On Wed, Jun 20, 2018 at 9:16 AM Maxim Kolchin <kolchin...@gmail.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> I'm running Apache Accumulo on Google VMs (aka Compute Engine) and use
> >> Google Cloud Storage as a replacement for HDFS. Hadoop (its MapReduce
> >> part)
> >> is only used to run the import job to load some RDF to Accumulo, so I
> shut
> >> it down when the job finishes.
> >>
> >> When I trying to startup Rya I see the following exception:
> >> java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. The
> full
> >> log is at [1].
> >>
> >> Does it mean that Rya requires a direct access to Hadoop? I guess Rya is
> >> looking for a Hadoop cluster in Zookeeper and when it can't find it
> there
> >> it's looking for the HADOOP_HOME.
> >>
> >> [1]: https://gist.github.com/KMax/687293ce666754ce8eed11c369a0db05
> >>
> >> Thank you in advance!
> >> Maxim Kolchin
> >>
> >> E-mail: kolchin...@gmail.com
> >> Tel.: +7 (911) 199-55-73
> >> Homepage: http://kolchinmax.ru
> >>
> >
>

Re: Does Rya directly interact with Hadoop (bypassing Accumulo)?

Reply via email to