Hi Rya dev community,

we are very interested in Apache Rya and would like to contribute to the
project, starting with some proposals for development.

But first we would like to introduce ourselves. Semantic Web Company is the
leading provider of graph-based metadata, search and analytic solutions.
The company is the vendor of PoolParty Semantic Suite (
https://www.poolparty.biz/), one of the most renowned semantic software
platforms on the global market. PoolParty supports enterprise needs in
information management, metadata management, cognitive computing, data
analytics and content excellence.

PoolParty consists of different components that all integrate triple stores
for data storage. We use the rdf4j api (currently 2.2.4) and integrate with
stores from different vendors. Regarding data management, the different
components have different requirements. Some components do a lot of rather
small reads and writes, while others bulk store large data sets or use
complex sparql queries for searching.

The Rya store definitely looks promising for us. We would like to integrate
it into our components and also contribute in the process. To do so we came
up with some issues that we would have to solve. These might also be
interesting for you so we want to share our list here for discussion. We
already did some tests regarding these issues.


Dependency versions:
--------------------

rdf4j: we are currently using 2.2.4. I think the swith from sesame to rdf4j
is very important for integrators.

mongodb: 3.6.3 seems to work with the current implementation.

Accumulo: 1.8.1

   - note that the accumulo upgrade also required two code changes that
     should be checked by an accumulo expert.
     for example in this file [1], the other file was this one [2]. this is
just for
     information now, we would of course submit pull requests or patches.
   - also note that accumulo upgrade requires a libthrift upgrade
     (when running Rya with the current libthrift dependency inside an
alpine
     docker image we ran into a segmentation fault)

Hadoop 2.9.0


Integration:
------------

Programmatically:

To integrate with most of our components, we will use the rdf4j library. As
already noted above we are currently using version 2.2.4.

For some of our components atomic actions are mandatory. The current
version of MongoDB does not seem support this (only on document level).
However, the upcoming version 4 with provide ACID, which we think could be
beneficial to all who need some data consistency guarantees for their use
cases. Accumulo seems to have some sort of transactional behaviour, but we
do not know how this works in combination with rdf4j. Maybe someone could
answer this?

SPARQL:

           We noted some issues regarding standards compliance of the
SPARQL endpoint:

- default URL path like (...my.domain.org/sparql)
- return of official mimetypes (i.e. application/sparql-results+xml instead
of text/xml)
- content-negotiation via HTTP headers..


Deployment and Testing:
-----------------------

We can deploy and test all components within our default system
environment, where our system operations team will create the required
nodes and all components will be installed as defined in Rya's installation
guide either on one single node or on multiple nodes. The nodes would be
dedicated VMs. So we can help not only with functional tests, but also
regarding performance and scalalbility with different setups.

         We are also deploying Rya within an orchestrated cluster based on
DC/OS (https://dcos.io/), which means:

- docker images with the components are managed by mesosphere marathon (
https://mesosphere.com/blog/marathon-production-ready-containers/) deployed
as a apache mesos framework (http://mesos.apache.org/).
- getting the services involved (Accumulo/MongoDB + Rya) certified as a
DC/OS service (https://universe.dcos.io/#/packages) would surely help to
spread the word about Rya.


We hope this sounds interesting for you and we would like to get your
feedback. If you have any questions, please ask.

Kind regards / Beste Grüße,

Robert David

*Download now
<https://www.poolparty.biz/wp-content/uploads/2016/09/IDCPaper_DataIntegrationwithSemanticTechnologies.pdf>
**IDC
Technology Spotlight *
*Get certified! <https://www.poolparty.biz/academy/> **PoolParty Academy*


*Robert David*
CTO
Semantic Web Company GmbH

EU: +43-14021235 <+43%201%204021235>
US: (415) 800-3776
https://www.poolparty.biz
https://www.semantic-web.com

Reply via email to