[jira] [Commented] (JENA-624) Develop a new in-memory RDF Dataset implementation

ASF GitHub Bot (JIRA) Tue, 10 Nov 2015 06:37:31 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998667#comment-14998667
 ]


ASF GitHub Bot commented on JENA-624:
-------------------------------------

Github user ajs6f commented on a diff in the pull request:

    https://github.com/apache/jena/pull/94#discussion_r44412440
  
    --- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/core/mem/QuadTable.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.sparql.core.mem;
    +
    +import static org.apache.jena.graph.Node.ANY;
    +
    +import java.util.stream.Stream;
    +
    +import org.apache.jena.graph.Node;
    +import org.apache.jena.sparql.core.Quad;
    +
    +/**
    + * A simplex or multiplex table of {@link Quad}s. Implementations may wish 
to override {@link #listGraphNodes()} with a
    + * more efficient implementation.
    + *
    + */
    +public interface QuadTable extends TupleTable<Quad> {
    +
    +   /**
    +    * Search the table using a pattern of slots. {@link Node#ANY} or 
<code>null</code> will work as a wildcard.
    +    *
    +    * @param g the graph node of the pattern
    +    * @param s the subject node of the pattern
    +    * @param p the predicate node of the pattern
    +    * @param o the object node of the pattern
    +    * @return an {@link Stream} of matched quads
    +    */
    +   Stream<Quad> find(Node g, Node s, Node p, Node o);
    +
    +   /**
    +    * Discover the graphs named in the table
    +    *
    +    * @return an {@link Stream} of graph names used in this table
    +    */
    +   default Stream<Node> listGraphNodes() {
    +           return find(ANY, ANY, ANY, ANY).map(Quad::getGraph).distinct();
    +   }
    --- End diff --
    
    Cool. I agree that the "forked impl" is a bit odd, but I did it for what I 
think are real gains in concision and clarity _within_ those types. The 
important point is that `QuadTableForm` is an `enum` and it's the six values 
_within_ `QuadTableForm` that actually impl `QuadTable`. Then `HexTable` binds 
up all those forms into a structure that acts as _one_ `QuadTable` by selecting 
the most efficient form(s) for any given operation. I'll add some comments to 
explain that relationship more fully and clearly.


> Develop a new in-memory RDF Dataset implementation
> --------------------------------------------------
>
>                 Key: JENA-624
>                 URL: https://issues.apache.org/jira/browse/JENA-624
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Andy Seaborne
>            Assignee: A. Soroka
>              Labels: gsoc, gsoc2015, java, linked_data, rdf
>
> The current (Jan 2014) Jena in-memory dataset uses a general purpose 
> container that works for any storage technology for graphs together with 
> in-memory graphs.  
> This project would develop a new implementation design specifically for RDF 
> datasets (triples and quads) and efficient SPARQL execution, for example, 
> using multi-core parallel operations and/or multi-version concurrent 
> datastructures to maximise true parallel operation.
> This is a system project suitable for someone interested in datatbase 
> implementation, datastructure design and implementation, operating systems or 
> distributed systems.
> Note that TDB can operate in-memory using a simulated disk with 
> copy-in/copy-out semantics for disk-level operations.  It is for faithful 
> testing TDB infrastructure and is not designed performance, general in-memory 
> use or use at scale.  While lesson may be learnt from that system, TDB 
> in-memory is not the answer here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-624) Develop a new in-memory RDF Dataset implementation

Reply via email to