Re: Storage structure question

2016-11-25 Thread Puja Valiyil
You both should take a look at the entity centric index -- it uses the column 
family and column qualifier to store triple info outside of the row key.   We 
have looked at alternate storage schemes, they have been largely abandoned due 
a desire to not disrupt existing users (which may have tools that take 
advantage of the current storage scheme and bypass the dao).  

Sent from my iPhone

> On Nov 25, 2016, at 2:37 AM, pranav.puri  wrote:
> 
> hi
> 
> Since triples are stored entirely in the row id in Accumulo.
> 
> Can i use column qualifier or value for storing different parameters for the 
> triples.So that I could implement some iterator to filter the triples based 
> on these parameters and then run sparql queries on these triples.
> 
> 
>> On Tuesday 22 November 2016 07:36 PM, Aaron D. Mihalik wrote:
>> Good questions.  Putting all of the information in the Row ID seems like a
>> common pattern for composite indices in Accumulo, but I went back to the
>> original Rya Paper [1] to pull out the reasoning:
>> 
>> """
>> All the data for the triple resides in the Accumulo Row ID. This offers
>> several benefits: 1) by using a direct string representation, we can do
>> direct range scans on the literals; 2) the format is very easy to serialize
>> and deserialize, which provides for faster query and ingest; 3) since no
>> information needs to be stored in the Column Family, Qualifier, or Value
>> fields of the Accumulo tables, the storage requirements for the triples are
>> significantly reduced.
>> """
>> 
>> As for overriding the storage mechanism/pattern, you'd probably have to
>> write your own TripleRowResolver [2].  There a slide deck here [3] that
>> show's the different layers of Rya, and that might be a good place to start.
>> 
>> What sort of storage scheme are you considering?
>> 
>> --Aaron
>> 
>> [1] https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
>> [2]
>> https://github.com/apache/incubator-rya/blob/master/common/rya.api/src/main/java/org/apache/rya/api/resolver/triple/TripleRowResolver.java
>> [3] https://cwiki.apache.org/confluence/display/RYA/Rya+Office+Hours (see
>> "Running Through Rya Examples")
>> 
>>> On Mon, Nov 21, 2016 at 11:28 PM Greg Clark  wrote:
>>> 
>>> As I understand, Rya stores triples entirely in the row id in Accumulo.
>>> Why?
>>> 
>>> Are entries stored this way to avoid overloading rows?
>>> 
>>> Is the storage configurable; that is, could I set a flag to allow triples
>>> to be stored with one part of the triple in the row id, one part in the
>>> column family, and the remaining one in the column qualifier?  Or perhaps
>>> set a configuration to use a different storage approach?
> 


Re: Storage structure question

2016-11-24 Thread pranav.puri

hi

Since triples are stored entirely in the row id in Accumulo.

Can i use column qualifier or value for storing different parameters for 
the triples.So that I could implement some iterator to filter the 
triples based on these parameters and then run sparql queries on these 
triples.



On Tuesday 22 November 2016 07:36 PM, Aaron D. Mihalik wrote:

Good questions.  Putting all of the information in the Row ID seems like a
common pattern for composite indices in Accumulo, but I went back to the
original Rya Paper [1] to pull out the reasoning:

"""
All the data for the triple resides in the Accumulo Row ID. This offers
several benefits: 1) by using a direct string representation, we can do
direct range scans on the literals; 2) the format is very easy to serialize
and deserialize, which provides for faster query and ingest; 3) since no
information needs to be stored in the Column Family, Qualifier, or Value
fields of the Accumulo tables, the storage requirements for the triples are
significantly reduced.
"""

As for overriding the storage mechanism/pattern, you'd probably have to
write your own TripleRowResolver [2].  There a slide deck here [3] that
show's the different layers of Rya, and that might be a good place to start.

What sort of storage scheme are you considering?

--Aaron

[1] https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
[2]
https://github.com/apache/incubator-rya/blob/master/common/rya.api/src/main/java/org/apache/rya/api/resolver/triple/TripleRowResolver.java
[3] https://cwiki.apache.org/confluence/display/RYA/Rya+Office+Hours (see
"Running Through Rya Examples")

On Mon, Nov 21, 2016 at 11:28 PM Greg Clark  wrote:


As I understand, Rya stores triples entirely in the row id in Accumulo.
Why?

Are entries stored this way to avoid overloading rows?

Is the storage configurable; that is, could I set a flag to allow triples
to be stored with one part of the triple in the row id, one part in the
column family, and the remaining one in the column qualifier?  Or perhaps
set a configuration to use a different storage approach?





Re: Storage structure question

2016-11-22 Thread Aaron D. Mihalik
Good questions.  Putting all of the information in the Row ID seems like a
common pattern for composite indices in Accumulo, but I went back to the
original Rya Paper [1] to pull out the reasoning:

"""
All the data for the triple resides in the Accumulo Row ID. This offers
several benefits: 1) by using a direct string representation, we can do
direct range scans on the literals; 2) the format is very easy to serialize
and deserialize, which provides for faster query and ingest; 3) since no
information needs to be stored in the Column Family, Qualifier, or Value
fields of the Accumulo tables, the storage requirements for the triples are
significantly reduced.
"""

As for overriding the storage mechanism/pattern, you'd probably have to
write your own TripleRowResolver [2].  There a slide deck here [3] that
show's the different layers of Rya, and that might be a good place to start.

What sort of storage scheme are you considering?

--Aaron

[1] https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
[2]
https://github.com/apache/incubator-rya/blob/master/common/rya.api/src/main/java/org/apache/rya/api/resolver/triple/TripleRowResolver.java
[3] https://cwiki.apache.org/confluence/display/RYA/Rya+Office+Hours (see
"Running Through Rya Examples")

On Mon, Nov 21, 2016 at 11:28 PM Greg Clark  wrote:

> As I understand, Rya stores triples entirely in the row id in Accumulo.
> Why?
>
> Are entries stored this way to avoid overloading rows?
>
> Is the storage configurable; that is, could I set a flag to allow triples
> to be stored with one part of the triple in the row id, one part in the
> column family, and the remaining one in the column qualifier?  Or perhaps
> set a configuration to use a different storage approach?
>


Storage structure question

2016-11-21 Thread Greg Clark
As I understand, Rya stores triples entirely in the row id in Accumulo.
Why?

Are entries stored this way to avoid overloading rows?

Is the storage configurable; that is, could I set a flag to allow triples
to be stored with one part of the triple in the row id, one part in the
column family, and the remaining one in the column qualifier?  Or perhaps
set a configuration to use a different storage approach?