Re: Storage structure question
You both should take a look at the entity centric index -- it uses the column family and column qualifier to store triple info outside of the row key. We have looked at alternate storage schemes, they have been largely abandoned due a desire to not disrupt existing users (which may have tools that take advantage of the current storage scheme and bypass the dao). Sent from my iPhone > On Nov 25, 2016, at 2:37 AM, pranav.puri wrote: > > hi > > Since triples are stored entirely in the row id in Accumulo. > > Can i use column qualifier or value for storing different parameters for the > triples.So that I could implement some iterator to filter the triples based > on these parameters and then run sparql queries on these triples. > > >> On Tuesday 22 November 2016 07:36 PM, Aaron D. Mihalik wrote: >> Good questions. Putting all of the information in the Row ID seems like a >> common pattern for composite indices in Accumulo, but I went back to the >> original Rya Paper [1] to pull out the reasoning: >> >> """ >> All the data for the triple resides in the Accumulo Row ID. This offers >> several benefits: 1) by using a direct string representation, we can do >> direct range scans on the literals; 2) the format is very easy to serialize >> and deserialize, which provides for faster query and ingest; 3) since no >> information needs to be stored in the Column Family, Qualifier, or Value >> fields of the Accumulo tables, the storage requirements for the triples are >> significantly reduced. >> """ >> >> As for overriding the storage mechanism/pattern, you'd probably have to >> write your own TripleRowResolver [2]. There a slide deck here [3] that >> show's the different layers of Rya, and that might be a good place to start. >> >> What sort of storage scheme are you considering? >> >> --Aaron >> >> [1] https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf >> [2] >> https://github.com/apache/incubator-rya/blob/master/common/rya.api/src/main/java/org/apache/rya/api/resolver/triple/TripleRowResolver.java >> [3] https://cwiki.apache.org/confluence/display/RYA/Rya+Office+Hours (see >> "Running Through Rya Examples") >> >>> On Mon, Nov 21, 2016 at 11:28 PM Greg Clark wrote: >>> >>> As I understand, Rya stores triples entirely in the row id in Accumulo. >>> Why? >>> >>> Are entries stored this way to avoid overloading rows? >>> >>> Is the storage configurable; that is, could I set a flag to allow triples >>> to be stored with one part of the triple in the row id, one part in the >>> column family, and the remaining one in the column qualifier? Or perhaps >>> set a configuration to use a different storage approach? >
Re: Storage structure question
hi Since triples are stored entirely in the row id in Accumulo. Can i use column qualifier or value for storing different parameters for the triples.So that I could implement some iterator to filter the triples based on these parameters and then run sparql queries on these triples. On Tuesday 22 November 2016 07:36 PM, Aaron D. Mihalik wrote: Good questions. Putting all of the information in the Row ID seems like a common pattern for composite indices in Accumulo, but I went back to the original Rya Paper [1] to pull out the reasoning: """ All the data for the triple resides in the Accumulo Row ID. This offers several benefits: 1) by using a direct string representation, we can do direct range scans on the literals; 2) the format is very easy to serialize and deserialize, which provides for faster query and ingest; 3) since no information needs to be stored in the Column Family, Qualifier, or Value fields of the Accumulo tables, the storage requirements for the triples are significantly reduced. """ As for overriding the storage mechanism/pattern, you'd probably have to write your own TripleRowResolver [2]. There a slide deck here [3] that show's the different layers of Rya, and that might be a good place to start. What sort of storage scheme are you considering? --Aaron [1] https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf [2] https://github.com/apache/incubator-rya/blob/master/common/rya.api/src/main/java/org/apache/rya/api/resolver/triple/TripleRowResolver.java [3] https://cwiki.apache.org/confluence/display/RYA/Rya+Office+Hours (see "Running Through Rya Examples") On Mon, Nov 21, 2016 at 11:28 PM Greg Clark wrote: As I understand, Rya stores triples entirely in the row id in Accumulo. Why? Are entries stored this way to avoid overloading rows? Is the storage configurable; that is, could I set a flag to allow triples to be stored with one part of the triple in the row id, one part in the column family, and the remaining one in the column qualifier? Or perhaps set a configuration to use a different storage approach?
Re: Storage structure question
Good questions. Putting all of the information in the Row ID seems like a common pattern for composite indices in Accumulo, but I went back to the original Rya Paper [1] to pull out the reasoning: """ All the data for the triple resides in the Accumulo Row ID. This offers several benefits: 1) by using a direct string representation, we can do direct range scans on the literals; 2) the format is very easy to serialize and deserialize, which provides for faster query and ingest; 3) since no information needs to be stored in the Column Family, Qualifier, or Value fields of the Accumulo tables, the storage requirements for the triples are significantly reduced. """ As for overriding the storage mechanism/pattern, you'd probably have to write your own TripleRowResolver [2]. There a slide deck here [3] that show's the different layers of Rya, and that might be a good place to start. What sort of storage scheme are you considering? --Aaron [1] https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf [2] https://github.com/apache/incubator-rya/blob/master/common/rya.api/src/main/java/org/apache/rya/api/resolver/triple/TripleRowResolver.java [3] https://cwiki.apache.org/confluence/display/RYA/Rya+Office+Hours (see "Running Through Rya Examples") On Mon, Nov 21, 2016 at 11:28 PM Greg Clark wrote: > As I understand, Rya stores triples entirely in the row id in Accumulo. > Why? > > Are entries stored this way to avoid overloading rows? > > Is the storage configurable; that is, could I set a flag to allow triples > to be stored with one part of the triple in the row id, one part in the > column family, and the remaining one in the column qualifier? Or perhaps > set a configuration to use a different storage approach? >
Storage structure question
As I understand, Rya stores triples entirely in the row id in Accumulo. Why? Are entries stored this way to avoid overloading rows? Is the storage configurable; that is, could I set a flag to allow triples to be stored with one part of the triple in the row id, one part in the column family, and the remaining one in the column qualifier? Or perhaps set a configuration to use a different storage approach?