Re: Using SPARQL against HBase

Edward J. Yoon Mon, 05 Apr 2010 02:51:36 -0700

Well, the structure should be fit for the purpose but, I don't know
what you are trying to do. (e.g., SPARQL adapter? large-scale RDF
processing and storing?)


On Mon, Apr 5, 2010 at 3:14 PM, Amandeep Khurana <[email protected]> wrote:
> Edward,
>
> I think for now we'll start with modeling how to store triples such that we
> can run real time SPARQL queries on them and then later look at the Pregel
> model and how we can leverage that for bulk processing. The Bigtable data
> model doesnt lend itself directly to store triples such that fast querying
> is possible. Do you have any idea on how Google stores linked data in
> bigtable? We can build on it from there.
>
> -ak
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Sun, Apr 4, 2010 at 10:50 PM, Edward J. Yoon <[email protected]>wrote:
>
>> Hi, I'm a proposer/sponsor of heart project.
>>
>> I have no doubt that RDF can be stored in HBase because google also
>> stores linked-data in their bigtable.
>>
>> However, If you want to focus on large-scale (distributed) processing,
>> I would recommend you to read google pregel project (google's graph
>> computing framework). because the SPARQL is a basically graph query
>> language for RDF graph data.
>>
>> On Fri, Apr 2, 2010 at 7:09 AM, Jürgen Jakobitsch <[email protected]>
>> wrote:
>> > hi again,
>> >
>> > i'm definitly interested.
>> >
>> > you probably heard of the heart project, but there's hardly something
>> going on,
>> > so i think it's well worth the effort.
>> >
>> > for your discussion days i'd recommend taking a look at openrdf sail api
>> >
>> > @http://www.openrdf.org/doc/sesame2/system/
>> >
>> > the point is that there is allready everything you need like query engine
>> and the
>> > like..
>> > to make it clear for beginning a quad store its close to perfect because
>> it
>> > actually comes down to implement the getStatements method as accurate as
>> possible.
>> >
>> > the query engine does the same by parsing the sparql query and using the
>> getStatements method.
>> >
>> > now this method simply has five arguments :
>> >
>> > subject, predicate, object, includeinferred and contexts, where subject
>> predicate, object can
>> > be null, includeinferred can be ignored for starting and contexts can
>> also be null for a starter
>> > or an array of uris.
>> >
>> > also note that the sail api is quite commonly used (virtuoso,
>> openrdfsesame, neo4j, bigdata, even oracle has an old version,
>> > we'll be having one implementation for talis and 4store in the coming
>> weeks and of course my quadstore "tuqs")
>> >
>> > if you find the way to retrieve the triples (quads) from hbase i could
>> implement a sail
>> > store in a day - et voila ...
>> >
>> > anyways it would be nice if you keep me informed .. i'd really like to
>> contribute...
>> >
>> > wkr www.turnguard.com
>> >
>> >
>> > ----- Original Message -----
>> > From: "Amandeep Khurana" <[email protected]>
>> > To: [email protected]
>> > Sent: Thursday, April 1, 2010 11:45:00 PM
>> > Subject: Re: Using SPARQL against HBase
>> >
>> > Andrew and I just had a chat about exploring how we can leverage HBase
>> for a
>> > scalable RDF store and we'll be looking at it in more detail over the
>> next
>> > few days. Is anyone of you interested in helping out? We are going to be
>> > looking at what all is required to build a triple store + query engine on
>> > HBase and how HBase can be used as is or remodeled to fit the problem.
>> > Depending on what we find out, we'll decide on taking the project further
>> > and committing efforts towards it.
>> >
>> > -Amandeep
>> >
>> >
>> > Amandeep Khurana
>> > Computer Science Graduate Student
>> > University of California, Santa Cruz
>> >
>> >
>> > On Thu, Apr 1, 2010 at 1:12 PM, Jürgen Jakobitsch <[email protected]
>> >wrote:
>> >
>> >> hi,
>> >>
>> >> this sounds very interesting to me, i'm currently fiddling
>> >> around with a suitable row and column setup for triples.
>> >>
>> >> i'm about to implement openrdf's sail api for hbase (i just did
>> >> a lucene quad store implementation which is superfast a scales
>> >> to a couple of hundreds of millions of triples (
>> http://turnguard.com/tuqs
>> >> ))
>> >> but i'm in my first days of hbase encounters, so my experience
>> >> in row column design is manageable.
>> >>
>> >> from my point of view the problem is to really efficiantly store
>> >> besides the triples themselves the contexts (named graphs) and
>> >> languages of literal.
>> >>
>> >> by the way : i just did a small tablemanager (in beta) that lets
>> >> you create htables -> from <- rdf (see
>> >> http://sourceforge.net/projects/hbasetablemgr/)
>> >>
>> >> i'd be really happy to contribute on the rdf and sparql side,
>> >> but certainly could need some help on the hbase table design side.
>> >>
>> >> wkr www.turnguard.com/turnguard
>> >>
>> >>
>> >>
>> >> ----- Original Message -----
>> >> From: "Raffi Basmajian" <[email protected]>
>> >> To: [email protected], [email protected]
>> >> Sent: Thursday, April 1, 2010 9:45:59 PM
>> >> Subject: RE: Using SPARQL against HBase
>> >>
>> >>
>> >> This is an interesting article from a few guys over at BBN/Raytheon. By
>> >> storing triples in flat files theu used a custom algorithm, detailed in
>> >> the article, to iterate the WHERE clause from a SPARQL query and reduce
>> >> the map into the desired result.
>> >>
>> >> This is very similar to what I need to do; the only difference being
>> >> that our data is stored in Hbase tables, not as triples in flat files.
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Amandeep Khurana [mailto:[email protected]]
>> >> Sent: Wednesday, March 31, 2010 3:30 PM
>> >> To: [email protected]; [email protected]
>> >> Subject: Re: Using SPARQL against HBase
>> >>
>> >> Why do you need to build an in-memory graph which you would want to
>> >> read/write to? You could store the graph in HBase directly. As pointed
>> >> out, HBase might not be the best suited for SPARQL queries, but its not
>> >> impossible to do. Using the triples, you can form a graph that can be
>> >> represented in HBase as an adjacency list. I've stored graphs with
>> >> 16-17M nodes which was data equivalent to about 600M triples. And this
>> >> was on a small cluster and could certainly scale way more than 16M graph
>> >> nodes.
>> >>
>> >> In case you are interested in working on SPARQL over HBase, we could
>> >> collaborate on it...
>> >>
>> >> -ak
>> >>
>> >>
>> >> Amandeep Khurana
>> >> Computer Science Graduate Student
>> >> University of California, Santa Cruz
>> >>
>> >>
>> >> On Wed, Mar 31, 2010 at 11:56 AM, Andrew Purtell
>> >> <[email protected]>wrote:
>> >>
>> >> > Hi Raffi,
>> >> >
>> >> > To read up on fundamentals I suggest Google's BigTable paper:
>> >> > http://labs.google.com/papers/bigtable.html
>> >> >
>> >> > Detail on how HBase implements the BigTable architecture within the
>> >> > Hadoop ecosystem can be found here:
>> >> >
>> >> >  http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
>> >> >
>> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>> >> >
>> >> >
>> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-l
>> >> > og.html
>> >> >
>> >> > Hope that helps,
>> >> >
>> >> >   - Andy
>> >> >
>> >> > > From: Basmajian, Raffi <[email protected]>
>> >> > > Subject: RE: Using SPARQL against HBase
>> >> > > To: [email protected], [email protected]
>> >> > > Date: Wednesday, March 31, 2010, 11:42 AM If Hbase can't respond to
>> >> > > SPARQL-like queries, then what type of query language can it respond
>> >>
>> >> > > to? In a traditional RDBMS database one would use SQL; so what is
>> >> > > the counterpart query language with Hbase?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> This e-mail transmission may contain information that is proprietary,
>> >> privileged and/or confidential and is intended exclusively for the
>> person(s)
>> >> to whom it is addressed. Any use, copying, retention or disclosure by
>> any
>> >> person other than the intended recipient or the intended recipient's
>> >> designees is strictly prohibited. If you are not the intended recipient
>> or
>> >> their designee, please notify the sender immediately by return e-mail
>> and
>> >> delete all copies. OppenheimerFunds may, at its sole discretion,
>> monitor,
>> >> review, retain and/or disclose the content of all email communications.
>> >>
>> >>
>> ==============================================================================
>> >>
>> >>
>> >> --
>> >> punkt. netServices
>> >> ______________________________
>> >> Jürgen Jakobitsch
>> >> Codeography
>> >>
>> >> Lerchenfelder Gürtel 43 Top 5/2
>> >> A - 1160 Wien
>> >> Tel.: 01 / 897 41 22 - 29
>> >> Fax: 01 / 897 41 22 - 22
>> >>
>> >> netServices http://www.punkt.at
>> >>
>> >>
>> >
>> > --
>> > punkt. netServices
>> > ______________________________
>> > Jürgen Jakobitsch
>> > Codeography
>> >
>> > Lerchenfelder Gürtel 43 Top 5/2
>> > A - 1160 Wien
>> > Tel.: 01 / 897 41 22 - 29
>> > Fax: 01 / 897 41 22 - 22
>> >
>> > netServices http://www.punkt.at
>> >
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon @ NHN, corp.
>> [email protected]
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
[email protected]
http://blog.udanax.org

Re: Using SPARQL against HBase

Reply via email to