Well, the structure should be fit for the purpose but, I don't know what you are trying to do. (e.g., SPARQL adapter? large-scale RDF processing and storing?)
On Mon, Apr 5, 2010 at 3:14 PM, Amandeep Khurana <[email protected]> wrote: > Edward, > > I think for now we'll start with modeling how to store triples such that we > can run real time SPARQL queries on them and then later look at the Pregel > model and how we can leverage that for bulk processing. The Bigtable data > model doesnt lend itself directly to store triples such that fast querying > is possible. Do you have any idea on how Google stores linked data in > bigtable? We can build on it from there. > > -ak > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Sun, Apr 4, 2010 at 10:50 PM, Edward J. Yoon <[email protected]>wrote: > >> Hi, I'm a proposer/sponsor of heart project. >> >> I have no doubt that RDF can be stored in HBase because google also >> stores linked-data in their bigtable. >> >> However, If you want to focus on large-scale (distributed) processing, >> I would recommend you to read google pregel project (google's graph >> computing framework). because the SPARQL is a basically graph query >> language for RDF graph data. >> >> On Fri, Apr 2, 2010 at 7:09 AM, Jürgen Jakobitsch <[email protected]> >> wrote: >> > hi again, >> > >> > i'm definitly interested. >> > >> > you probably heard of the heart project, but there's hardly something >> going on, >> > so i think it's well worth the effort. >> > >> > for your discussion days i'd recommend taking a look at openrdf sail api >> > >> > @http://www.openrdf.org/doc/sesame2/system/ >> > >> > the point is that there is allready everything you need like query engine >> and the >> > like.. >> > to make it clear for beginning a quad store its close to perfect because >> it >> > actually comes down to implement the getStatements method as accurate as >> possible. >> > >> > the query engine does the same by parsing the sparql query and using the >> getStatements method. >> > >> > now this method simply has five arguments : >> > >> > subject, predicate, object, includeinferred and contexts, where subject >> predicate, object can >> > be null, includeinferred can be ignored for starting and contexts can >> also be null for a starter >> > or an array of uris. >> > >> > also note that the sail api is quite commonly used (virtuoso, >> openrdfsesame, neo4j, bigdata, even oracle has an old version, >> > we'll be having one implementation for talis and 4store in the coming >> weeks and of course my quadstore "tuqs") >> > >> > if you find the way to retrieve the triples (quads) from hbase i could >> implement a sail >> > store in a day - et voila ... >> > >> > anyways it would be nice if you keep me informed .. i'd really like to >> contribute... >> > >> > wkr www.turnguard.com >> > >> > >> > ----- Original Message ----- >> > From: "Amandeep Khurana" <[email protected]> >> > To: [email protected] >> > Sent: Thursday, April 1, 2010 11:45:00 PM >> > Subject: Re: Using SPARQL against HBase >> > >> > Andrew and I just had a chat about exploring how we can leverage HBase >> for a >> > scalable RDF store and we'll be looking at it in more detail over the >> next >> > few days. Is anyone of you interested in helping out? We are going to be >> > looking at what all is required to build a triple store + query engine on >> > HBase and how HBase can be used as is or remodeled to fit the problem. >> > Depending on what we find out, we'll decide on taking the project further >> > and committing efforts towards it. >> > >> > -Amandeep >> > >> > >> > Amandeep Khurana >> > Computer Science Graduate Student >> > University of California, Santa Cruz >> > >> > >> > On Thu, Apr 1, 2010 at 1:12 PM, Jürgen Jakobitsch <[email protected] >> >wrote: >> > >> >> hi, >> >> >> >> this sounds very interesting to me, i'm currently fiddling >> >> around with a suitable row and column setup for triples. >> >> >> >> i'm about to implement openrdf's sail api for hbase (i just did >> >> a lucene quad store implementation which is superfast a scales >> >> to a couple of hundreds of millions of triples ( >> http://turnguard.com/tuqs >> >> )) >> >> but i'm in my first days of hbase encounters, so my experience >> >> in row column design is manageable. >> >> >> >> from my point of view the problem is to really efficiantly store >> >> besides the triples themselves the contexts (named graphs) and >> >> languages of literal. >> >> >> >> by the way : i just did a small tablemanager (in beta) that lets >> >> you create htables -> from <- rdf (see >> >> http://sourceforge.net/projects/hbasetablemgr/) >> >> >> >> i'd be really happy to contribute on the rdf and sparql side, >> >> but certainly could need some help on the hbase table design side. >> >> >> >> wkr www.turnguard.com/turnguard >> >> >> >> >> >> >> >> ----- Original Message ----- >> >> From: "Raffi Basmajian" <[email protected]> >> >> To: [email protected], [email protected] >> >> Sent: Thursday, April 1, 2010 9:45:59 PM >> >> Subject: RE: Using SPARQL against HBase >> >> >> >> >> >> This is an interesting article from a few guys over at BBN/Raytheon. By >> >> storing triples in flat files theu used a custom algorithm, detailed in >> >> the article, to iterate the WHERE clause from a SPARQL query and reduce >> >> the map into the desired result. >> >> >> >> This is very similar to what I need to do; the only difference being >> >> that our data is stored in Hbase tables, not as triples in flat files. >> >> >> >> >> >> -----Original Message----- >> >> From: Amandeep Khurana [mailto:[email protected]] >> >> Sent: Wednesday, March 31, 2010 3:30 PM >> >> To: [email protected]; [email protected] >> >> Subject: Re: Using SPARQL against HBase >> >> >> >> Why do you need to build an in-memory graph which you would want to >> >> read/write to? You could store the graph in HBase directly. As pointed >> >> out, HBase might not be the best suited for SPARQL queries, but its not >> >> impossible to do. Using the triples, you can form a graph that can be >> >> represented in HBase as an adjacency list. I've stored graphs with >> >> 16-17M nodes which was data equivalent to about 600M triples. And this >> >> was on a small cluster and could certainly scale way more than 16M graph >> >> nodes. >> >> >> >> In case you are interested in working on SPARQL over HBase, we could >> >> collaborate on it... >> >> >> >> -ak >> >> >> >> >> >> Amandeep Khurana >> >> Computer Science Graduate Student >> >> University of California, Santa Cruz >> >> >> >> >> >> On Wed, Mar 31, 2010 at 11:56 AM, Andrew Purtell >> >> <[email protected]>wrote: >> >> >> >> > Hi Raffi, >> >> > >> >> > To read up on fundamentals I suggest Google's BigTable paper: >> >> > http://labs.google.com/papers/bigtable.html >> >> > >> >> > Detail on how HBase implements the BigTable architecture within the >> >> > Hadoop ecosystem can be found here: >> >> > >> >> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture >> >> > >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >> >> > >> >> > >> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-l >> >> > og.html >> >> > >> >> > Hope that helps, >> >> > >> >> > - Andy >> >> > >> >> > > From: Basmajian, Raffi <[email protected]> >> >> > > Subject: RE: Using SPARQL against HBase >> >> > > To: [email protected], [email protected] >> >> > > Date: Wednesday, March 31, 2010, 11:42 AM If Hbase can't respond to >> >> > > SPARQL-like queries, then what type of query language can it respond >> >> >> >> > > to? In a traditional RDBMS database one would use SQL; so what is >> >> > > the counterpart query language with Hbase? >> >> > >> >> > >> >> > >> >> > >> >> > >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> This e-mail transmission may contain information that is proprietary, >> >> privileged and/or confidential and is intended exclusively for the >> person(s) >> >> to whom it is addressed. Any use, copying, retention or disclosure by >> any >> >> person other than the intended recipient or the intended recipient's >> >> designees is strictly prohibited. If you are not the intended recipient >> or >> >> their designee, please notify the sender immediately by return e-mail >> and >> >> delete all copies. OppenheimerFunds may, at its sole discretion, >> monitor, >> >> review, retain and/or disclose the content of all email communications. >> >> >> >> >> ============================================================================== >> >> >> >> >> >> -- >> >> punkt. netServices >> >> ______________________________ >> >> Jürgen Jakobitsch >> >> Codeography >> >> >> >> Lerchenfelder Gürtel 43 Top 5/2 >> >> A - 1160 Wien >> >> Tel.: 01 / 897 41 22 - 29 >> >> Fax: 01 / 897 41 22 - 22 >> >> >> >> netServices http://www.punkt.at >> >> >> >> >> > >> > -- >> > punkt. netServices >> > ______________________________ >> > Jürgen Jakobitsch >> > Codeography >> > >> > Lerchenfelder Gürtel 43 Top 5/2 >> > A - 1160 Wien >> > Tel.: 01 / 897 41 22 - 29 >> > Fax: 01 / 897 41 22 - 22 >> > >> > netServices http://www.punkt.at >> > >> > >> >> >> >> -- >> Best Regards, Edward J. Yoon @ NHN, corp. >> [email protected] >> http://blog.udanax.org >> > -- Best Regards, Edward J. Yoon @ NHN, corp. [email protected] http://blog.udanax.org
