Re: ported lucandra: lucene index on HBase

TuX RaceR Mon, 19 Apr 2010 01:06:52 -0700

Hi Thomas,

Thanks for sharing your code for lucehbase.
The schema you used  seems the same as the one use in lucandra:


-------------------
*Documents Ids are currently random and autogenerated.

*Term keys and Document Keys are encoded as follows (using a randombinary delimiter)


     Term Key                     col name         value
     "index_name/field/term" => { documentId , position vector }

     Document Key
     "index_name/documentId" => { fieldName , value }
--------------------

I have two questions:

1) for a given term key, the number of column can get potentially verylarge. Have you tried another schema where the document id is put in thekey, i.e.:

Term Key colname value

     "index_name/field/term/docid" => { info , position vector }

That way you get trivial paging in the case where a lot of documentscontain the term.

2) once you get the list of docids, to get the document details (i.e thepairs { fieldName , value }), you will trigger a lot of random accessqueries to Hbase (where in 1, with the alternative schema"index_name/field/term/docid" you open a scanner and with the schema"index_name/field/term" you just get one row). I am wondering how youcan get fast answers that way. If you have few fields, would it be agood idea to store also the values in the index (only the alternativeschema "index_name/field/term/docid" allows this)?


Thanks
TuX



Thomas Koch wrote:

Hi,

Lucandra stores a lucene index on cassandra:
http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend
As the author of lucandra writes: "I’m sure something similar could be builton hbase."
So here it is:
http://github.com/thkoch2001/lucehbase
This is only a first prototype which has not been tested on anything real yet.But if you're interested, please join me to get it production ready!
I propose to keep this thread on hbase-user and java-dev only.
Would it make sense to aim this project to become an hbase contrib? Or alucene contrib?
Best regards,

Thomas Koch, http://www.koch.ro

Re: ported lucandra: lucene index on HBase

Reply via email to