[jira] Updated: (HADOOP-1913) [HBase] Build a Lucene index on an HBase table

Ning Li (JIRA) Tue, 18 Sep 2007 07:20:18 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ning Li updated HADOOP-1913:
----------------------------

    Attachment: build_table_index.take2.again.patch

> Pardon me Ning for being a bit thick but I do not see an example of per 
> column config. in BuildTableIndex.  I see parsing of command line and passing 
> of a list of column names to IdentityTableMap but not an example of 
> per-column config. as a property value of an hbase config.  Do you mean the 
> XML in TestTableIndex?  If so, its not clear how you do config. for columns 
> 2, 3, etc.  Perhaps you could provide an example here in the issue

You are right. I meant the example in TestTableIndex. Here is an example with 
multiple columns:

  <configuration>
    <column>
      <property><name>hbase.column.name</name><value>column1</value></property>
      <property><name>hbase.column.store</name><value>true</value></property>
      <property><name>hbase.column.index</name><value>true</value></property>
      
<property><name>hbase.column.tokenize</name><value>false</value></property>
      <property><name>hbase.column.boost</name><value>3</value></property>
      
<property><name>hbase.column.omit.norms</name><value>false</value></property>
    </column>
    <column>
      <property><name>hbase.column.name</name><value>column2</value></property>
      <property><name>hbase.column.store</name><value>false</value></property>
      <property><name>hbase.column.index</name><value>true</value></property>
      <property><name>hbase.column.tokenize</name><value>true</value></property>
    </column>
    <property><name>hbase.index.rowkey.name</name><value>KEY</value></property>
    
<property><name>hbase.index.max.buffered.docs</name><value>500</value></property>
    
<property><name>hbase.index.max.field.length</name><value>10000</value></property>
    <property><name>hbase.index.merge.factor</name><value>10</value></property>
    
<property><name>hbase.index.use.compound.file</name><value>true</value></property>
    <property><name>hbase.index.optimize</name><value>true</value></property>
  </configuration>

> Take2 seems to be mangled:

:( I just tried and it works for me. I rerolled it anyway and here it is.

> [HBase] Build a Lucene index on an HBase table
> ----------------------------------------------
>
>                 Key: HADOOP-1913
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1913
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: Ning Li
>            Priority: Minor
>         Attachments: build_table_index.patch, 
> build_table_index.take2.again.patch, build_table_index.take2.patch
>
>
> This patch provides a Reducer class and other related classes which help to 
> build a Lucene index on an HBase table. The index build part is similar to 
> that of Nutch.
>   - Each row is modeled as a Lucene document: row key is indexed in its 
> untokenized form, column name-value pairs are Lucene field name-value pairs.
>   - IndexConf is used to configure various Lucene parameters, specify whether 
> to optimize an index and which columns to index and/or store, in tokenized or 
> untokenized form, etc.
>   - The number of reduce tasks decides the number of indexes (partitions). 
> The index(es) is stored in the output path of job configuration.
>   - The index build process is done in the reduce phase. Users can use the 
> map phase to join rows from different tables or to pre-parse/analyze column 
> content, etc.
>   - A junit test is added to test the build of an index on an HBase table 
> with an identity mapper. It also serves as an example on how to use the new 
> classes.
>   - BuildTableIndex is provided to help building an index on an HBase table. 
> It should be moved to examples package if HBase decides to have one.
> This patch requires the inclusion of the Lucene library.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1913) [HBase] Build a Lucene index on an HBase table

Reply via email to