[ 
http://jira.dspace.org/jira/browse/DS-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=10417#action_10417
 ] 

Mark Diggory commented on DS-208:
---------------------------------

Stuart, you are correct to a degree...

I would be cautious about storing the full text, unless your planning on 
presenting fragments of it in context like Google, it is going to create a very 
large index. And indexing is going to be come very memory intensive is you are 
pulling the full text into strings, we will begin to risk out of memory errors 
if the indexing process is not streamed using readers (I did the rewrite to 
optimize this when I first started at MIT).

You might experiment with using the setter for value after constructing the 
Field with some default string like....

Field field = new Field(String name, "junk" , Field.Store.Yes, 
Field.Index.TOKENIZED)
field.setValue(Reader value)

You might be able to get away with a reader parsed tokenized stored field then 
(but I don't know how much more efficient that may be)

http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/document/Field.html#setValue(java.io.Reader)


p.s. How are you highlighting when the presented values for the search results 
are pulled from the Item metadata directly? (that is a loaded question, I'm 
hoping your answer is, we don't use the metadata for the item directly anymore 
and render the lucene record directly with hit highlighting present?!) ;-)





> Make the fulltext indexes configurable
> --------------------------------------
>
>                 Key: DS-208
>                 URL: http://jira.dspace.org/jira/browse/DS-208
>             Project: DSpace 1.x
>          Issue Type: Improvement
>          Components: DSpace API
>    Affects Versions: 1.5.2
>            Reporter: Andrea Bollini
>            Assignee: Andrea Bollini
>         Attachments: DS-208-configure-fulltext-indexes.patch
>
>
> This patch allow the user to configure the index name where the extracted 
> text from the bitstream (fulltext) will be stored.
> More then one index is allowed and if the configuration is missing the 
> "default" index name is used for backward compatibility.
> Documentation update is included, please take a look if possible

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to