How to retrieve tokens?

2012-02-23 Thread Thiago
Hi to everybody,

My name is Thiago and I'm new with Apache Solr and NoSQL databases. At the
moment, I'm working and using Solr for document indexing. My Question is: Is
there any way to retrieve the tokens in place of the original data?

For example:
I have a field using the fieldtype text_general from the original
schema.xml. If I insert a document with the following string in this field:
All you need is love, the tokens that I get are: all, you, need, love.
When I search in this base, I want to get the tokens(all, you, need, love)
in place of the indexed string.

I searched for this in the web and in this forum too, but I saw some people
saying to use TermVectorsComponent. Is there any way more easy to do it? As
I saw, TermVectorsComponent is more difficult and use more memory.

Thanks to everybody.

Thiago


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-retrieve-tokens-tp3770007p3770007.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to retrieve tokens?

2012-02-23 Thread Erick Erickson
Essentially, you're talking about reconstructing the field from the
tokens, and that's pretty difficult in general and lossy. For instance,
if you use stemming and running gets stemmed to run, you
get back just run from the index. Is that acceptable?

But otherwise, you've got to go into the low levels of Lucene to
get this info, and reassembling it is lengthy, I suspect you'd find
that performance was unacceptable.

Why do you want to do this? This may be an XY problem.
http://people.apache.org/~hossman/#xyproblem

Best
Erick

On Thu, Feb 23, 2012 at 10:22 AM, Thiago thiagosousasilve...@gmail.com wrote:
 Hi to everybody,

 My name is Thiago and I'm new with Apache Solr and NoSQL databases. At the
 moment, I'm working and using Solr for document indexing. My Question is: Is
 there any way to retrieve the tokens in place of the original data?

 For example:
 I have a field using the fieldtype text_general from the original
 schema.xml. If I insert a document with the following string in this field:
 All you need is love, the tokens that I get are: all, you, need, love.
 When I search in this base, I want to get the tokens(all, you, need, love)
 in place of the indexed string.

 I searched for this in the web and in this forum too, but I saw some people
 saying to use TermVectorsComponent. Is there any way more easy to do it? As
 I saw, TermVectorsComponent is more difficult and use more memory.

 Thanks to everybody.

 Thiago


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-retrieve-tokens-tp3770007p3770007.html
 Sent from the Solr - User mailing list archive at Nabble.com.