Re: Store input text after analyzers and token filters

2010-03-15 Thread JCodina

For solr 1.4
Is basically the same but  IndexSchema (org.apache.solr.schema.IndexSchema)
needs to be updated to include the function 
getFieldTypeByName(String fieldTypeName) which is already in sorl1.5

  /**
   * Given the name of a {...@link org.apache.solr.schema.FieldType} (not to be
confused with {...@link #getFieldType(String)} which
   * takes in the name of a field), return the {...@link
org.apache.solr.schema.FieldType}.
   * @param fieldTypeName The name of the {...@link
org.apache.solr.schema.FieldType}
   * @return The {...@link org.apache.solr.schema.FieldType} or null.
   */
  public FieldType getFieldTypeByName(String fieldTypeName){
return fieldTypes.get(fieldTypeName);
  }

then the AnalyzedField is a bit different, but basicaclly is a copy of the
TextField as is in 1.4

http://old.nabble.com/file/p27902273/AnalyzedField.java AnalyzedField.java 
-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27902273.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Store input text after analyzers and token filters

2010-03-09 Thread JCodina

Otis,
I've been thinking on it, and trying to figure out the different solutions
- Try to solve it doing a bridge between  solr and clustering.
- Try to solve it before/during indexing

The second option, of course is better for performance, but how to do it??

I think a good option may be to create a new type derived type from the
FieldType class
like the  SortableIntField which has the toInternal(String val) function.
Then the problem is how to include the result of the analysis of anoter
field type in the  toInternal function

So there would be a new type that can be used on copy fields , that takes
the analysis of the source
field and injects in the code. It takes as parameter the field from which
takes the analysis .

So, how can I get the result of the analysis of a given text by a given
field using internal functions??





Otis Gospodnetic wrote:
 
 Hi Joan,
 
 You could use the FieldAnalysisRequestHandler:
 http://www.search-lucene.com/?q=FieldAnalysisRequestHandler
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 
-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27840488.html
Sent from the Solr - User mailing list archive at Nabble.com.



Store input text after analyzers and token filters

2010-03-05 Thread JCodina


In an stored field, the content stored is the raw input text.
But when the analyzers perform some cleaning or interesting transformation
of the text, then it could be interesting to store the text after the
tokenizer/Filter chain
there is a way to do this? To be able to get back the text of the document
after being processed??

thanks
Joan
-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27792550.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Store input text after analyzers and token filters

2010-03-05 Thread Ahmet Arslan
 In an stored field, the content stored is the raw input
 text.
 But when the analyzers perform some cleaning or interesting
 transformation
 of the text, then it could be interesting to store the text
 after the
 tokenizer/Filter chain
 there is a way to do this? To be able to get back the text
 of the document
 after being processed??

You can get term vectors [1] of analyzed text.

Also you can see analyzed text in solr/admin/analysis.jsp if you copy and paste 
sample text data.

[1] http://wiki.apache.org/solr/TermVectorComponent 


  


Re: Store input text after analyzers and token filters

2010-03-05 Thread JCodina

Thanks,
It can be useful as a workarrond, 
but I get a vector not a result that I may use wherever I could used the
stored text. 
I'm thinking in clustering.


Ahmet Arslan wrote:
 
 In an stored field, the content stored is the raw input
 text.
 But when the analyzers perform some cleaning or interesting
 transformation
 of the text, then it could be interesting to store the text
 after the
 tokenizer/Filter chain
 there is a way to do this? To be able to get back the text
 of the document
 after being processed??
 
 You can get term vectors [1] of analyzed text.
 
 Also you can see analyzed text in solr/admin/analysis.jsp if you copy and
 paste sample text data.
 
 [1] http://wiki.apache.org/solr/TermVectorComponent 
 
 
   
 
 

-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27794689.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Store input text after analyzers and token filters

2010-03-05 Thread Otis Gospodnetic
Hi Joan,

You could use the FieldAnalysisRequestHandler: 
http://www.search-lucene.com/?q=FieldAnalysisRequestHandler

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: JCodina joan.cod...@barcelonamedia.org
 To: solr-user@lucene.apache.org
 Sent: Fri, March 5, 2010 6:01:37 AM
 Subject: Store input text after analyzers and token filters
 
 
 
 In an stored field, the content stored is the raw input text.
 But when the analyzers perform some cleaning or interesting transformation
 of the text, then it could be interesting to store the text after the
 tokenizer/Filter chain
 there is a way to do this? To be able to get back the text of the document
 after being processed??
 
 thanks
 Joan
 -- 
 View this message in context: 
 http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27792550.html
 Sent from the Solr - User mailing list archive at Nabble.com.