[jira] Updated: (SOLR-651) A SearchComponent for fetching TF-IDF values

2008-09-24 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-651:
-

Attachment: SOLR-651.patch

Here's a start at making this support distributed.  Still needs testing.  I'm 
not sure if I'm doing the distributed right, but there ain't a whole lot of 
docs on it just yet, so I'm going based off of what I see in the other 
components.  I'm especially not clear if I am understanding the stages 
correctly.

Also, would be handy if there was a better way of testing the distributed 
stuff.  So far, I call directly into the component to call distributedProcess, 
but would also be nice to have a harness that does what TestDistributedSearch 
does (i.e. setup a couple of Jetty instances and actually run them)
  

 A SearchComponent for fetching TF-IDF values
 

 Key: SOLR-651
 URL: https://issues.apache.org/jira/browse/SOLR-651
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-651.patch, SOLR-651.patch, SOLR-651.patch


 A SearchComponent that can return TF-IDF vector for any given document in the 
 SOLR index
 Query : A Document Number / a query identifying a Document
 Response :  A Map of term vs.TF-IDF value of every term in the Selected
 Document
 Why ?
 Most of the Machine Learning Algorithms work on TFIDF representation of
 documents, hence adding a Request Handler proving the TFIDF representation
 will pave the way for incorporating Learning Paradigms to SOLR framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-651) A SearchComponent for fetching TF-IDF values

2008-09-10 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-651:
-

Attachment: SOLR-651.patch

Addresses Noble's thoughts.

 A SearchComponent for fetching TF-IDF values
 

 Key: SOLR-651
 URL: https://issues.apache.org/jira/browse/SOLR-651
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-651.patch, SOLR-651.patch


 A SearchComponent that can return TF-IDF vector for any given document in the 
 SOLR index
 Query : A Document Number / a query identifying a Document
 Response :  A Map of term vs.TF-IDF value of every term in the Selected
 Document
 Why ?
 Most of the Machine Learning Algorithms work on TFIDF representation of
 documents, hence adding a Request Handler proving the TFIDF representation
 will pave the way for incorporating Learning Paradigms to SOLR framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-651) A SearchComponent for fetching TF-IDF values

2008-09-04 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-651:
-

Attachment: SOLR-651.patch

Here's a first crack at this.  It still needs more unit tests to exercise the 
various combination of options, but I think it is a reasonable first crack at 
the idea.

Questions to be answered/things to still do:
1. How do people like the format for output?  It's basically broken down by 
doc, then field, then term, then term information,  See the unit tests for some 
samples
2. Would be good to have a more efficient lookup for IDF.  At a minimum, a 
cache of IDF values would be useful, but the memory would need to be 
controlled.  Lucene may do some caching under the hood, so that should be 
investigated more
3.  It relies on the query component doing it's thing.  That is, you send in a 
query, start and rows, and this component just loops over the doc list and 
fetches.  I could see a case for doing things separately, but that seems like 
duplication.  People using this can just send explicit queries designed for 
this Component.
4. Probably needs some error handling for documents that don't have term 
vectors, but haven't tested yet.



 A SearchComponent for fetching TF-IDF values
 

 Key: SOLR-651
 URL: https://issues.apache.org/jira/browse/SOLR-651
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-651.patch


 A SearchComponent that can return TF-IDF vector for any given document in the 
 SOLR index
 Query : A Document Number / a query identifying a Document
 Response :  A Map of term vs.TF-IDF value of every term in the Selected
 Document
 Why ?
 Most of the Machine Learning Algorithms work on TFIDF representation of
 documents, hence adding a Request Handler proving the TFIDF representation
 will pave the way for incorporating Learning Paradigms to SOLR framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-651) A SearchComponent for fetching TF-IDF values

2008-08-06 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-651:
-

Fix Version/s: 1.4

 A SearchComponent for fetching TF-IDF values
 

 Key: SOLR-651
 URL: https://issues.apache.org/jira/browse/SOLR-651
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4


 A SearchComponent that can return TF-IDF vector for any given document in the 
 SOLR index
 Query : A Document Number / a query identifying a Document
 Response :  A Map of term vs.TF-IDF value of every term in the Selected
 Document
 Why ?
 Most of the Machine Learning Algorithms work on TFIDF representation of
 documents, hence adding a Request Handler proving the TFIDF representation
 will pave the way for incorporating Learning Paradigms to SOLR framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.