[jira] Updated: (SOLR-1316) Create autosuggest component

2009-12-15 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1316:


Attachment: suggest.patch

Updated patch:

 * removed the broken RadixTree,
 * changed Suggester and Lookup API so that they don't join the tokens - 
instead they will use whatever tokens are produced by the analyzer. For now 
results are merged into a single SpellingResult.

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.5

 Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1316) Create autosuggest component

2009-11-12 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1316:


Attachment: suggest.patch

Updated patch that includes the new TST sources. Tests on a 100k-words 
dictionary yield very similar results for the TST and Jaspell implementations, 
i.e. the initial build time is around 600ms, and then the lookup time is around 
4-7ms for prefixes that yield more than 100 results.

To test it put this in your solrconfig.xml:

{code:xml}
  searchComponent name=spellcheck class=solr.SpellCheckComponent
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.jaspell.JaspellLookup/str
  str name=fieldtext/str
  str name=sourceLocationamerican-english/str
/lst
  /searchComponent

...


{code}

And then use e.g. the following parameters:

{noformat}
spellcheck=truespellcheck.build=truespellcheck.dictionary=suggest \
spellcheck.extendedResults=truespellcheck.count=100q=test
{noformat}

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: suggest.patch, suggest.patch, TST.zip

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1316) Create autosuggest component

2009-09-20 Thread Ankul Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankul Garg updated SOLR-1316:
-

Attachment: TST.zip

Modified the code for returning a list of suggest keys. Andrez kindly update 
the same in your patch.

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: suggest.patch, TST.zip

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1316) Create autosuggest component

2009-09-20 Thread Ankul Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankul Garg updated SOLR-1316:
-

Attachment: (was: TernarySearchTree.tar.gz)

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: suggest.patch, TST.zip

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1316) Create autosuggest component

2009-09-20 Thread Ankul Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankul Garg updated SOLR-1316:
-

Attachment: TST.zip

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: suggest.patch, TST.zip

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1316) Create autosuggest component

2009-09-20 Thread Ankul Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankul Garg updated SOLR-1316:
-

Attachment: (was: TST.zip)

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: suggest.patch, TST.zip

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1316) Create autosuggest component

2009-09-19 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1316:


Attachment: suggest.patch

This is a very much work in progress, to get review before proceeding.

Highlights of this patch:

* created a set of interfaces in o.a.s.spelling.suggest to hide implementation 
details of various autocomplete mechanisms.

* imported sources of RadixTree, Jaspell TST and Ankul's TST. Wrapped each 
implementation so that it works with the same interface. (Ankul: I couldn't 
figure out how to actually retrieve suggested keys from your TST?)

* extended HighFrequencyDictionary to return TermFreqIterator, which gives not 
only words but also their frequencies. Implemented a similar iterator for 
file-based term-freq lists.

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: suggest.patch, TernarySearchTree.tar.gz

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1316) Create autosuggest component

2009-09-13 Thread Ankul Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankul Garg updated SOLR-1316:
-

Attachment: TernarySearchTree.tar.gz

Hi Jason,
My TST implementation is here. The zip contains 4 benchmarking results too : 
TST1.txt , TST2.txt etc.

The 4 datasets were as follows :
All words are real life words extracted from dbpedia dump.
1. The first dataset contains 1,00,000 tokens consisting of single words, 
phrases of two words and phrases of three words.
2. The second dataset contains 5,00,000 tokens consisting of single words, 
phrases of two words and phrases of three words.
3. The third dataset contains 10,00,000 tokens consisting of single words, 
phrases of two words and phrases of three words.
4. The fourth dataset contains 50,00,000 tokens consisting of single words, 
phrases of two words and phrases of three words.

These were the environment details while benchmarking :
Platfrom : Linux
java version 1.6.0_16
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
RAM : 16GiB
Java HeapSize : default

Is there any other way to balance the tree? Also, what's your progress?

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: TernarySearchTree.tar.gz

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.