RE: SmartChineseAnalyzer and stopwords.txt
Hello, Has anyone used SmartChineseAnalyzer to index search Chinese content? I would like to discuss about few things. Best Regards, Sylvain De : Delbosc, Sylvain [mailto:sylvain.delb...@capgemini.com] Envoyé : jeudi 5 janvier 2012 14:02 À : solr-user@lucene.apache.org Cc : Delance, Quentin Objet : SmartChineseAnalyzer and stopwords.txt Hello, I would like to know how to use stopwords with SmartChineseAnalyzer. Following what is described at http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html it seems to be possible but I do not manage to make it work. Presently I am defining my analyzer like this but the stopwords.txt file located in the same directory as schema.xml does not seem to be taken into account. analyzer class=org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer/ Has somebody managed to make this work? NB: I am using SolR 1.4 and I am using several cores. Best Regards, _ Sylvain DELBOSC/ Capgemini Sud / Toulouse Application Architect Senior / TIC - ADC Tel.: +33 5 61 31 55 70 / www.capgemini.comhttp://www.capgemini.com/ Fax: +33 5 61 31 53 85 15, avenue du Docteur Grynfogel BP 53655 - 31036 Toulouse Cedex 1 [cid:image001.gif@01CCCBB1.E82858F0]Ensemble, libérons nos énergies. _ Capgemini is a trading name used by the Capgemini Group of companies which includes Capgemini Sud, registered in Toulouse, France (RCS 479 766 990) whose registered office is 15 avenue du Dr Grynfogel - BP 53655 - 31036 Toulouse cedex 1. [cid:image002.gif@01CCCBB1.E82858F0] This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Re: Indexing Failed.Rolled back all changes Issue
On Fri, Jan 6, 2012 at 12:28 PM, Rajdeep Alapati rajdeep.alap...@benefitfocus.com wrote: Hi, I am new to this SOLR.I was digging data import request handler for past few days and now i am doing some poc after i download the solr server. [...] The dataimport.properties file should be created by DIH on completion of indexing. This should be a problem only if that file was not writeable by the user that the web interface runs as. In any case, this should only generate a logged error, and the only problem should be that this would hamper future delta-imports. There are probably other errors in your Solr log file. Please share with us such errors, and your data-config.xml. Regards, Gora
Implementing complex token matching algo using solr
Hi All, Problem Description I'm trying to implement a custom algorithm to match user provided free-text input, a company name such as Ford Motor, against a reference data source consisting of 1.4 million company names. The algorithm executes following steps: Step 1) Performs an Exact Match, followed by Begins Match and finally Contains Match of user provided search input. Results from this step are also sorted in the same order. Step 2) Performs a token by token match of search input with reference company name. Every token is matched in following order: Exact, Begins, Contains, Levenshtein Distance ( 0.2) and Refined Soundex. E.g. If user input is Foord Motur Holding and it's being matched against The Ford Motor Holdings Company then first token Foord will match Ford based on Soundex match, second token Motur will match Motor based on Edit Distance Algo and and last token Holding will match Holdings via Begins match. Scoring: Every token match is first scored on a scale that rates the matching technique, with Exact match being the best and Soundex being the worst. The overall score is calculated, on a scale of 0-100%, by calculating a weighted average of individual token-match scores. Weights are assigned based on index-order of token i.e. the first token has highest weight and last token has lowest. My Partial Solution I have implemented a simple schema in solr to store referance company names. A String field (called companyName), a simple text field (called as companyText) copied from string and another text field (called as companySoundex) copied from string and using PhoneticFilterFactory for Refined Soundex based matching. I have been able to replicate step 1) in a single solr query. For step 2) I plan to fire 3 parallel queries to solr server. First query performing a simple text search on companyText field, second query performing fuzzy match using ~ operator on companyText field and third query performing soundex match on companySoundex field. I plan to somehow combine the results from these 3 parallel queries to get desired final result. Questions: 1) Is there a better way to replicate Step 2) of original algorithm? 2) Even if I go with my three-parallel-queries approach then how to get the right sorting order as I get in the original algorithm ? I guess the main problem is how to compare the solr scores from these 3 entirely different queries to do the final combining of results Thanks for reading this long question. Any help/pointers would be greatly appreciated. -- Thanks
question
Hi Everybody, How can we read e-mail content in PST file of the storage of Microsoft outlook. It is not the e-mail at the exchange server. Thanks Regards Shu (Steve) Chen Tel: 425-818-0568 Fax: 425-641-8908 Cell: 425-785-9971