RE: SmartChineseAnalyzer and stopwords.txt
Hello, Has anyone used SmartChineseAnalyzer to index & search Chinese content? I would like to discuss about few things. Best Regards, Sylvain De : Delbosc, Sylvain [mailto:sylvain.delb...@capgemini.com] Envoyé : jeudi 5 janvier 2012 14:02 À : solr-user@lucene.apache.org Cc : Delance, Quentin Objet : SmartChineseAnalyzer and stopwords.txt Hello, I would like to know how to use stopwords with SmartChineseAnalyzer. Following what is described at http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html it seems to be possible but I do not manage to make it work. Presently I am defining my analyzer like this but the stopwords.txt file located in the same directory as schema.xml does not seem to be taken into account. Has somebody managed to make this work? NB: I am using SolR 1.4 and I am using several cores. Best Regards, _ Sylvain DELBOSC/ Capgemini Sud / Toulouse Application Architect Senior / TIC - ADC Tel.: +33 5 61 31 55 70 / www.capgemini.com<http://www.capgemini.com/> Fax: +33 5 61 31 53 85 15, avenue du Docteur Grynfogel BP 53655 - 31036 Toulouse Cedex 1 [cid:image001.gif@01CCCBB1.E82858F0]Ensemble, libérons nos énergies. _ Capgemini is a trading name used by the Capgemini Group of companies which includes Capgemini Sud, registered in Toulouse, France (RCS 479 766 990) whose registered office is 15 avenue du Dr Grynfogel - BP 53655 - 31036 Toulouse cedex 1. [cid:image002.gif@01CCCBB1.E82858F0] This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
SmartChineseAnalyzer and stopwords.txt
Hello, I would like to know how to use stopwords with SmartChineseAnalyzer. Following what is described at http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html it seems to be possible but I do not manage to make it work. Presently I am defining my analyzer like this but the stopwords.txt file located in the same directory as schema.xml does not seem to be taken into account. Has somebody managed to make this work? NB: I am using SolR 1.4 and I am using several cores. Best Regards, _ Sylvain DELBOSC/ Capgemini Sud / Toulouse Application Architect Senior / TIC - ADC Tel.: +33 5 61 31 55 70 / www.capgemini.com<http://www.capgemini.com/> Fax: +33 5 61 31 53 85 15, avenue du Docteur Grynfogel BP 53655 - 31036 Toulouse Cedex 1 [cid:image001.gif@01CCCBB1.E82858F0]Ensemble, libérons nos énergies. _ Capgemini is a trading name used by the Capgemini Group of companies which includes Capgemini Sud, registered in Toulouse, France (RCS 479 766 990) whose registered office is 15 avenue du Dr Grynfogel - BP 53655 - 31036 Toulouse cedex 1. [cid:image002.gif@01CCCBB1.E82858F0] This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
Re: SmartChineseAnalyzer
: Subject: SmartChineseAnalyzer : References: : : : : In-Reply-To: : http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
SmartChineseAnalyzer
Hi all, I checked the documentation of SmartChineseAnalyzer, It looks like it is for Simplified Chinese Only. Does anyone tried to include Traditional Chinese characters also. As the analyzer is based on a dictionary from ICTCLAS1.0. My first thought is maybe i can get it work by simply convert the whole dictionary to Traditional Chinese? Btw, I checked ICTCLAS official website and it seems the newest version java library supports GB2312、GBK、UTF-8、BIG5. So I can expect a roadmap for SmartChineseAnalyzer to support BIG5 later? Anyone can show me some hint is much appreciated. Regards, Wayne
RE: Error while indexing using SmartChineseAnalyzer
Thanks for the reply Shalin. Posted the stack trace on the Jira issue SOLR-1336. -Kumar -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Tuesday, September 01, 2009 4:56 PM To: solr-user@lucene.apache.org Subject: Re: Error while indexing using SmartChineseAnalyzer On Tue, Sep 1, 2009 at 4:37 PM, Jana, Kumar Raja wrote: > Hi, > > I tried using the patch provided for Solr-1336 JIRA issue for > integrating Lucene's SmartChineseAnalyzer with Solr and tried testing it > out but I faced the AbstractMethodError during indexing as well as > Searching (stack trace below). > Questions on patches are best asked on the issue. Please post the stack trace to SOLR-1336. -- Regards, Shalin Shekhar Mangar.
Re: Error while indexing using SmartChineseAnalyzer
On Tue, Sep 1, 2009 at 4:37 PM, Jana, Kumar Raja wrote: > Hi, > > I tried using the patch provided for Solr-1336 JIRA issue for > integrating Lucene's SmartChineseAnalyzer with Solr and tried testing it > out but I faced the AbstractMethodError during indexing as well as > Searching (stack trace below). > Questions on patches are best asked on the issue. Please post the stack trace to SOLR-1336. -- Regards, Shalin Shekhar Mangar.
Error while indexing using SmartChineseAnalyzer
Hi, I tried using the patch provided for Solr-1336 JIRA issue for integrating Lucene's SmartChineseAnalyzer with Solr and tried testing it out but I faced the AbstractMethodError during indexing as well as Searching (stack trace below). There seems to be something wrong during the tokenization of the content. Can someone please tell me what I am doing wrong here? The Stack Trace SEVERE: java.lang.AbstractMethodError at org.apache.solr.analysis.TokenizerChain.tokenStream(TokenizerChain.java: 64) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.tokenStream(IndexSc hema.java:360) at org.apache.lucene.analysis.Analyzer.reusableTokenStream(Analyzer.java:44 ) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPer Field.java:123) at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocField ConsumersPerField.java:36) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFi eldProcessorPerThread.java:234) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.j ava:762) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.j ava:745) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2199 ) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2171 ) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2. java:218) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdate ProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) Thanks, Kumar