Re: Re: any analyzer will keep punctuation?
i think Ahmet is right. use WhiteSpace tokeniser will separate doc into token.and then you use custom filter can delete some punctuation you want to remove.Realization a custom filter is not very difficult. 380382...@qq.com 发件人: Yonghui Zhao 发送时间: 2017-03-08 12:22 收件人: Ahmet Arslan 抄送: java-user@lucene.apache.org 主题: Re: any analyzer will keep punctuation? Hi Ahmet, Thanks for your reply, but I didn't quite get your idea. I want to get an analyzer like standard analyzer but with punctuation customized. I think one way is customizing an analyzer with a customizer tokenizer like StandardTokenizer. In my tokenizer I will re-write StandardTokenizerImpl which seems a little complicate. I don't understand how "a customised word delimiter filter factory" works in tokenizer. 2017-03-06 22:26 GMT+08:00 Ahmet Arslan: > Hi Zhao, > > WhiteSpace tokeniser followed by a customised word delimiter filter > factory would be solution. > Please see types attribute of the word delimiter filter for customising > characters. > > ahmet > > > > On Monday, March 6, 2017 12:22 PM, Yonghui Zhao > wrote: > Yes whitespace analyzer will keep punctuation, but it only breaks word by > space. > > > I didn’t explain my requirement clearly. > > I want to an analyzer like standard analyzer but may keep some punctuation > configured. > > > 2017-03-06 18:03 GMT+08:00 Ahmet Arslan : > > > Hi, > > > > Whitespace analyser/tokenizer for example. > > > > Ahmet > > > > > > > > On Monday, March 6, 2017 10:21 AM, Yonghui Zhao > > wrote: > > Lucene standard anlyzer will remove almost all punctuation. > > In some cases, we want to keep some punctuation, for example in music > > search, some singer name and album name could be a punctuation. > > > > Is there any analyzer that we can customized punctuation to be removed? > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >
Re: [ANNOUNCE] Apache Solr 6.4.2 released
FYI, the http://lucene.apache.org/solr/mirrors-solr-latest-redir.html link redirects to http://www.apache.org/dyn/closer.lua/lucene/solr/6.4.1 and not http://www.apache.org/dyn/closer.lua/lucene/solr/6.4.2 On 8 March 2017 at 01:00, Ishan Chattopadhyayawrote: > 7 March 2017, Apache Solr 6.4.2 available > > Solr is the popular, blazing fast, open source NoSQL search platform from > the Apache Lucene project. Its major features include powerful full-text > search, hit highlighting, faceted search and analytics, rich document > parsing, geospatial search, extensive REST APIs as well as parallel SQL. > Solr is enterprise grade, secure and highly scalable, providing fault > tolerant distributed search and indexing, and powers the search and > navigation features of many of the world's largest internet sites. > > Solr 6.4.2 is available for immediate download at: > >- > >http://lucene.apache.org/solr/mirrors-solr-latest-redir.html > > Please read CHANGES.txt for a full list of new features and changes: > >- > >https://lucene.apache.org/solr/6_4_2/changes/Changes.html > > Solr 6.4.2 contains 4 bug fixes since the 6.4.1 release: > >- > >Serious performance degradation in Solr 6.4 due to the metrics >collection. IndexWriter metrics collection turned off by default, > directory >level metrics collection completely removed (until a better design is >found) >- > >Transaction log replay can hit an NullPointerException due to new >Metrics code >- > >NullPointerException in CloudSolrClient when reading stale alias >- > >UnifiedHighlighter and PostingsHighlighter bug in PrefixQuery and >TermRangeQuery for multi-byte text > > Further details of changes are available in the change log available at: > http://lucene.apache.org/solr/6_4_2/changes/Changes.html > > Please report any feedback to the mailing lists ( > http://lucene.apache.org/solr/discussion.html) > Note: The Apache Software Foundation uses an extensive mirroring network > for distributing releases. It is possible that the mirror you are using may > not have replicated the release yet. If that is the case, please try > another mirror. This also applies to Maven access. >
Re: any analyzer will keep punctuation?
Hi Ahmet, Thanks for your reply, but I didn't quite get your idea. I want to get an analyzer like standard analyzer but with punctuation customized. I think one way is customizing an analyzer with a customizer tokenizer like StandardTokenizer. In my tokenizer I will re-write StandardTokenizerImpl which seems a little complicate. I don't understand how "a customised word delimiter filter factory" works in tokenizer. 2017-03-06 22:26 GMT+08:00 Ahmet Arslan: > Hi Zhao, > > WhiteSpace tokeniser followed by a customised word delimiter filter > factory would be solution. > Please see types attribute of the word delimiter filter for customising > characters. > > ahmet > > > > On Monday, March 6, 2017 12:22 PM, Yonghui Zhao > wrote: > Yes whitespace analyzer will keep punctuation, but it only breaks word by > space. > > > I didn’t explain my requirement clearly. > > I want to an analyzer like standard analyzer but may keep some punctuation > configured. > > > 2017-03-06 18:03 GMT+08:00 Ahmet Arslan : > > > Hi, > > > > Whitespace analyser/tokenizer for example. > > > > Ahmet > > > > > > > > On Monday, March 6, 2017 10:21 AM, Yonghui Zhao > > wrote: > > Lucene standard anlyzer will remove almost all punctuation. > > In some cases, we want to keep some punctuation, for example in music > > search, some singer name and album name could be a punctuation. > > > > Is there any analyzer that we can customized punctuation to be removed? > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >
[ANNOUNCE] Apache Solr 6.4.2 released
7 March 2017, Apache Solr 6.4.2 available Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search and analytics, rich document parsing, geospatial search, extensive REST APIs as well as parallel SQL. Solr is enterprise grade, secure and highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 6.4.2 is available for immediate download at: - http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Please read CHANGES.txt for a full list of new features and changes: - https://lucene.apache.org/solr/6_4_2/changes/Changes.html Solr 6.4.2 contains 4 bug fixes since the 6.4.1 release: - Serious performance degradation in Solr 6.4 due to the metrics collection. IndexWriter metrics collection turned off by default, directory level metrics collection completely removed (until a better design is found) - Transaction log replay can hit an NullPointerException due to new Metrics code - NullPointerException in CloudSolrClient when reading stale alias - UnifiedHighlighter and PostingsHighlighter bug in PrefixQuery and TermRangeQuery for multi-byte text Further details of changes are available in the change log available at: http://lucene.apache.org/solr/6_4_2/changes/Changes.html Please report any feedback to the mailing lists ( http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also applies to Maven access.
[ANNOUNCE] Apache Lucene 6.4.2 released
7 March 2017, Apache Lucene™ 6.4.2 available The Lucene PMC is pleased to announce the release of Apache Lucene 6.4.2 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This release contains 1 bug fix since the 6.4.1 release: - CommonGramsQueryFilter was producing a disconnected token graph, messing up phrase queries during query parsing The release is available for immediate download at: - http://www.apache.org/dyn/closer.lua/lucene/java/6.4.2 Please read CHANGES.txt for a full list of new features and changes: - https://lucene.apache.org/core/6_4_2/changes/Changes.html Please report any feedback to the mailing lists ( http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access.