Re: Re: any analyzer will keep punctuation?

2017-03-07 Thread 380382...@qq.com
i think Ahmet is right. use WhiteSpace tokeniser will separate doc into 
token.and then you use custom filter can delete some punctuation you want to 
remove.Realization a custom filter is not very difficult.  



380382...@qq.com
 
发件人: Yonghui Zhao
发送时间: 2017-03-08 12:22
收件人: Ahmet Arslan
抄送: java-user@lucene.apache.org
主题: Re: any analyzer will keep punctuation?
Hi Ahmet,
 
Thanks for your reply, but I didn't quite get your idea.
I want to get an analyzer like standard analyzer but with punctuation
customized.
I think one way is customizing an analyzer  with a customizer  tokenizer
like StandardTokenizer.
In my tokenizer I will re-write StandardTokenizerImpl which seems a little
complicate.
I don't understand how "a customised word delimiter filter factory" works
in tokenizer.
 
 
2017-03-06 22:26 GMT+08:00 Ahmet Arslan :
 
> Hi Zhao,
>
> WhiteSpace tokeniser followed by a customised word delimiter filter
> factory would be solution.
> Please see types attribute of the word delimiter filter for customising
> characters.
>
> ahmet
>
>
>
> On Monday, March 6, 2017 12:22 PM, Yonghui Zhao 
> wrote:
> Yes whitespace analyzer will keep punctuation, but it only breaks word by
> space.
>
>
> I didn’t explain my requirement clearly.
>
> I want to an analyzer like standard analyzer but may keep some punctuation
> configured.
>
>
> 2017-03-06 18:03 GMT+08:00 Ahmet Arslan :
>
> > Hi,
> >
> > Whitespace analyser/tokenizer for example.
> >
> > Ahmet
> >
> >
> >
> > On Monday, March 6, 2017 10:21 AM, Yonghui Zhao 
> > wrote:
> > Lucene standard anlyzer will remove almost all punctuation.
> > In some cases, we want to keep some punctuation, for example in music
> > search, some singer name and album name could be a punctuation.
> >
> > Is there any analyzer that we can customized punctuation to be removed?
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-07 Thread Sahil Agarwal
​FYI, the http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
 link redirects to http://www.apache.org/dyn/closer.lua/lucene/solr/6.4.1
and not http://www.apache.org/dyn/closer.lua/lucene/solr/6.4.2​

On 8 March 2017 at 01:00, Ishan Chattopadhyaya 
wrote:

> 7 March 2017, Apache Solr 6.4.2 available
>
> Solr is the popular, blazing fast, open source NoSQL search platform from
> the Apache Lucene project. Its major features include powerful full-text
> search, hit highlighting, faceted search and analytics, rich document
> parsing, geospatial search, extensive REST APIs as well as parallel SQL.
> Solr is enterprise grade, secure and highly scalable, providing fault
> tolerant distributed search and indexing, and powers the search and
> navigation features of many of the world's largest internet sites.
>
> Solr 6.4.2 is available for immediate download at:
>
>-
>
>http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>
> Please read CHANGES.txt for a full list of new features and changes:
>
>-
>
>https://lucene.apache.org/solr/6_4_2/changes/Changes.html
>
> Solr 6.4.2 contains 4 bug fixes since the 6.4.1 release:
>
>-
>
>Serious performance degradation in Solr 6.4 due to the metrics
>collection. IndexWriter metrics collection turned off by default,
> directory
>level metrics collection completely removed (until a better design is
>found)
>-
>
>Transaction log replay can hit an NullPointerException due to new
>Metrics code
>-
>
>NullPointerException in CloudSolrClient when reading stale alias
>-
>
>UnifiedHighlighter and PostingsHighlighter bug in PrefixQuery and
>TermRangeQuery for multi-byte text
>
> Further details of changes are available in the change log available at:
> http://lucene.apache.org/solr/6_4_2/changes/Changes.html
>
> Please report any feedback to the mailing lists (
> http://lucene.apache.org/solr/discussion.html)
> Note: The Apache Software Foundation uses an extensive mirroring network
> for distributing releases. It is possible that the mirror you are using may
> not have replicated the release yet. If that is the case, please try
> another mirror. This also applies to Maven access.
>


Re: any analyzer will keep punctuation?

2017-03-07 Thread Yonghui Zhao
Hi Ahmet,

Thanks for your reply, but I didn't quite get your idea.
I want to get an analyzer like standard analyzer but with punctuation
customized.
I think one way is customizing an analyzer  with a customizer  tokenizer
like StandardTokenizer.
In my tokenizer I will re-write StandardTokenizerImpl which seems a little
complicate.
I don't understand how "a customised word delimiter filter factory" works
in tokenizer.


2017-03-06 22:26 GMT+08:00 Ahmet Arslan :

> Hi Zhao,
>
> WhiteSpace tokeniser followed by a customised word delimiter filter
> factory would be solution.
> Please see types attribute of the word delimiter filter for customising
> characters.
>
> ahmet
>
>
>
> On Monday, March 6, 2017 12:22 PM, Yonghui Zhao 
> wrote:
> Yes whitespace analyzer will keep punctuation, but it only breaks word by
> space.
>
>
> I didn’t explain my requirement clearly.
>
> I want to an analyzer like standard analyzer but may keep some punctuation
> configured.
>
>
> 2017-03-06 18:03 GMT+08:00 Ahmet Arslan :
>
> > Hi,
> >
> > Whitespace analyser/tokenizer for example.
> >
> > Ahmet
> >
> >
> >
> > On Monday, March 6, 2017 10:21 AM, Yonghui Zhao 
> > wrote:
> > Lucene standard anlyzer will remove almost all punctuation.
> > In some cases, we want to keep some punctuation, for example in music
> > search, some singer name and album name could be a punctuation.
> >
> > Is there any analyzer that we can customized punctuation to be removed?
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


[ANNOUNCE] Apache Solr 6.4.2 released

2017-03-07 Thread Ishan Chattopadhyaya
7 March 2017, Apache Solr 6.4.2 available

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

Solr 6.4.2 is available for immediate download at:

   -

   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Please read CHANGES.txt for a full list of new features and changes:

   -

   https://lucene.apache.org/solr/6_4_2/changes/Changes.html

Solr 6.4.2 contains 4 bug fixes since the 6.4.1 release:

   -

   Serious performance degradation in Solr 6.4 due to the metrics
   collection. IndexWriter metrics collection turned off by default, directory
   level metrics collection completely removed (until a better design is
   found)
   -

   Transaction log replay can hit an NullPointerException due to new
   Metrics code
   -

   NullPointerException in CloudSolrClient when reading stale alias
   -

   UnifiedHighlighter and PostingsHighlighter bug in PrefixQuery and
   TermRangeQuery for multi-byte text

Further details of changes are available in the change log available at:
http://lucene.apache.org/solr/6_4_2/changes/Changes.html

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)
Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.


[ANNOUNCE] Apache Lucene 6.4.2 released

2017-03-07 Thread Ishan Chattopadhyaya
7 March 2017, Apache Lucene™ 6.4.2 available

The Lucene PMC is pleased to announce the release of Apache Lucene 6.4.2

Apache Lucene is a high-performance, full-featured text search engine library
written entirely in Java. It is a technology suitable for nearly any
application that requires full-text search, especially cross-platform.

This release contains 1 bug fix since the 6.4.1 release:

   -

   CommonGramsQueryFilter was producing a disconnected token graph, messing
   up phrase queries during query parsing

The release is available for immediate download at:

   -

   http://www.apache.org/dyn/closer.lua/lucene/java/6.4.2

Please read CHANGES.txt for a full list of new features and changes:

   -

   https://lucene.apache.org/core/6_4_2/changes/Changes.html

Please report any feedback to the mailing lists (
http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases. It is possible that the mirror you are using may not
have replicated the release yet. If that is the case, please try another
mirror. This also goes for Maven access.