[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037727#comment-17037727 ] Xin-Chun Zhang commented on LUCENE-9136: Hi, [~jtibshirani], thanks for your suggestions! ??"I wonder if this clustering-based approach could fit more closely in the current search framework. In the current prototype, we keep all the cluster information on-heap. We could instead try storing each cluster as its own 'term' with a postings list. The kNN query would then be modelled as an 'OR' over these terms."?? In the previous implementation ([https://github.com/irvingzhang/lucene-solr/commit/eb5f79ea7a705595821f73f80a0c5752061869b2]), the cluster information is divided into two parts – meta (.ifi) and data(.ifd) as shown in the following figure, where each cluster with a postings list is stored in the data file (.ifd) and not kept on-heap. A major concern of this implementation is its reading performance of cluster data since reading is a very frequent behavior on kNN search. I will test and check the performance. !image-2020-02-16-15-05-02-451.png! ??"Because of this concern, it could be nice to include benchmarks for index time (in addition to QPS)..."?? Many thanks! I will check the links you mentioned and consider optimize the clustering cost. In addition, more benchmarks will be added soon. > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png, > image-2020-02-16-15-05-02-451.png > > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: image-2020-02-16-15-05-02-451.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png, > image-2020-02-16-15-05-02-451.png > > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: (was: image-2020-02-16-14-36-54-478.png) > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png, > image-2020-02-16-15-05-02-451.png > > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: image-2020-02-16-14-36-54-478.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png, > image-2020-02-16-14-36-54-478.png > > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037713#comment-17037713 ] ASF subversion and git services commented on LUCENE-9220: - Commit 8ced733fc3a2b7db61b6d96e5399ae2a2918d3ba in lucene-solr's branch refs/heads/jira/LUCENE-9220 from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8ced733 ] LUCENE-9220: teach gradle the new snowball-generated boilerplate, too > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch, > snowball_53739a805cfa6c.patch, snowball_53739a805cfa6c.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037711#comment-17037711 ] Robert Muir commented on LUCENE-9220: - stemmers, stopwords, and tests are currently generated by the script. Since it requires cloning 3 other repos (snowball, snowball-data, snowball-website) I will aim to move the script's logic to be executed under gradle next. It will not be portable or be in the groovy language. > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch, > snowball_53739a805cfa6c.patch, snowball_53739a805cfa6c.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9220: Attachment: snowball_53739a805cfa6c.patch > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch, > snowball_53739a805cfa6c.patch, snowball_53739a805cfa6c.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037709#comment-17037709 ] ASF subversion and git services commented on LUCENE-9220: - Commit 03aebecf98acab31c608dbcdbf8b5c038c3c02f7 in lucene-solr's branch refs/heads/jira/LUCENE-9220 from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=03aebec ] LUCENE-9220: regenerate all snowball stopfiles > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch, > snowball_53739a805cfa6c.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9220: Attachment: snowball_53739a805cfa6c.patch > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch, > snowball_53739a805cfa6c.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037700#comment-17037700 ] ASF subversion and git services commented on LUCENE-9220: - Commit d9a285c857e632a32f7762c49d2ab8363ae8c876 in lucene-solr's branch refs/heads/jira/LUCENE-9220 from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d9a285c ] LUCENE-9220: automate generation of (bsd license-only, sampled) test data > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037687#comment-17037687 ] Robert Muir commented on LUCENE-9220: - I'm working my way thru automating the process (test data right now). I would like a better situation, e.g. maybe we have commit hash + patch + regen script in our repo tied into gradle regenerate task. It may just be limited to only work on linux or similar, we have to patch sources, invoke makefile, compile c code, etc. > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git
[ https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037681#comment-17037681 ] Uwe Schindler commented on LUCENE-8987: --- It looks like this does not work in markdown files: {code} ./content/pages/solr/resources.md:* [Latest Release](/solr/{{ LUCENE_LATEST_RELEASE | replace(".", "_") }}/index.html) {code} > Move Lucene web site from svn to git > > > Key: LUCENE-8987 > URL: https://issues.apache.org/jira/browse/LUCENE-8987 > Project: Lucene - Core > Issue Type: Task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: lucene-site-repo.png > > > INFRA just enabled [a new way of configuring website > build|https://s.apache.org/asfyaml] from a git branch, [see dev list > email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E]. > It allows for automatic builds of both staging and production site, much > like the old CMS. We can choose to auto publish the html content of an > {{output/}} folder, or to have a bot build the site using > [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder. > The goal of this issue is to explore how this can be done for > [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new > git repo {{lucene-site}}, copy over the site from svn, see if it can be > "Pelicanized" easily and then test staging. Benefits are that more people > will be able to edit the web site and we can take PRs from the public (with > GitHub preview of pages). > Non-goals: > * Create a new web site or a new graphic design > * Change from Markdown to Asciidoc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git
[ https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037679#comment-17037679 ] Uwe Schindler commented on LUCENE-8987: --- Small issue: https://lucene.apache.org/solr/resources.html#documentation The link to the Solr Javadocs is missing the version number! > Move Lucene web site from svn to git > > > Key: LUCENE-8987 > URL: https://issues.apache.org/jira/browse/LUCENE-8987 > Project: Lucene - Core > Issue Type: Task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: lucene-site-repo.png > > > INFRA just enabled [a new way of configuring website > build|https://s.apache.org/asfyaml] from a git branch, [see dev list > email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E]. > It allows for automatic builds of both staging and production site, much > like the old CMS. We can choose to auto publish the html content of an > {{output/}} folder, or to have a bot build the site using > [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder. > The goal of this issue is to explore how this can be done for > [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new > git repo {{lucene-site}}, copy over the site from svn, see if it can be > "Pelicanized" easily and then test staging. Benefits are that more people > will be able to edit the web site and we can take PRs from the public (with > GitHub preview of pages). > Non-goals: > * Create a new web site or a new graphic design > * Change from Markdown to Asciidoc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git
[ https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037678#comment-17037678 ] Uwe Schindler commented on LUCENE-8987: --- The new site is now live: - https://lucene.apache.org/ (we should maybe just fix certificate warnings because of unencrypted content still there) - The Subversion Git CMS tree was cleaned up (added file "MOVED_TO_GIT") - The Subversion folder with the Javadocs and Refguide was kept alive: https://svn.apache.org/repos/infra/websites/production/lucene/content To publish Javadocs and Refguide nothing has changed, only extpath.txt is gone. Just commit to subversion as you did before. > Move Lucene web site from svn to git > > > Key: LUCENE-8987 > URL: https://issues.apache.org/jira/browse/LUCENE-8987 > Project: Lucene - Core > Issue Type: Task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: lucene-site-repo.png > > > INFRA just enabled [a new way of configuring website > build|https://s.apache.org/asfyaml] from a git branch, [see dev list > email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E]. > It allows for automatic builds of both staging and production site, much > like the old CMS. We can choose to auto publish the html content of an > {{output/}} folder, or to have a bot build the site using > [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder. > The goal of this issue is to explore how this can be done for > [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new > git repo {{lucene-site}}, copy over the site from svn, see if it can be > "Pelicanized" easily and then test staging. Benefits are that more people > will be able to edit the web site and we can take PRs from the public (with > GitHub preview of pages). > Non-goals: > * Create a new web site or a new graphic design > * Change from Markdown to Asciidoc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9034) Officially publish the new site
[ https://issues.apache.org/jira/browse/LUCENE-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-9034. --- Resolution: Fixed The new site is now live: - https://lucene.apache.org/ (we should maybe just fix certificate warnings because of unencrypted content still there) - The Subversion Git CMS tree was cleaned up (added file "MOVED_TO_GIT") - The Subversion folder with the Javadocs and Refguide was kept alive: https://svn.apache.org/repos/infra/websites/production/lucene/content To publish Javadocs and Refguide nothing has changed, only extpath.txt is gone. Just commit to subversion as you did before. > Officially publish the new site > --- > > Key: LUCENE-9034 > URL: https://issues.apache.org/jira/browse/LUCENE-9034 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/website >Reporter: Jan Høydahl >Assignee: Uwe Schindler >Priority: Major > > Publishing the web site means creating a publish branch and adding the right > magic instructions to {{.asf.yml}} etc. This will then publish the new site > and disable old CMS. > Before we do that we should > # Make sure all docs and release tools are updated for new site publishing > instructions > # Create a PR with latest changes in old CMS site since the export. This > will be the changes done during 8.3.0 release and possibly some news entries > related to security issues etc. > After publishing we should ask INFRA to make old site svn read-only (and > perhaps do a commit that replaces svn content with a README.txt), so it is > obvious for everyone that we have migrated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir opened a new pull request #1262: LUCENE-9220: regenerate all stemmers from snowball 2.0
rmuir opened a new pull request #1262: LUCENE-9220: regenerate all stemmers from snowball 2.0 URL: https://github.com/apache/lucene-solr/pull/1262 Instead of patching them after-the-fact (both manually and automatically over the years) we patch the generator. This is easier to maintain than patches/changes against generated code. See LUCENE-9220 for more information. There is a remaining nocommit, test data. Also need to hook in and test the new languages that are added here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037669#comment-17037669 ] ASF subversion and git services commented on LUCENE-9220: - Commit fc229b170197e37ffcbdb330e7657939979a7def in lucene-solr's branch refs/heads/jira/LUCENE-9220 from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fc229b1 ] LUCENE-9220 regenerate all stemmers from snowball 2.0 Instead of patching them after-the-fact (both manually and automatically over the years) we patch the generator. This is easier to maintain than patches/changes against generated code. See LUCENE-9220 for more information. There is a remaining nocommit, test data. Also need to hook in and test the new languages that are added here. > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch > > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037670#comment-17037670 ] ASF subversion and git services commented on LUCENE-9220: - Commit fc229b170197e37ffcbdb330e7657939979a7def in lucene-solr's branch refs/heads/jira/LUCENE-9220 from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fc229b1 ] LUCENE-9220 regenerate all stemmers from snowball 2.0 Instead of patching them after-the-fact (both manually and automatically over the years) we patch the generator. This is easier to maintain than patches/changes against generated code. See LUCENE-9220 for more information. There is a remaining nocommit, test data. Also need to hook in and test the new languages that are added here. > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch > > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037666#comment-17037666 ] Robert Muir commented on LUCENE-9220: - I've uploaded a patch of that same commit against 53739a805cfa6c of snowball. This way we can refer to it in documentation for now from this issue. I will make a lucene PR for the lucene-side changes here. > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch > > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9220: Attachment: snowball_53739a805cfa6c.patch > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > Attachments: snowball_53739a805cfa6c.patch > > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037665#comment-17037665 ] Robert Muir commented on LUCENE-9220: - Of course tests won't pass! Otherwise this thing gets massively slower because it won't have our fixes to unnecessary reflection, string creation, etc. Also lucene has armenian and estonian, neither of which are currently in the snowball repo (one is on the website though, and the other has a PR). So we have to generate and enable stemmers for those languages. We also support stemmers that are disabled by default (KP, german2, lovins), so we have to enable and generate those too. Finally there is the mixed tabs/space indentation, the lack of license headers, the lack of javadocs, the mixed tab/space indentation, it all adds up to make it quite the pain in the ass. I think instead of patching *generated* code we should patch snowball itself and try to send the fixes to them upstream. It seems reasonable they would want consistent whitespace, docs, licensing, better performance, etc. For example, currently methodhandle patching will fail because the generated structure has changed, but its a one-liner to fix this in their C-code generator, and easier to maintain that way, even as a patch. I have made such changes here: https://github.com/rmuir/snowball/commit/2e1433394ef02ee248127c8e3485d9cbc395d577 > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #365: [SOLR-12243] span query generalization + query parser tests
janhoy commented on issue #365: [SOLR-12243] span query generalization + query parser tests URL: https://github.com/apache/lucene-solr/pull/365#issuecomment-586652193 Merged manually This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #365: [SOLR-12243] span query generalization + query parser tests
janhoy closed pull request #365: [SOLR-12243] span query generalization + query parser tests URL: https://github.com/apache/lucene-solr/pull/365 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #639: Solve the problem of highlighting Chinese inaccurately.
janhoy commented on issue #639: Solve the problem of highlighting Chinese inaccurately. URL: https://github.com/apache/lucene-solr/pull/639#issuecomment-586652109 Thanks for contributing. You way want to discuss your problem in the mailing list and confirm it is a bug, and then invite developers to take a look at your PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #807: Remove solr.jetty.https.port when SSL is not used
janhoy commented on issue #807: Remove solr.jetty.https.port when SSL is not used URL: https://github.com/apache/lucene-solr/pull/807#issuecomment-586651472 This looks like a bug. Please create a corresponding JIRA issue and read the checklist above. There should be a line in solr/CHANGES.txt as well for this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9034) Officially publish the new site
[ https://issues.apache.org/jira/browse/LUCENE-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037664#comment-17037664 ] Uwe Schindler commented on LUCENE-9034: --- Opened: INFRA-19859 > Officially publish the new site > --- > > Key: LUCENE-9034 > URL: https://issues.apache.org/jira/browse/LUCENE-9034 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/website >Reporter: Jan Høydahl >Assignee: Uwe Schindler >Priority: Major > > Publishing the web site means creating a publish branch and adding the right > magic instructions to {{.asf.yml}} etc. This will then publish the new site > and disable old CMS. > Before we do that we should > # Make sure all docs and release tools are updated for new site publishing > instructions > # Create a PR with latest changes in old CMS site since the export. This > will be the changes done during 8.3.0 release and possibly some news entries > related to security issues etc. > After publishing we should ask INFRA to make old site svn read-only (and > perhaps do a commit that replaces svn content with a README.txt), so it is > obvious for everyone that we have migrated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #908: Change the file format of README files from README.txt to README.md a…
janhoy commented on issue #908: Change the file format of README files from README.txt to README.md a… URL: https://github.com/apache/lucene-solr/pull/908#issuecomment-586651025 @pinkeshsharma Have you seen my review feedback? Please make adjustments so we can merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #873: Rename README.txt to README.md
janhoy closed pull request #873: Rename README.txt to README.md URL: https://github.com/apache/lucene-solr/pull/873 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #873: Rename README.txt to README.md
janhoy commented on issue #873: Rename README.txt to README.md URL: https://github.com/apache/lucene-solr/pull/873#issuecomment-586650863 This is a duplicate of #908 which also aims to change formatting. Closing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #988: Update README.md
janhoy closed pull request #988: Update README.md URL: https://github.com/apache/lucene-solr/pull/988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #988: Update README.md
janhoy commented on issue #988: Update README.md URL: https://github.com/apache/lucene-solr/pull/988#issuecomment-586650685 Closing this. Please contribute instead to the cwiki page already linked to. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9034) Officially publish the new site
[ https://issues.apache.org/jira/browse/LUCENE-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037658#comment-17037658 ] Uwe Schindler commented on LUCENE-9034: --- OK, sorry for delay. I will contact infra soon. Uwe > Officially publish the new site > --- > > Key: LUCENE-9034 > URL: https://issues.apache.org/jira/browse/LUCENE-9034 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/website >Reporter: Jan Høydahl >Assignee: Uwe Schindler >Priority: Major > > Publishing the web site means creating a publish branch and adding the right > magic instructions to {{.asf.yml}} etc. This will then publish the new site > and disable old CMS. > Before we do that we should > # Make sure all docs and release tools are updated for new site publishing > instructions > # Create a PR with latest changes in old CMS site since the export. This > will be the changes done during 8.3.0 release and possibly some news entries > related to security issues etc. > After publishing we should ask INFRA to make old site svn read-only (and > perhaps do a commit that replaces svn content with a README.txt), so it is > obvious for everyone that we have migrated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #1005: LUCENE-9042: Refactor TopGroups.merge tests
janhoy commented on issue #1005: LUCENE-9042: Refactor TopGroups.merge tests URL: https://github.com/apache/lucene-solr/pull/1005#issuecomment-586645396 Linking with LUCENE-9042 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy merged pull request #1090: Update README.txt for analysis-extras
janhoy merged pull request #1090: Update README.txt for analysis-extras URL: https://github.com/apache/lucene-solr/pull/1090 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #1090: Update README.txt for analysis-extras
janhoy commented on issue #1090: Update README.txt for analysis-extras URL: https://github.com/apache/lucene-solr/pull/1090#issuecomment-586645053 Thanks for the contribution This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #1143: HdfsDirectory support createTempOutput
janhoy commented on issue #1143: HdfsDirectory support createTempOutput URL: https://github.com/apache/lucene-solr/pull/1143#issuecomment-586644825 @kaynewu Have you opened a JIRA issue as well for this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on issue #1036: SOLR-13967: Update query.css to make query form sticky on the top while scrolling page
janhoy commented on issue #1036: SOLR-13967: Update query.css to make query form sticky on the top while scrolling page URL: https://github.com/apache/lucene-solr/pull/1036#issuecomment-586644660 @shuson Why is this PR still open? The JIRA issue is closed as implemented but I cannot see that anything has been committed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #932: SOLR-13829: Stop converting continuous numbers to Longs in Streaming Expressions
janhoy closed pull request #932: SOLR-13829: Stop converting continuous numbers to Longs in Streaming Expressions URL: https://github.com/apache/lucene-solr/pull/932 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #529: LUCENE-8617: Use SimpleFSDirectory on non-default FS
janhoy closed pull request #529: LUCENE-8617: Use SimpleFSDirectory on non-default FS URL: https://github.com/apache/lucene-solr/pull/529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #500: LUCENE-8517: do not wrap FixedShingleFilter with conditional in TestR…
janhoy closed pull request #500: LUCENE-8517: do not wrap FixedShingleFilter with conditional in TestR… URL: https://github.com/apache/lucene-solr/pull/500 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #1016: SOLR-13662: Test fix & Reference guide for package manager
janhoy closed pull request #1016: SOLR-13662: Test fix & Reference guide for package manager URL: https://github.com/apache/lucene-solr/pull/1016 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf commented on issue #976: SOLR-13749: Implement support for joining across collections with multiple shards
gus-asf commented on issue #976: SOLR-13749: Implement support for joining across collections with multiple shards URL: https://github.com/apache/lucene-solr/pull/976#issuecomment-586640936 merged manually This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf closed pull request #976: SOLR-13749: Implement support for joining across collections with multiple shards
gus-asf closed pull request #976: SOLR-13749: Implement support for joining across collections with multiple shards URL: https://github.com/apache/lucene-solr/pull/976 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13971) Velocity custom template RCE vulnerability
[ https://issues.apache.org/jira/browse/SOLR-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037589#comment-17037589 ] Jan Høydahl commented on SOLR-13971: Also add 7.7.3 as fixVersion in this JIRA? > Velocity custom template RCE vulnerability > -- > > Key: SOLR-13971 > URL: https://issues.apache.org/jira/browse/SOLR-13971 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.5.5, 6.0, 6.6.5, 7.0, 7.7, 8.0, 8.3 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.4 > > Attachments: SOLR-13971.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > We need to disable this. There is a zero day attack in the wild. 41 stars on > this github project: > # https://github.com/jas502n/solr_rce > # https://gist.github.com/s00py/a1ba36a3689fa13759ff910e179fc133 > We need to disable this in a way that cannot be re-enabled using the Config > API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss closed pull request #543: LUCENE-8474: final cleanups and removal of RAMDirectory
dweiss closed pull request #543: LUCENE-8474: final cleanups and removal of RAMDirectory URL: https://github.com/apache/lucene-solr/pull/543 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss closed pull request #829: SOLR-13452: Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.
dweiss closed pull request #829: SOLR-13452: Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle. URL: https://github.com/apache/lucene-solr/pull/829 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss closed pull request #536: LUCENE-8643: Decrease test complexity in the default case. Exclude simple text codec.
dweiss closed pull request #536: LUCENE-8643: Decrease test complexity in the default case. Exclude simple text codec. URL: https://github.com/apache/lucene-solr/pull/536 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss closed pull request #533: LUCENE-8636: TestPointQueries and long execution times
dweiss closed pull request #533: LUCENE-8636: TestPointQueries and long execution times URL: https://github.com/apache/lucene-solr/pull/533 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #124: fix small issue in solr shell script
janhoy closed pull request #124: fix small issue in solr shell script URL: https://github.com/apache/lucene-solr/pull/124 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14258) DocList (DocSlice) should not implement DocSet
[ https://issues.apache.org/jira/browse/SOLR-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037523#comment-17037523 ] David Smiley commented on SOLR-14258: - I’m okay with master only but I’d like your opinion on what change has the most compatibility risk here? > DocList (DocSlice) should not implement DocSet > -- > > Key: SOLR-14258 > URL: https://issues.apache.org/jira/browse/SOLR-14258 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > DocList is an internal interface used to hold the documents we'll ultimately > return from search. It has one implementation -- DocSlice. It implements > DocSet but I think that was a mistake. Basically no-where does Solr depend > on the fact that a DocList is a DocSet today, and keeping it this way > complicates maintenance on DocSet's abstraction. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14258) DocList (DocSlice) should not implement DocSet
[ https://issues.apache.org/jira/browse/SOLR-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037499#comment-17037499 ] Mikhail Khludnev commented on SOLR-14258: - I'm ok with PR. Do you target it for master only? I worry about all customers plugins made for 8x. > DocList (DocSlice) should not implement DocSet > -- > > Key: SOLR-14258 > URL: https://issues.apache.org/jira/browse/SOLR-14258 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > DocList is an internal interface used to hold the documents we'll ultimately > return from search. It has one implementation -- DocSlice. It implements > DocSet but I think that was a mistake. Basically no-where does Solr depend > on the fact that a DocList is a DocSet today, and keeping it this way > complicates maintenance on DocSet's abstraction. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org