[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return
[ https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangFeiCheng updated LUCENE-9609: - Description: I noticed that when there are too many terms, the highlighted query is restricted I know that in TermInSetQuery, when there are fewer terms, BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query efficiency {code:java} static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; public Query rewrite(IndexReader reader) throws IOException { final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, BooleanQuery.getMaxClauseCount()); if (termData.size() <= threshold) { BooleanQuery.Builder bq = new BooleanQuery.Builder(); TermIterator iterator = termData.iterator(); for (BytesRef term = iterator.next(); term != null; term = iterator.next()) { bq.add(new TermQuery(new Term(iterator.field(), BytesRef.deepCopyOf(term))), Occur.SHOULD); } return new ConstantScoreQuery(bq.build()); } return super.rewrite(reader); } {code} When the term of the query statement exceeds 16, the createWeight method in TermInSetQuery will be used {code:java} public Weight createWeight(IndexSearcher searcher, boolean needsScores, float boost) throws IOException { return new ConstantScoreWeight(this, boost) { @Override public void extractTerms(Set terms) { // no-op // This query is for abuse cases when the number of terms is too high to // run efficiently as a BooleanQuery. So likewise we hide its terms in // order to protect highlighters } .. } {code} I want to ask, why do you say "we hide its terms in order to protect highlighters" Why this threshold can highlight protection, or how to implement such " protect highlighters"? was: I noticed that when there are too many terms, the highlighted query is restricted I know that in TermInSetQuery, when there are fewer terms, BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query efficiency {code:java} static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; public Query rewrite(IndexReader reader) throws IOException { final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, BooleanQuery.getMaxClauseCount()); if (termData.size() <= threshold) { BooleanQuery.Builder bq = new BooleanQuery.Builder(); TermIterator iterator = termData.iterator(); for (BytesRef term = iterator.next(); term != null; term = iterator.next()) { bq.add(new TermQuery(new Term(iterator.field(), BytesRef.deepCopyOf(term))), Occur.SHOULD); } return new ConstantScoreQuery(bq.build()); } return super.rewrite(reader); } {code} When the term of the query statement exceeds 16, the createWeight method in TermInSetQuery will be used {code:java} public Weight createWeight(IndexSearcher searcher, boolean needsScores, float boost) throws IOException { return new ConstantScoreWeight(this, boost) { @Override public void extractTerms(Set terms) { // no-op // This query is for abuse cases when the number of terms is too high to // run efficiently as a BooleanQuery. So likewise we hide its terms in // order to protect highlighters } .. } {code} I want to ask, why do I say "we hide its terms in order to protect highlighters" Why this threshold can highlight protection, or how to implement such " protect highlighters"? > When the term of more than 16, highlight the query does not return > -- > > Key: LUCENE-9609 > URL: https://issues.apache.org/jira/browse/LUCENE-9609 > Project: Lucene - Core > Issue Type: Wish > Components: core/search >Affects Versions: 7.7.3 >Reporter: WangFeiCheng >Priority: Minor > > I noticed that when there are too many terms, the highlighted query is > restricted > I know that in TermInSetQuery, when there are fewer terms, > BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query > efficiency > {code:java} > static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; > public Query rewrite(IndexReader reader) throws IOException { > final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, > BooleanQuery.getMaxClauseCount()); > if (termData.size() <= threshold) { > BooleanQuery.Builder bq = new BooleanQuery.Builder(); > TermIterator iterator = termData.iterator(); > for (BytesRef term = iterator.next(); term != null; term = > iterator.next()) { > bq.add(new TermQuery(new Term(iterator.field(), > BytesRef.deepCopyOf(term))), Occur.SHOULD); > } > return new ConstantScoreQuery(bq.build()); > } > return super.rewrite(reader); >
[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return
[ https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangFeiCheng updated LUCENE-9609: - Description: I noticed that when there are too many terms, the highlighted query is restricted I know that in TermInSetQuery, when there are fewer terms, BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query efficiency {code:java} static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; public Query rewrite(IndexReader reader) throws IOException { final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, BooleanQuery.getMaxClauseCount()); if (termData.size() <= threshold) { BooleanQuery.Builder bq = new BooleanQuery.Builder(); TermIterator iterator = termData.iterator(); for (BytesRef term = iterator.next(); term != null; term = iterator.next()) { bq.add(new TermQuery(new Term(iterator.field(), BytesRef.deepCopyOf(term))), Occur.SHOULD); } return new ConstantScoreQuery(bq.build()); } return super.rewrite(reader); } {code} When the term of the query statement exceeds 16, the createWeight method in TermInSetQuery will be used {code:java} public Weight createWeight(IndexSearcher searcher, boolean needsScores, float boost) throws IOException { return new ConstantScoreWeight(this, boost) { @Override public void extractTerms(Set terms) { // no-op // This query is for abuse cases when the number of terms is too high to // run efficiently as a BooleanQuery. So likewise we hide its terms in // order to protect highlighters } .. } {code} I want to ask, why do I say "we hide its terms in order to protect highlighters" Why this threshold can highlight protection, or how to implement such " protect highlighters"? was: I noticed that when there are too many terms, the highlighted query is restricted I know that in TermInSetQuery, when there are fewer entries, please use BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 to improve query efficiency {code:java} 静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; 公共查询重写(IndexReader阅读器)引发IOException { 最终int阈值= Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount()); 如果(termData.size()<=阈值){ BooleanQuery.Builder bq =新的BooleanQuery.Builder(); TermIterator迭代器= termData.iterator(); for(BytesRef term = iterator.next(); term!= null; term = iterator.next()){ bq.add(new TermQuery(new Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD); } 返回新的ConstantScoreQuery(bq.build()); } 返回super.rewrite(reader); } {code} 但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个 {code:java} @Override public void extractTerms(Set 术语){ //无操作 //此查询用于术语数量过多而无法使用的滥用情况 //作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在 //为了保护荧光笔 } {code} 我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语” 为什么这个阈值可以保护重点,以及如何实现这种“保护”? > When the term of more than 16, highlight the query does not return > -- > > Key: LUCENE-9609 > URL: https://issues.apache.org/jira/browse/LUCENE-9609 > Project: Lucene - Core > Issue Type: Wish > Components: core/search >Affects Versions: 7.7.3 >Reporter: WangFeiCheng >Priority: Minor > > I noticed that when there are too many terms, the highlighted query is > restricted > I know that in TermInSetQuery, when there are fewer terms, > BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query > efficiency > {code:java} > static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; > public Query rewrite(IndexReader reader) throws IOException { > final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, > BooleanQuery.getMaxClauseCount()); > if (termData.size() <= threshold) { > BooleanQuery.Builder bq = new BooleanQuery.Builder(); > TermIterator iterator = termData.iterator(); > for (BytesRef term = iterator.next(); term != null; term = > iterator.next()) { > bq.add(new TermQuery(new Term(iterator.field(), > BytesRef.deepCopyOf(term))), Occur.SHOULD); > } > return new ConstantScoreQuery(bq.build()); > } > return super.rewrite(reader); > } > {code} > When the term of the query statement exceeds 16, the createWeight method in > TermInSetQuery will be used > {code:java} > public Weight createWeight(IndexSearcher searcher, boolean needsScores, float > boost) throws IOException { > return new ConstantScoreWeight(this, boost) { > @Override > public void extractTerms(Set terms) { > // no-op > // This query is for abuse cases when the number of terms is too high > to > // run efficiently as a BooleanQuery. So likewise we hide its terms in > // order to protect
[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return
[ https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangFeiCheng updated LUCENE-9609: - Description: I noticed that when there are too many terms, the highlighted query is restricted I know that in TermInSetQuery, when there are fewer entries, please use BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 to improve query efficiency {code:java} 静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; 公共查询重写(IndexReader阅读器)引发IOException { 最终int阈值= Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount()); 如果(termData.size()<=阈值){ BooleanQuery.Builder bq =新的BooleanQuery.Builder(); TermIterator迭代器= termData.iterator(); for(BytesRef term = iterator.next(); term!= null; term = iterator.next()){ bq.add(new TermQuery(new Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD); } 返回新的ConstantScoreQuery(bq.build()); } 返回super.rewrite(reader); } {code} 但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个 {code:java} @Override public void extractTerms(Set 术语){ //无操作 //此查询用于术语数量过多而无法使用的滥用情况 //作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在 //为了保护荧光笔 } {code} 我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语” 为什么这个阈值可以保护重点,以及如何实现这种“保护”? > When the term of more than 16, highlight the query does not return > -- > > Key: LUCENE-9609 > URL: https://issues.apache.org/jira/browse/LUCENE-9609 > Project: Lucene - Core > Issue Type: Wish > Components: core/search >Affects Versions: 7.7.3 >Reporter: WangFeiCheng >Priority: Minor > > I noticed that when there are too many terms, the highlighted query is > restricted > I know that in TermInSetQuery, when there are fewer entries, please use > BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 to improve query efficiency > {code:java} > 静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; > 公共查询重写(IndexReader阅读器)引发IOException { > 最终int阈值= > Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount()); > 如果(termData.size()<=阈值){ > BooleanQuery.Builder bq =新的BooleanQuery.Builder(); > TermIterator迭代器= termData.iterator(); > for(BytesRef term = iterator.next(); term!= null; term = > iterator.next()){ > bq.add(new TermQuery(new > Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD); > } > 返回新的ConstantScoreQuery(bq.build()); > } > 返回super.rewrite(reader); > } > {code} > 但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个 > > {code:java} > @Override > public void extractTerms(Set 术语){ > //无操作 > //此查询用于术语数量过多而无法使用的滥用情况 > //作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在 > //为了保护荧光笔 > } > {code} > 我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语” > 为什么这个阈值可以保护重点,以及如何实现这种“保护”? > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return
[ https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangFeiCheng updated LUCENE-9609: - Description: (was: 我注意到,当术语过多时,突出显示的查询受到限制 我知道在TermInSetQuery中,当词条较少时,请使用BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16来提高查询效率 {code:java} 静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; 公共查询重写(IndexReader阅读器)引发IOException { 最终int阈值= Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount()); 如果(termData.size()<=阈值){ BooleanQuery.Builder bq =新的BooleanQuery.Builder(); TermIterator迭代器= termData.iterator(); for(BytesRef term = iterator.next(); term!= null; term = iterator.next()){ bq.add(new TermQuery(new Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD); } 返回新的ConstantScoreQuery(bq.build()); } 返回super.rewrite(reader); } {code} 但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个 {code:java} @Override public void extractTerms(Set 术语){ //无操作 //此查询用于术语数量过多而无法使用的滥用情况 //作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在 //为了保护荧光笔 } {code} 我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语” 为什么这个阈值可以保护重点,以及如何实现这种“保护”? ) > When the term of more than 16, highlight the query does not return > -- > > Key: LUCENE-9609 > URL: https://issues.apache.org/jira/browse/LUCENE-9609 > Project: Lucene - Core > Issue Type: Wish > Components: core/search >Affects Versions: 7.7.3 >Reporter: WangFeiCheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9609) When the term of more than 16, highlight the query does not return
WangFeiCheng created LUCENE-9609: Summary: When the term of more than 16, highlight the query does not return Key: LUCENE-9609 URL: https://issues.apache.org/jira/browse/LUCENE-9609 Project: Lucene - Core Issue Type: Wish Components: core/search Affects Versions: 7.7.3 Reporter: WangFeiCheng 我注意到,当术语过多时,突出显示的查询受到限制 我知道在TermInSetQuery中,当词条较少时,请使用BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16来提高查询效率 {code:java} 静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16; 公共查询重写(IndexReader阅读器)引发IOException { 最终int阈值= Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount()); 如果(termData.size()<=阈值){ BooleanQuery.Builder bq =新的BooleanQuery.Builder(); TermIterator迭代器= termData.iterator(); for(BytesRef term = iterator.next(); term!= null; term = iterator.next()){ bq.add(new TermQuery(new Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD); } 返回新的ConstantScoreQuery(bq.build()); } 返回super.rewrite(reader); } {code} 但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个 {code:java} @Override public void extractTerms(Set 术语){ //无操作 //此查询用于术语数量过多而无法使用的滥用情况 //作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在 //为了保护荧光笔 } {code} 我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语” 为什么这个阈值可以保护重点,以及如何实现这种“保护”? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
[ https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bai sui updated SOLR-15000: --- Description: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, *I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?* h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! !add-collection-step-2.png|width=500! Schema editing mode can be switched between visual editing mode or advanced expert mode - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect was: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! !add-collection-step-2.png|width=500! Schema editing mode can be switched between visual editing mode or advanced expert mode - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect > Solr based enterprise level, one-stop search center products with high > performance, high reliability and high scalability > - > > Key: SOLR-15000 > URL: https://issues.apache.org/jira/browse/SOLR-15000 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: bai sui >Priority: Minor > Attachments: add-collection-step-2-expert.png, > add-collection-step-2.png > > > h2. Summary > I have developed an enterprise application based on Solr,named TIS . Use TIS > can quickly build enterprise search service for you. TIS includes three > components: > - offline index building platform > The data is exported from ER database( mysql, sqlserver and so on) through > full table scanning, and then the wide table is constructed by local MR tool, >
[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
[ https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bai sui updated SOLR-15000: --- Description: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! !add-collection-step-2.png|width=500! Schema editing mode can be switched between visual editing mode or advanced expert mode - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect was: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! !add-collection-step-2.png|width=500! Schema editing mode can be switched between visual editing mode or advanced expert mode - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect > Solr based enterprise level, one-stop search center products with high > performance, high reliability and high scalability > - > > Key: SOLR-15000 > URL: https://issues.apache.org/jira/browse/SOLR-15000 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: bai sui >Priority: Minor > Attachments: add-collection-step-2-expert.png, > add-collection-step-2.png > > > h2. Summary > I have developed an enterprise application based on Solr,named TIS . Use TIS > can quickly build enterprise search service for you. TIS includes three > components: > - offline index building platform > The data is exported from ER database( mysql, sqlserver and so on) through > full table scanning, and then the wide table is constructed by local MR tool, > or the wide
[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
[ https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bai sui updated SOLR-15000: --- Description: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! !add-collection-step-2.png|width=500! Schema editing mode can be switched between visual editing mode or advanced expert mode - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect was: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect > Solr based enterprise level, one-stop search center products with high > performance, high reliability and high scalability > - > > Key: SOLR-15000 > URL: https://issues.apache.org/jira/browse/SOLR-15000 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: bai sui >Priority: Minor > Attachments: add-collection-step-2-expert.png, > add-collection-step-2.png > > > h2. Summary > I have developed an enterprise application based on Solr,named TIS . Use TIS > can quickly build enterprise search service for you. TIS includes three > components: > - offline index building platform > The data is exported from ER database( mysql, sqlserver and so on) through > full table scanning, and then the wide table is constructed by local MR tool, > or the wide table is constructed directly by spark > - incremental real-time channel > It is transmitted to Kafka , and real-time stream
[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
[ https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bai sui updated SOLR-15000: --- Description: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect was: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect > Solr based enterprise level, one-stop search center products with high > performance, high reliability and high scalability > - > > Key: SOLR-15000 > URL: https://issues.apache.org/jira/browse/SOLR-15000 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: bai sui >Priority: Minor > Attachments: add-collection-step-2-expert.png, > add-collection-step-2.png > > > h2. Summary > I have developed an enterprise application based on Solr,named TIS . Use TIS > can quickly build enterprise search service for you. TIS includes three > components: > - offline index building platform > The data is exported from ER database( mysql, sqlserver and so on) through > full table scanning, and then the wide table is constructed by local MR tool, > or the wide table is constructed directly by spark > - incremental real-time channel > It is transmitted to Kafka , and real-time stream calculation is carried out > by Flink and submitted to search engine to ensure that the data in search > engine and database are
[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
[ https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bai sui updated SOLR-15000: --- Description: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. !add-collection-step-2-expert.png|width=500! - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect was: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect > Solr based enterprise level, one-stop search center products with high > performance, high reliability and high scalability > - > > Key: SOLR-15000 > URL: https://issues.apache.org/jira/browse/SOLR-15000 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: bai sui >Priority: Minor > Attachments: add-collection-step-2-expert.png, > add-collection-step-2.png > > > h2. Summary > I have developed an enterprise application based on Solr,named TIS . Use TIS > can quickly build enterprise search service for you. TIS includes three > components: > - offline index building platform > The data is exported from ER database( mysql, sqlserver and so on) through > full table scanning, and then the wide table is constructed by local MR tool, > or the wide table is constructed directly by spark > - incremental real-time channel > It is transmitted to Kafka , and real-time stream calculation is carried out > by Flink and submitted to search engine to ensure that the data in search > engine and database are consistent in near real time > - search engine >
[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
[ https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bai sui updated SOLR-15000: --- Attachment: add-collection-step-2.png > Solr based enterprise level, one-stop search center products with high > performance, high reliability and high scalability > - > > Key: SOLR-15000 > URL: https://issues.apache.org/jira/browse/SOLR-15000 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: bai sui >Priority: Minor > Attachments: add-collection-step-2-expert.png, > add-collection-step-2.png > > > h2. Summary > I have developed an enterprise application based on Solr,named TIS . Use TIS > can quickly build enterprise search service for you. TIS includes three > components: > - offline index building platform > The data is exported from ER database( mysql, sqlserver and so on) through > full table scanning, and then the wide table is constructed by local MR tool, > or the wide table is constructed directly by spark > - incremental real-time channel > It is transmitted to Kafka , and real-time stream calculation is carried out > by Flink and submitted to search engine to ensure that the data in search > engine and database are consistent in near real time > - search engine > currently,based on Solr8 > TIS integrate these components seamlessly and bring users one-stop, out of > the box experience. > h2. My question > I want to feed back my code to the community, but TIS focuses on Enterprise > Application Search, just as elasitc search focuses on visual analysis of time > series data. Because Solr is a general search product, **I don't think TIS > can be merged directly into Solr. Is it possible for TIS to be a new > incubation project under Apache?** > h2. TIS main Features > - The schema and solrconfig storage are separated from ZK and stored in > MySQL. The version management function is provided. Users can roll back to > the historical version of the configuration. > - Define wide table rules based on the selected data table > - The offline index building component is provided. Outside the collection, > the data is built into Lucene segment file. Then, the segment file is > returned to the local disk where solrcore is located. The new index of reload > solrcore takes effect -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
[ https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bai sui updated SOLR-15000: --- Attachment: add-collection-step-2-expert.png > Solr based enterprise level, one-stop search center products with high > performance, high reliability and high scalability > - > > Key: SOLR-15000 > URL: https://issues.apache.org/jira/browse/SOLR-15000 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: bai sui >Priority: Minor > Attachments: add-collection-step-2-expert.png, > add-collection-step-2.png > > > h2. Summary > I have developed an enterprise application based on Solr,named TIS . Use TIS > can quickly build enterprise search service for you. TIS includes three > components: > - offline index building platform > The data is exported from ER database( mysql, sqlserver and so on) through > full table scanning, and then the wide table is constructed by local MR tool, > or the wide table is constructed directly by spark > - incremental real-time channel > It is transmitted to Kafka , and real-time stream calculation is carried out > by Flink and submitted to search engine to ensure that the data in search > engine and database are consistent in near real time > - search engine > currently,based on Solr8 > TIS integrate these components seamlessly and bring users one-stop, out of > the box experience. > h2. My question > I want to feed back my code to the community, but TIS focuses on Enterprise > Application Search, just as elasitc search focuses on visual analysis of time > series data. Because Solr is a general search product, **I don't think TIS > can be merged directly into Solr. Is it possible for TIS to be a new > incubation project under Apache?** > h2. TIS main Features > - The schema and solrconfig storage are separated from ZK and stored in > MySQL. The version management function is provided. Users can roll back to > the historical version of the configuration. > - Define wide table rules based on the selected data table > - The offline index building component is provided. Outside the collection, > the data is built into Lucene segment file. Then, the segment file is > returned to the local disk where solrcore is located. The new index of reload > solrcore takes effect -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
[ https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bai sui updated SOLR-15000: --- Description: h2. Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. h2. My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** h2. TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect was: ## Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. ## My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** ## TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect > Solr based enterprise level, one-stop search center products with high > performance, high reliability and high scalability > - > > Key: SOLR-15000 > URL: https://issues.apache.org/jira/browse/SOLR-15000 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: bai sui >Priority: Minor > > h2. Summary > I have developed an enterprise application based on Solr,named TIS . Use TIS > can quickly build enterprise search service for you. TIS includes three > components: > - offline index building platform > The data is exported from ER database( mysql, sqlserver and so on) through > full table scanning, and then the wide table is constructed by local MR tool, > or the wide table is constructed directly by spark > - incremental real-time channel > It is transmitted to Kafka , and real-time stream calculation is carried out > by Flink and submitted to search engine to ensure that the data in search > engine and database are consistent in near real time > - search engine > currently,based on Solr8 > TIS integrate these components seamlessly and bring users one-stop, out of > the box experience. > h2. My question >
[jira] [Created] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability
bai sui created SOLR-15000: -- Summary: Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability Key: SOLR-15000 URL: https://issues.apache.org/jira/browse/SOLR-15000 Project: Solr Issue Type: Wish Security Level: Public (Default Security Level. Issues are Public) Components: Admin UI Reporter: bai sui ## Summary I have developed an enterprise application based on Solr,named TIS . Use TIS can quickly build enterprise search service for you. TIS includes three components: - offline index building platform The data is exported from ER database( mysql, sqlserver and so on) through full table scanning, and then the wide table is constructed by local MR tool, or the wide table is constructed directly by spark - incremental real-time channel It is transmitted to Kafka , and real-time stream calculation is carried out by Flink and submitted to search engine to ensure that the data in search engine and database are consistent in near real time - search engine currently,based on Solr8 TIS integrate these components seamlessly and bring users one-stop, out of the box experience. ## My question I want to feed back my code to the community, but TIS focuses on Enterprise Application Search, just as elasitc search focuses on visual analysis of time series data. Because Solr is a general search product, **I don't think TIS can be merged directly into Solr. Is it possible for TIS to be a new incubation project under Apache?** ## TIS main Features - The schema and solrconfig storage are separated from ZK and stored in MySQL. The version management function is provided. Users can roll back to the historical version of the configuration. - Define wide table rules based on the selected data table - The offline index building component is provided. Outside the collection, the data is built into Lucene segment file. Then, the segment file is returned to the local disk where solrcore is located. The new index of reload solrcore takes effect -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
zacharymorn commented on a change in pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r523355315 ## File path: gradle/native/disable-native.gradle ## @@ -17,20 +17,65 @@ // This is the master switch to disable all tasks that compile // native (cpp) code. -def buildNative = propertyOrDefault("build.native", true).toBoolean() +rootProject.ext { + buildNative = propertyOrDefault("build.native", true).toBoolean() +} + +// Explicitly list all projects that should be configured for native extensions. +// We could scan for projects with a the cpp-library plugin but this is faster. +def nativeProjects = allprojects.findAll {it.path in [ +":lucene:misc:native" +]} + +def javaProjectsWithNativeDeps = allprojects.findAll {it.path in [ +":lucene:misc" +]} + +// Set up defaults for projects with native dependencies. +configure(javaProjectsWithNativeDeps, { + configurations { Review comment: This configuration block seems very auto-magical to me in that it somehow gets linked to `:lucene:misc:native/build` folder (no reference to `nativeProjects` here), and the `copyNativeDeps` task below copies only the needed library artifact without all the nested folder structure (nor does it seems to copy any other random file I created in `:lucene:misc:native/build` folder to test thing out). Is this some convention trigged by the two attribute settings below? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
zacharymorn commented on pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-727123057 > No worries. I've pushed the code the way I think it should work - please take a look, let me know if you don't understand something. I tested on Windows and Linux, runs fine. The 'tests.native' flag is set automatically depending on 'build.native' but is there separately just in case somebody wished to manually enable those tests from IDE level. Wow these look pretty advanced! Pretty sure I can't come up with them myself (and I do have a question that I can't seems to find the answer to readily online). I also like that these tests can be run from dev's local environment as well (and they run fine on my mac) compared to the @Nightly annotation approach. Thanks Dawid! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dxl360 commented on a change in pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled
dxl360 commented on a change in pull request #2080: URL: https://github.com/apache/lucene-solr/pull/2080#discussion_r523290307 ## File path: lucene/core/src/test/org/apache/lucene/index/TestCustomTermFreq.java ## @@ -458,4 +458,50 @@ public void testFieldInvertState() throws Exception { IOUtils.close(w, dir); } + + // LUCENE-8947: Indexing fails with "too many tokens for field" when using custom term frequencies Review comment: Yeah that is a more precise description. I just added it to the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14928) Remove Overseer ClusterStateUpdater
[ https://issues.apache.org/jira/browse/SOLR-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231876#comment-17231876 ] Ilan Ginzburg commented on SOLR-14928: -- A new [commit|https://github.com/murblanc/lucene-solr/commit/4d96bcdbe0278c53a5c9283fe62af49715522e81] allows running comparison tests (by toggling {{StateChangeRecorder.USE_DISTRIBUTED_STATE_CHANGE}}) to see the time it takes to run the create collection command (excluding actual replica creation on the nodes where they should go!). Initial (JMeter based) comparison between state update directly to ZK and via Overseer seems promising (but the collection creation work done by {{CreateCollectionCmd}} is the low hanging fruit expected to be faster, so no major win here, but at least no fatal blow to this effort). Note the Collection API is broken by this commit (so don't try deleting collections or anything with it). > Remove Overseer ClusterStateUpdater > --- > > Key: SOLR-14928 > URL: https://issues.apache.org/jira/browse/SOLR-14928 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Ilan Ginzburg >Assignee: Ilan Ginzburg >Priority: Major > Labels: cluster, collection-api, overseer > > Remove the Overseer {{ClusterStateUpdater}} thread and associated Zookeeper > queue at {{<_chroot_>/overseer/queue}}. > Change cluster state updates so that each (Collection API) command execution > does the update directly in Zookeeper using optimistic locking (Compare and > Swap on the {{state.json}} Zookeeper files). > Following this change cluster state updates would still be happening only > from the Overseer node (that's where Collection API commands are executing), > but the code will be ready for distribution once such commands can be > executed by any node (other work done in the context of parent task > SOLR-14927). > See the [Cluster State > Updater|https://docs.google.com/document/d/1u4QHsIHuIxlglIW6hekYlXGNOP0HjLGVX5N6inkj6Ok/edit#heading=h.ymtfm3p518c] > section in the Removing Overseer doc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled
mikemccand commented on a change in pull request #2080: URL: https://github.com/apache/lucene-solr/pull/2080#discussion_r523259786 ## File path: lucene/core/src/test/org/apache/lucene/index/TestCustomTermFreq.java ## @@ -458,4 +458,50 @@ public void testFieldInvertState() throws Exception { IOUtils.close(w, dir); } + + // LUCENE-8947: Indexing fails with "too many tokens for field" when using custom term frequencies Review comment: Maybe add `when using large enough custom term frequencies to overflow int on accumulation`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8947) Indexing fails with "too many tokens for field" when using custom term frequencies
[ https://issues.apache.org/jira/browse/LUCENE-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231851#comment-17231851 ] Michael McCandless commented on LUCENE-8947: Thanks [~dxl360], I'll look! > Indexing fails with "too many tokens for field" when using custom term > frequencies > -- > > Key: LUCENE-8947 > URL: https://issues.apache.org/jira/browse/LUCENE-8947 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.5 >Reporter: Michael McCandless >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We are using custom term frequencies (LUCENE-7854) to index per-token scoring > signals, however for one document that had many tokens and those tokens had > fairly large (~998,000) scoring signals, we hit this exception: > {noformat} > 2019-08-05T21:32:37,048 [ERROR] (LuceneIndexing-3-thread-3) > com.amazon.lucene.index.IndexGCRDocument: Failed to index doc: > java.lang.IllegalArgumentException: too many tokens for field "foobar" > at > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:825) > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430) > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > {noformat} > This is happening in this code in {{DefaultIndexingChain.java}}: > {noformat} > try { > invertState.length = Math.addExact(invertState.length, > invertState.termFreqAttribute.getTermFrequency()); > } catch (ArithmeticException ae) { > throw new IllegalArgumentException("too many tokens for field \"" + > field.name() + "\""); > }{noformat} > Where Lucene is accumulating the total length (number of tokens) for the > field. But total length doesn't really make sense if you are using custom > term frequencies to hold arbitrary scoring signals? Or, maybe it does make > sense, if user is using this as simple boosting, but maybe we should allow > this length to be a {{long}}? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8947) Indexing fails with "too many tokens for field" when using custom term frequencies
[ https://issues.apache.org/jira/browse/LUCENE-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231847#comment-17231847 ] Duan Li commented on LUCENE-8947: - I open a PR to fix this issue https://github.com/apache/lucene-solr/pull/2080. > Indexing fails with "too many tokens for field" when using custom term > frequencies > -- > > Key: LUCENE-8947 > URL: https://issues.apache.org/jira/browse/LUCENE-8947 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.5 >Reporter: Michael McCandless >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We are using custom term frequencies (LUCENE-7854) to index per-token scoring > signals, however for one document that had many tokens and those tokens had > fairly large (~998,000) scoring signals, we hit this exception: > {noformat} > 2019-08-05T21:32:37,048 [ERROR] (LuceneIndexing-3-thread-3) > com.amazon.lucene.index.IndexGCRDocument: Failed to index doc: > java.lang.IllegalArgumentException: too many tokens for field "foobar" > at > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:825) > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430) > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297) > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450) > at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291) > at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264) > {noformat} > This is happening in this code in {{DefaultIndexingChain.java}}: > {noformat} > try { > invertState.length = Math.addExact(invertState.length, > invertState.termFreqAttribute.getTermFrequency()); > } catch (ArithmeticException ae) { > throw new IllegalArgumentException("too many tokens for field \"" + > field.name() + "\""); > }{noformat} > Where Lucene is accumulating the total length (number of tokens) for the > field. But total length doesn't really make sense if you are using custom > term frequencies to hold arbitrary scoring signals? Or, maybe it does make > sense, if user is using this as simple boosting, but maybe we should allow > this length to be a {{long}}? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2010: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead
muse-dev[bot] commented on a change in pull request #2010: URL: https://github.com/apache/lucene-solr/pull/2010#discussion_r523248650 ## File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java ## @@ -344,13 +344,8 @@ final private void doReplicateOnlyRecovery(SolrCore core) throws InterruptedExce // though try { CloudDescriptor cloudDesc = this.coreDescriptor.getCloudDescriptor(); -ZkNodeProps leaderprops = zkStateReader.getLeaderRetry( -cloudDesc.getCollectionName(), cloudDesc.getShardId()); -final String leaderBaseUrl = leaderprops.getStr(ZkStateReader.BASE_URL_PROP); -final String leaderCoreName = leaderprops.getStr(ZkStateReader.CORE_NAME_PROP); - -String leaderUrl = ZkCoreNodeProps.getCoreUrl(leaderBaseUrl, leaderCoreName); - +ZkNodeProps leaderprops = zkStateReader.getLeaderRetry(cloudDesc.getCollectionName(), cloudDesc.getShardId()); +String leaderUrl = ZkCoreNodeProps.getCoreUrl(leaderprops); String ourUrl = ZkCoreNodeProps.getCoreUrl(baseUrl, coreName); boolean isLeader = leaderUrl.equals(ourUrl); // TODO: We can probably delete most of this code if we say this Review comment: *NULL_DEREFERENCE:* object `leaderUrl` last assigned on line 348 could be null and is dereferenced at line 351. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dxl360 opened a new pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled
dxl360 opened a new pull request #2080: URL: https://github.com/apache/lucene-solr/pull/2080 # Description Lucene accumulates the total length when indexing the field. But when we use custom term frequencies to hold arbitrary scoring signals, Lucene will run into integer overflow error during accumulation if the scoring signals and the number of tokens are too large. This PR aims to fix this issue https://issues.apache.org/jira/browse/LUCENE-8947 # Solution Skip the field length accumulation when norms is disabled. # Tests The test tries to index a field with extremely large custom term frequency - Successfully index the field that omits norms - Expect to trigger the indexing error when indexing the same field with norms disabled # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14997) Admin UI shows only the host name (not even port) in the graph view.
[ https://issues.apache.org/jira/browse/SOLR-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-14997. --- Resolution: Invalid Oh, bother. It turns out that the port name _is_ shown _if_ you have more than one JVM running, I didn't think to check that first. So never mind and sorry for the noise. > Admin UI shows only the host name (not even port) in the graph view. > > > Key: SOLR-14997 > URL: https://issues.apache.org/jira/browse/SOLR-14997 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0) >Reporter: Erick Erickson >Priority: Major > Attachments: Screen Shot 2020-11-13 at 7.51.28 AM.png > > > Didn't check 8x. > The graph view just shows "localhost (N)" for each replica, see attached > screenshot. It should at least show the port. > Showing the port is important I think, when I have multiple JVMs on the same > machine, seeing all the replicas in a particular JVM have a problem at a > glance is very useful. > I don't have any strong feelings about showing the full replica name. > What do people think about showing the full node name? People often put > important information in the node name relative to their organization, it'd > also help people understand what goes into, say, the createNodeSet. So the > equivalent to my screenshot would be "localhost:8981_solr" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14998) any Collections Handler actions should be logged at debug level
[ https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nazerke Seidan updated SOLR-14998: -- Summary: any Collections Handler actions should be logged at debug level (was: CLUSTERSTATUS info level logging is redundant in CollectionsHandler ) > any Collections Handler actions should be logged at debug level > --- > > Key: SOLR-14998 > URL: https://issues.apache.org/jira/browse/SOLR-14998 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Nazerke Seidan >Priority: Minor > > CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster > status is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO > level should be set to DEBUG to avoid a lot of noise. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14998) CLUSTERSTATUS info level logging is redundant in CollectionsHandler
[ https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231708#comment-17231708 ] Nazerke Seidan commented on SOLR-14998: --- {{logs from CollectionsHandler: Invoked Collection Action :clusterstatus with params action=CLUSTERSTATUS=javabin=2 and sendToOCPQueue=true}} {{logs from HttpSolrCall:[admin] webapp=null path=/admin/collections params=\{action=CLUSTERSTATUS=javabin=2} status=0 QTime=1}} {{From the logs I see only action=CLUSTERSTATUS but it should be any Collections handler actions }} > CLUSTERSTATUS info level logging is redundant in CollectionsHandler > - > > Key: SOLR-14998 > URL: https://issues.apache.org/jira/browse/SOLR-14998 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Nazerke Seidan >Priority: Minor > > CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster > status is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO > level should be set to DEBUG to avoid a lot of noise. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude commented on a change in pull request #2010: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead
thelabdude commented on a change in pull request #2010: URL: https://github.com/apache/lucene-solr/pull/2010#discussion_r523117430 ## File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java ## @@ -1401,8 +1420,7 @@ public ZkCoreNodeProps getLeaderProps(final String collection, byte[] data = zkClient.getData( ZkStateReader.getShardLeadersPath(collection, slice), null, null, true); -ZkCoreNodeProps leaderProps = new ZkCoreNodeProps( -ZkNodeProps.load(data)); +ZkCoreNodeProps leaderProps = new ZkCoreNodeProps(ZkNodeProps.load(data)); Review comment: that'll teach me to fix weird whitespacing! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on pull request #1571: SOLR-14560: Interleaving for Learning To Rank
alessandrobenedetti commented on pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#issuecomment-726896901 Hi @cpoerschke I just pushed all your commits, I was happy with all of them :) If there's no other observation I would proceed committing next week ( it will be my first direct commit, so I'll need to take a look to official guidelines and target for merging). I assume we merge to master squashing and then cherry-pick the commit to some other branches? Do you think we need to target a major release? Or we could add it in the upcoming minors? Have a nice weekend and thank you for your help, it has been greatly appreciated! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
alessandrobenedetti commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r523103150 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -271,17 +340,24 @@ public void transform(SolrDocument doc, int docid) private void implTransform(SolrDocument doc, int docid, Float score) throws IOException { - Object fv = featureLogger.getFeatureVector(docid, scoringQuery, searcher); - if (fv == null) { // FV for this document was not in the cache -fv = featureLogger.makeFeatureVector( -LTRRescorer.extractFeaturesInfo( -modelWeight, -docid, -(docsWereNotReranked ? score : null), -leafContexts)); + LTRScoringQuery rerankingQuery = rerankingQueries[0]; + LTRScoringQuery.ModelWeight rerankingModelWeight = modelWeights[0]; + if (rerankingQueries.length > 1 && ((LTRInterleavingScoringQuery)rerankingQueries[1]).getPickedInterleavingDocIds().contains(docid)) { Review comment: Perfectly splendid! I agree! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
alessandrobenedetti commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r523102857 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.length == 0); + String transformerFeatureStore = SolrQueryRequestContextUtils.getFvStoreName(req); + Map transformerExternalFeatureInfo = LTRQParserPlugin.extractEFIParams(localparams); -final FeatureStore store = fr.getFeatureStore(featureStoreName); -featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - -try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); + initLoggingModel(transformerFeatureStore); + setupRerankingQueriesForLogging(rerankingQueriesFromContext, transformerFeatureStore, transformerExternalFeatureInfo); + setupRerankingWeightsForLogging(context); +} + +private boolean isModelMatchingFeatureStore(String featureStoreName, LTRScoringModel model) { + return model != null && featureStoreName.equals(model.getFeatureStoreName()); +} - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features +/** + * The loggingModel is an empty model that is just used to extract the features + * and log them + * @param transformerFeatureStore the explicit transformer feature store + */ +private void initLoggingModel(String transformerFeatureStore) { + if (transformerFeatureStore == null || !isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) { +// if store is set in the transformer we should overwrite the logger +final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); -}catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); -} - } +final FeatureStore store = fr.getFeatureStore(transformerFeatureStore); +transformerFeatureStore = store.getName(); // if featureStoreName was null before this gets actual name - if (scoringQuery.getOriginalQuery() == null) { -scoringQuery.setOriginalQuery(context.getQuery()); +loggingModel = new LoggingModel(loggingModelName, +transformerFeatureStore, store.getFeatures()); } - if (scoringQuery.getFeatureLogger() == null){ -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); +} - try { -modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 1f); - } catch (final IOException e) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.getMessage(), e); +/** + * When preparing the reranking queries for logging features various scenarios apply: + * + * No Reranking + * There is the need of a logger model from the default feature store/ the explicit feature store passed + * to extract the feature vector + * + * Re Ranking + * 1) If no explicit feature store is passed, the models for each reranking query can be safely re-used + * the feature vector can be fetched from the feature vector cache. + * 2) If an explicit feature store is passed, and no reranking query uses a model from that featureStore, + * There is the need of a logger model to extract the feature vector + * 3) If an explicit feature store is passed, and there is a reranking query that uses a model from that
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
alessandrobenedetti commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r523102600 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.length == 0); + String transformerFeatureStore = SolrQueryRequestContextUtils.getFvStoreName(req); + Map transformerExternalFeatureInfo = LTRQParserPlugin.extractEFIParams(localparams); -final FeatureStore store = fr.getFeatureStore(featureStoreName); -featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - -try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); + initLoggingModel(transformerFeatureStore); + setupRerankingQueriesForLogging(rerankingQueriesFromContext, transformerFeatureStore, transformerExternalFeatureInfo); + setupRerankingWeightsForLogging(context); +} + +private boolean isModelMatchingFeatureStore(String featureStoreName, LTRScoringModel model) { + return model != null && featureStoreName.equals(model.getFeatureStoreName()); +} - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features +/** + * The loggingModel is an empty model that is just used to extract the features + * and log them + * @param transformerFeatureStore the explicit transformer feature store + */ +private void initLoggingModel(String transformerFeatureStore) { + if (transformerFeatureStore == null || !isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) { +// if store is set in the transformer we should overwrite the logger +final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); -}catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); -} - } +final FeatureStore store = fr.getFeatureStore(transformerFeatureStore); +transformerFeatureStore = store.getName(); // if featureStoreName was null before this gets actual name - if (scoringQuery.getOriginalQuery() == null) { -scoringQuery.setOriginalQuery(context.getQuery()); +loggingModel = new LoggingModel(loggingModelName, +transformerFeatureStore, store.getFeatures()); } - if (scoringQuery.getFeatureLogger() == null){ -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); +} - try { -modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 1f); - } catch (final IOException e) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.getMessage(), e); +/** + * When preparing the reranking queries for logging features various scenarios apply: + * + * No Reranking + * There is the need of a logger model from the default feature store/ the explicit feature store passed + * to extract the feature vector + * + * Re Ranking + * 1) If no explicit feature store is passed, the models for each reranking query can be safely re-used + * the feature vector can be fetched from the feature vector cache. + * 2) If an explicit feature store is passed, and no reranking query uses a model from that featureStore, + * There is the need of a logger model to extract the feature vector + * 3) If an explicit feature store is passed, and there is a reranking query that uses a model from that
[jira] [Created] (SOLR-14999) Add built-in option to advertise Solr with a different port than Jetty listens on.
Houston Putman created SOLR-14999: - Summary: Add built-in option to advertise Solr with a different port than Jetty listens on. Key: SOLR-14999 URL: https://issues.apache.org/jira/browse/SOLR-14999 Project: Solr Issue Type: Improvement Reporter: Houston Putman Assignee: Houston Putman Currently the default settings in {{solr.xml}} allow the specification of one port, {{jetty.port}} which the bin/solr script provides from the {{SOLR_PORT}} environment variable. This port is used twice. Jetty uses it to listen for requests, and the clusterState uses the port to advertise the address of the Solr Node. In cloud environments, it's sometimes crucial to be able to listen on one port and advertise yourself as listening on another. This is because there is a proxy that listens on the advertised port, and forwards the request to the server which is listening to the jetty port. Solr already supports having a separate Jetty port and Live Nodes port (examples provided in the dev-list discussion linked below). I suggest that we add this to the default solr config so that users can use the default solr.xml in cloud configurations, and the solr/bin script will enable easy use of this feature. There has been [discussion on this exact problem|https://mail-archives.apache.org/mod_mbox/lucene-dev/201910.mbox/%3CCABEwPvGFEggt9Htn%3DA5%3DtoawuimSJ%2BZcz0FvsaYod7v%2B4wHKog%40mail.gmail.com%3E] on the dev list already. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
alessandrobenedetti commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r523074662 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -208,55 +216,116 @@ public void setContext(ResultContext context) { if (threadManager != null) { threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); } - - // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); + LTRScoringQuery[] rerankingQueriesFromContext = SolrQueryRequestContextUtils.getScoringQueries(req); + docsWereNotReranked = (rerankingQueriesFromContext == null || rerankingQueriesFromContext.length == 0); + String transformerFeatureStore = SolrQueryRequestContextUtils.getFvStoreName(req); + Map transformerExternalFeatureInfo = LTRQParserPlugin.extractEFIParams(localparams); -final FeatureStore store = fr.getFeatureStore(featureStoreName); -featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - -try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); + initLoggingModel(transformerFeatureStore); + setupRerankingQueriesForLogging(rerankingQueriesFromContext, transformerFeatureStore, transformerExternalFeatureInfo); + setupRerankingWeightsForLogging(context); +} + +private boolean isModelMatchingFeatureStore(String featureStoreName, LTRScoringModel model) { + return model != null && featureStoreName.equals(model.getFeatureStoreName()); +} - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features +/** + * The loggingModel is an empty model that is just used to extract the features + * and log them + * @param transformerFeatureStore the explicit transformer feature store + */ +private void initLoggingModel(String transformerFeatureStore) { + if (transformerFeatureStore == null || !isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) { +// if store is set in the transformer we should overwrite the logger +final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); -}catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); -} - } +final FeatureStore store = fr.getFeatureStore(transformerFeatureStore); +transformerFeatureStore = store.getName(); // if featureStoreName was null before this gets actual name - if (scoringQuery.getOriginalQuery() == null) { -scoringQuery.setOriginalQuery(context.getQuery()); +loggingModel = new LoggingModel(loggingModelName, +transformerFeatureStore, store.getFeatures()); } - if (scoringQuery.getFeatureLogger() == null){ -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); +} - try { -modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 1f); - } catch (final IOException e) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.getMessage(), e); +/** + * When preparing the reranking queries for logging features various scenarios apply: + * + * No Reranking + * There is the need of a logger model from the default feature store/ the explicit feature store passed + * to extract the feature vector + * + * Re Ranking + * 1) If no explicit feature store is passed, the models for each reranking query can be safely re-used + * the feature vector can be fetched from the feature vector cache. + * 2) If an explicit feature store is passed, and no reranking query uses a model from that featureStore, + * There is the need of a logger model to extract the feature vector + * 3) If an explicit feature store is passed, and there is a reranking query that uses a model from that
[jira] [Commented] (SOLR-14997) Admin UI shows only the host name (not even port) in the graph view.
[ https://issues.apache.org/jira/browse/SOLR-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231616#comment-17231616 ] Cassandra Targett commented on SOLR-14997: -- I'm using 8.7 right now and it shows the port, so it's lack is either master or whatever other branch you're using. The full node name might be interesting in another place, but I think for some implementations the graph will get even harder to read if it appears here (because it will be really long). The "Nodes" screen already shows the host and node name, which seems sufficient to me. I'm not sure anyone would really make a connection between the thing that appears on the Graph view and the createNodeSet parameter just by showing it on that screen if they aren't already making the connection from the Nodes screen. > Admin UI shows only the host name (not even port) in the graph view. > > > Key: SOLR-14997 > URL: https://issues.apache.org/jira/browse/SOLR-14997 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0) >Reporter: Erick Erickson >Priority: Major > Attachments: Screen Shot 2020-11-13 at 7.51.28 AM.png > > > Didn't check 8x. > The graph view just shows "localhost (N)" for each replica, see attached > screenshot. It should at least show the port. > Showing the port is important I think, when I have multiple JVMs on the same > machine, seeing all the replicas in a particular JVM have a problem at a > glance is very useful. > I don't have any strong feelings about showing the full replica name. > What do people think about showing the full node name? People often put > important information in the node name relative to their organization, it'd > also help people understand what goes into, say, the createNodeSet. So the > equivalent to my screenshot would be "localhost:8981_solr" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14998) CLUSTERSTATUS info level logging is redundant in CollectionsHandler
[ https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231618#comment-17231618 ] David Smiley commented on SOLR-14998: - So it's super-clear, can you comment an example of what both logs look like? Is this specific to CLUSTERSTATUS (you put it in the title) or is it *any* Collections handler actions? > CLUSTERSTATUS info level logging is redundant in CollectionsHandler > - > > Key: SOLR-14998 > URL: https://issues.apache.org/jira/browse/SOLR-14998 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Nazerke Seidan >Priority: Minor > > CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster > status is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO > level should be set to DEBUG to avoid a lot of noise. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14998) CLUSTERSTATUS info level logging is redundant in CollectionsHandler
Nazerke Seidan created SOLR-14998: - Summary: CLUSTERSTATUS info level logging is redundant in CollectionsHandler Key: SOLR-14998 URL: https://issues.apache.org/jira/browse/SOLR-14998 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Nazerke Seidan CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster status is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO level should be set to DEBUG to avoid a lot of noise. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #2047: LUCENE-9592: Use doubles in VectorUtil to maintain precision.
msokolov commented on a change in pull request #2047: URL: https://github.com/apache/lucene-solr/pull/2047#discussion_r523001646 ## File path: lucene/core/src/java/org/apache/lucene/util/VectorUtil.java ## @@ -25,47 +25,22 @@ private VectorUtil() { } - public static float dotProduct(float[] a, float[] b) { -float res = 0f; -/* - * If length of vector is larger than 8, we use unrolled dot product to accelerate the - * calculation. - */ -int i; -for (i = 0; i < a.length % 8; i++) { - res += b[i] * a[i]; -} -if (a.length < 8) { - return res; -} -float s0 = 0f; -float s1 = 0f; -float s2 = 0f; -float s3 = 0f; -float s4 = 0f; -float s5 = 0f; -float s6 = 0f; -float s7 = 0f; -for (; i + 7 < a.length; i += 8) { - s0 += b[i] * a[i]; - s1 += b[i + 1] * a[i + 1]; - s2 += b[i + 2] * a[i + 2]; - s3 += b[i + 3] * a[i + 3]; - s4 += b[i + 4] * a[i + 4]; - s5 += b[i + 5] * a[i + 5]; - s6 += b[i + 6] * a[i + 6]; - s7 += b[i + 7] * a[i + 7]; + public static double dotProduct(float[] a, float[] b) { Review comment: As an alternative, we could also consider changing the test to have a larger epsilon. I found that with the current 1e-5, I got one failure in 100 runs. Changing to 1e-4, I ran 1000 iterations with no failures. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9582) Rename VectorValues.ScoreFunction to SearchStrategy
[ https://issues.apache.org/jira/browse/LUCENE-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sokolov resolved LUCENE-9582. - Resolution: Fixed > Rename VectorValues.ScoreFunction to SearchStrategy > > > Key: LUCENE-9582 > URL: https://issues.apache.org/jira/browse/LUCENE-9582 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > This is an issue to apply some of the feedback from LUCENE-9322 that came > after it was pushed; we want to: > 1. rename VectorValues.ScoreFunction -> SearchStrategy (and all of the > references to that terminology), and make it a simple enum with no > implementation > 2. rename the strategies to indicate the ANN implementation that backs them, > so we can represent more than one such implementation/algorithm. > 3. Move scoring implementation to a utility class > I'll open a separate issue for exploring how to hide the > VectorValues.RandomAccess API, which is probably specific to HNSW > FYI [~jtibshirani] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12741) Nuke rule based replica placement strategy in Lucene/Solr 8.0
[ https://issues.apache.org/jira/browse/SOLR-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231524#comment-17231524 ] Erick Erickson commented on SOLR-12741: --- [~shalin] and maybe [~noble] Any updates here? > Nuke rule based replica placement strategy in Lucene/Solr 8.0 > - > > Key: SOLR-12741 > URL: https://issues.apache.org/jira/browse/SOLR-12741 > Project: Solr > Issue Type: Task > Components: AutoScaling, SolrCloud >Reporter: Shalin Shekhar Mangar >Priority: Blocker > > Once SOLR-12740 is done, we should nuke all code related to rule based > replica placement strategy in Solr 8.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231500#comment-17231500 ] Uwe Schindler commented on LUCENE-9608: --- Yes. On ASF and Policeman Jenkins. Unfortunately the branch is quite old and does not even gave a fully functional Gradle build. It should really be merged soon. Uwe > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?
[ https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231499#comment-17231499 ] Michael Sokolov commented on LUCENE-9583: - bq. Perhaps we could revisit this issue once the first ANN implementation is completed? [~jtibshirani] that makes sense. We can leave this open, even though the attached PR was pushed. I just pushed LUCENE-9004 as well, implementing NSW graph indexing, so that should give us a more concrete basis for comparison. I have been testing performance (recall/latency) using a KnnGraphTester class that is part of that. However one challenge is coming up with a test dataset we can share. I have been using some proprietary embeddings, getting good results, and just started looking into testing with GloVe, and got not-so-good results there. I am concerned that GloVe may have some strong clustering and require us to implement the diversity heuristic from the HNSW paper. > How should we expose VectorValues.RandomAccess? > --- > > Key: LUCENE-9583 > URL: https://issues.apache.org/jira/browse/LUCENE-9583 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} > sub-interface. [~jtibshirani] pointed out this is not needed by some > vector-indexing strategies which can operate solely using a forward-iterator > (it is needed by HNSW), and so in the interest of simplifying the public API > we should not expose this internal detail (which by the way surfaces internal > ordinals that are somewhat uninteresting outside the random access API). > I looked into how to move this inside the HNSW-specific code and remembered > that we do also currently make use of the RA API when merging vector fields > over sorted indexes. Without it, we would need to load all vectors into RAM > while flushing/merging, as we currently do in > {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost > for the simpler API. > Another thing I noticed while reviewing this is that I moved the KNN > {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} > to {{VectorValues.RandomAccess}}. This I think we could move back, and > handle the HNSW requirements for search elsewhere. I wonder if that would > alleviate the major concern here? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231498#comment-17231498 ] Dawid Weiss commented on LUCENE-9608: - All cloud2refimpl builds are from that branch, I believe. > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231494#comment-17231494 ] Michael McCandless commented on LUCENE-9608: Ahh, thanks for digging into this so quickly [~dweiss]! {quote}The build you mentioned, Mike, comes from Mark's Solr branch - I think this patch has not been applied there, I don't know. This works on master just fine. {quote} I had not realized it was Mark's Solr branch – thanks for the explanation. > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231490#comment-17231490 ] ASF subversion and git services commented on LUCENE-9004: - Commit b36b4af22bb76dc42b466b818b417bcbc0deb006 in lucene-solr's branch refs/heads/master from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b36b4af ] LUCENE-9004: KNN vector search using NSW graphs (#2022) > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Assignee: Michael Sokolov >Priority: Major > Fix For: master (9.0) > > Attachments: hnsw_layered_graph.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph from scratch when merging. The > process is going to be limited, at least initially, to graphs that can fit > in RAM since we require random access to the entire graph while constructing > it: In order to add links bidirectionally we must continually update existing > documents. > I think we want to express this API to users as a single joint > {{KnnGraphField}} abstraction that joins together the vectors and the graph > as a single joint field type. Mostly it just looks like a vector-valued > field, but has this graph attached to it. > I'll push a branch with my POC and would love to hear comments. It has many > nocommits, basic design is not really set, there is no Query implementation > and no integration iwth IndexSearcher, but it does work by some measure using > a standalone test class. I've tested with uniform random vectors and on my > laptop indexed 10K documents in around 10 seconds and searched them at 95% > recall (compared with exact
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231491#comment-17231491 ] ASF subversion and git services commented on LUCENE-9004: - Commit 03c1910bff2f94d7a733a9688aa15d3282718040 in lucene-solr's branch refs/heads/master from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=03c1910 ] LUCENE-9004: CHANGES.txt entry > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Assignee: Michael Sokolov >Priority: Major > Fix For: master (9.0) > > Attachments: hnsw_layered_graph.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph from scratch when merging. The > process is going to be limited, at least initially, to graphs that can fit > in RAM since we require random access to the entire graph while constructing > it: In order to add links bidirectionally we must continually update existing > documents. > I think we want to express this API to users as a single joint > {{KnnGraphField}} abstraction that joins together the vectors and the graph > as a single joint field type. Mostly it just looks like a vector-valued > field, but has this graph attached to it. > I'll push a branch with my POC and would love to hear comments. It has many > nocommits, basic design is not really set, there is no Query implementation > and no integration iwth IndexSearcher, but it does work by some measure using > a standalone test class. I've tested with uniform random vectors and on my > laptop indexed 10K documents in around 10 seconds and searched them at 95% > recall (compared with exact nearest-neighbor baseline)
[jira] [Resolved] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sokolov resolved LUCENE-9004. - Fix Version/s: master (9.0) Assignee: Michael Sokolov Resolution: Fixed > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Assignee: Michael Sokolov >Priority: Major > Fix For: master (9.0) > > Attachments: hnsw_layered_graph.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph from scratch when merging. The > process is going to be limited, at least initially, to graphs that can fit > in RAM since we require random access to the entire graph while constructing > it: In order to add links bidirectionally we must continually update existing > documents. > I think we want to express this API to users as a single joint > {{KnnGraphField}} abstraction that joins together the vectors and the graph > as a single joint field type. Mostly it just looks like a vector-valued > field, but has this graph attached to it. > I'll push a branch with my POC and would love to hear comments. It has many > nocommits, basic design is not really set, there is no Query implementation > and no integration iwth IndexSearcher, but it does work by some measure using > a standalone test class. I've tested with uniform random vectors and on my > laptop indexed 10K documents in around 10 seconds and searched them at 95% > recall (compared with exact nearest-neighbor baseline) at around 250 QPS. I > haven't made any attempt to use multithreaded search for this, but it is > amenable to per-segment concurrency. > [1] >
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231484#comment-17231484 ] ASF subversion and git services commented on LUCENE-9004: - Commit b36b4af22bb76dc42b466b818b417bcbc0deb006 in lucene-solr's branch refs/heads/master from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b36b4af ] LUCENE-9004: KNN vector search using NSW graphs (#2022) > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Assignee: Michael Sokolov >Priority: Major > Fix For: master (9.0) > > Attachments: hnsw_layered_graph.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph from scratch when merging. The > process is going to be limited, at least initially, to graphs that can fit > in RAM since we require random access to the entire graph while constructing > it: In order to add links bidirectionally we must continually update existing > documents. > I think we want to express this API to users as a single joint > {{KnnGraphField}} abstraction that joins together the vectors and the graph > as a single joint field type. Mostly it just looks like a vector-valued > field, but has this graph attached to it. > I'll push a branch with my POC and would love to hear comments. It has many > nocommits, basic design is not really set, there is no Query implementation > and no integration iwth IndexSearcher, but it does work by some measure using > a standalone test class. I've tested with uniform random vectors and on my > laptop indexed 10K documents in around 10 seconds and searched them at 95% > recall (compared with exact
[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231481#comment-17231481 ] Dawid Weiss commented on LUCENE-9608: - The build you mentioned, Mike, comes from Mark's Solr branch - I think this patch has not been applied there, I don't know. This works on master just fine. > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9608. - Resolution: Works for Me This works on master. > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231479#comment-17231479 ] Michael Sokolov commented on LUCENE-9004: - I pushed the attached CR, and will close this issue. There are lots of followups needed for things like: improving (reducing) heap usage during graph construction, adding a Query implementation, exposing index hyperparameters, benchmarks, testing on public datasets, implementing a diversity heuristic for neighbor selection during graph construction, making the graph hierarchical, exploring more efficient search across multiple per-segment graphs, etc. I will open issues for the most immediate things that are clearly needed. > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Attachments: hnsw_layered_graph.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph from scratch when merging. The > process is going to be limited, at least initially, to graphs that can fit > in RAM since we require random access to the entire graph while constructing > it: In order to add links bidirectionally we must continually update existing > documents. > I think we want to express this API to users as a single joint > {{KnnGraphField}} abstraction that joins together the vectors and the graph > as a single joint field type. Mostly it just looks like a vector-valued > field, but has this graph attached to it. > I'll push a branch with my POC and would love to hear comments. It has many > nocommits, basic design is not really set, there is no Query implementation > and no integration iwth IndexSearcher, but it does work by some measure using
[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231472#comment-17231472 ] Michael McCandless commented on LUCENE-9378: Thanks [~jpountz]! > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Major > Fix For: 8.8 > > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, > hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231469#comment-17231469 ] Dawid Weiss commented on LUCENE-9608: - Linux: {code} ERROR: The following test(s) have failed: - org.apache.lucene.util.TestPleaseFail.testFail (:lucene:test-framework) Test output: /home/dweiss/work/lucene-solr/lucene/test-framework/build/test-results/test/outputs/OUTPUT-org.apache.lucene.util.TestPleaseFail.txt Reproduce with: gradlew :lucene:test-framework:test --tests "org.apache.lucene.util.TestPleaseFail.testFail" -Ptests.jvms=4 "-Ptests.jvmargs=-XX:+UseSerialGC -Dplease.fail=true" -Ptests.seed=34EFD9D3E6994AEF -Ptests.file.encoding=ISO-8859-1 {code} > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #2022: LUCENE-9004: KNN vector search using NSW graphs
msokolov merged pull request #2022: URL: https://github.com/apache/lucene-solr/pull/2022 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231467#comment-17231467 ] Dawid Weiss commented on LUCENE-9608: - I've just added a class that can be hand-triggered to cause an error and it works for me just fine: {code} gradlew :lucene:test-framework:test --tests "TestPleaseFail" "-Ptests.jvmargs=-XX:+UseSerialGC -Dplease.fail=true" {code} results in (note the quotes): {code} ERROR: The following test(s) have failed: - org.apache.lucene.util.TestPleaseFail.testFail (:lucene:test-framework) Test output: C:\Work\apache\lucene.master\lucene\test-framework\build\test-results\test\outputs\OUTPUT-org.apache.lucene.util.TestPleaseFail.txt Reproduce with: gradlew :lucene:test-framework:test --tests "org.apache.lucene.util.TestPleaseFail.testFail" -Ptests.jvms=12 "-Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC -Dplease.fail=true" -Ptests.seed=9845E6C55FCBCABD -Ptests.file.encoding=UTF-8 {code} > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231464#comment-17231464 ] ASF subversion and git services commented on LUCENE-9608: - Commit 80a0154d572596d1e2a8af41c828d08332bb77f0 in lucene-solr's branch refs/heads/master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=80a0154 ] LUCENE-9608: add a hand-triggered test error class. > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14976) New collection is not getting created.
[ https://issues.apache.org/jira/browse/SOLR-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-14976. Resolution: Invalid Hi Prince, Please bring this up on the mailing list. JIRA is typically reserved for known concrete issues. "Support portal" questions and requests for help are better on the solr-user mailing list, where there are a lot more eyes. > New collection is not getting created. > -- > > Key: SOLR-14976 > URL: https://issues.apache.org/jira/browse/SOLR-14976 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.7.3 >Reporter: Prince >Priority: Major > Attachments: Solr_Zk_Logs.txt > > > Hi Team, > We aren't able to create a new collection, either from solradmin UI or from > CLI. Once solr is restarted, we are able to proceed with creating collections > and the same scenario repeats over. > Attached zookeeper and solr logs to the case. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9598) Improve the summary of Jenkins emails on failure
[ https://issues.apache.org/jira/browse/LUCENE-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231458#comment-17231458 ] Uwe Schindler commented on LUCENE-9598: --- Hi, due to problem with the builds on Policeman Jenkins (maybe too less heap space for Jenkins slaves), I removed the horrible Regex on the build log. Many builds on external nodes were "hanging" on sending mails. I found no OOMs, but something was fishy. The mails now only contain JVM settings and failed tests, but no build log snippets anymore. The output by Ant and Gradle is often very huge (especially on failed Solr tests with sometimes hundreds of moegabytes of log file output) and the regexes seem to maybe never end or run out of memory. I did not change ASF Jenkins. > Improve the summary of Jenkins emails on failure > > > Key: LUCENE-9598 > URL: https://issues.apache.org/jira/browse/LUCENE-9598 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Priority: Minor > > Where are the patterns that drive what's extracted from the console logs sent > to builds mailing list? I think these could be improved to include more > context starting after "FAILURE" - then you know which task failed exactly, > not just that the build failed. > {code} > FAILURE: Build failed with an exception. > * What went wrong: > Execution failed for task ':solr:solr-ref-guide:checkLocalJavadocLinksSite'. > > Process 'command '/usr/local/asfpackages/java/jdk-11.0.6/bin/java'' > > finished with non-zero exit value 255 > * Try: > Run with --stacktrace option to get the stack trace. Run with --info or > --debug option to get more log output. Run with --scan to get full insights. > * Get more help at https://help.gradle.org > Deprecated Gradle features were used in this build, making it incompatible > with Gradle 7.0. > Use '--warning-mode all' to show the individual deprecation warnings. > See > https://docs.gradle.org/6.6.1/userguide/command_line_interface.html#sec:command_line_warnings > BUILD FAILED in 1h 6m 1s > 852 actionable tasks: 852 executed > Build step 'Invoke Gradle script' changed build result to FAILURE > Build step 'Invoke Gradle script' marked build as failure > Archiving artifacts > Recording test results > Email was triggered for: Failure - Any > Sending email for trigger: Failure - Any > [Email-ext] Notification email body length: 446 > Sending email to: bui...@lucene.apache.org > Finished: FAILURE > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
[ https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231452#comment-17231452 ] Dawid Weiss commented on LUCENE-9608: - This has been fixed already in LUCENE-9549? > Reproduce with line is missing quotes around JVM args > - > > Key: LUCENE-9608 > URL: https://issues.apache.org/jira/browse/LUCENE-9608 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > When we have an exciting test failure, our {{test-framework}} prints a nice > {{Reproduce with:}} output, e.g. from a failure this AM: > {noformat} > Reproduce with: gradlew :lucene:test-framework:test --tests > "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 > -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops > -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 > -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} > But, this is missing quotes around this part: > {noformat} > -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} > it should really be this: > {noformat} > -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} > Probably this is a simple fix? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9607) TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes test failure
[ https://issues.apache.org/jira/browse/LUCENE-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231444#comment-17231444 ] Michael McCandless commented on LUCENE-9607: OK I opened LUCENE-9608 to add missing quotes to {{Reproduce with:}} line. > TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes test failure > -- > > Key: LUCENE-9607 > URL: https://issues.apache.org/jira/browse/LUCENE-9607 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > CI builds have been failing with this: > {noformat} > FAILED: > org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes > Error Message: > java.lang.AssertionError > Stack Trace: > java.lang.AssertionError > at > __randomizedtesting.SeedInfo.seed([43D1E1D1DB325AD7:3E13F00D7ACC8E7E]:0) > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat.checkEncodingCalled(TestUniformSplitPostingFormat.java:63) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1000) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826) > at
[jira] [Created] (LUCENE-9608) Reproduce with line is missing quotes around JVM args
Michael McCandless created LUCENE-9608: -- Summary: Reproduce with line is missing quotes around JVM args Key: LUCENE-9608 URL: https://issues.apache.org/jira/browse/LUCENE-9608 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless When we have an exciting test failure, our {{test-framework}} prints a nice {{Reproduce with:}} output, e.g. from a failure this AM: {noformat} Reproduce with: gradlew :lucene:test-framework:test --tests "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} But, this is missing quotes around this part: {noformat} -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat} it should really be this: {noformat} -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat} Probably this is a simple fix? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9607) TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes test failure
Michael McCandless created LUCENE-9607: -- Summary: TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes test failure Key: LUCENE-9607 URL: https://issues.apache.org/jira/browse/LUCENE-9607 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless CI builds have been failing with this: {noformat} FAILED: org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes Error Message: java.lang.AssertionError Stack Trace: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([43D1E1D1DB325AD7:3E13F00D7ACC8E7E]:0) at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat.checkEncodingCalled(TestUniformSplitPostingFormat.java:63) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1000) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826) at java.base/java.lang.Thread.run(Thread.java:832) Reproduce with: gradlew :lucene:codecs:test --tests "org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat" -Ptests.jvms=6 -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat} But it does not seem to repro for me on one try. Also disturbing is the missing quotes around the {{-Ptests.jvmargs=..}} which then
[jira] [Created] (SOLR-14997) Admin UI shows only the host name (not even port) in the graph view.
Erick Erickson created SOLR-14997: - Summary: Admin UI shows only the host name (not even port) in the graph view. Key: SOLR-14997 URL: https://issues.apache.org/jira/browse/SOLR-14997 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Affects Versions: master (9.0) Reporter: Erick Erickson Attachments: Screen Shot 2020-11-13 at 7.51.28 AM.png Didn't check 8x. The graph view just shows "localhost (N)" for each replica, see attached screenshot. It should at least show the port. Showing the port is important I think, when I have multiple JVMs on the same machine, seeing all the replicas in a particular JVM have a problem at a glance is very useful. I don't have any strong feelings about showing the full replica name. What do people think about showing the full node name? People often put important information in the node name relative to their organization, it'd also help people understand what goes into, say, the createNodeSet. So the equivalent to my screenshot would be "localhost:8981_solr" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)
[ https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231428#comment-17231428 ] Tomoko Uchida commented on LUCENE-9499: --- {quote}Maybe we should limit applying this task only to test-framework module? {quote} I fixed render-javadoc.gradle as suggested. > Clean up package name conflicts between modules (split packages) > > > Key: LUCENE-9499 > URL: https://issues.apache.org/jira/browse/LUCENE-9499 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-9499-javadoc.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We have lots of package name conflicts (shared package names) between modules > in the source tree. It is not only annoying for devs/users but also indeed > bad practice since Java 9 (according to my understanding), and we already > have some problems with Javadocs due to these splitted packages as some of us > would know. Also split packages make migrating to the Java 9 module system > impossible. > This is the placeholder to fix all package name conflicts in Lucene. > See the dev list thread for more background. > > [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E] > Modules that need to be fixed / cleaned up: > - analyzers-common (LUCENE-9317) > - analyzers-icu (LUCENE-9558) > - backward-codecs (LUCENE-9318) > - sandbox (LUCENE-9319) > - misc (LUCENE-9600) > - (test-framework: this can be excluded for the moment) > Also lucene-core will be heavily affected (some classes have to be moved into > {{core}}, or some classes' and methods' in {{core}} visibility have to be > relaxed). > Probably most work would be done in a parallel manner, but conflicts can > happen. If someone want to help out, please open an issue before working and > share your thoughts with me and others. > I set "Fix version" to 9.0 - means once we make a commit on here, this will > be a blocker for release 9.0.0. (I don't think the changes should be > delivered across two major releases; all changes have to be out at once in a > major release.) If there are any objections or concerns, please leave > comments. For now I have no idea about the total volume of changes or > technical obstacles that have to be handled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)
[ https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231426#comment-17231426 ] ASF subversion and git services commented on LUCENE-9499: - Commit 8bac4e7f748592b7f86fb651a35498854a25abb8 in lucene-solr's branch refs/heads/master from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8bac4e7 ] LUCENE-9499: javadoc split package workaroud should be applied only to test-framework. > Clean up package name conflicts between modules (split packages) > > > Key: LUCENE-9499 > URL: https://issues.apache.org/jira/browse/LUCENE-9499 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-9499-javadoc.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We have lots of package name conflicts (shared package names) between modules > in the source tree. It is not only annoying for devs/users but also indeed > bad practice since Java 9 (according to my understanding), and we already > have some problems with Javadocs due to these splitted packages as some of us > would know. Also split packages make migrating to the Java 9 module system > impossible. > This is the placeholder to fix all package name conflicts in Lucene. > See the dev list thread for more background. > > [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E] > Modules that need to be fixed / cleaned up: > - analyzers-common (LUCENE-9317) > - analyzers-icu (LUCENE-9558) > - backward-codecs (LUCENE-9318) > - sandbox (LUCENE-9319) > - misc (LUCENE-9600) > - (test-framework: this can be excluded for the moment) > Also lucene-core will be heavily affected (some classes have to be moved into > {{core}}, or some classes' and methods' in {{core}} visibility have to be > relaxed). > Probably most work would be done in a parallel manner, but conflicts can > happen. If someone want to help out, please open an issue before working and > share your thoughts with me and others. > I set "Fix version" to 9.0 - means once we make a commit on here, this will > be a blocker for release 9.0.0. (I don't think the changes should be > delivered across two major releases; all changes have to be out at once in a > major release.) If there are any objections or concerns, please leave > comments. For now I have no idea about the total volume of changes or > technical obstacles that have to be handled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9499) Clean up package name conflicts between modules (split packages)
[ https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-9499: -- Attachment: LUCENE-9499-javadoc.patch > Clean up package name conflicts between modules (split packages) > > > Key: LUCENE-9499 > URL: https://issues.apache.org/jira/browse/LUCENE-9499 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-9499-javadoc.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We have lots of package name conflicts (shared package names) between modules > in the source tree. It is not only annoying for devs/users but also indeed > bad practice since Java 9 (according to my understanding), and we already > have some problems with Javadocs due to these splitted packages as some of us > would know. Also split packages make migrating to the Java 9 module system > impossible. > This is the placeholder to fix all package name conflicts in Lucene. > See the dev list thread for more background. > > [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E] > Modules that need to be fixed / cleaned up: > - analyzers-common (LUCENE-9317) > - analyzers-icu (LUCENE-9558) > - backward-codecs (LUCENE-9318) > - sandbox (LUCENE-9319) > - misc (LUCENE-9600) > - (test-framework: this can be excluded for the moment) > Also lucene-core will be heavily affected (some classes have to be moved into > {{core}}, or some classes' and methods' in {{core}} visibility have to be > relaxed). > Probably most work would be done in a parallel manner, but conflicts can > happen. If someone want to help out, please open an issue before working and > share your thoughts with me and others. > I set "Fix version" to 9.0 - means once we make a commit on here, this will > be a blocker for release 9.0.0. (I don't think the changes should be > delivered across two major releases; all changes have to be out at once in a > major release.) If there are any objections or concerns, please leave > comments. For now I have no idea about the total volume of changes or > technical obstacles that have to be handled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9499) Clean up package name conflicts between modules (split packages)
[ https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-9499: -- Attachment: (was: LUCENE-9499-javadoc) > Clean up package name conflicts between modules (split packages) > > > Key: LUCENE-9499 > URL: https://issues.apache.org/jira/browse/LUCENE-9499 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-9499-javadoc.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We have lots of package name conflicts (shared package names) between modules > in the source tree. It is not only annoying for devs/users but also indeed > bad practice since Java 9 (according to my understanding), and we already > have some problems with Javadocs due to these splitted packages as some of us > would know. Also split packages make migrating to the Java 9 module system > impossible. > This is the placeholder to fix all package name conflicts in Lucene. > See the dev list thread for more background. > > [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E] > Modules that need to be fixed / cleaned up: > - analyzers-common (LUCENE-9317) > - analyzers-icu (LUCENE-9558) > - backward-codecs (LUCENE-9318) > - sandbox (LUCENE-9319) > - misc (LUCENE-9600) > - (test-framework: this can be excluded for the moment) > Also lucene-core will be heavily affected (some classes have to be moved into > {{core}}, or some classes' and methods' in {{core}} visibility have to be > relaxed). > Probably most work would be done in a parallel manner, but conflicts can > happen. If someone want to help out, please open an issue before working and > share your thoughts with me and others. > I set "Fix version" to 9.0 - means once we make a commit on here, this will > be a blocker for release 9.0.0. (I don't think the changes should be > delivered across two major releases; all changes have to be out at once in a > major release.) If there are any objections or concerns, please leave > comments. For now I have no idea about the total volume of changes or > technical obstacles that have to be handled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9499) Clean up package name conflicts between modules (split packages)
[ https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-9499: -- Attachment: LUCENE-9499-javadoc > Clean up package name conflicts between modules (split packages) > > > Key: LUCENE-9499 > URL: https://issues.apache.org/jira/browse/LUCENE-9499 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-9499-javadoc.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We have lots of package name conflicts (shared package names) between modules > in the source tree. It is not only annoying for devs/users but also indeed > bad practice since Java 9 (according to my understanding), and we already > have some problems with Javadocs due to these splitted packages as some of us > would know. Also split packages make migrating to the Java 9 module system > impossible. > This is the placeholder to fix all package name conflicts in Lucene. > See the dev list thread for more background. > > [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E] > Modules that need to be fixed / cleaned up: > - analyzers-common (LUCENE-9317) > - analyzers-icu (LUCENE-9558) > - backward-codecs (LUCENE-9318) > - sandbox (LUCENE-9319) > - misc (LUCENE-9600) > - (test-framework: this can be excluded for the moment) > Also lucene-core will be heavily affected (some classes have to be moved into > {{core}}, or some classes' and methods' in {{core}} visibility have to be > relaxed). > Probably most work would be done in a parallel manner, but conflicts can > happen. If someone want to help out, please open an issue before working and > share your thoughts with me and others. > I set "Fix version" to 9.0 - means once we make a commit on here, this will > be a blocker for release 9.0.0. (I don't think the changes should be > delivered across two major releases; all changes have to be out at once in a > major release.) If there are any objections or concerns, please leave > comments. For now I have no idea about the total volume of changes or > technical obstacles that have to be handled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)
[ https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231402#comment-17231402 ] Uwe Schindler commented on LUCENE-9499: --- bq. I left it as it was, since we still have split packages in test-framework (can we remove these lines now?) I thought about this the minute after I sent the mail. Maybe we should limit applying this task only to test-framework module? > Clean up package name conflicts between modules (split packages) > > > Key: LUCENE-9499 > URL: https://issues.apache.org/jira/browse/LUCENE-9499 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > We have lots of package name conflicts (shared package names) between modules > in the source tree. It is not only annoying for devs/users but also indeed > bad practice since Java 9 (according to my understanding), and we already > have some problems with Javadocs due to these splitted packages as some of us > would know. Also split packages make migrating to the Java 9 module system > impossible. > This is the placeholder to fix all package name conflicts in Lucene. > See the dev list thread for more background. > > [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E] > Modules that need to be fixed / cleaned up: > - analyzers-common (LUCENE-9317) > - analyzers-icu (LUCENE-9558) > - backward-codecs (LUCENE-9318) > - sandbox (LUCENE-9319) > - misc (LUCENE-9600) > - (test-framework: this can be excluded for the moment) > Also lucene-core will be heavily affected (some classes have to be moved into > {{core}}, or some classes' and methods' in {{core}} visibility have to be > relaxed). > Probably most work would be done in a parallel manner, but conflicts can > happen. If someone want to help out, please open an issue before working and > share your thoughts with me and others. > I set "Fix version" to 9.0 - means once we make a commit on here, this will > be a blocker for release 9.0.0. (I don't think the changes should be > delivered across two major releases; all changes have to be out at once in a > major release.) If there are any objections or concerns, please leave > comments. For now I have no idea about the total volume of changes or > technical obstacles that have to be handled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)
[ https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231393#comment-17231393 ] Tomoko Uchida commented on LUCENE-9499: --- Thank you [~uschindler] for fixing the link. {quote}And this may be removed, as we have no split packages anymore: [https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=gradle/documentation/render-javadoc.gradle;h=bbd1b5e603a0c9f513c836452a18b9ce9caa83e7;hb=426a9c2#l277] {quote} I left it as it was, since we still have split packages in test-framework (can we remove these lines now?) > Clean up package name conflicts between modules (split packages) > > > Key: LUCENE-9499 > URL: https://issues.apache.org/jira/browse/LUCENE-9499 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > We have lots of package name conflicts (shared package names) between modules > in the source tree. It is not only annoying for devs/users but also indeed > bad practice since Java 9 (according to my understanding), and we already > have some problems with Javadocs due to these splitted packages as some of us > would know. Also split packages make migrating to the Java 9 module system > impossible. > This is the placeholder to fix all package name conflicts in Lucene. > See the dev list thread for more background. > > [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E] > Modules that need to be fixed / cleaned up: > - analyzers-common (LUCENE-9317) > - analyzers-icu (LUCENE-9558) > - backward-codecs (LUCENE-9318) > - sandbox (LUCENE-9319) > - misc (LUCENE-9600) > - (test-framework: this can be excluded for the moment) > Also lucene-core will be heavily affected (some classes have to be moved into > {{core}}, or some classes' and methods' in {{core}} visibility have to be > relaxed). > Probably most work would be done in a parallel manner, but conflicts can > happen. If someone want to help out, please open an issue before working and > share your thoughts with me and others. > I set "Fix version" to 9.0 - means once we make a commit on here, this will > be a blocker for release 9.0.0. (I don't think the changes should be > delivered across two major releases; all changes have to be out at once in a > major release.) If there are any objections or concerns, please leave > comments. For now I have no idea about the total volume of changes or > technical obstacles that have to be handled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
dweiss commented on pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-726675884 No worries. I've pushed the code the way I think it should work - please take a look, let me know if you don't understand something. I tested on Windows and Linux, runs fine. The 'tests.native' flag is set automatically depending on 'build.native' but is there separately just in case somebody wished to manually enable those tests from IDE level. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
zacharymorn commented on pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-726643548 > Thanks Zach. I looked at what you did with tests and I think this can be done in a cleaner way that always works. I'll show you how, give me some time. Sounds good! Look forward to it. > Also, update your IDE"s formatter settings to the convention used throughout the code (you used 4 spaces indentation). May be helpful in other patches too. Ops sorry. Just updated it to use 2 spaces indentation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
dweiss commented on pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-726599397 Thanks Zach. I looked at what you did with tests and I think this can be done in a cleaner way that always works. I'll show you how, give me some time. Also, update your IDE"s formatter settings to the convention used throughout the code (you used 4 spaces indentation). May be helpful in other patches too. I'll get back to you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``
[ https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231249#comment-17231249 ] Simon Willnauer commented on LUCENE-9508: - uups sorry! I meant [~shamirwasia] thanks for the headsup > DocumentsWriter doesn't check for BlockedFlushes in stall mode`` > > > Key: LUCENE-9508 > URL: https://issues.apache.org/jira/browse/LUCENE-9508 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.5.1 >Reporter: Sorabh Hamirwasia >Priority: Major > Labels: IndexWriter > > Hi, > I was investigating an issue where the memory usage by a single Lucene > IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the > memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, > this case ~3GB). So ideally memory usage should not go above that limit. I > looked into the heap dump and found that the fullFlush thread when enters > *markForFullFlush* method, it tries to take lock on the ThreadStates of all > the DWPT thread sequentially. If lock on one of the ThreadState is blocked > then it will block indefinitely. This is what happened in my case, where one > of the DWPT thread was stuck in indexing process. Due to this fullFlush > thread was unable to populate the flush queue even though the stall mode was > detected. This caused the new indexing request which came on indexing thread > to continue after sleeping for a second, and continue with indexing. In > **preUpdate()** method it looks for the stalled case and see if there is any > pending flushes (based on flush queue), if not then sleep and continue. > Question: > 1) Should **preUpdate** look into the blocked flushes information as well > instead of just flush queue ? > 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates > ? Since single blocking writing thread can block the full flush here. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org