[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return

2020-11-13 Thread WangFeiCheng (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangFeiCheng updated LUCENE-9609:
-
Description: 
I noticed that when there are too many terms, the highlighted query is 
restricted

I know that in TermInSetQuery, when there are fewer terms, 
BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query 
efficiency
{code:java}
static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

public Query rewrite(IndexReader reader) throws IOException {
final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, 
BooleanQuery.getMaxClauseCount());
if (termData.size() <= threshold) {
  BooleanQuery.Builder bq = new BooleanQuery.Builder();
  TermIterator iterator = termData.iterator();
  for (BytesRef term = iterator.next(); term != null; term = 
iterator.next()) {
bq.add(new TermQuery(new Term(iterator.field(), 
BytesRef.deepCopyOf(term))), Occur.SHOULD);
  }
  return new ConstantScoreQuery(bq.build());
}
return super.rewrite(reader);
  }
{code}
 When the term of the query statement exceeds 16, the createWeight method in 
TermInSetQuery will be used
{code:java}
public Weight createWeight(IndexSearcher searcher, boolean needsScores, float 
boost) throws IOException {
return new ConstantScoreWeight(this, boost) {

  @Override
  public void extractTerms(Set terms) {
// no-op
// This query is for abuse cases when the number of terms is too high to
// run efficiently as a BooleanQuery. So likewise we hide its terms in
// order to protect highlighters
  }

  ..
  }
{code}
I want to ask, why do you say "we hide its terms in order to protect 
highlighters"

Why this threshold can highlight protection, or how to implement such " protect 
highlighters"?

 

 

  was:
I noticed that when there are too many terms, the highlighted query is 
restricted

I know that in TermInSetQuery, when there are fewer terms, 
BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query 
efficiency
{code:java}
static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

public Query rewrite(IndexReader reader) throws IOException {
final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, 
BooleanQuery.getMaxClauseCount());
if (termData.size() <= threshold) {
  BooleanQuery.Builder bq = new BooleanQuery.Builder();
  TermIterator iterator = termData.iterator();
  for (BytesRef term = iterator.next(); term != null; term = 
iterator.next()) {
bq.add(new TermQuery(new Term(iterator.field(), 
BytesRef.deepCopyOf(term))), Occur.SHOULD);
  }
  return new ConstantScoreQuery(bq.build());
}
return super.rewrite(reader);
  }
{code}
 When the term of the query statement exceeds 16, the createWeight method in 
TermInSetQuery will be used
{code:java}
public Weight createWeight(IndexSearcher searcher, boolean needsScores, float 
boost) throws IOException {
return new ConstantScoreWeight(this, boost) {

  @Override
  public void extractTerms(Set terms) {
// no-op
// This query is for abuse cases when the number of terms is too high to
// run efficiently as a BooleanQuery. So likewise we hide its terms in
// order to protect highlighters
  }

  ..
  }
{code}
I want to ask, why do I say "we hide its terms in order to protect highlighters"

Why this threshold can highlight protection, or how to implement such " protect 
highlighters"?

 

 


> When the term of more than 16, highlight the query does not return
> --
>
> Key: LUCENE-9609
> URL: https://issues.apache.org/jira/browse/LUCENE-9609
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/search
>Affects Versions: 7.7.3
>Reporter: WangFeiCheng
>Priority: Minor
>
> I noticed that when there are too many terms, the highlighted query is 
> restricted
> I know that in TermInSetQuery, when there are fewer terms, 
> BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query 
> efficiency
> {code:java}
> static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;
> public Query rewrite(IndexReader reader) throws IOException {
> final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, 
> BooleanQuery.getMaxClauseCount());
> if (termData.size() <= threshold) {
>   BooleanQuery.Builder bq = new BooleanQuery.Builder();
>   TermIterator iterator = termData.iterator();
>   for (BytesRef term = iterator.next(); term != null; term = 
> iterator.next()) {
> bq.add(new TermQuery(new Term(iterator.field(), 
> BytesRef.deepCopyOf(term))), Occur.SHOULD);
>   }
>   return new ConstantScoreQuery(bq.build());
> }
> return super.rewrite(reader);
>   

[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return

2020-11-13 Thread WangFeiCheng (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangFeiCheng updated LUCENE-9609:
-
Description: 
I noticed that when there are too many terms, the highlighted query is 
restricted

I know that in TermInSetQuery, when there are fewer terms, 
BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query 
efficiency
{code:java}
static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

public Query rewrite(IndexReader reader) throws IOException {
final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, 
BooleanQuery.getMaxClauseCount());
if (termData.size() <= threshold) {
  BooleanQuery.Builder bq = new BooleanQuery.Builder();
  TermIterator iterator = termData.iterator();
  for (BytesRef term = iterator.next(); term != null; term = 
iterator.next()) {
bq.add(new TermQuery(new Term(iterator.field(), 
BytesRef.deepCopyOf(term))), Occur.SHOULD);
  }
  return new ConstantScoreQuery(bq.build());
}
return super.rewrite(reader);
  }
{code}
 When the term of the query statement exceeds 16, the createWeight method in 
TermInSetQuery will be used
{code:java}
public Weight createWeight(IndexSearcher searcher, boolean needsScores, float 
boost) throws IOException {
return new ConstantScoreWeight(this, boost) {

  @Override
  public void extractTerms(Set terms) {
// no-op
// This query is for abuse cases when the number of terms is too high to
// run efficiently as a BooleanQuery. So likewise we hide its terms in
// order to protect highlighters
  }

  ..
  }
{code}
I want to ask, why do I say "we hide its terms in order to protect highlighters"

Why this threshold can highlight protection, or how to implement such " protect 
highlighters"?

 

 

  was:
I noticed that when there are too many terms, the highlighted query is 
restricted

I know that in TermInSetQuery, when there are fewer entries, please use 
BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 to improve query efficiency
{code:java}
静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

公共查询重写(IndexReader阅读器)引发IOException {
最终int阈值= 
Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount());
如果(termData.size()<=阈值){
  BooleanQuery.Builder bq =新的BooleanQuery.Builder();
  TermIterator迭代器= termData.iterator();
  for(BytesRef term = iterator.next(); term!= null; term = iterator.next()){
bq.add(new TermQuery(new 
Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD);
  }
  返回新的ConstantScoreQuery(bq.build());
}
返回super.rewrite(reader);
  }
{code}
但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个

 
{code:java}
@Override
public void extractTerms(Set 术语){
//无操作
//此查询用于术语数量过多而无法使用的滥用情况
//作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在
//为了保护荧光笔
}
{code}
我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语”

为什么这个阈值可以保护重点,以及如何实现这种“保护”?

 

 


> When the term of more than 16, highlight the query does not return
> --
>
> Key: LUCENE-9609
> URL: https://issues.apache.org/jira/browse/LUCENE-9609
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/search
>Affects Versions: 7.7.3
>Reporter: WangFeiCheng
>Priority: Minor
>
> I noticed that when there are too many terms, the highlighted query is 
> restricted
> I know that in TermInSetQuery, when there are fewer terms, 
> BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query 
> efficiency
> {code:java}
> static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;
> public Query rewrite(IndexReader reader) throws IOException {
> final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, 
> BooleanQuery.getMaxClauseCount());
> if (termData.size() <= threshold) {
>   BooleanQuery.Builder bq = new BooleanQuery.Builder();
>   TermIterator iterator = termData.iterator();
>   for (BytesRef term = iterator.next(); term != null; term = 
> iterator.next()) {
> bq.add(new TermQuery(new Term(iterator.field(), 
> BytesRef.deepCopyOf(term))), Occur.SHOULD);
>   }
>   return new ConstantScoreQuery(bq.build());
> }
> return super.rewrite(reader);
>   }
> {code}
>  When the term of the query statement exceeds 16, the createWeight method in 
> TermInSetQuery will be used
> {code:java}
> public Weight createWeight(IndexSearcher searcher, boolean needsScores, float 
> boost) throws IOException {
> return new ConstantScoreWeight(this, boost) {
>   @Override
>   public void extractTerms(Set terms) {
> // no-op
> // This query is for abuse cases when the number of terms is too high 
> to
> // run efficiently as a BooleanQuery. So likewise we hide its terms in
> // order to protect 

[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return

2020-11-13 Thread WangFeiCheng (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangFeiCheng updated LUCENE-9609:
-
Description: 
I noticed that when there are too many terms, the highlighted query is 
restricted

I know that in TermInSetQuery, when there are fewer entries, please use 
BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 to improve query efficiency
{code:java}
静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

公共查询重写(IndexReader阅读器)引发IOException {
最终int阈值= 
Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount());
如果(termData.size()<=阈值){
  BooleanQuery.Builder bq =新的BooleanQuery.Builder();
  TermIterator迭代器= termData.iterator();
  for(BytesRef term = iterator.next(); term!= null; term = iterator.next()){
bq.add(new TermQuery(new 
Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD);
  }
  返回新的ConstantScoreQuery(bq.build());
}
返回super.rewrite(reader);
  }
{code}
但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个

 
{code:java}
@Override
public void extractTerms(Set 术语){
//无操作
//此查询用于术语数量过多而无法使用的滥用情况
//作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在
//为了保护荧光笔
}
{code}
我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语”

为什么这个阈值可以保护重点,以及如何实现这种“保护”?

 

 

> When the term of more than 16, highlight the query does not return
> --
>
> Key: LUCENE-9609
> URL: https://issues.apache.org/jira/browse/LUCENE-9609
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/search
>Affects Versions: 7.7.3
>Reporter: WangFeiCheng
>Priority: Minor
>
> I noticed that when there are too many terms, the highlighted query is 
> restricted
> I know that in TermInSetQuery, when there are fewer entries, please use 
> BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 to improve query efficiency
> {code:java}
> 静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;
> 公共查询重写(IndexReader阅读器)引发IOException {
> 最终int阈值= 
> Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount());
> 如果(termData.size()<=阈值){
>   BooleanQuery.Builder bq =新的BooleanQuery.Builder();
>   TermIterator迭代器= termData.iterator();
>   for(BytesRef term = iterator.next(); term!= null; term = 
> iterator.next()){
> bq.add(new TermQuery(new 
> Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD);
>   }
>   返回新的ConstantScoreQuery(bq.build());
> }
> 返回super.rewrite(reader);
>   }
> {code}
> 但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个
>  
> {code:java}
> @Override
> public void extractTerms(Set 术语){
> //无操作
> //此查询用于术语数量过多而无法使用的滥用情况
> //作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在
> //为了保护荧光笔
> }
> {code}
> 我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语”
> 为什么这个阈值可以保护重点,以及如何实现这种“保护”?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return

2020-11-13 Thread WangFeiCheng (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangFeiCheng updated LUCENE-9609:
-
Description: (was: 我注意到,当术语过多时,突出显示的查询受到限制

我知道在TermInSetQuery中,当词条较少时,请使用BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16来提高查询效率
{code:java}
静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

公共查询重写(IndexReader阅读器)引发IOException {
最终int阈值= 
Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount());
如果(termData.size()<=阈值){
  BooleanQuery.Builder bq =新的BooleanQuery.Builder();
  TermIterator迭代器= termData.iterator();
  for(BytesRef term = iterator.next(); term!= null; term = iterator.next()){
bq.add(new TermQuery(new 
Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD);
  }
  返回新的ConstantScoreQuery(bq.build());
}
返回super.rewrite(reader);
  }
{code}
但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个

 
{code:java}
@Override
public void extractTerms(Set 术语){
//无操作
//此查询用于术语数量过多而无法使用的滥用情况
//作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在
//为了保护荧光笔
}
{code}
我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语”

为什么这个阈值可以保护重点,以及如何实现这种“保护”?

 

 )

> When the term of more than 16, highlight the query does not return
> --
>
> Key: LUCENE-9609
> URL: https://issues.apache.org/jira/browse/LUCENE-9609
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/search
>Affects Versions: 7.7.3
>Reporter: WangFeiCheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9609) When the term of more than 16, highlight the query does not return

2020-11-13 Thread WangFeiCheng (Jira)
WangFeiCheng created LUCENE-9609:


 Summary: When the term of more than 16, highlight the query does 
not return
 Key: LUCENE-9609
 URL: https://issues.apache.org/jira/browse/LUCENE-9609
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search
Affects Versions: 7.7.3
Reporter: WangFeiCheng


我注意到,当术语过多时,突出显示的查询受到限制

我知道在TermInSetQuery中,当词条较少时,请使用BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16来提高查询效率
{code:java}
静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

公共查询重写(IndexReader阅读器)引发IOException {
最终int阈值= 
Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD,BooleanQuery.getMaxClauseCount());
如果(termData.size()<=阈值){
  BooleanQuery.Builder bq =新的BooleanQuery.Builder();
  TermIterator迭代器= termData.iterator();
  for(BytesRef term = iterator.next(); term!= null; term = iterator.next()){
bq.add(new TermQuery(new 
Term(iterator.field(),BytesRef.deepCopyOf(term))),Occur.SHOULD);
  }
  返回新的ConstantScoreQuery(bq.build());
}
返回super.rewrite(reader);
  }
{code}
但是,在extractTerms中使用TermInSetQuery方法时,查询条件的重点超过16个

 
{code:java}
@Override
public void extractTerms(Set 术语){
//无操作
//此查询用于术语数量过多而无法使用的滥用情况
//作为BooleanQuery有效运行。因此,我们同样将其术语隐藏在
//为了保护荧光笔
}
{code}
我想问一下,为什么要说“所以同样,我们为了保护荧光笔而隐藏了它的术语”

为什么这个阈值可以保护重点,以及如何实现这种“保护”?

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated SOLR-15000:
---
Description: 
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, *I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?*
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.

  !add-collection-step-2-expert.png|width=500!
  !add-collection-step-2.png|width=500!

   Schema editing mode can be switched between visual editing mode or advanced 
expert mode

 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect

  was:
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.

!add-collection-step-2-expert.png|width=500!
!add-collection-step-2.png|width=500!

   Schema editing mode can be switched between visual editing mode or advanced 
expert mode

 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect


> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2. Summary
> I have developed an enterprise application based on Solr,named TIS . Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
>  - offline index building platform
>  The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> 

[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated SOLR-15000:
---
Description: 
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.

!add-collection-step-2-expert.png|width=500!
!add-collection-step-2.png|width=500!

   Schema editing mode can be switched between visual editing mode or advanced 
expert mode

 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect

  was:
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.

!add-collection-step-2-expert.png|width=500!
!add-collection-step-2.png|width=500!

Schema editing mode can be switched between visual editing mode or advanced 
expert mode

 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect


> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2. Summary
> I have developed an enterprise application based on Solr,named TIS . Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
>  - offline index building platform
>  The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide 

[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated SOLR-15000:
---
Description: 
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.

!add-collection-step-2-expert.png|width=500!
!add-collection-step-2.png|width=500!

Schema editing mode can be switched between visual editing mode or advanced 
expert mode

 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect

  was:
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.


 !add-collection-step-2-expert.png|width=500!


 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect


> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2. Summary
> I have developed an enterprise application based on Solr,named TIS . Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
>  - offline index building platform
>  The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide table is constructed directly by spark
>  - incremental real-time channel
>  It is transmitted to Kafka , and real-time stream 

[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated SOLR-15000:
---
Description: 
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.


 !add-collection-step-2-expert.png|width=500!


 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect

  was:
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.
 !add-collection-step-2-expert.png|width=500!
 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect


> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2. Summary
> I have developed an enterprise application based on Solr,named TIS . Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
>  - offline index building platform
>  The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide table is constructed directly by spark
>  - incremental real-time channel
>  It is transmitted to Kafka , and real-time stream calculation is carried out 
> by Flink and submitted to search engine to ensure that the data in search 
> engine and database are 

[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated SOLR-15000:
---
Description: 
h2. Summary

I have developed an enterprise application based on Solr,named TIS . Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:
 - offline index building platform
 The data is exported from ER database( mysql, sqlserver and so on) through 
full table scanning, and then the wide table is constructed by local MR tool, 
or the wide table is constructed directly by spark
 - incremental real-time channel
 It is transmitted to Kafka , and real-time stream calculation is carried out 
by Flink and submitted to search engine to ensure that the data in search 
engine and database are consistent in near real time
 - search engine
 currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.
h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**
h2. TIS main Features
 - The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.
 !add-collection-step-2-expert.png|width=500!
 - Define wide table rules based on the selected data table
 - The offline index building component is provided. Outside the collection, 
the data is built into Lucene segment file. Then, the segment file is returned 
to the local disk where solrcore is located. The new index of reload solrcore 
takes effect

  was:
h2.  Summary

I have developed an enterprise application based on Solr,named TIS .  Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:

- offline index building platform
The data is exported from ER database( mysql, sqlserver and so on) through full 
table scanning, and then the wide table is constructed by local MR tool, or the 
wide table is constructed directly by spark
- incremental real-time channel
It is transmitted to Kafka , and real-time stream calculation is carried out by 
Flink and submitted to search engine to ensure that the data in search engine 
and database are consistent in near real time
- search engine
currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.

h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**

h2.  TIS main Features 

- The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.
- Define wide table rules based on the selected data table 
- The offline index building component is provided. Outside the collection, the 
data is built into Lucene segment file. Then, the segment file is returned to 
the local disk where solrcore is located. The new index of reload solrcore 
takes effect





> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2. Summary
> I have developed an enterprise application based on Solr,named TIS . Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
>  - offline index building platform
>  The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide table is constructed directly by spark
>  - incremental real-time channel
>  It is transmitted to Kafka , and real-time stream calculation is carried out 
> by Flink and submitted to search engine to ensure that the data in search 
> engine and database are consistent in near real time
>  - search engine
>  

[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated SOLR-15000:
---
Attachment: add-collection-step-2.png

> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2.  Summary
> I have developed an enterprise application based on Solr,named TIS .  Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
> - offline index building platform
> The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide table is constructed directly by spark
> - incremental real-time channel
> It is transmitted to Kafka , and real-time stream calculation is carried out 
> by Flink and submitted to search engine to ensure that the data in search 
> engine and database are consistent in near real time
> - search engine
> currently,based on Solr8
> TIS integrate these components seamlessly and bring users one-stop, out of 
> the box experience.
> h2. My question
> I want to feed back my code to the community, but TIS focuses on Enterprise 
> Application Search, just as elasitc search focuses on visual analysis of time 
> series data. Because Solr is a general search product, **I don't think TIS 
> can be merged directly into Solr. Is it possible for TIS to be a new 
> incubation project under Apache?**
> h2.  TIS main Features 
> - The schema and solrconfig storage are separated from ZK and stored in 
> MySQL. The version management function is provided. Users can roll back to 
> the historical version of the configuration.
> - Define wide table rules based on the selected data table 
> - The offline index building component is provided. Outside the collection, 
> the data is built into Lucene segment file. Then, the segment file is 
> returned to the local disk where solrcore is located. The new index of reload 
> solrcore takes effect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated SOLR-15000:
---
Attachment: add-collection-step-2-expert.png

> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2.  Summary
> I have developed an enterprise application based on Solr,named TIS .  Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
> - offline index building platform
> The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide table is constructed directly by spark
> - incremental real-time channel
> It is transmitted to Kafka , and real-time stream calculation is carried out 
> by Flink and submitted to search engine to ensure that the data in search 
> engine and database are consistent in near real time
> - search engine
> currently,based on Solr8
> TIS integrate these components seamlessly and bring users one-stop, out of 
> the box experience.
> h2. My question
> I want to feed back my code to the community, but TIS focuses on Enterprise 
> Application Search, just as elasitc search focuses on visual analysis of time 
> series data. Because Solr is a general search product, **I don't think TIS 
> can be merged directly into Solr. Is it possible for TIS to be a new 
> incubation project under Apache?**
> h2.  TIS main Features 
> - The schema and solrconfig storage are separated from ZK and stored in 
> MySQL. The version management function is provided. Users can roll back to 
> the historical version of the configuration.
> - Define wide table rules based on the selected data table 
> - The offline index building component is provided. Outside the collection, 
> the data is built into Lucene segment file. Then, the segment file is 
> returned to the local disk where solrcore is located. The new index of reload 
> solrcore takes effect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated SOLR-15000:
---
Description: 
h2.  Summary

I have developed an enterprise application based on Solr,named TIS .  Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:

- offline index building platform
The data is exported from ER database( mysql, sqlserver and so on) through full 
table scanning, and then the wide table is constructed by local MR tool, or the 
wide table is constructed directly by spark
- incremental real-time channel
It is transmitted to Kafka , and real-time stream calculation is carried out by 
Flink and submitted to search engine to ensure that the data in search engine 
and database are consistent in near real time
- search engine
currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.

h2. My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**

h2.  TIS main Features 

- The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.
- Define wide table rules based on the selected data table 
- The offline index building component is provided. Outside the collection, the 
data is built into Lucene segment file. Then, the segment file is returned to 
the local disk where solrcore is located. The new index of reload solrcore 
takes effect




  was:
## Summary

I have developed an enterprise application based on Solr,named TIS .  Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:

- offline index building platform
The data is exported from ER database( mysql, sqlserver and so on) through full 
table scanning, and then the wide table is constructed by local MR tool, or the 
wide table is constructed directly by spark
- incremental real-time channel
It is transmitted to Kafka , and real-time stream calculation is carried out by 
Flink and submitted to search engine to ensure that the data in search engine 
and database are consistent in near real time
- search engine
currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.

## My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**

## TIS main Features 

- The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.
- Define wide table rules based on the selected data table 
- The offline index building component is provided. Outside the collection, the 
data is built into Lucene segment file. Then, the segment file is returned to 
the local disk where solrcore is located. The new index of reload solrcore 
takes effect





> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
>
> h2.  Summary
> I have developed an enterprise application based on Solr,named TIS .  Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
> - offline index building platform
> The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide table is constructed directly by spark
> - incremental real-time channel
> It is transmitted to Kafka , and real-time stream calculation is carried out 
> by Flink and submitted to search engine to ensure that the data in search 
> engine and database are consistent in near real time
> - search engine
> currently,based on Solr8
> TIS integrate these components seamlessly and bring users one-stop, out of 
> the box experience.
> h2. My question
> 

[jira] [Created] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-13 Thread bai sui (Jira)
bai sui created SOLR-15000:
--

 Summary: Solr based enterprise level, one-stop search center 
products with high performance, high reliability and high scalability
 Key: SOLR-15000
 URL: https://issues.apache.org/jira/browse/SOLR-15000
 Project: Solr
  Issue Type: Wish
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI
Reporter: bai sui


## Summary

I have developed an enterprise application based on Solr,named TIS .  Use TIS 
can quickly build enterprise search service for you. TIS includes three 
components:

- offline index building platform
The data is exported from ER database( mysql, sqlserver and so on) through full 
table scanning, and then the wide table is constructed by local MR tool, or the 
wide table is constructed directly by spark
- incremental real-time channel
It is transmitted to Kafka , and real-time stream calculation is carried out by 
Flink and submitted to search engine to ensure that the data in search engine 
and database are consistent in near real time
- search engine
currently,based on Solr8

TIS integrate these components seamlessly and bring users one-stop, out of the 
box experience.

## My question

I want to feed back my code to the community, but TIS focuses on Enterprise 
Application Search, just as elasitc search focuses on visual analysis of time 
series data. Because Solr is a general search product, **I don't think TIS can 
be merged directly into Solr. Is it possible for TIS to be a new incubation 
project under Apache?**

## TIS main Features 

- The schema and solrconfig storage are separated from ZK and stored in MySQL. 
The version management function is provided. Users can roll back to the 
historical version of the configuration.
- Define wide table rules based on the selected data table 
- The offline index building component is provided. Outside the collection, the 
data is built into Lucene segment file. Then, the segment file is returned to 
the local disk where solrcore is located. The new index of reload solrcore 
takes effect






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-13 Thread GitBox


zacharymorn commented on a change in pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r523355315



##
File path: gradle/native/disable-native.gradle
##
@@ -17,20 +17,65 @@
 
 // This is the master switch to disable all tasks that compile
 // native (cpp) code.
-def buildNative = propertyOrDefault("build.native", true).toBoolean()
+rootProject.ext {
+  buildNative = propertyOrDefault("build.native", true).toBoolean()
+}
+
+// Explicitly list all projects that should be configured for native 
extensions.
+// We could scan for projects with a the cpp-library plugin but this is faster.
+def nativeProjects = allprojects.findAll {it.path in [
+":lucene:misc:native"
+]}
+
+def javaProjectsWithNativeDeps = allprojects.findAll {it.path in [
+":lucene:misc"
+]}
+
+// Set up defaults for projects with native dependencies.
+configure(javaProjectsWithNativeDeps, {
+  configurations {

Review comment:
   This configuration block seems very auto-magical to me in that it 
somehow gets linked to `:lucene:misc:native/build` folder (no reference to 
`nativeProjects` here),  and the `copyNativeDeps` task below copies only the 
needed library artifact without all the nested folder structure (nor does it 
seems to copy any other random file I created in `:lucene:misc:native/build` 
folder to test thing out). Is this some convention trigged by the two attribute 
settings below?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-13 Thread GitBox


zacharymorn commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-727123057


   > No worries. I've pushed the code the way I think it should work - please 
take a look, let me know if you don't understand something. I tested on Windows 
and Linux, runs fine. The 'tests.native' flag is set automatically depending on 
'build.native' but is there separately just in case somebody wished to manually 
enable those tests from IDE level.
   
   Wow these look pretty advanced! Pretty sure I can't come up with them myself 
(and I do have a question that I can't seems to find the answer to readily 
online). I also like that these tests can be run from dev's local environment 
as well (and they run fine on my mac) compared to the @Nightly annotation 
approach. Thanks Dawid!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dxl360 commented on a change in pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-13 Thread GitBox


dxl360 commented on a change in pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#discussion_r523290307



##
File path: lucene/core/src/test/org/apache/lucene/index/TestCustomTermFreq.java
##
@@ -458,4 +458,50 @@ public void testFieldInvertState() throws Exception {
 
 IOUtils.close(w, dir);
   }
+
+  // LUCENE-8947: Indexing fails with "too many tokens for field" when using 
custom term frequencies

Review comment:
   Yeah that is a more precise description. I just added it to the comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14928) Remove Overseer ClusterStateUpdater

2020-11-13 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231876#comment-17231876
 ] 

Ilan Ginzburg commented on SOLR-14928:
--

A new 
[commit|https://github.com/murblanc/lucene-solr/commit/4d96bcdbe0278c53a5c9283fe62af49715522e81]
 allows running comparison tests (by toggling 
{{StateChangeRecorder.USE_DISTRIBUTED_STATE_CHANGE}}) to see the time it takes 
to run the create collection command (excluding actual replica creation on the 
nodes where they should go!).

Initial (JMeter based) comparison between state update directly to ZK and via 
Overseer seems promising (but the collection creation work done by 
{{CreateCollectionCmd}} is the low hanging fruit expected to be faster, so no 
major win here, but at least no fatal blow to this effort).

Note the Collection API is broken by this commit (so don't try deleting 
collections or anything with it).

> Remove Overseer ClusterStateUpdater
> ---
>
> Key: SOLR-14928
> URL: https://issues.apache.org/jira/browse/SOLR-14928
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Ilan Ginzburg
>Assignee: Ilan Ginzburg
>Priority: Major
>  Labels: cluster, collection-api, overseer
>
> Remove the Overseer {{ClusterStateUpdater}} thread and associated Zookeeper 
> queue at {{<_chroot_>/overseer/queue}}.
> Change cluster state updates so that each (Collection API) command execution 
> does the update directly in Zookeeper using optimistic locking (Compare and 
> Swap on the {{state.json}} Zookeeper files).
> Following this change cluster state updates would still be happening only 
> from the Overseer node (that's where Collection API commands are executing), 
> but the code will be ready for distribution once such commands can be 
> executed by any node (other work done in the context of parent task 
> SOLR-14927).
> See the [Cluster State 
> Updater|https://docs.google.com/document/d/1u4QHsIHuIxlglIW6hekYlXGNOP0HjLGVX5N6inkj6Ok/edit#heading=h.ymtfm3p518c]
>  section in the Removing Overseer doc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-13 Thread GitBox


mikemccand commented on a change in pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#discussion_r523259786



##
File path: lucene/core/src/test/org/apache/lucene/index/TestCustomTermFreq.java
##
@@ -458,4 +458,50 @@ public void testFieldInvertState() throws Exception {
 
 IOUtils.close(w, dir);
   }
+
+  // LUCENE-8947: Indexing fails with "too many tokens for field" when using 
custom term frequencies

Review comment:
   Maybe add `when using large enough custom term frequencies to overflow 
int on accumulation`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8947) Indexing fails with "too many tokens for field" when using custom term frequencies

2020-11-13 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231851#comment-17231851
 ] 

Michael McCandless commented on LUCENE-8947:


Thanks [~dxl360], I'll look!

> Indexing fails with "too many tokens for field" when using custom term 
> frequencies
> --
>
> Key: LUCENE-8947
> URL: https://issues.apache.org/jira/browse/LUCENE-8947
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.5
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are using custom term frequencies (LUCENE-7854) to index per-token scoring 
> signals, however for one document that had many tokens and those tokens had 
> fairly large (~998,000) scoring signals, we hit this exception:
> {noformat}
> 2019-08-05T21:32:37,048 [ERROR] (LuceneIndexing-3-thread-3) 
> com.amazon.lucene.index.IndexGCRDocument: Failed to index doc: 
> java.lang.IllegalArgumentException: too many tokens for field "foobar"
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:825)
> at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
> at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
> at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
> at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
> {noformat}
> This is happening in this code in {{DefaultIndexingChain.java}}:
> {noformat}
>   try {
> invertState.length = Math.addExact(invertState.length, 
> invertState.termFreqAttribute.getTermFrequency());
>   } catch (ArithmeticException ae) {
> throw new IllegalArgumentException("too many tokens for field \"" + 
> field.name() + "\"");
>   }{noformat}
> Where Lucene is accumulating the total length (number of tokens) for the 
> field.  But total length doesn't really make sense if you are using custom 
> term frequencies to hold arbitrary scoring signals?  Or, maybe it does make 
> sense, if user is using this as simple boosting, but maybe we should allow 
> this length to be a {{long}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8947) Indexing fails with "too many tokens for field" when using custom term frequencies

2020-11-13 Thread Duan Li (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231847#comment-17231847
 ] 

Duan Li commented on LUCENE-8947:
-

I open a PR to fix this issue https://github.com/apache/lucene-solr/pull/2080.

> Indexing fails with "too many tokens for field" when using custom term 
> frequencies
> --
>
> Key: LUCENE-8947
> URL: https://issues.apache.org/jira/browse/LUCENE-8947
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.5
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are using custom term frequencies (LUCENE-7854) to index per-token scoring 
> signals, however for one document that had many tokens and those tokens had 
> fairly large (~998,000) scoring signals, we hit this exception:
> {noformat}
> 2019-08-05T21:32:37,048 [ERROR] (LuceneIndexing-3-thread-3) 
> com.amazon.lucene.index.IndexGCRDocument: Failed to index doc: 
> java.lang.IllegalArgumentException: too many tokens for field "foobar"
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:825)
> at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
> at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
> at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
> at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
> {noformat}
> This is happening in this code in {{DefaultIndexingChain.java}}:
> {noformat}
>   try {
> invertState.length = Math.addExact(invertState.length, 
> invertState.termFreqAttribute.getTermFrequency());
>   } catch (ArithmeticException ae) {
> throw new IllegalArgumentException("too many tokens for field \"" + 
> field.name() + "\"");
>   }{noformat}
> Where Lucene is accumulating the total length (number of tokens) for the 
> field.  But total length doesn't really make sense if you are using custom 
> term frequencies to hold arbitrary scoring signals?  Or, maybe it does make 
> sense, if user is using this as simple boosting, but maybe we should allow 
> this length to be a {{long}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2010: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead

2020-11-13 Thread GitBox


muse-dev[bot] commented on a change in pull request #2010:
URL: https://github.com/apache/lucene-solr/pull/2010#discussion_r523248650



##
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##
@@ -344,13 +344,8 @@ final private void doReplicateOnlyRecovery(SolrCore core) 
throws InterruptedExce

 // though
   try {
 CloudDescriptor cloudDesc = this.coreDescriptor.getCloudDescriptor();
-ZkNodeProps leaderprops = zkStateReader.getLeaderRetry(
-cloudDesc.getCollectionName(), cloudDesc.getShardId());
-final String leaderBaseUrl = 
leaderprops.getStr(ZkStateReader.BASE_URL_PROP);
-final String leaderCoreName = 
leaderprops.getStr(ZkStateReader.CORE_NAME_PROP);
-
-String leaderUrl = ZkCoreNodeProps.getCoreUrl(leaderBaseUrl, 
leaderCoreName);
-
+ZkNodeProps leaderprops = 
zkStateReader.getLeaderRetry(cloudDesc.getCollectionName(), 
cloudDesc.getShardId());
+String leaderUrl = ZkCoreNodeProps.getCoreUrl(leaderprops);
 String ourUrl = ZkCoreNodeProps.getCoreUrl(baseUrl, coreName);
 
 boolean isLeader = leaderUrl.equals(ourUrl); // TODO: We can probably 
delete most of this code if we say this

Review comment:
   *NULL_DEREFERENCE:*  object `leaderUrl` last assigned on line 348 could 
be null and is dereferenced at line 351.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dxl360 opened a new pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-13 Thread GitBox


dxl360 opened a new pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080


   
   
   
   # Description
   
   Lucene accumulates the total length when indexing the field. But when we use 
custom term frequencies to hold arbitrary scoring signals, Lucene will run into 
integer overflow error during accumulation if the scoring signals and the 
number of tokens are too large. This PR aims to fix this issue 
https://issues.apache.org/jira/browse/LUCENE-8947
   
   # Solution
   
   Skip the field length accumulation when norms is disabled.
   
   # Tests
   
   The test tries to index a field with extremely large custom term frequency 
   - Successfully index the field that omits norms
   - Expect to trigger the indexing error when indexing the same field with 
norms disabled
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14997) Admin UI shows only the host name (not even port) in the graph view.

2020-11-13 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14997.
---
Resolution: Invalid

Oh, bother.

It turns out that the port name _is_ shown _if_ you have more than one JVM 
running, I didn't think to check that first.  So never mind and sorry for the 
noise.

> Admin UI shows only the host name (not even port) in the graph view.
> 
>
> Key: SOLR-14997
> URL: https://issues.apache.org/jira/browse/SOLR-14997
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Erick Erickson
>Priority: Major
> Attachments: Screen Shot 2020-11-13 at 7.51.28 AM.png
>
>
> Didn't check 8x.
> The graph view just shows "localhost (N)" for each replica, see attached 
> screenshot. It should at least show the port.
> Showing the port is important I think, when I have multiple JVMs on the same 
> machine, seeing all the replicas in a particular JVM have a problem at a 
> glance is very useful.
> I don't have any strong feelings about showing the full replica name.
> What do people think about showing the full node name? People often put 
> important information in the node name relative to their organization, it'd 
> also help people understand what goes into, say, the createNodeSet. So the 
> equivalent to my screenshot would be "localhost:8981_solr"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14998) any Collections Handler actions should be logged at debug level

2020-11-13 Thread Nazerke Seidan (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nazerke Seidan updated SOLR-14998:
--
Summary: any Collections Handler actions should be logged at debug level  
(was: CLUSTERSTATUS  info level logging is redundant in CollectionsHandler )

> any Collections Handler actions should be logged at debug level
> ---
>
> Key: SOLR-14998
> URL: https://issues.apache.org/jira/browse/SOLR-14998
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster 
> status  is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO 
> level should be set to DEBUG  to avoid a lot of noise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14998) CLUSTERSTATUS info level logging is redundant in CollectionsHandler

2020-11-13 Thread Nazerke Seidan (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231708#comment-17231708
 ] 

Nazerke Seidan commented on SOLR-14998:
---

{{logs from CollectionsHandler: Invoked Collection Action :clusterstatus with 
params action=CLUSTERSTATUS=javabin=2 and sendToOCPQueue=true}}

{{logs from HttpSolrCall:[admin] webapp=null path=/admin/collections 
params=\{action=CLUSTERSTATUS=javabin=2} status=0 QTime=1}}

 

{{From the logs I see only action=CLUSTERSTATUS but it should be any 
Collections handler actions }}

> CLUSTERSTATUS  info level logging is redundant in CollectionsHandler 
> -
>
> Key: SOLR-14998
> URL: https://issues.apache.org/jira/browse/SOLR-14998
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster 
> status  is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO 
> level should be set to DEBUG  to avoid a lot of noise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude commented on a change in pull request #2010: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead

2020-11-13 Thread GitBox


thelabdude commented on a change in pull request #2010:
URL: https://github.com/apache/lucene-solr/pull/2010#discussion_r523117430



##
File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
##
@@ -1401,8 +1420,7 @@ public ZkCoreNodeProps getLeaderProps(final String 
collection,
 byte[] data = zkClient.getData(
 ZkStateReader.getShardLeadersPath(collection, slice), null, null,
 true);
-ZkCoreNodeProps leaderProps = new ZkCoreNodeProps(
-ZkNodeProps.load(data));
+ZkCoreNodeProps leaderProps = new 
ZkCoreNodeProps(ZkNodeProps.load(data));

Review comment:
   that'll teach me to fix weird whitespacing!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-13 Thread GitBox


alessandrobenedetti commented on pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#issuecomment-726896901


   Hi @cpoerschke  I just pushed all your commits, I was happy with all of them 
:) 
   If there's no other observation I would proceed committing next week ( it 
will be my first direct commit, so I'll need to take a look to official 
guidelines and target for merging).
   I assume we merge to master squashing and then cherry-pick the commit to 
some other branches?
   Do you think we need to target a major release? Or we could add it in the 
upcoming minors?
   Have a nice weekend and thank you for your help, it has been greatly 
appreciated!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-13 Thread GitBox


alessandrobenedetti commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r523103150



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -271,17 +340,24 @@ public void transform(SolrDocument doc, int docid)
 
 private void implTransform(SolrDocument doc, int docid, Float score)
 throws IOException {
-  Object fv = featureLogger.getFeatureVector(docid, scoringQuery, 
searcher);
-  if (fv == null) { // FV for this document was not in the cache
-fv = featureLogger.makeFeatureVector(
-LTRRescorer.extractFeaturesInfo(
-modelWeight,
-docid,
-(docsWereNotReranked ? score : null),
-leafContexts));
+  LTRScoringQuery rerankingQuery = rerankingQueries[0];
+  LTRScoringQuery.ModelWeight rerankingModelWeight = modelWeights[0];
+  if (rerankingQueries.length > 1 && 
((LTRInterleavingScoringQuery)rerankingQueries[1]).getPickedInterleavingDocIds().contains(docid))
 {

Review comment:
   Perfectly splendid! I agree!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-13 Thread GitBox


alessandrobenedetti commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r523102857



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }
-  
-  // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
 
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
+  LTRScoringQuery[] rerankingQueriesFromContext = 
SolrQueryRequestContextUtils.getScoringQueries(req);
+  docsWereNotReranked = (rerankingQueriesFromContext == null || 
rerankingQueriesFromContext.length == 0);
+  String transformerFeatureStore = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  Map transformerExternalFeatureInfo = 
LTRQParserPlugin.extractEFIParams(localparams);
 
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
+  initLoggingModel(transformerFeatureStore);
+  setupRerankingQueriesForLogging(rerankingQueriesFromContext, 
transformerFeatureStore, transformerExternalFeatureInfo);
+  setupRerankingWeightsForLogging(context);
+}
+
+private boolean isModelMatchingFeatureStore(String featureStoreName, 
LTRScoringModel model) {
+  return model != null && 
featureStoreName.equals(model.getFeatureStoreName());
+}
 
-  scoringQuery = new LTRScoringQuery(lm,
-  LTRQParserPlugin.extractEFIParams(localparams),
-  true,
-  threadManager); // request feature weights to be created for all 
features
+/**
+ * The loggingModel is an empty model that is just used to extract the 
features
+ * and log them
+ * @param transformerFeatureStore the explicit transformer feature store
+ */
+private void initLoggingModel(String transformerFeatureStore) {
+  if (transformerFeatureStore == null || 
!isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) {
+// if store is set in the transformer we should overwrite the logger
+final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
 
-}catch (final Exception e) {
-  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "retrieving the feature store "+featureStoreName, e);
-}
-  }
+final FeatureStore store = fr.getFeatureStore(transformerFeatureStore);
+transformerFeatureStore = store.getName(); // if featureStoreName was 
null before this gets actual name
 
-  if (scoringQuery.getOriginalQuery() == null) {
-scoringQuery.setOriginalQuery(context.getQuery());
+loggingModel = new LoggingModel(loggingModelName,
+transformerFeatureStore, store.getFeatures());
   }
-  if (scoringQuery.getFeatureLogger() == null){
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-  }
-  scoringQuery.setRequest(req);
-
-  featureLogger = scoringQuery.getFeatureLogger();
+}
 
-  try {
-modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 
1f);
-  } catch (final IOException e) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, 
e.getMessage(), e);
+/**
+ * When preparing the reranking queries for logging features various 
scenarios apply:
+ * 
+ * No Reranking 
+ * There is the need of a logger model from the default feature store/ the 
explicit feature store passed
+ * to extract the feature vector
+ * 
+ * Re Ranking
+ * 1) If no explicit feature store is passed, the models for each 
reranking query can be safely re-used
+ * the feature vector can be fetched from the feature vector cache.
+ * 2) If an explicit feature store is passed, and no reranking query uses 
a model from that featureStore,
+ * There is the need of a logger model to extract the feature vector
+ * 3) If an explicit feature store is passed, and there is a reranking 
query that uses a model from that 

[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-13 Thread GitBox


alessandrobenedetti commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r523102600



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }
-  
-  // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
 
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
+  LTRScoringQuery[] rerankingQueriesFromContext = 
SolrQueryRequestContextUtils.getScoringQueries(req);
+  docsWereNotReranked = (rerankingQueriesFromContext == null || 
rerankingQueriesFromContext.length == 0);
+  String transformerFeatureStore = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  Map transformerExternalFeatureInfo = 
LTRQParserPlugin.extractEFIParams(localparams);
 
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
+  initLoggingModel(transformerFeatureStore);
+  setupRerankingQueriesForLogging(rerankingQueriesFromContext, 
transformerFeatureStore, transformerExternalFeatureInfo);
+  setupRerankingWeightsForLogging(context);
+}
+
+private boolean isModelMatchingFeatureStore(String featureStoreName, 
LTRScoringModel model) {
+  return model != null && 
featureStoreName.equals(model.getFeatureStoreName());
+}
 
-  scoringQuery = new LTRScoringQuery(lm,
-  LTRQParserPlugin.extractEFIParams(localparams),
-  true,
-  threadManager); // request feature weights to be created for all 
features
+/**
+ * The loggingModel is an empty model that is just used to extract the 
features
+ * and log them
+ * @param transformerFeatureStore the explicit transformer feature store
+ */
+private void initLoggingModel(String transformerFeatureStore) {
+  if (transformerFeatureStore == null || 
!isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) {
+// if store is set in the transformer we should overwrite the logger
+final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
 
-}catch (final Exception e) {
-  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "retrieving the feature store "+featureStoreName, e);
-}
-  }
+final FeatureStore store = fr.getFeatureStore(transformerFeatureStore);
+transformerFeatureStore = store.getName(); // if featureStoreName was 
null before this gets actual name
 
-  if (scoringQuery.getOriginalQuery() == null) {
-scoringQuery.setOriginalQuery(context.getQuery());
+loggingModel = new LoggingModel(loggingModelName,
+transformerFeatureStore, store.getFeatures());
   }
-  if (scoringQuery.getFeatureLogger() == null){
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-  }
-  scoringQuery.setRequest(req);
-
-  featureLogger = scoringQuery.getFeatureLogger();
+}
 
-  try {
-modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 
1f);
-  } catch (final IOException e) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, 
e.getMessage(), e);
+/**
+ * When preparing the reranking queries for logging features various 
scenarios apply:
+ * 
+ * No Reranking 
+ * There is the need of a logger model from the default feature store/ the 
explicit feature store passed
+ * to extract the feature vector
+ * 
+ * Re Ranking
+ * 1) If no explicit feature store is passed, the models for each 
reranking query can be safely re-used
+ * the feature vector can be fetched from the feature vector cache.
+ * 2) If an explicit feature store is passed, and no reranking query uses 
a model from that featureStore,
+ * There is the need of a logger model to extract the feature vector
+ * 3) If an explicit feature store is passed, and there is a reranking 
query that uses a model from that 

[jira] [Created] (SOLR-14999) Add built-in option to advertise Solr with a different port than Jetty listens on.

2020-11-13 Thread Houston Putman (Jira)
Houston Putman created SOLR-14999:
-

 Summary: Add built-in option to advertise Solr with a different 
port than Jetty listens on.
 Key: SOLR-14999
 URL: https://issues.apache.org/jira/browse/SOLR-14999
 Project: Solr
  Issue Type: Improvement
Reporter: Houston Putman
Assignee: Houston Putman


Currently the default settings in {{solr.xml}} allow the specification of one 
port, {{jetty.port}}  which the bin/solr script provides from the {{SOLR_PORT}} 
environment variable. This port is used twice. Jetty uses it to listen for 
requests, and the clusterState uses the port to advertise the address of the 
Solr Node.

In cloud environments, it's sometimes crucial to be able to listen on one port 
and advertise yourself as listening on another. This is because there is a 
proxy that listens on the advertised port, and forwards the request to the 
server which is listening to the jetty port.

Solr already supports having a separate Jetty port and Live Nodes port 
(examples provided in the dev-list discussion linked below). I suggest that we 
add this to the default solr config so that users can use the default solr.xml 
in cloud configurations, and the solr/bin script will enable easy use of this 
feature.

There has been [discussion on this exact 
problem|https://mail-archives.apache.org/mod_mbox/lucene-dev/201910.mbox/%3CCABEwPvGFEggt9Htn%3DA5%3DtoawuimSJ%2BZcz0FvsaYod7v%2B4wHKog%40mail.gmail.com%3E]
 on the dev list already.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-13 Thread GitBox


alessandrobenedetti commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r523074662



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -208,55 +216,116 @@ public void setContext(ResultContext context) {
   if (threadManager != null) {
 
threadManager.setExecutor(context.getRequest().getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor());
   }
-  
-  // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
 
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
+  LTRScoringQuery[] rerankingQueriesFromContext = 
SolrQueryRequestContextUtils.getScoringQueries(req);
+  docsWereNotReranked = (rerankingQueriesFromContext == null || 
rerankingQueriesFromContext.length == 0);
+  String transformerFeatureStore = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  Map transformerExternalFeatureInfo = 
LTRQParserPlugin.extractEFIParams(localparams);
 
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
+  initLoggingModel(transformerFeatureStore);
+  setupRerankingQueriesForLogging(rerankingQueriesFromContext, 
transformerFeatureStore, transformerExternalFeatureInfo);
+  setupRerankingWeightsForLogging(context);
+}
+
+private boolean isModelMatchingFeatureStore(String featureStoreName, 
LTRScoringModel model) {
+  return model != null && 
featureStoreName.equals(model.getFeatureStoreName());
+}
 
-  scoringQuery = new LTRScoringQuery(lm,
-  LTRQParserPlugin.extractEFIParams(localparams),
-  true,
-  threadManager); // request feature weights to be created for all 
features
+/**
+ * The loggingModel is an empty model that is just used to extract the 
features
+ * and log them
+ * @param transformerFeatureStore the explicit transformer feature store
+ */
+private void initLoggingModel(String transformerFeatureStore) {
+  if (transformerFeatureStore == null || 
!isModelMatchingFeatureStore(transformerFeatureStore, loggingModel)) {
+// if store is set in the transformer we should overwrite the logger
+final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
 
-}catch (final Exception e) {
-  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "retrieving the feature store "+featureStoreName, e);
-}
-  }
+final FeatureStore store = fr.getFeatureStore(transformerFeatureStore);
+transformerFeatureStore = store.getName(); // if featureStoreName was 
null before this gets actual name
 
-  if (scoringQuery.getOriginalQuery() == null) {
-scoringQuery.setOriginalQuery(context.getQuery());
+loggingModel = new LoggingModel(loggingModelName,
+transformerFeatureStore, store.getFeatures());
   }
-  if (scoringQuery.getFeatureLogger() == null){
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-  }
-  scoringQuery.setRequest(req);
-
-  featureLogger = scoringQuery.getFeatureLogger();
+}
 
-  try {
-modelWeight = scoringQuery.createWeight(searcher, ScoreMode.COMPLETE, 
1f);
-  } catch (final IOException e) {
-throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, 
e.getMessage(), e);
+/**
+ * When preparing the reranking queries for logging features various 
scenarios apply:
+ * 
+ * No Reranking 
+ * There is the need of a logger model from the default feature store/ the 
explicit feature store passed
+ * to extract the feature vector
+ * 
+ * Re Ranking
+ * 1) If no explicit feature store is passed, the models for each 
reranking query can be safely re-used
+ * the feature vector can be fetched from the feature vector cache.
+ * 2) If an explicit feature store is passed, and no reranking query uses 
a model from that featureStore,
+ * There is the need of a logger model to extract the feature vector
+ * 3) If an explicit feature store is passed, and there is a reranking 
query that uses a model from that 

[jira] [Commented] (SOLR-14997) Admin UI shows only the host name (not even port) in the graph view.

2020-11-13 Thread Cassandra Targett (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231616#comment-17231616
 ] 

Cassandra Targett commented on SOLR-14997:
--

I'm using 8.7 right now and it shows the port, so it's lack is either master or 
whatever other branch you're using.

The full node name might be interesting in another place, but I think for some 
implementations the graph will get even harder to read if it appears here 
(because it will be really long). The "Nodes" screen already shows the host and 
node name, which seems sufficient to me. I'm not sure anyone would really make 
a connection between the thing that appears on the Graph view and the 
createNodeSet parameter just by showing it on that screen if they aren't 
already making the connection from the Nodes screen.

> Admin UI shows only the host name (not even port) in the graph view.
> 
>
> Key: SOLR-14997
> URL: https://issues.apache.org/jira/browse/SOLR-14997
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Erick Erickson
>Priority: Major
> Attachments: Screen Shot 2020-11-13 at 7.51.28 AM.png
>
>
> Didn't check 8x.
> The graph view just shows "localhost (N)" for each replica, see attached 
> screenshot. It should at least show the port.
> Showing the port is important I think, when I have multiple JVMs on the same 
> machine, seeing all the replicas in a particular JVM have a problem at a 
> glance is very useful.
> I don't have any strong feelings about showing the full replica name.
> What do people think about showing the full node name? People often put 
> important information in the node name relative to their organization, it'd 
> also help people understand what goes into, say, the createNodeSet. So the 
> equivalent to my screenshot would be "localhost:8981_solr"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14998) CLUSTERSTATUS info level logging is redundant in CollectionsHandler

2020-11-13 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231618#comment-17231618
 ] 

David Smiley commented on SOLR-14998:
-

So it's super-clear, can you comment an example of what both logs look like?  
Is this specific to CLUSTERSTATUS (you put it in the title) or is it *any* 
Collections handler actions?

> CLUSTERSTATUS  info level logging is redundant in CollectionsHandler 
> -
>
> Key: SOLR-14998
> URL: https://issues.apache.org/jira/browse/SOLR-14998
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster 
> status  is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO 
> level should be set to DEBUG  to avoid a lot of noise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14998) CLUSTERSTATUS info level logging is redundant in CollectionsHandler

2020-11-13 Thread Nazerke Seidan (Jira)
Nazerke Seidan created SOLR-14998:
-

 Summary: CLUSTERSTATUS  info level logging is redundant in 
CollectionsHandler 
 Key: SOLR-14998
 URL: https://issues.apache.org/jira/browse/SOLR-14998
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Nazerke Seidan


CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster 
status  is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO 
level should be set to DEBUG  to avoid a lot of noise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #2047: LUCENE-9592: Use doubles in VectorUtil to maintain precision.

2020-11-13 Thread GitBox


msokolov commented on a change in pull request #2047:
URL: https://github.com/apache/lucene-solr/pull/2047#discussion_r523001646



##
File path: lucene/core/src/java/org/apache/lucene/util/VectorUtil.java
##
@@ -25,47 +25,22 @@
   private VectorUtil() {
   }
 
-  public static float dotProduct(float[] a, float[] b) {
-float res = 0f;
-/*
- * If length of vector is larger than 8, we use unrolled dot product to 
accelerate the
- * calculation.
- */
-int i;
-for (i = 0; i < a.length % 8; i++) {
-  res += b[i] * a[i];
-}
-if (a.length < 8) {
-  return res;
-}
-float s0 = 0f;
-float s1 = 0f;
-float s2 = 0f;
-float s3 = 0f;
-float s4 = 0f;
-float s5 = 0f;
-float s6 = 0f;
-float s7 = 0f;
-for (; i + 7 < a.length; i += 8) {
-  s0 += b[i] * a[i];
-  s1 += b[i + 1] * a[i + 1];
-  s2 += b[i + 2] * a[i + 2];
-  s3 += b[i + 3] * a[i + 3];
-  s4 += b[i + 4] * a[i + 4];
-  s5 += b[i + 5] * a[i + 5];
-  s6 += b[i + 6] * a[i + 6];
-  s7 += b[i + 7] * a[i + 7];
+  public static double dotProduct(float[] a, float[] b) {

Review comment:
   As an alternative, we could also consider changing the test to have a 
larger epsilon. I found that with the current 1e-5, I got one failure in 100 
runs. Changing to 1e-4, I ran 1000 iterations with no failures.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9582) Rename VectorValues.ScoreFunction to SearchStrategy

2020-11-13 Thread Michael Sokolov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov resolved LUCENE-9582.
-
Resolution: Fixed

> Rename VectorValues.ScoreFunction  to SearchStrategy
> 
>
> Key: LUCENE-9582
> URL: https://issues.apache.org/jira/browse/LUCENE-9582
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is an issue to apply some of the feedback from LUCENE-9322 that came 
> after it was pushed; we want to:
> 1. rename VectorValues.ScoreFunction -> SearchStrategy (and all of the 
> references to that terminology), and make it a simple enum with no 
> implementation
> 2. rename the strategies to indicate the ANN implementation that backs them, 
> so we can represent more than one such implementation/algorithm.
> 3. Move scoring implementation to a utility class
> I'll open a separate issue for exploring how to hide the 
> VectorValues.RandomAccess  API, which is probably specific to HNSW
> FYI [~jtibshirani] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12741) Nuke rule based replica placement strategy in Lucene/Solr 8.0

2020-11-13 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231524#comment-17231524
 ] 

Erick Erickson commented on SOLR-12741:
---

[~shalin] and maybe [~noble] Any updates here?

> Nuke rule based replica placement strategy in Lucene/Solr 8.0
> -
>
> Key: SOLR-12741
> URL: https://issues.apache.org/jira/browse/SOLR-12741
> Project: Solr
>  Issue Type: Task
>  Components: AutoScaling, SolrCloud
>Reporter: Shalin Shekhar Mangar
>Priority: Blocker
>
> Once SOLR-12740 is done, we should nuke all code related to rule based 
> replica placement strategy in Solr 8.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231500#comment-17231500
 ] 

Uwe Schindler commented on LUCENE-9608:
---

Yes. On ASF and Policeman Jenkins.

Unfortunately the branch is quite old and does not even gave a fully functional 
Gradle build. It should really be merged soon.

Uwe

> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

2020-11-13 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231499#comment-17231499
 ] 

Michael Sokolov commented on LUCENE-9583:
-

bq. Perhaps we could revisit this issue once the first ANN implementation is 
completed?

[~jtibshirani] that makes sense.  We can leave this open, even though the 
attached PR was pushed. I just pushed LUCENE-9004 as well, implementing NSW 
graph indexing, so that should give us a more concrete basis for comparison. I 
have been testing performance (recall/latency) using a KnnGraphTester class 
that is part of that. However one challenge is coming up with a test dataset we 
can share. I have been using some proprietary embeddings, getting good results, 
and just started looking into testing with GloVe, and got not-so-good results 
there.  I am concerned that GloVe may have some strong clustering and require 
us to implement the diversity heuristic from the HNSW paper.

> How should we expose VectorValues.RandomAccess?
> ---
>
> Key: LUCENE-9583
> URL: https://issues.apache.org/jira/browse/LUCENE-9583
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231498#comment-17231498
 ] 

Dawid Weiss commented on LUCENE-9608:
-

All cloud2refimpl builds are from that branch, I believe.

> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231494#comment-17231494
 ] 

Michael McCandless commented on LUCENE-9608:


Ahh, thanks for digging into this so quickly [~dweiss]! 
{quote}The build you mentioned, Mike, comes from Mark's Solr branch - I think 
this patch has not been applied there, I don't know. This works on master just 
fine.
{quote}
I had not realized it was Mark's Solr branch – thanks for the explanation.

> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-11-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231490#comment-17231490
 ] 

ASF subversion and git services commented on LUCENE-9004:
-

Commit b36b4af22bb76dc42b466b818b417bcbc0deb006 in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b36b4af ]

LUCENE-9004: KNN vector search using NSW graphs (#2022)



> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Assignee: Michael Sokolov
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does work by some measure using 
> a standalone test class. I've tested with uniform random vectors and on my 
> laptop indexed 10K documents in around 10 seconds and searched them at 95% 
> recall (compared with exact 

[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-11-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231491#comment-17231491
 ] 

ASF subversion and git services commented on LUCENE-9004:
-

Commit 03c1910bff2f94d7a733a9688aa15d3282718040 in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=03c1910 ]

LUCENE-9004: CHANGES.txt entry


> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Assignee: Michael Sokolov
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does work by some measure using 
> a standalone test class. I've tested with uniform random vectors and on my 
> laptop indexed 10K documents in around 10 seconds and searched them at 95% 
> recall (compared with exact nearest-neighbor baseline) 

[jira] [Resolved] (LUCENE-9004) Approximate nearest vector search

2020-11-13 Thread Michael Sokolov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov resolved LUCENE-9004.
-
Fix Version/s: master (9.0)
 Assignee: Michael Sokolov
   Resolution: Fixed

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Assignee: Michael Sokolov
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does work by some measure using 
> a standalone test class. I've tested with uniform random vectors and on my 
> laptop indexed 10K documents in around 10 seconds and searched them at 95% 
> recall (compared with exact nearest-neighbor baseline) at around 250 QPS. I 
> haven't made any attempt to use multithreaded search for this, but it is 
> amenable to per-segment concurrency.
> [1] 
> 

[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-11-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231484#comment-17231484
 ] 

ASF subversion and git services commented on LUCENE-9004:
-

Commit b36b4af22bb76dc42b466b818b417bcbc0deb006 in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b36b4af ]

LUCENE-9004: KNN vector search using NSW graphs (#2022)



> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Assignee: Michael Sokolov
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does work by some measure using 
> a standalone test class. I've tested with uniform random vectors and on my 
> laptop indexed 10K documents in around 10 seconds and searched them at 95% 
> recall (compared with exact 

[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231481#comment-17231481
 ] 

Dawid Weiss commented on LUCENE-9608:
-

The build you mentioned, Mike, comes from Mark's Solr branch - I think this 
patch has not been applied there, I don't know. This works on master just fine.

> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9608.
-
Resolution: Works for Me

This works on master.

> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-11-13 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231479#comment-17231479
 ] 

Michael Sokolov commented on LUCENE-9004:
-

I pushed the attached CR, and will close this issue. There are lots of 
followups needed for things like: improving (reducing) heap usage during graph 
construction, adding a Query implementation, exposing index hyperparameters, 
benchmarks, testing on public datasets, implementing a diversity heuristic for 
neighbor selection during graph construction, making the graph hierarchical, 
exploring more efficient search across multiple per-segment graphs, etc. I will 
open issues for the most immediate things that are clearly needed.

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does work by some measure using 

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-13 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231472#comment-17231472
 ] 

Michael McCandless commented on LUCENE-9378:


Thanks [~jpountz]!

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Fix For: 8.8
>
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231469#comment-17231469
 ] 

Dawid Weiss commented on LUCENE-9608:
-

Linux:
{code}
ERROR: The following test(s) have failed:
  - org.apache.lucene.util.TestPleaseFail.testFail (:lucene:test-framework)
Test output: 
/home/dweiss/work/lucene-solr/lucene/test-framework/build/test-results/test/outputs/OUTPUT-org.apache.lucene.util.TestPleaseFail.txt
Reproduce with: gradlew :lucene:test-framework:test --tests 
"org.apache.lucene.util.TestPleaseFail.testFail" -Ptests.jvms=4 
"-Ptests.jvmargs=-XX:+UseSerialGC -Dplease.fail=true" 
-Ptests.seed=34EFD9D3E6994AEF -Ptests.file.encoding=ISO-8859-1
{code}

> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov merged pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-13 Thread GitBox


msokolov merged pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231467#comment-17231467
 ] 

Dawid Weiss commented on LUCENE-9608:
-

I've just added a class that can be hand-triggered to cause an error and it 
works for me just fine:
{code}
gradlew :lucene:test-framework:test --tests "TestPleaseFail" 
"-Ptests.jvmargs=-XX:+UseSerialGC -Dplease.fail=true"
{code}

results in (note the quotes):
{code}
ERROR: The following test(s) have failed:
  - org.apache.lucene.util.TestPleaseFail.testFail (:lucene:test-framework)
Test output: 
C:\Work\apache\lucene.master\lucene\test-framework\build\test-results\test\outputs\OUTPUT-org.apache.lucene.util.TestPleaseFail.txt
Reproduce with: gradlew :lucene:test-framework:test --tests 
"org.apache.lucene.util.TestPleaseFail.testFail" -Ptests.jvms=12 
"-Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC -Dplease.fail=true" 
-Ptests.seed=9845E6C55FCBCABD -Ptests.file.encoding=UTF-8
{code}

> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231464#comment-17231464
 ] 

ASF subversion and git services commented on LUCENE-9608:
-

Commit 80a0154d572596d1e2a8af41c828d08332bb77f0 in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=80a0154 ]

LUCENE-9608: add a hand-triggered test error class.


> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14976) New collection is not getting created.

2020-11-13 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-14976.

Resolution: Invalid

Hi Prince,

Please bring this up on the mailing list.  JIRA is typically reserved for known 
concrete issues. 
 "Support portal" questions and requests for help are better on the solr-user 
mailing list, where there are a lot more eyes.

> New collection is not getting created.
> --
>
> Key: SOLR-14976
> URL: https://issues.apache.org/jira/browse/SOLR-14976
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.7.3
>Reporter: Prince
>Priority: Major
> Attachments: Solr_Zk_Logs.txt
>
>
> Hi Team,
> We aren't able to create a new collection, either from solradmin UI or from 
> CLI. Once solr is restarted, we are able to proceed with creating collections 
> and the same scenario repeats over.
> Attached zookeeper and solr logs to the case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9598) Improve the summary of Jenkins emails on failure

2020-11-13 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231458#comment-17231458
 ] 

Uwe Schindler commented on LUCENE-9598:
---

Hi,
due to problem with the builds on Policeman Jenkins (maybe too less heap space 
for Jenkins slaves), I removed the horrible Regex on the build log. Many builds 
on external nodes were "hanging" on sending mails. I found no OOMs, but 
something was fishy.

The mails now only contain JVM settings and failed tests, but no build log 
snippets anymore. The output by Ant and Gradle is often very huge (especially 
on failed Solr tests with sometimes hundreds of moegabytes of log file output) 
and the regexes seem to maybe never end or run out of memory.

I did not change ASF Jenkins.

> Improve the summary of Jenkins emails on failure
> 
>
> Key: LUCENE-9598
> URL: https://issues.apache.org/jira/browse/LUCENE-9598
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Priority: Minor
>
> Where are the patterns that drive what's extracted from the console logs sent 
> to builds mailing list? I think these could be improved to include more 
> context starting after "FAILURE" - then you know which task failed exactly, 
> not just that the build failed. 
> {code}
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':solr:solr-ref-guide:checkLocalJavadocLinksSite'.
> > Process 'command '/usr/local/asfpackages/java/jdk-11.0.6/bin/java'' 
> > finished with non-zero exit value 255
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or 
> --debug option to get more log output. Run with --scan to get full insights.
> * Get more help at https://help.gradle.org
> Deprecated Gradle features were used in this build, making it incompatible 
> with Gradle 7.0.
> Use '--warning-mode all' to show the individual deprecation warnings.
> See 
> https://docs.gradle.org/6.6.1/userguide/command_line_interface.html#sec:command_line_warnings
> BUILD FAILED in 1h 6m 1s
> 852 actionable tasks: 852 executed
> Build step 'Invoke Gradle script' changed build result to FAILURE
> Build step 'Invoke Gradle script' marked build as failure
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
> [Email-ext] Notification email body length: 446
> Sending email to: bui...@lucene.apache.org
> Finished: FAILURE
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231452#comment-17231452
 ] 

Dawid Weiss commented on LUCENE-9608:
-

This has been fixed already in LUCENE-9549?

> Reproduce with line is missing quotes around JVM args
> -
>
> Key: LUCENE-9608
> URL: https://issues.apache.org/jira/browse/LUCENE-9608
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> When we have an exciting test failure, our {{test-framework}} prints a nice 
> {{Reproduce with:}} output, e.g. from a failure this AM:
> {noformat}
> Reproduce with: gradlew :lucene:test-framework:test --tests 
> "org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
> -XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
> -Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
> But, this is missing quotes around this part:
> {noformat}
> -Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
> it should really be this:
> {noformat}
> -Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
> Probably this is a simple fix?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9607) TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes test failure

2020-11-13 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231444#comment-17231444
 ] 

Michael McCandless commented on LUCENE-9607:


OK I opened LUCENE-9608 to add missing quotes to {{Reproduce with:}} line.

> TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes test failure
> --
>
> Key: LUCENE-9607
> URL: https://issues.apache.org/jira/browse/LUCENE-9607
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> CI builds have been failing with this:
> {noformat}
> FAILED:  
> org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes
> Error Message:
> java.lang.AssertionError
> Stack Trace:
> java.lang.AssertionError
>         at 
> __randomizedtesting.SeedInfo.seed([43D1E1D1DB325AD7:3E13F00D7ACC8E7E]:0)
>         at org.junit.Assert.fail(Assert.java:86)
>         at org.junit.Assert.assertTrue(Assert.java:41)
>         at org.junit.Assert.assertTrue(Assert.java:52)
>         at 
> org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat.checkEncodingCalled(TestUniformSplitPostingFormat.java:63)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1000)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
>         at 

[jira] [Created] (LUCENE-9608) Reproduce with line is missing quotes around JVM args

2020-11-13 Thread Michael McCandless (Jira)
Michael McCandless created LUCENE-9608:
--

 Summary: Reproduce with line is missing quotes around JVM args
 Key: LUCENE-9608
 URL: https://issues.apache.org/jira/browse/LUCENE-9608
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless


When we have an exciting test failure, our {{test-framework}} prints a nice 
{{Reproduce with:}} output, e.g. from a failure this AM:
{noformat}
Reproduce with: gradlew :lucene:test-framework:test --tests 
"org.apache.lucene.util.TestSysoutsLimits" -Ptests.jvms=6 
-Ptests.haltonfailure=false -Ptests.jvmargs=-XX:+UseCompressedOops 
-XX:+UseSerialGC -Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 
-Ptests.badapples=false -Ptests.file.encoding=US-ASCII {noformat}
But, this is missing quotes around this part:
{noformat}
-Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC {noformat}
it should really be this:
{noformat}
-Ptests.jvmargs="-XX:+UseCompressedOops -XX:+UseSerialGC"{noformat}
Probably this is a simple fix?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9607) TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes test failure

2020-11-13 Thread Michael McCandless (Jira)
Michael McCandless created LUCENE-9607:
--

 Summary: 
TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes test failure
 Key: LUCENE-9607
 URL: https://issues.apache.org/jira/browse/LUCENE-9607
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless


CI builds have been failing with this:
{noformat}


FAILED:  
org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat.testCheckIntegrityReadsAllBytes

Error Message:
java.lang.AssertionError

Stack Trace:
java.lang.AssertionError
        at 
__randomizedtesting.SeedInfo.seed([43D1E1D1DB325AD7:3E13F00D7ACC8E7E]:0)
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at 
org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat.checkEncodingCalled(TestUniformSplitPostingFormat.java:63)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1000)
        at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
        at java.base/java.lang.Thread.run(Thread.java:832)


Reproduce with: gradlew :lucene:codecs:test --tests 
"org.apache.lucene.codecs.uniformsplit.TestUniformSplitPostingFormat" 
-Ptests.jvms=6 -Ptests.haltonfailure=false 
-Ptests.jvmargs=-XX:+UseCompressedOops -XX:+UseSerialGC 
-Ptests.seed=43D1E1D1DB325AD7 -Ptests.multiplier=3 -Ptests.badapples=false 
-Ptests.file.encoding=US-ASCII {noformat}
But it does not seem to repro for me on one try.

Also disturbing is the missing quotes around the {{-Ptests.jvmargs=..}} which 
then 

[jira] [Created] (SOLR-14997) Admin UI shows only the host name (not even port) in the graph view.

2020-11-13 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-14997:
-

 Summary: Admin UI shows only the host name (not even port) in the 
graph view.
 Key: SOLR-14997
 URL: https://issues.apache.org/jira/browse/SOLR-14997
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: master (9.0)
Reporter: Erick Erickson
 Attachments: Screen Shot 2020-11-13 at 7.51.28 AM.png

Didn't check 8x.

The graph view just shows "localhost (N)" for each replica, see attached 
screenshot. It should at least show the port.

Showing the port is important I think, when I have multiple JVMs on the same 
machine, seeing all the replicas in a particular JVM have a problem at a glance 
is very useful.

I don't have any strong feelings about showing the full replica name.

What do people think about showing the full node name? People often put 
important information in the node name relative to their organization, it'd 
also help people understand what goes into, say, the createNodeSet. So the 
equivalent to my screenshot would be "localhost:8981_solr"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-13 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231428#comment-17231428
 ] 

Tomoko Uchida commented on LUCENE-9499:
---

{quote}Maybe we should limit applying this task only to test-framework module?
{quote}
I fixed render-javadoc.gradle as suggested.

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9499-javadoc.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231426#comment-17231426
 ] 

ASF subversion and git services commented on LUCENE-9499:
-

Commit 8bac4e7f748592b7f86fb651a35498854a25abb8 in lucene-solr's branch 
refs/heads/master from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8bac4e7 ]

LUCENE-9499: javadoc split package workaroud should be applied only to 
test-framework.


> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9499-javadoc.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-13 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-9499:
--
Attachment: LUCENE-9499-javadoc.patch

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9499-javadoc.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-13 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-9499:
--
Attachment: (was: LUCENE-9499-javadoc)

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9499-javadoc.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-13 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-9499:
--
Attachment: LUCENE-9499-javadoc

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9499-javadoc.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-13 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231402#comment-17231402
 ] 

Uwe Schindler commented on LUCENE-9499:
---

bq. I left it as it was, since we still have split packages in test-framework 
(can we remove these lines now?)

I thought about this the minute after I sent the mail. Maybe we should limit 
applying this task only to test-framework module?

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9499) Clean up package name conflicts between modules (split packages)

2020-11-13 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231393#comment-17231393
 ] 

Tomoko Uchida commented on LUCENE-9499:
---

Thank you [~uschindler] for fixing the link.
{quote}And this may be removed, as we have no split packages anymore: 
[https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=gradle/documentation/render-javadoc.gradle;h=bbd1b5e603a0c9f513c836452a18b9ce9caa83e7;hb=426a9c2#l277]
{quote}
I left it as it was, since we still have split packages in test-framework (can 
we remove these lines now?)

> Clean up package name conflicts between modules (split packages)
> 
>
> Key: LUCENE-9499
> URL: https://issues.apache.org/jira/browse/LUCENE-9499
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have lots of package name conflicts (shared package names) between modules 
> in the source tree. It is not only annoying for devs/users but also indeed 
> bad practice since Java 9 (according to my understanding), and we already 
> have some problems with Javadocs due to these splitted packages as some of us 
> would know. Also split packages make migrating to the Java 9 module system 
> impossible.
> This is the placeholder to fix all package name conflicts in Lucene.
> See the dev list thread for more background. 
>  
> [https://lists.apache.org/thread.html/r6496963e89a5e0615e53206429b6843cc5d3e923a2045cc7b7a1eb03%40%3Cdev.lucene.apache.org%3E]
> Modules that need to be fixed / cleaned up:
>  - analyzers-common (LUCENE-9317)
>  - analyzers-icu (LUCENE-9558)
>  - backward-codecs (LUCENE-9318)
>  - sandbox (LUCENE-9319)
>  - misc (LUCENE-9600)
>  - (test-framework: this can be excluded for the moment)
> Also lucene-core will be heavily affected (some classes have to be moved into 
> {{core}}, or some classes' and methods' in {{core}} visibility have to be 
> relaxed).
> Probably most work would be done in a parallel manner, but conflicts can 
> happen. If someone want to help out, please open an issue before working and 
> share your thoughts with me and others.
> I set "Fix version" to 9.0 - means once we make a commit on here, this will 
> be a blocker for release 9.0.0. (I don't think the changes should be 
> delivered across two major releases; all changes have to be out at once in a 
> major release.) If there are any objections or concerns, please leave 
> comments. For now I have no idea about the total volume of changes or 
> technical obstacles that have to be handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-13 Thread GitBox


dweiss commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-726675884


   No worries. I've pushed the code the way I think it should work - please 
take a look, let me know if you don't understand something. I tested on Windows 
and Linux, runs fine. The 'tests.native' flag is set automatically depending on 
'build.native' but is there separately just in case somebody wished to manually 
enable those tests from IDE level.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-13 Thread GitBox


zacharymorn commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-726643548


   > Thanks Zach. I looked at what you did with tests and I think this can be 
done in a cleaner way that always works. I'll show you how, give me some time.
   
   Sounds good! Look forward to it.
   
   > Also, update your IDE"s formatter settings to the convention used 
throughout the code (you used 4 spaces indentation). May be helpful in other 
patches too.
   
   Ops sorry. Just updated it to use 2 spaces indentation. 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-13 Thread GitBox


dweiss commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-726599397


   Thanks Zach. I looked at what you did with tests and I think this can be 
done in a cleaner way that always works. I'll show you how, give me some time.
   
   Also, update your IDE"s formatter settings to the convention used throughout 
the code (you used 4 spaces indentation). May be helpful in other patches too.
   
   I'll get back to you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-11-13 Thread Simon Willnauer (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231249#comment-17231249
 ] 

Simon Willnauer commented on LUCENE-9508:
-

uups sorry! I meant [~shamirwasia] thanks for the headsup

> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> 
>
> Key: LUCENE-9508
> URL: https://issues.apache.org/jira/browse/LUCENE-9508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.1
>Reporter: Sorabh Hamirwasia
>Priority: Major
>  Labels: IndexWriter
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org