[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search

2020-02-08 Thread Xin-Chun Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin-Chun Zhang updated LUCENE-9136:
---
Description: 
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization based algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
LUCENE-9004) for Lucene, has made great progress. The issue draws attention of 
those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 

As an alternative for solving ANN similarity search problems, IVFFlat is also 
very popular with many users and supporters. Compared with HNSW, IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query (no training required) but requires extra storage for saving graphs 
[indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
*The recall ratio of IVFFlat could be gradually increased by adjusting the 
query parameter (nprobe), while it's hard for HNSW to improve its accuracy*. In 
theory, IVFFlat could achieve 100% recall ratio. Another advantage is that 
IVFFlat can be faster and more accurate when enables GPU parallel computing 
(current not support in Java). Both algorithms have their merits and demerits. 
Since HNSW is now under development, it may be better to provide both 
implementations (HNSW && IVFFlat) for potential users who are faced with very 
different scenarios and want to more choices.

  was:
Representation learning (RL) has been an established discipline in the machine 
learning space for decades but it draws tremendous attention lately with the 
emergence of deep learning. The central problem of RL is to determine an 
optimal representation of the input data. By embedding the data into a high 
dimensional vector, the vector retrieval (VR) method is then applied to search 
the relevant items.

With the rapid development of RL over the past few years, the technique has 
been used extensively in industry from online advertising to computer vision 
and speech recognition. There exist many open source implementations of VR 
algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various 
choices for potential users. However, the aforementioned implementations are 
all written in C++, and no plan for supporting Java interface, making it hard 
to be integrated in Java projects or those who are not familier with C/C++  
[[https://github.com/facebookresearch/faiss/issues/105]]. 

The algorithms for vector retrieval can be roughly classified into four 
categories,
 # Tree-base algorithms, such as KD-tree;
 # Hashing methods, such as LSH (Local Sensitive Hashing);
 # Product quantization based algorithms, such as IVFFlat;
 # Graph-base algorithms, such as HNSW, SSG, NSG;

where IVFFlat and HNSW are the most popular ones among all the VR algorithms.

Recently, the implementation of HNSW (Hierarchical Navigable Small World, 
LUCENE-9004) for Lucene, has made great progress. The issue draws attention of 
those who are interested in Lucene or hope to use HNSW with Solr/Lucene. 

As an alternative for solving ANN similarity search problems, IVFFlat is also 
very popular with many users and supporters. Compared with HNSW, IVFFlat has 
smaller index size but requires k-means clustering, while HNSW is faster in 
query (no training required) but requires extra storage for saving graphs 
[indexing 1M 
vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. 
The recall ratio of IVFFlat could be gradually increased by adjusting the query 
parameter (nprobe), while it's hard for HNSW to improve the 

[jira] [Commented] (LUCENE-9211) Adding compression to BinaryDocValues storage

2020-02-08 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033125#comment-17033125
 ] 

David Smiley commented on LUCENE-9211:
--

This seems cool for some use-cases but I worry about the overhead for others.  
I think I have a benchmark module ".alg" file for SerializedDVStrategy in 
spatial-extras.  I should try it out on your PR.

I wish it was easier for us to let users toggle the choice of DocValuesFormat 
only for one type but not for others.  DocValuesFormat is really a format of 
formats, which is inflexible.  [~juan.duran], a colleague of mine, has been 
diving into this topic lately and I hope he shares it here (new issue of 
course).

> Adding compression to BinaryDocValues storage
> -
>
> Key: LUCENE-9211
> URL: https://issues.apache.org/jira/browse/LUCENE-9211
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
>  Labels: pull-request-available
>
> While SortedSetDocValues can be used today to store identical values in a 
> compact form this is not effective for data with many unique values.
> The proposal is that BinaryDocValues should be stored in LZ4 compressed 
> blocks which can dramatically reduce disk storage costs in many cases. The 
> proposal is blocks of a number of documents are stored as a single compressed 
> blob along with metadata that records offsets where the original document 
> values can be found in the uncompressed content.
> There's a trade-off here between efficient compression (more docs-per-block = 
> better compression) and fast retrieval times (fewer docs-per-block = faster 
> read access for single values). A fixed block size of 32 docs seems like it 
> would be a reasonable compromise for most scenarios.
> A PR is up for review here [https://github.com/apache/lucene-solr/pull/1234]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-08 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376754130
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
 ##
 @@ -124,6 +126,15 @@ public boolean hasSidePath(int state) {
 .toArray(Term[]::new);
   }
 
+  /**
+   * Returns the list of terms that start at the provided state
+   */
+  public QueryBuilder.TermAndBoost[] getTermsAndBoosts(String field, int 
state) {
 
 Review comment:
   Given that this class, GraphTokenStreamFiniteStrings deals with 
`List` (something I did not know when I made a previous 
comment), and also that TermAndBoost is an inner class to QueryBuilder, I think 
it's better to put this back into QueryBuilder.  I still think 
`List` is weird and heavyweight but you didn't add it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-08 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376753387
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/DelimitedBoostTokenFilter.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.boost;
+
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.search.BoostAttribute;
+
+import java.io.IOException;
+
+
+/**
+ * Characters before the delimiter are the "token", those after are the boost.
+ * 
+ * For example, if the delimiter is '|', then for the string "foo|0.7", foo is 
the token
+ * and 0.7 is the boost.
+ * 
+ * Note make sure your Tokenizer doesn't split on the delimiter, or this won't 
work
+ */
+public final class DelimitedBoostTokenFilter extends TokenFilter {
+  private final char delimiter;
+  private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);
+  private final BoostAttribute boostAtt = addAttribute(BoostAttribute.class);
+
+  public DelimitedBoostTokenFilter(TokenStream input, char delimiter) {
+super(input);
+this.delimiter = delimiter;
+  }
+
+  @Override
+  public boolean incrementToken() throws IOException {
+if (input.incrementToken()) {
+  final char[] buffer = termAtt.buffer();
+  final int length = termAtt.length();
+  for (int i = 0; i < length; i++) {
+if (buffer[i] == delimiter) {
+  float boost = Float.parseFloat(new String(buffer, i + 1, (length - 
(i + 1;
+  boostAtt.setBoost(boost);
+  termAtt.setLength(i);
+  return true;
+}
+  }
+  return true;
+} else return false;
 
 Review comment:
   I know this is a minor matter of taste but please but brackets on the false 
side of the else with the code on its own line.  This is for consistency with 
our defacto code style in the project.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-08 Thread GitBox
dsmiley commented on a change in pull request #357: [SOLR-12238] Synonym 
Queries boost
URL: https://github.com/apache/lucene-solr/pull/357#discussion_r376753645
 
 

 ##
 File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/boost/package-info.java
 ##
 @@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Provides various convenience classes for creating boosts on Tokens.
+ */
+package org.apache.lucene.analysis.boost;
 
 Review comment:
   While I can see why you chose a new "boost" sub-package because the payload 
based filter from which you drew inspiration was in a "payload" sub-package, I 
lean towards the "miscellaneous" package.  Note that 
DelimitedTermFrequencyTokenFilter is in "miscellaneous" too.  WDYT @romseygeek 
?  Or maybe we need a new "delimited" sub-package for all these to go; I dunno.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14149) Remove non-changes from CHANGES.txt

2020-02-08 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-14149.
-
Fix Version/s: 8.5
   Resolution: Fixed

> Remove non-changes from CHANGES.txt
> ---
>
> Key: SOLR-14149
> URL: https://issues.apache.org/jira/browse/SOLR-14149
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Our CHANGES.txt should just list our changes / release notes.  Nothing else.
> * no Introduction
> * no "Getting Started"
> * no "Versions of Major Components" with each release.
> We have a website, a reference guide, and a README.md that are all more 
> suitable places.  Lets not maintain it here as well; lets keep this file 
> focused on it's namesake.  We can/should *link* to that information from 
> CHANGES.txt.  For example linking to 
> https://lucene.apache.org/solr/guide/8_4/solr-upgrade-notes.html is highly 
> appropriate as it's a more user friendly editorialized version of CHANGES.txt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14149) Remove non-changes from CHANGES.txt

2020-02-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033118#comment-17033118
 ] 

ASF subversion and git services commented on SOLR-14149:


Commit e6701680f4ecb051dc932cd1b809abe4b84eb6b5 in lucene-solr's branch 
refs/heads/branch_8x from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e670168 ]

SOLR-14149: CHANGES.txt Remove off-topic stuff
* No Introduction (to Solr) header.  Point at solr-upgrade-notes.adoc instead
* No Getting Started header
* No Versions of Major Components header
* No "Upgrade Notes" for subsequent releases.  See solr-upgrade-notes.adoc
Closes #1202

(cherry picked from commit 46c09456140fca0244ef726bd18a3ca7c5c7d131)


> Remove non-changes from CHANGES.txt
> ---
>
> Key: SOLR-14149
> URL: https://issues.apache.org/jira/browse/SOLR-14149
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Our CHANGES.txt should just list our changes / release notes.  Nothing else.
> * no Introduction
> * no "Getting Started"
> * no "Versions of Major Components" with each release.
> We have a website, a reference guide, and a README.md that are all more 
> suitable places.  Lets not maintain it here as well; lets keep this file 
> focused on it's namesake.  We can/should *link* to that information from 
> CHANGES.txt.  For example linking to 
> https://lucene.apache.org/solr/guide/8_4/solr-upgrade-notes.html is highly 
> appropriate as it's a more user friendly editorialized version of CHANGES.txt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14149) Remove non-changes from CHANGES.txt

2020-02-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033117#comment-17033117
 ] 

ASF subversion and git services commented on SOLR-14149:


Commit 46c09456140fca0244ef726bd18a3ca7c5c7d131 in lucene-solr's branch 
refs/heads/master from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=46c0945 ]

SOLR-14149: CHANGES.txt Remove off-topic stuff
* No Introduction (to Solr) header.  Point at solr-upgrade-notes.adoc instead
* No Getting Started header
* No Versions of Major Components header
* No "Upgrade Notes" for subsequent releases.  See solr-upgrade-notes.adoc
Closes #1202


> Remove non-changes from CHANGES.txt
> ---
>
> Key: SOLR-14149
> URL: https://issues.apache.org/jira/browse/SOLR-14149
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Our CHANGES.txt should just list our changes / release notes.  Nothing else.
> * no Introduction
> * no "Getting Started"
> * no "Versions of Major Components" with each release.
> We have a website, a reference guide, and a README.md that are all more 
> suitable places.  Lets not maintain it here as well; lets keep this file 
> focused on it's namesake.  We can/should *link* to that information from 
> CHANGES.txt.  For example linking to 
> https://lucene.apache.org/solr/guide/8_4/solr-upgrade-notes.html is highly 
> appropriate as it's a more user friendly editorialized version of CHANGES.txt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley closed pull request #1202: SOLR-14149: CHANGES.txt Remove off-topic stuff

2020-02-08 Thread GitBox
dsmiley closed pull request #1202: SOLR-14149: CHANGES.txt Remove off-topic 
stuff
URL: https://github.com/apache/lucene-solr/pull/1202
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-02-08 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033086#comment-17033086
 ] 

Julie Tibshirani commented on LUCENE-9004:
--

[~erickerickson] the benchmark force merges the index to one segment to ensure 
we're doing an 'apples to apples' comparison between the Lucene data structure 
and the FAISS implementation. It was taking > 8 hours to complete on the GloVe 
dataset, and it seems that the time is spent merging the kNN graph. If you have 
suggestions on the benchmark set-up, maybe we could discuss on the PR so I 
don't add a lot of noise to the issue?

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does work by some measure using 
> a standalone test class. I've tested with uniform random vectors and on my 
> laptop indexed 10K 

[jira] [Comment Edited] (SOLR-14249) Krb5HttpClientBuilder should not buffer requests

2020-02-08 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032723#comment-17032723
 ] 

Kevin Risden edited comment on SOLR-14249 at 2/8/20 8:41 PM:
-

So I haven't personally looked at Krb5HttpClientBuilder recently, other than 
completely unrelated SOLR-13726. Part of the reason that a lot of clients 
buffer is due to how Kerberos SPNEGO authentication works.

There are 2 parts typically
* a request without authentication where the server returns a 401 with a 
negotiate response
* a request with authentication in response to the negotiate which the server 
can verify

If you don't put any optimizations in place every request becomes two. A lot of 
times a cookie is used here to limit the amount of HTTP requests.

The reason the 401 and second request is an issue - is if the request is a non 
repeatable one - like a POST body. The client ends up sending the body and gets 
a 401 then goes o crap I need to send the body again and can't - because its 
non repeatable.

So a lot of times the super simple workaround is to buffer the request - do the 
401 check dance and then proceed. This is a way to make a non repeatable 
request semi repeatable.

This buffering has issues though as you found where the buffer should be 
limited in size which then limits the usefulness of this technique.

There are a few alternatives to buffering:
* Authenticate upfront with say an OPTIONS request - which will get the cookie. 
the next request say a POST won't have any issue and won't do the 401 dance
* "preemptively" does SPNEGO authorization if you know the SPN needed and 
create the right authorization header - this also skips the 401 and server can 
check the header
* Use "Expect: 100-continue" header which asks the server if it can handle the 
request without the body and if it can then send the body. This actually holds 
the data from being sent in the first place if possible.
** Curl automatically activates "Expect: 100-continue" under a few conditions- 
https://gms.tf/when-curl-sends-100-continue.html
** Apache HttpClient does NOT do any special handling of "Expect: 100-continue" 
- but if you explicitly set the "Expect: 100-continue" header Apache HttpClient 
will work with it.
** not sure if Jetty HttpClient does anything with "Expect: 100-continue" - I 
personally haven't looked into it (yet)

So long story short - yes buffering is a problem.


was (Author: risdenk):
So I haven't personally looked at Krb5HttpClientBuilder recently, other than 
completely unrelated SOLR-13726. Part of the reason that a lot of clients 
buffer is due to how Kerberos SPNEGO authentication works.

There are 2 parts typically
* a request without authentication where the server returns a 401 with a 
negotiate response
* a request with authentication in response to the negotiate which the server 
can verify

If you don't put any optimizations in place every request becomes two. A lot of 
times a cookie is used here to limit the amount of HTTP requests.

The reason the 401 and second request is an issue - is if the request is a non 
repeatable one - like a POST body. The client ends up sending the body and gets 
a 401 then goes o crap I need to send the body again and can't - because its 
non repeatable.

So a lot of times the super simple workaround is to buffer the request - do the 
401 check dance and then proceed. This is a way to make a non repeatable 
request semi repeatable.

This buffering has issues though as you found where the buffer should be 
limited in size which then limits the usefulness of this technique.

There are a few alternatives to buffering:
* Authenticate upfront with say an OPTIONS request - which will get the cookie. 
the next request say a POST won't have any issue and won't do the 401 dance
* "preemptively" does SPNEGO authorization if you know the SPN needed and 
create the right authorization header - this also skips the 401 and server can 
check the header
* Use "Expect: 100-continue" header which asks the server if it can handle the 
request without the body and if it can then send the body. This actually holds 
the data from being sent in the first place if possible.
** Curl automatically activates "Expect: 100-continue" under a few conditions- 
https://gms.tf/when-curl-sends-100-continue.html
** Apache HttpClient does NOT do any special handling of "Expect: 100-continue"
** not sure if Jetty HttpClient does anything with "Expect: 100-continue"

So long story short - yes buffering is a problem.

> Krb5HttpClientBuilder should not buffer requests 
> -
>
> Key: SOLR-14249
> URL: https://issues.apache.org/jira/browse/SOLR-14249
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Authentication, 

[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-02-08 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033021#comment-17033021
 ] 

Erick Erickson commented on LUCENE-9004:


[~jtibshirani] What kinds of trouble did you have with forceMerge(1)? It can 
certainly take a long time to complete, but I've rarely seen other problems 
with it. Assuming you're using the default TieredMergePolicy that is...

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 
> and no integration iwth IndexSearcher, but it does work by some measure using 
> a standalone test class. I've tested with uniform random vectors and on my 
> laptop indexed 10K documents in around 10 seconds and searched them at 95% 
> recall (compared with exact nearest-neighbor baseline) at around 250 QPS. I 
> haven't made any attempt to use multithreaded search for this, but it is 

[jira] [Commented] (SOLR-14249) Krb5HttpClientBuilder should not buffer requests

2020-02-08 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033019#comment-17033019
 ] 

Kevin Risden commented on SOLR-14249:
-

This is semi related to SOLR-14250 where expect 100 continue doesn't look like 
it works?

> Krb5HttpClientBuilder should not buffer requests 
> -
>
> Key: SOLR-14249
> URL: https://issues.apache.org/jira/browse/SOLR-14249
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Authentication, SolrJ
>Affects Versions: 7.4, master (9.0), 8.4.1
>Reporter: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-14249-reproduction.patch
>
>
> When SolrJ clients enable Kerberos authentication, a request interceptor is 
> set up which wraps the actual HttpEntity in a BufferedHttpEntity.  This 
> BufferedHttpEntity, well, buffers the request body in a {{byte[]}} so it can 
> be repeated if needed.  This works fine for small requests, but when requests 
> get large storing the entire request in memory causes contention or 
> OutOfMemoryErrors.
> The easiest way for this to manifest is to use ConcurrentUpdateSolrClient, 
> which opens a connection to Solr and streams documents out in an ever 
> increasing request entity until the doc queue held by the client is emptied.
> I ran into this while troubleshooting a DIH run that would reproducibly load 
> a few hundred thousand documents before progress stalled out.  Solr never 
> crashed and the DIH thread was still alive, but the 
> ConcurrentUpdateSolrClient used by DIH had its "Runner" thread disappear 
> around the time of the stall and an OOM like the one below could be seen in 
> solr-8983-console.log:
> {code}
> WARNING: Uncaught exception in thread: 
> Thread[concurrentUpdateScheduler-28-thread-1,5,TGRP-TestKerberosClientBuffering]
> java.lang.OutOfMemoryError: Java heap space
>   at __randomizedtesting.SeedInfo.seed([371A00FBA76D31DF]:0)
>   at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
>   at 
> java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
>   at 
> java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
>   at 
> java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
>   at 
> org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:213)
>   at 
> org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:94)
>   at 
> org.apache.solr.common.util.ByteUtils.writeUTF16toUTF8(ByteUtils.java:145)
>   at org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:848)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:932)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:328)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:616)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:355)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeMapEntry(JavaBinCodec.java:764)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:383)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:705)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:367)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:223)
>   at 
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:330)
>   at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:228)
>   at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:155)
>   at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:91)
>   at 
> org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner$1.writeTo(ConcurrentUpdateSolrClient.java:264)
>   at org.apache.http.entity.EntityTemplate.writeTo(EntityTemplate.java:73)
>   at 
> org.apache.http.entity.BufferedHttpEntity.(BufferedHttpEntity.java:62)
>   at 
> org.apache.solr.client.solrj.impl.Krb5HttpClientBuilder.lambda$new$3(Krb5HttpClientBuilder.java:155)
>   at 
> org.apache.solr.client.solrj.impl.Krb5HttpClientBuilder$$Lambda$459/0x000800623840.process(Unknown
>  Source)
>   at 
> 

[GitHub] [lucene-solr] risdenk commented on issue #591: SOLR-9840: Add a unit test for LDAP integration (Hrishikesh Gadre, Kevin Risden)

2020-02-08 Thread GitBox
risdenk commented on issue #591: SOLR-9840: Add a unit test for LDAP 
integration (Hrishikesh Gadre, Kevin Risden)
URL: https://github.com/apache/lucene-solr/pull/591#issuecomment-583764671
 
 
   working on rebasing to latest master to make sure still valid.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search

2020-02-08 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032966#comment-17032966
 ] 

Julie Tibshirani edited comment on LUCENE-9004 at 2/8/20 6:24 PM:
--

As another option for running benchmarks I wanted to give more information on 
the [ann-benchmarks repo|https://github.com/erikbern/ann-benchmarks]. It’s a 
shared set of benchmarks developed by the kNN search community, and contains a 
set of pretty realistic datasets, as well as connectors to existing kNN 
libraries like FAISS. 

I pushed a branch that hooks up the Lucene HNSW prototype to ann-benchmarks: 
[https://github.com/jtibshirani/ann-benchmarks/pull/1]. It’s nice to have 
everything in one place, as we can compare prototypes against reference 
implementations from FAISS to check that the recalls match. [This 
comment|https://github.com/jtibshirani/ann-benchmarks/pull/1#issuecomment-583760337]
 contains results of running both the HNSW prototype and FAISS’s implementation 
against a small test dataset. It looks like the prototype gives ~5% lower 
recall for the same parameter values, which suggests there’s room for small 
fixes/ improvements in terms of the algorithm. (I might have misunderstood the 
default parameter values though, any corrections are welcome!)

Some more background:
 * That test uses a small test dataset because I had trouble getting 
`forceMerge(1)` to complete on a large dataset. But there are some more 
realistic datasets like `glove-100-angular` (a set of 1.2 million GloVe word 
vectors), and `deep-image-96-angular` (a set of 10M 'deep descriptors' of 
images from a CNN).
 * By default, ann-benchmarks retrieves k=10 nearest neighbors, and reports 
recall as the number of results that overlap with the true k nearest neighbors. 
There is an adjustable small ‘fudge factor’ epsilon, so that a result is still 
counted as correct if it is within a small distance of the true kth nearest 
neighbor.
 * Since ann-benchmarks is a Python library, the branch uses py4j to convert to 
and from Java. py4j could add non-trivial overhead, so this benchmarking 
strategy is probably not best for measuring raw QPS. But it can be useful to 
(1) examine recall numbers, and (2) compare different Lucene kNN approaches 
against each other.

Feel free to ping me/ comment on the PR if you spot issues or have trouble 
getting it to work.


was (Author: jtibshirani):
As another option for running benchmarks I wanted to give more information on 
the [ann-benchmarks repo](https://github.com/erikbern/ann-benchmarks). It’s a 
shared set of benchmarks developed by the kNN search community, and contains a 
set of pretty realistic datasets, as well as connectors to existing kNN 
libraries like FAISS. 

I pushed a branch that hooks up the Lucene HNSW prototype to ann-benchmarks: 
https://github.com/jtibshirani/ann-benchmarks/pull/1. It’s nice to have 
everything in one place, as we can compare prototypes against reference 
implementations from FAISS to check that the recalls match. [This 
comment](https://github.com/jtibshirani/ann-benchmarks/pull/1#issuecomment-583760337)
 contains results of running both the HNSW prototype and FAISS’s implementation 
against a small test dataset. It looks like the prototype gives ~5% lower 
recall for the same parameter values, which suggests there’s room for small 
fixes/ improvements in terms of the algorithm. (I might have misunderstood the 
default parameter values though, any corrections are welcome!)

Some more background:
 * That test uses a small test dataset because I had trouble getting 
`forceMerge(1)` to complete on a large dataset. But there are some more 
realistic datasets like `glove-100-angular` (a set of 1.2 million GloVe word 
vectors), and `deep-image-96-angular` (a set of 10M ‘deep descriptors’ of 
images from a CNN).
 * By default, ann-benchmarks retrieves k=10 nearest neighbors, and reports 
recall as the number of results that overlap with the true k nearest neighbors. 
There is an adjustable small ‘fudge factor’ epsilon, so that a result is still 
counted as correct if it is within a small distance of the true kth nearest 
neighbor.
 * Since ann-benchmarks is a Python library, the branch uses py4j to convert to 
and from Java. py4j could add non-trivial overhead, so this benchmarking 
strategy is probably not best for measuring raw QPS. But it can be useful to 
(1) examine recall numbers, and (2) compare different Lucene kNN approaches 
against each other.

Feel free to ping me/ comment on the PR if you spot issues or have trouble 
getting it to work.

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: 

[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-02-08 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032966#comment-17032966
 ] 

Julie Tibshirani commented on LUCENE-9004:
--

As another option for running benchmarks I wanted to give more information on 
the [ann-benchmarks repo](https://github.com/erikbern/ann-benchmarks). It’s a 
shared set of benchmarks developed by the kNN search community, and contains a 
set of pretty realistic datasets, as well as connectors to existing kNN 
libraries like FAISS. 

I pushed a branch that hooks up the Lucene HNSW prototype to ann-benchmarks: 
https://github.com/jtibshirani/ann-benchmarks/pull/1. It’s nice to have 
everything in one place, as we can compare prototypes against reference 
implementations from FAISS to check that the recalls match. [This 
comment](https://github.com/jtibshirani/ann-benchmarks/pull/1#issuecomment-583760337)
 contains results of running both the HNSW prototype and FAISS’s implementation 
against a small test dataset. It looks like the prototype gives ~5% lower 
recall for the same parameter values, which suggests there’s room for small 
fixes/ improvements in terms of the algorithm. (I might have misunderstood the 
default parameter values though, any corrections are welcome!)

Some more background:
 * That test uses a small test dataset because I had trouble getting 
`forceMerge(1)` to complete on a large dataset. But there are some more 
realistic datasets like `glove-100-angular` (a set of 1.2 million GloVe word 
vectors), and `deep-image-96-angular` (a set of 10M ‘deep descriptors’ of 
images from a CNN).
 * By default, ann-benchmarks retrieves k=10 nearest neighbors, and reports 
recall as the number of results that overlap with the true k nearest neighbors. 
There is an adjustable small ‘fudge factor’ epsilon, so that a result is still 
counted as correct if it is within a small distance of the true kth nearest 
neighbor.
 * Since ann-benchmarks is a Python library, the branch uses py4j to convert to 
and from Java. py4j could add non-trivial overhead, so this benchmarking 
strategy is probably not best for measuring raw QPS. But it can be useful to 
(1) examine recall numbers, and (2) compare different Lucene kNN approaches 
against each other.

Feel free to ping me/ comment on the PR if you spot issues or have trouble 
getting it to work.

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to 

[jira] [Updated] (SOLR-14038) Admin UI display for "state.json" should be in a scollable region

2020-02-08 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14038:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Admin UI display for "state.json" should be in a scollable region
> -
>
> Key: SOLR-14038
> URL: https://issues.apache.org/jira/browse/SOLR-14038
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Reporter: Erick Erickson
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-14038.patch, Screen Shot 2019-12-09 at 3.19.53 
> PM.png
>
>
> Probably a result of some of the recent changes to the admin UI. See attached 
> screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14038) Admin UI display for "state.json" should be in a scollable region

2020-02-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032962#comment-17032962
 ] 

ASF subversion and git services commented on SOLR-14038:


Commit 056e2cc5dad261ebc09c18b29cc6a9c5693e6afc in lucene-solr's branch 
refs/heads/branch_8x from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=056e2cc ]

SOLR-14038: Admin UI display for "state.json" should be in a scollable region

Signed-off-by: Kevin Risden 


> Admin UI display for "state.json" should be in a scollable region
> -
>
> Key: SOLR-14038
> URL: https://issues.apache.org/jira/browse/SOLR-14038
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Reporter: Erick Erickson
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-14038.patch, Screen Shot 2019-12-09 at 3.19.53 
> PM.png
>
>
> Probably a result of some of the recent changes to the admin UI. See attached 
> screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14038) Admin UI display for "state.json" should be in a scollable region

2020-02-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032960#comment-17032960
 ] 

ASF subversion and git services commented on SOLR-14038:


Commit 3885a81aa4c31c542d17d2f50b222a77bb92f245 in lucene-solr's branch 
refs/heads/master from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3885a81 ]

SOLR-14038: Admin UI display for "state.json" should be in a scollable region

Signed-off-by: Kevin Risden 


> Admin UI display for "state.json" should be in a scollable region
> -
>
> Key: SOLR-14038
> URL: https://issues.apache.org/jira/browse/SOLR-14038
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Reporter: Erick Erickson
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-14038.patch, Screen Shot 2019-12-09 at 3.19.53 
> PM.png
>
>
> Probably a result of some of the recent changes to the admin UI. See attached 
> screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14209) Upgrade JQuery to 3.4.1

2020-02-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032958#comment-17032958
 ] 

ASF subversion and git services commented on SOLR-14209:


Commit 8df7f379a411227e58486d8c0455a8a9dd18d8c7 in lucene-solr's branch 
refs/heads/branch_8x from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8df7f37 ]

SOLR-14209: Upgrade JQuery to 3.4.1

* JQuery 2.1.3 to 3.4.1
* jstree 1.0-rc1 to v3.3.8

Closes #1209

Signed-off-by: Kevin Risden 


> Upgrade JQuery to 3.4.1
> ---
>
> Key: SOLR-14209
> URL: https://issues.apache.org/jira/browse/SOLR-14209
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, contrib - Velocity
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.5
>
> Attachments: Screen Shot 2020-01-23 at 3.17.07 PM.png, Screen Shot 
> 2020-01-23 at 3.28.47 PM.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently JQuery is on 2.1.3. It would be good to upgrade to the latest 
> version if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14209) Upgrade JQuery to 3.4.1

2020-02-08 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14209:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Upgrade JQuery to 3.4.1
> ---
>
> Key: SOLR-14209
> URL: https://issues.apache.org/jira/browse/SOLR-14209
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, contrib - Velocity
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.5
>
> Attachments: Screen Shot 2020-01-23 at 3.17.07 PM.png, Screen Shot 
> 2020-01-23 at 3.28.47 PM.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently JQuery is on 2.1.3. It would be good to upgrade to the latest 
> version if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14209) Upgrade JQuery to 3.4.1

2020-02-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032957#comment-17032957
 ] 

ASF subversion and git services commented on SOLR-14209:


Commit c4a8a77d23b502b11885fb331d325a563b9706da in lucene-solr's branch 
refs/heads/master from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c4a8a77 ]

SOLR-14209: Upgrade JQuery to 3.4.1

* JQuery 2.1.3 to 3.4.1
* jstree 1.0-rc1 to v3.3.8

Closes #1209

Signed-off-by: Kevin Risden 


> Upgrade JQuery to 3.4.1
> ---
>
> Key: SOLR-14209
> URL: https://issues.apache.org/jira/browse/SOLR-14209
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, contrib - Velocity
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.5
>
> Attachments: Screen Shot 2020-01-23 at 3.17.07 PM.png, Screen Shot 
> 2020-01-23 at 3.28.47 PM.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently JQuery is on 2.1.3. It would be good to upgrade to the latest 
> version if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk closed pull request #1209: SOLR-14209: Upgrade JQuery to 3.4.1

2020-02-08 Thread GitBox
risdenk closed pull request #1209: SOLR-14209: Upgrade JQuery to 3.4.1
URL: https://github.com/apache/lucene-solr/pull/1209
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14247) IndexSizeTriggerMixedBoundsTest does a lot of sleeping

2020-02-08 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032929#comment-17032929
 ] 

Erick Erickson commented on SOLR-14247:
---

I'll beast this over this weekend

 

> IndexSizeTriggerMixedBoundsTest does a lot of sleeping
> --
>
> Key: SOLR-14247
> URL: https://issues.apache.org/jira/browse/SOLR-14247
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I run tests locally, the slowest reported test is always 
> IndexSizeTriggerMixedBoundsTest  coming in at around 2 minutes.
> I took a look at the code and discovered that at least 80s of that is all 
> sleeps!
> There might need to be more synchronization and ordering added back in, but 
> when I removed all of the sleeps the test still passed locally for me, so I'm 
> not too sure what the point was or why we were slowing the system down so 
> much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-08 Thread GitBox
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583753281
 
 
   Here the forbiddenapis example how to setup a task per sourceSet: 
https://github.com/policeman-tools/forbidden-apis/blob/master/src/main/resources/de/thetaphi/forbiddenapis/gradle/plugin-init.groovy#L42


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-08 Thread GitBox
uschindler edited a comment on issue #1242: LUCENE-9201: Port 
documentation-lint task to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583751044
 
 
   The task should just be defined for each sourceSet. Then tests and compile 
works automatically. Grafles will automatically add 2 tasks (one for each 
sourceSet): ecjLintMain and ecjLintTest (if you call it ecjLint base name). To 
set this up ask Gradle for current sourceSets and generate a task with 
automatic name based on SourceSet name. Classpath is provided gratis.
   
   See e.g Gradle internal tasks or forbiddenapis source code how those tasks 
should be declared. This easy seen here is not in line with the model behind 
Gradle (you define tasks per sourceSet, so it's extensible, e.g. if we add new 
sourceSets when building multi-release jars for some modules).
   
   sourceSet by the way also has source target and/or release version.
   Thi


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build

2020-02-08 Thread GitBox
uschindler commented on issue #1242: LUCENE-9201: Port documentation-lint task 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-583751044
 
 
   The task should just be defined for each sourceSet. Then tests and compile 
works automatically. Grafles will automatically add 2 tasks (one for each 
sourceSet): ecjLintMain and ecjLintTest (if you call it ecjLint base name). To 
set this up ask Gradle for current sourceSets and generate a task with 
automatic name based on SourceSet name. Classpath is provided gratis.
   
   See e.g Gradle internal tasks or forbiddenapis source code how those tasks 
should be declared. This easy seen here is not in line with the model behind 
Gradle (you define tasks per sourceSet, so it's extensible).
   
   sourceSet by the way also has source target and/or release version.
   Thi


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9214) enable documentation-lint on EA versions (14 and 15)

2020-02-08 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032914#comment-17032914
 ] 

Robert Muir commented on LUCENE-9214:
-

Well 15 breaks the python script already as they have changed the HTML 
formatting already.
We don't want to be adapting the python script to EA versions, more motivation 
to fix LUCENE-9215. Then the issues go away as we don't rely on any html output 
implementation detail.

> enable documentation-lint on EA versions (14 and 15)
> 
>
> Key: LUCENE-9214
> URL: https://issues.apache.org/jira/browse/LUCENE-9214
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9214.patch
>
>
> I think we should add them to the supported list. Why not detect/report 
> issues we find? We can always disable one if there is a particular problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9214) enable documentation-lint on EA versions (14 and 15)

2020-02-08 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9214:

Attachment: LUCENE-9214.patch

> enable documentation-lint on EA versions (14 and 15)
> 
>
> Key: LUCENE-9214
> URL: https://issues.apache.org/jira/browse/LUCENE-9214
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9214.patch
>
>
> I think we should add them to the supported list. Why not detect/report 
> issues we find? We can always disable one if there is a particular problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9215) replace checkJavaDocs.py with doclet

2020-02-08 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032912#comment-17032912
 ] 

Robert Muir commented on LUCENE-9215:
-

Attached my simple prototype. It finds problems that the python tool currently 
misses already, so unfortunately there will be some level of pain to make it 
work.

> replace checkJavaDocs.py with doclet
> 
>
> Key: LUCENE-9215
> URL: https://issues.apache.org/jira/browse/LUCENE-9215
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9215_prototype.patch
>
>
> The current checker runs regular expressions against html, and it breaks when 
> newer java change html output. This is not particularly fun to fix: see 
> LUCENE-9213
> Java releases often now, and when i compared generated html of a simple class 
> across 11,12,13 it surprised me how much changes. So I think we want to avoid 
> parsing their HTML.
> Javadoc {{Xdoclint}} feature has a "missing checker": but it is black/white. 
> Either everything is fully documented or its not. And while you can 
> enable/disable doclint checks per-package, this also seems black/white 
> (either all checks or no checks at all).
> On the other hand the python checker is able to check per-package at 
> different granularities (package, class, method). It makes it possible to 
> iteratively improve the situation.
> With doclet api we could implement checks in pure java, for example to match 
> checkJavaDocs.py logic:
> {code}
>   private void checkComment(Element element) {
> var tree = docTrees.getDocCommentTree(element);
> if (tree == null) {
>   error(element, "javadocs are missing");
> } else {
>   var normalized = tree.getFirstSentence().get(0).toString()
>.replace('\u00A0', ' ')
>.trim()
>.toLowerCase(Locale.ROOT);
>   if (normalized.isEmpty()) {
> error(element, "blank javadoc comment");
>   } else if (normalized.startsWith("licensed to the apache software 
> foundation") ||
>  normalized.startsWith("copyright 2004 the apache software 
> foundation")) {
> error(element, "comment is really a license");
>   }
> }
> {code}
> If there are problems then they just appear as errors from the output of 
> {{javadoc}} like usual:
> {noformat}
> javadoc: error - org.apache.lucene.nodoc (package): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanNearQuery.java:190:
>  error - SpanNearWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanContainingQuery.java:54:
>  error - SpanContainingWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanWithinQuery.java:55:
>  error - SpanWithinWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanTermQuery.java:94:
>  error - SpanTermWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanNotQuery.java:109:
>  error - SpanNotWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanOrQuery.java:139:
>  error - SpanOrWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanPositionCheckQuery.java:77:
>  error - SpanPositionCheckWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/MultiCollectorManager.java:61:
>  error - Collectors (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/MultiCollectorManager.java:89:
>  error - LeafCollectors (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/util/PagedBytes.java:353:
>  error - PagedBytesDataOutput (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/util/PagedBytes.java:285:
>  error - PagedBytesDataInput (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/EmptyDoc.java:22:
>  error - EmptyDoc (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/LicenseDoc.java:36:
>  error - LicenseDoc (class): comment is really a license
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/NoDoc.java:19:
>  error - NoDoc (class): javadocs are missing
> FAILURE: Build failed with an exception.
> * 

[jira] [Created] (LUCENE-9215) replace checkJavaDocs.py with doclet

2020-02-08 Thread Robert Muir (Jira)
Robert Muir created LUCENE-9215:
---

 Summary: replace checkJavaDocs.py with doclet
 Key: LUCENE-9215
 URL: https://issues.apache.org/jira/browse/LUCENE-9215
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-9215_prototype.patch

The current checker runs regular expressions against html, and it breaks when 
newer java change html output. This is not particularly fun to fix: see 
LUCENE-9213

Java releases often now, and when i compared generated html of a simple class 
across 11,12,13 it surprised me how much changes. So I think we want to avoid 
parsing their HTML.

Javadoc {{Xdoclint}} feature has a "missing checker": but it is black/white. 
Either everything is fully documented or its not. And while you can 
enable/disable doclint checks per-package, this also seems black/white (either 
all checks or no checks at all).

On the other hand the python checker is able to check per-package at different 
granularities (package, class, method). It makes it possible to iteratively 
improve the situation.

With doclet api we could implement checks in pure java, for example to match 
checkJavaDocs.py logic:

{code}
  private void checkComment(Element element) {
var tree = docTrees.getDocCommentTree(element);
if (tree == null) {
  error(element, "javadocs are missing");
} else {
  var normalized = tree.getFirstSentence().get(0).toString()
   .replace('\u00A0', ' ')
   .trim()
   .toLowerCase(Locale.ROOT);
  if (normalized.isEmpty()) {
error(element, "blank javadoc comment");
  } else if (normalized.startsWith("licensed to the apache software 
foundation") ||
 normalized.startsWith("copyright 2004 the apache software 
foundation")) {
error(element, "comment is really a license");
  }
}
{code}

If there are problems then they just appear as errors from the output of 
{{javadoc}} like usual:
{noformat}
javadoc: error - org.apache.lucene.nodoc (package): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanNearQuery.java:190:
 error - SpanNearWeight (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanContainingQuery.java:54:
 error - SpanContainingWeight (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanWithinQuery.java:55:
 error - SpanWithinWeight (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanTermQuery.java:94:
 error - SpanTermWeight (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanNotQuery.java:109:
 error - SpanNotWeight (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanOrQuery.java:139:
 error - SpanOrWeight (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanPositionCheckQuery.java:77:
 error - SpanPositionCheckWeight (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/MultiCollectorManager.java:61:
 error - Collectors (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/MultiCollectorManager.java:89:
 error - LeafCollectors (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/util/PagedBytes.java:353:
 error - PagedBytesDataOutput (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/util/PagedBytes.java:285:
 error - PagedBytesDataInput (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/EmptyDoc.java:22:
 error - EmptyDoc (class): javadocs are missing
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/LicenseDoc.java:36:
 error - LicenseDoc (class): comment is really a license
/home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/NoDoc.java:19:
 error - NoDoc (class): javadocs are missing

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':lucene:core:javadoc'.
> Javadoc generation failed. Generated Javadoc options file (useful for 
> troubleshooting): 
> '/home/rmuir/workspace/lucene-solr/lucene/core/build/tmp/javadoc/javadoc.options'
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9215) replace checkJavaDocs.py with doclet

2020-02-08 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9215:

Attachment: LUCENE-9215_prototype.patch

> replace checkJavaDocs.py with doclet
> 
>
> Key: LUCENE-9215
> URL: https://issues.apache.org/jira/browse/LUCENE-9215
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9215_prototype.patch
>
>
> The current checker runs regular expressions against html, and it breaks when 
> newer java change html output. This is not particularly fun to fix: see 
> LUCENE-9213
> Java releases often now, and when i compared generated html of a simple class 
> across 11,12,13 it surprised me how much changes. So I think we want to avoid 
> parsing their HTML.
> Javadoc {{Xdoclint}} feature has a "missing checker": but it is black/white. 
> Either everything is fully documented or its not. And while you can 
> enable/disable doclint checks per-package, this also seems black/white 
> (either all checks or no checks at all).
> On the other hand the python checker is able to check per-package at 
> different granularities (package, class, method). It makes it possible to 
> iteratively improve the situation.
> With doclet api we could implement checks in pure java, for example to match 
> checkJavaDocs.py logic:
> {code}
>   private void checkComment(Element element) {
> var tree = docTrees.getDocCommentTree(element);
> if (tree == null) {
>   error(element, "javadocs are missing");
> } else {
>   var normalized = tree.getFirstSentence().get(0).toString()
>.replace('\u00A0', ' ')
>.trim()
>.toLowerCase(Locale.ROOT);
>   if (normalized.isEmpty()) {
> error(element, "blank javadoc comment");
>   } else if (normalized.startsWith("licensed to the apache software 
> foundation") ||
>  normalized.startsWith("copyright 2004 the apache software 
> foundation")) {
> error(element, "comment is really a license");
>   }
> }
> {code}
> If there are problems then they just appear as errors from the output of 
> {{javadoc}} like usual:
> {noformat}
> javadoc: error - org.apache.lucene.nodoc (package): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanNearQuery.java:190:
>  error - SpanNearWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanContainingQuery.java:54:
>  error - SpanContainingWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanWithinQuery.java:55:
>  error - SpanWithinWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanTermQuery.java:94:
>  error - SpanTermWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanNotQuery.java:109:
>  error - SpanNotWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanOrQuery.java:139:
>  error - SpanOrWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/spans/SpanPositionCheckQuery.java:77:
>  error - SpanPositionCheckWeight (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/MultiCollectorManager.java:61:
>  error - Collectors (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/search/MultiCollectorManager.java:89:
>  error - LeafCollectors (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/util/PagedBytes.java:353:
>  error - PagedBytesDataOutput (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/util/PagedBytes.java:285:
>  error - PagedBytesDataInput (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/EmptyDoc.java:22:
>  error - EmptyDoc (class): javadocs are missing
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/LicenseDoc.java:36:
>  error - LicenseDoc (class): comment is really a license
> /home/rmuir/workspace/lucene-solr/lucene/core/src/java/org/apache/lucene/nodoc/NoDoc.java:19:
>  error - NoDoc (class): javadocs are missing
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':lucene:core:javadoc'.
> > Javadoc generation failed. Generated Javadoc options file (useful for 
> > 

[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms

2020-02-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032910#comment-17032910
 ] 

ASF subversion and git services commented on LUCENE-8279:
-

Commit f41eabdc5fa091079b83cdc7813cdcfb05dfbf46 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f41eabd ]

LUCENE-8279: fix javadocs wrong header levels and accessibility issues

Java 13 adds a new doclint check under "accessibility" that the html
header nesting level isn't crazy.

Many are incorrect because the html4-style javadocs had horrible
font-sizes, so developers used the wrong header level to work around it.
This is no issue in trunk (always html5).

Java recommends against using such structured tags at all in javadocs,
but that is a more involved change: this just "shifts" header levels
in documents to be correct.


> Improve CheckIndex on norms
> ---
>
> Key: LUCENE-8279
> URL: https://issues.apache.org/jira/browse/LUCENE-8279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.0
>
> Attachments: LUCENE-8279.patch, LUCENE-8279.patch
>
>
> We should improve CheckIndex to make sure that terms and norms agree on which 
> documents have a value on an indexed field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8729) Java 13: Fix Javadocs (accessibility) issues

2020-02-08 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-8729.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

The header levels are corrected in master and the check is re-enabled.

> Java 13: Fix Javadocs (accessibility) issues
> 
>
> Key: LUCENE-8729
> URL: https://issues.apache.org/jira/browse/LUCENE-8729
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/javadocs
>Reporter: Uwe Schindler
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8279.patch, LUCENE-8729-workaround.patch
>
>
> On Policeman Jenkins I isntalled a preview release of JDK 13. The Oracle 
> supplied one does not yet have the issue, but nightly builds of Alexej 
> Shipilev contain a patch that does additional check on Javadocs comments when 
> doclint is enabled, so the next OpenJDK builds of Oracle will likely have the 
> same issue. It fails already in "javac". The output is: 
> https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Linux/275/consoleText
> Problem is HTML headings (like "H1" inside javadocs comments clashing with 
> "H1" generated by Javadoc output, or "H3" without "H2"), in JDK-11 there is 
> already a comment in the Javadocs spec 
> (https://docs.oracle.com/en/java/javase/11/docs/specs/doc-comment-spec.html, 
> "When writing documentation comments for members, it is best not to use HTML 
> heading tags such as  and , because the standard doclet creates an 
> entire structured document, and these structural tags might interfere with 
> the formatting of the generated document.".
> The error is the following:
> {noformat}
> [mkdir] Created dir: 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java
> [javac] Compiling 868 source files to 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java:98:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * Term Dictionary
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/codecs/lucene80/package-info.java:21:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Apache Lucene - Index File Formats
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:41:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Basic Point Types
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:68:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Geospatial Point Types
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:78:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Advanced usage
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/search/Sort.java:37:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * Valid Types of Values
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/util/packed/package-info.java:34:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * In-memory structures
> [javac]^
> [javac] Note: Some input files use or override a deprecated API.
> [javac] Note: Recompile with -Xlint:deprecation for details.
> [javac] 7 errors
> {noformat}
> I think we should fix this and maybe don't use headings at all (as suggested 
> in the Spec), or fix them to be at lease correct. Some hints to issues in 
> latest JDK docs: https://bugs.openjdk.java.net/browse/JDK-8220379
> Not sure about doclint in general, I'l ask on maing lists, how this affects 
> 3rd party code!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9214) enable documentation-lint on EA versions (14 and 15)

2020-02-08 Thread Robert Muir (Jira)
Robert Muir created LUCENE-9214:
---

 Summary: enable documentation-lint on EA versions (14 and 15)
 Key: LUCENE-9214
 URL: https://issues.apache.org/jira/browse/LUCENE-9214
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


I think we should add them to the supported list. Why not detect/report issues 
we find? We can always disable one if there is a particular problem.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8729) Java 13: Fix Javadocs (accessibility) issues

2020-02-08 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032907#comment-17032907
 ] 

Robert Muir commented on LUCENE-8729:
-

Attached patch. I "shifted" heading levels in the impacted documents to be 
correct. 

There was a missing table caption.
In some cases {{h1}} of the first sentence was removed as javadoc already uses 
breakiterator and does special stuff with it (e.g. insert into summaries).

Nothing very exciting. I will test documentation-lint with some various JDK 
versions so we are sure it doesn't break builds.

> Java 13: Fix Javadocs (accessibility) issues
> 
>
> Key: LUCENE-8729
> URL: https://issues.apache.org/jira/browse/LUCENE-8729
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/javadocs
>Reporter: Uwe Schindler
>Priority: Major
> Attachments: LUCENE-8279.patch, LUCENE-8729-workaround.patch
>
>
> On Policeman Jenkins I isntalled a preview release of JDK 13. The Oracle 
> supplied one does not yet have the issue, but nightly builds of Alexej 
> Shipilev contain a patch that does additional check on Javadocs comments when 
> doclint is enabled, so the next OpenJDK builds of Oracle will likely have the 
> same issue. It fails already in "javac". The output is: 
> https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Linux/275/consoleText
> Problem is HTML headings (like "H1" inside javadocs comments clashing with 
> "H1" generated by Javadoc output, or "H3" without "H2"), in JDK-11 there is 
> already a comment in the Javadocs spec 
> (https://docs.oracle.com/en/java/javase/11/docs/specs/doc-comment-spec.html, 
> "When writing documentation comments for members, it is best not to use HTML 
> heading tags such as  and , because the standard doclet creates an 
> entire structured document, and these structural tags might interfere with 
> the formatting of the generated document.".
> The error is the following:
> {noformat}
> [mkdir] Created dir: 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java
> [javac] Compiling 868 source files to 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java:98:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * Term Dictionary
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/codecs/lucene80/package-info.java:21:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Apache Lucene - Index File Formats
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:41:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Basic Point Types
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:68:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Geospatial Point Types
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:78:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Advanced usage
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/search/Sort.java:37:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * Valid Types of Values
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/util/packed/package-info.java:34:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * In-memory structures
> [javac]^
> [javac] Note: Some input files use or override a deprecated API.
> [javac] Note: Recompile with -Xlint:deprecation for details.
> [javac] 7 errors
> {noformat}
> I think we should fix this and maybe don't use headings at all (as suggested 
> in the Spec), or fix them to be at lease correct. Some hints to issues in 
> latest JDK docs: https://bugs.openjdk.java.net/browse/JDK-8220379
> Not sure about doclint in general, I'l ask on maing lists, how this affects 
> 3rd party code!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, 

[jira] [Updated] (LUCENE-8729) Java 13: Fix Javadocs (accessibility) issues

2020-02-08 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-8729:

Attachment: LUCENE-8279.patch

> Java 13: Fix Javadocs (accessibility) issues
> 
>
> Key: LUCENE-8729
> URL: https://issues.apache.org/jira/browse/LUCENE-8729
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/javadocs
>Reporter: Uwe Schindler
>Priority: Major
> Attachments: LUCENE-8279.patch, LUCENE-8729-workaround.patch
>
>
> On Policeman Jenkins I isntalled a preview release of JDK 13. The Oracle 
> supplied one does not yet have the issue, but nightly builds of Alexej 
> Shipilev contain a patch that does additional check on Javadocs comments when 
> doclint is enabled, so the next OpenJDK builds of Oracle will likely have the 
> same issue. It fails already in "javac". The output is: 
> https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Linux/275/consoleText
> Problem is HTML headings (like "H1" inside javadocs comments clashing with 
> "H1" generated by Javadoc output, or "H3" without "H2"), in JDK-11 there is 
> already a comment in the Javadocs spec 
> (https://docs.oracle.com/en/java/javase/11/docs/specs/doc-comment-spec.html, 
> "When writing documentation comments for members, it is best not to use HTML 
> heading tags such as  and , because the standard doclet creates an 
> entire structured document, and these structural tags might interfere with 
> the formatting of the generated document.".
> The error is the following:
> {noformat}
> [mkdir] Created dir: 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java
> [javac] Compiling 868 source files to 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java:98:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * Term Dictionary
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/codecs/lucene80/package-info.java:21:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Apache Lucene - Index File Formats
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:41:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Basic Point Types
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:68:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Geospatial Point Types
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:78:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Advanced usage
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/search/Sort.java:37:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * Valid Types of Values
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/util/packed/package-info.java:34:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * In-memory structures
> [javac]^
> [javac] Note: Some input files use or override a deprecated API.
> [javac] Note: Recompile with -Xlint:deprecation for details.
> [javac] 7 errors
> {noformat}
> I think we should fix this and maybe don't use headings at all (as suggested 
> in the Spec), or fix them to be at lease correct. Some hints to issues in 
> latest JDK docs: https://bugs.openjdk.java.net/browse/JDK-8220379
> Not sure about doclint in general, I'l ask on maing lists, how this affects 
> 3rd party code!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8729) Java 13: Fix Javadocs (accessibility) issues

2020-02-08 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032906#comment-17032906
 ] 

Robert Muir commented on LUCENE-8729:
-

I am working on it. The problems are mostly caused by horrible style issues 
with larger headings in the old javadocs format. It would make font size so 
absurdly big that people would use the wrong heading level. But trunk only 
generates html5 javadocs, so we can fix it. the checker makes it trivial.

> Java 13: Fix Javadocs (accessibility) issues
> 
>
> Key: LUCENE-8729
> URL: https://issues.apache.org/jira/browse/LUCENE-8729
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/javadocs
>Reporter: Uwe Schindler
>Priority: Major
> Attachments: LUCENE-8729-workaround.patch
>
>
> On Policeman Jenkins I isntalled a preview release of JDK 13. The Oracle 
> supplied one does not yet have the issue, but nightly builds of Alexej 
> Shipilev contain a patch that does additional check on Javadocs comments when 
> doclint is enabled, so the next OpenJDK builds of Oracle will likely have the 
> same issue. It fails already in "javac". The output is: 
> https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Linux/275/consoleText
> Problem is HTML headings (like "H1" inside javadocs comments clashing with 
> "H1" generated by Javadoc output, or "H3" without "H2"), in JDK-11 there is 
> already a comment in the Javadocs spec 
> (https://docs.oracle.com/en/java/javase/11/docs/specs/doc-comment-spec.html, 
> "When writing documentation comments for members, it is best not to use HTML 
> heading tags such as  and , because the standard doclet creates an 
> entire structured document, and these structural tags might interfere with 
> the formatting of the generated document.".
> The error is the following:
> {noformat}
> [mkdir] Created dir: 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java
> [javac] Compiling 868 source files to 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/build/core/classes/java
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java:98:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * Term Dictionary
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/codecs/lucene80/package-info.java:21:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Apache Lucene - Index File Formats
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:41:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Basic Point Types
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:68:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Geospatial Point Types
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/index/PointValues.java:78:
>  error: unexpected heading used: , compared to implicit preceding 
> heading: 
> [javac]  * Advanced usage
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/search/Sort.java:37:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * Valid Types of Values
> [javac]^
> [javac] 
> /home/jenkins/workspace/Lucene-Solr-8.x-Linux/lucene/core/src/java/org/apache/lucene/util/packed/package-info.java:34:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [javac]  * In-memory structures
> [javac]^
> [javac] Note: Some input files use or override a deprecated API.
> [javac] Note: Recompile with -Xlint:deprecation for details.
> [javac] 7 errors
> {noformat}
> I think we should fix this and maybe don't use headings at all (as suggested 
> in the Spec), or fix them to be at lease correct. Some hints to issues in 
> latest JDK docs: https://bugs.openjdk.java.net/browse/JDK-8220379
> Not sure about doclint in general, I'l ask on maing lists, how this affects 
> 3rd party code!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9201) Port documentation-lint task to Gradle build

2020-02-08 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032905#comment-17032905
 ] 

Robert Muir commented on LUCENE-9201:
-

If i remember, the problem with "just" converting to package-info.java is the 
split lucene packages across modules, something goes bad (i think doc links 
among other issues). I will at least assess how bad that is to fix (I am not 
really sure), to fight tooling less. A major release is the time to fix it.

> Port documentation-lint task to Gradle build
> 
>
> Key: LUCENE-9201
> URL: https://issues.apache.org/jira/browse/LUCENE-9201
> Project: Lucene - Core
>  Issue Type: Sub-task
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: javadocGRADLE.png, javadocHTML4.png, javadocHTML5.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ant build's "documentation-lint" target consists of those two sub targets.
>  * "-ecj-javadoc-lint" (Javadoc linting by ECJ)
>  * "-documentation-lint"(Missing javadocs / broken links check by python 
> scripts)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14245) Validate Replica / ReplicaInfo on creation

2020-02-08 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14245.
-
Resolution: Fixed

> Validate Replica / ReplicaInfo on creation
> --
>
> Key: SOLR-14245
> URL: https://issues.apache.org/jira/browse/SOLR-14245
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.5
>
>
> Replica / ReplicaInfo should be immutable and their fields should be 
> validated on creation.
> Some users reported that very rarely during a failed collection CREATE or 
> DELETE, or when the Overseer task queue becomes corrupted, Solr may write to 
> ZK incomplete replica infos (eg. node_name = null).
> This problem is difficult to reproduce but we should add safeguards anyway to 
> prevent writing such corrupted replica info to ZK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org