[GitHub] [lucene-solr] dsmiley commented on pull request #1574: SOLR-14566: Log NOW value on coordinator node for dist search requests

2020-06-13 Thread GitBox


dsmiley commented on pull request #1574:
URL: https://github.com/apache/lucene-solr/pull/1574#issuecomment-643716917


   +1 clever and useful



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on a change in pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated

2020-06-13 Thread GitBox


janhoy commented on a change in pull request #1572:
URL: https://github.com/apache/lucene-solr/pull/1572#discussion_r439772292



##
File path: solr/core/src/java/org/apache/solr/core/SolrPaths.java
##
@@ -128,4 +130,33 @@ private static void logOnceInfo(String key, String msg) {
   log.info(msg);
 }
   }
+
+  /**
+   * Checks that the given path is relative to SOLR_HOME, SOLR_DATA_HOME, 
coreRootDirectory or one of the paths
+   * specified in solr.xml's allowPaths element. The following paths will fail 
validation
+   * 
+   *   Relative paths starting with ..
+   *   Windows UNC paths (\\host\share\path)
+   *   Absolute paths which are not below the list of allowed paths
+   * 
+   * @param pathToAssert path to check
+   * @param allowPaths list of paths that should be allowed prefixes
+   * @throws SolrException if path is outside allowed paths
+   */
+  public static void assertPathAllowed(Path pathToAssert, Set 
allowPaths) throws SolrException {
+if (OS.isFamilyWindows() && pathToAssert.toString().startsWith("")) {

Review comment:
   I have not tested this on Windows. On my mac, the `Path` class uses an 
OSX implementation so I think it will not detect the UNC style path, it does 
not manage to normalize or make it absolute, so I scoped the check for Windows 
only. I test on the string version before normalizing since normalize may mess 
up UNC paths.
   
   I decided to block UNC totally instead of trying to be smart about it. Users 
can always map a drive letter to the desired share to work around it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on a change in pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated

2020-06-13 Thread GitBox


janhoy commented on a change in pull request #1572:
URL: https://github.com/apache/lucene-solr/pull/1572#discussion_r439772139



##
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##
@@ -1259,6 +1277,20 @@ public SolrCore create(String coreName, Path 
instancePath, Map p
 }
   }
 
+  /**
+   * Checks that the given path is relative to SOLR_HOME, SOLR_DATA_HOME, 
coreRootDirectory or one of the paths
+   * specified in solr.xml's allowPaths element.
+   * @param path path to check
+   * @throws SolrException if path is outside allowed paths
+   */
+  public void assertPathAllowed(Path path) throws SolrException {
+if (path.normalize().equals(path) && !path.isAbsolute()) return;

Review comment:
   I pushed a commit addressing this. Quite a few changes, but we now 
detect `..` specifically.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134945#comment-17134945
 ] 

ASF subversion and git services commented on SOLR-13169:


Commit 396490b65ca1af6ff1f1157a9896c9528c234eea in lucene-solr's branch 
refs/heads/master from Gus Heck
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=396490b ]

SOLR-13169 Improve docs for MOVEREPLICA - better parity with ref guide for v2 
api descriptions


> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica";,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-13 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-13169:

Attachment: SOLR-13169.patch

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica";,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-13 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134939#comment-17134939
 ] 

Gus Heck commented on SOLR-13169:
-

Corrections from another read through, and documentation for other parameters. 
Choosing not to document `waitForFinalState` at this time because it's unclear 
what value it has. This command already has a wait for the completion of the 
add command and causing the add command to wait/block on it's own doesn't seem 
useful (alternately, my understanding of that parameter is flawed and I 
shouldn't write it into the docs). Opened SOLR-14568 which may change the docs 
for timeout slightly. This turned into a lot more than originally anticipated, 
so attaching a patch summarizing changes to the ref guide in case that helps 
folks look over what I've done.  Given no objections I'll port whatever applies 
to 8.x down to 8.x next weekend (and fix any objections). 

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica";,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134936#comment-17134936
 ] 

ASF subversion and git services commented on SOLR-13169:


Commit b00d747eb6a94ab5775258b032e621f998ec44ba in lucene-solr's branch 
refs/heads/master from Gus Heck
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b00d747 ]

SOLR-13169 Improve docs for MOVEREPLICA - document additional existing 
parameters, second pass fixing spelling and other details.


> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica";,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1573: Cleanup TermsHashPerField

2020-06-13 Thread GitBox


mikemccand commented on a change in pull request #1573:
URL: https://github.com/apache/lucene-solr/pull/1573#discussion_r439751347



##
File path: 
lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
##
@@ -207,8 +202,6 @@ public void newPostingsArray() {
 
   @Override
   ParallelPostingsArray createPostingsArray(int size) {
-IndexOptions indexOptions = fieldInfo.getIndexOptions();
-assert indexOptions != IndexOptions.NONE;

Review comment:
   Hmm why not keep this assertion (to confirm that if the field is not 
somehow indexed we are not accidentally/incorrectly running this code)?

##
File path: 
lucene/core/src/java/org/apache/lucene/index/ParallelPostingsArray.java
##
@@ -22,14 +22,14 @@
   final static int BYTES_PER_POSTING = 3 * Integer.BYTES;
 
   final int size;
-  final int[] textStarts;
-  final int[] intStarts;
-  final int[] byteStarts;
+  final int[] textStarts; // maps term ID to the terms text start in the 
bytesHash
+  final int[] addressOffset; // maps term ID to current stream address

Review comment:
   +1 for this renaming!

##
File path: lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java
##
@@ -19,203 +19,207 @@
 
 import java.io.IOException;
 
-import org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute;
-import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute;
 import org.apache.lucene.util.ByteBlockPool;
+import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefHash.BytesStartArray;
 import org.apache.lucene.util.BytesRefHash;
 import org.apache.lucene.util.Counter;
 import org.apache.lucene.util.IntBlockPool;
 
+/**
+ * This class allows to store streams of information per term without knowing
+ * the size of the stream ahead of time. Each stream typically encodes one 
level
+ * of information like term frequency per document or term proximity. 
Internally
+ * this class allocates a linked list of slices that can be read by a {@link 
ByteSliceReader}
+ * for each term. Terms are first deduplicated in a {@link BytesRefHash} once 
this is done
+ * internal data-structures point to the current offset of each stream that 
can be written to.
+ */
 abstract class TermsHashPerField implements Comparable {
   private static final int HASH_INIT_SIZE = 4;
 
-  final TermsHash termsHash;
-
-  final TermsHashPerField nextPerField;
-  protected final DocumentsWriterPerThread.DocState docState;
-  protected final FieldInvertState fieldState;
-  TermToBytesRefAttribute termAtt;
-  protected TermFrequencyAttribute termFreqAtt;
-
-  // Copied from our perThread
-  final IntBlockPool intPool;
+  private final TermsHashPerField nextPerField;
+  private final IntBlockPool intPool;
   final ByteBlockPool bytePool;
-  final ByteBlockPool termBytePool;
-
-  final int streamCount;
-  final int numPostingInt;
-
-  protected final FieldInfo fieldInfo;
-
-  final BytesRefHash bytesHash;
+  // for each term we store an integer per stream that points into the 
bytePool above
+  // the address is updated once data is written to the stream to point to the 
next free offset
+  // this the terms stream. The start address for the stream is stored in 
postingsArray.byteStarts[termId]
+  // This is initialized in the #addTerm method, either to a brand new per 
term stream if the term is new or
+  // to the addresses where the term stream was written to when we saw it the 
last time.
+  private int[] termStreamAddressBuffer;
+  private int streamAddressOffset;
+  private final int streamCount;
+  private final String fieldName;
+  final IndexOptions indexOptions;
+  /* This stores the actual term bytes for postings and offsets into the 
parent hash in the case that this
+  * TermsHashPerField is hashing term vectors.*/
+  private final BytesRefHash bytesHash;
 
   ParallelPostingsArray postingsArray;
-  private final Counter bytesUsed;
+  private int lastDocID; // only with assert
 
   /** streamCount: how many streams this field stores per term.
* E.g. doc(+freq) is 1 stream, prox+offset is a second. */
-
-  public TermsHashPerField(int streamCount, FieldInvertState fieldState, 
TermsHash termsHash, TermsHashPerField nextPerField, FieldInfo fieldInfo) {
-intPool = termsHash.intPool;
-bytePool = termsHash.bytePool;
-termBytePool = termsHash.termBytePool;
-docState = termsHash.docState;
-this.termsHash = termsHash;
-bytesUsed = termsHash.bytesUsed;
-this.fieldState = fieldState;
+  TermsHashPerField(int streamCount, IntBlockPool intPool, ByteBlockPool 
bytePool, ByteBlockPool termBytePool,
+Counter bytesUsed, TermsHashPerField nextPerField, String 
fieldName, IndexOptions indexOptions) {
+this.intPool = intPool;
+this.bytePool = bytePool;
 this.streamCount = streamCount;
-numPostingInt = 2*streamCount;
-this.fieldInfo = fieldInfo;
+this.fieldName = fieldName;
 this.nextPerField =

[GitHub] [lucene-solr] msokolov commented on pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-06-13 Thread GitBox


msokolov commented on pull request #1351:
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-643654018


   This is so close! just needs precommit fix I think?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14568) org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica uses hard coded timeout

2020-06-13 Thread Gus Heck (Jira)
Gus Heck created SOLR-14568:
---

 Summary: org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica uses 
hard coded timeout
 Key: SOLR-14568
 URL: https://issues.apache.org/jira/browse/SOLR-14568
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: master (9.0)
Reporter: Gus Heck


org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica gained a hardcoded timeout 
in SOLR-11045 but there is no clear reason discussed in that ticket and no 
comment in the code to indicate why it is ignoring the value of the timeout 
parameter already passed into that method. 

This should be clarified in code and documented ([~caomanhdat]?) or the timeout 
parameter should be supported. It sure seems like we should support the api 
parameter but from the pattern of commits this looks potentially intentional 
and has survived several revisions, so I hesitate to  just change it without 
input/confirmation.  If this can be clarified soon, I'll document the result it 
in SOLR-131699, otherwise I'll just document the state as it is, and the docs 
can be updated if there are changes resulting from this ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14567) Fix or suppress remaining warnings in solrj

2020-06-13 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-14567:
--
Summary: Fix or suppress remaining warnings in solrj  (was: Fix or suppress 
up remaining warnings in solrj)

> Fix or suppress remaining warnings in solrj
> ---
>
> Key: SOLR-14567
> URL: https://issues.apache.org/jira/browse/SOLR-14567
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> This is another place where the number of warnings per directory is getting 
> too small to do individually, so I'll do them all in a bunch.
> Note: this will exclude autoscaling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (SOLR-14417) Gradle build sometimes fails RE BlockPoolSlice

2020-06-13 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck reopened SOLR-14417:
-

> Gradle build sometimes fails RE BlockPoolSlice
> --
>
> Key: SOLR-14417
> URL: https://issues.apache.org/jira/browse/SOLR-14417
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: David Smiley
>Priority: Minor
>
> There seems to be some package visibility hacks around our Hdfs integration:
> {{/Users/dsmiley/SearchDev/lucene-solr/solr/core/src/test/org/apache/solr/cloud/hdfs/HdfsTestUtil.java:125:
>  error: BlockPoolSlice is not public in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl; cannot be accessed 
> from outside package}}
> {{List> modifiedHadoopClasses = Arrays.asList(BlockPoolSlice.class, 
> DiskChecker.class,}}
> This happens on my Gradle build when running {{gradlew testClasses}} (i.e. to 
> compile tests) but Ant proceeded without issue.  The work-around is to run 
> {{gradlew clean}} first but really I want our build to be smarter here.
> CC [~krisden]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14417) Gradle build sometimes fails RE BlockPoolSlice

2020-06-13 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134845#comment-17134845
 ] 

Gus Heck commented on SOLR-14417:
-

I just hit this when running a test via Intellij (which is using gradle). My 
IDE tells me we have our own version of this class that is public, but when I 
search classes in Intellij, it shows me that it can find both our version and a 
version of the class  in hadoop-hdfs-3.2.0.jar ... the latter of which is not 
public. This appears to be a classpath ordering inconsistency...

> Gradle build sometimes fails RE BlockPoolSlice
> --
>
> Key: SOLR-14417
> URL: https://issues.apache.org/jira/browse/SOLR-14417
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: David Smiley
>Priority: Minor
>
> There seems to be some package visibility hacks around our Hdfs integration:
> {{/Users/dsmiley/SearchDev/lucene-solr/solr/core/src/test/org/apache/solr/cloud/hdfs/HdfsTestUtil.java:125:
>  error: BlockPoolSlice is not public in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl; cannot be accessed 
> from outside package}}
> {{List> modifiedHadoopClasses = Arrays.asList(BlockPoolSlice.class, 
> DiskChecker.class,}}
> This happens on my Gradle build when running {{gradlew testClasses}} (i.e. to 
> compile tests) but Ant proceeded without issue.  The work-around is to run 
> {{gradlew clean}} first but really I want our build to be smarter here.
> CC [~krisden]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14564) Fix or suppress remaining warnings in solr/core

2020-06-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134830#comment-17134830
 ] 

ASF subversion and git services commented on SOLR-14564:


Commit 65e34449d12ed54abea8648a40b5c3e66e33bbca in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=65e3444 ]

SOLR-14564: Fix or suppress remaining warnings in solr/core


> Fix or suppress remaining warnings in solr/core
> ---
>
> Key: SOLR-14564
> URL: https://issues.apache.org/jira/browse/SOLR-14564
> Project: Solr
>  Issue Type: Sub-task
> Environment: It's getting to the point where the overhead of cleaning 
> up individual directories is getting to be a pain. So this will be 2-3 
> commits of fixes in whatever order I find them when compiling in IntelliJ.
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14564) Fix or suppress remaining warnings in solr/core

2020-06-13 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14564.
---
Fix Version/s: 8.6
   Resolution: Fixed

They said it couldn't be done! On to the last two JIRAs for SolrJ warnings.

> Fix or suppress remaining warnings in solr/core
> ---
>
> Key: SOLR-14564
> URL: https://issues.apache.org/jira/browse/SOLR-14564
> Project: Solr
>  Issue Type: Sub-task
> Environment: It's getting to the point where the overhead of cleaning 
> up individual directories is getting to be a pain. So this will be 2-3 
> commits of fixes in whatever order I find them when compiling in IntelliJ.
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14564) Fix or suppress remaining warnings in solr/core

2020-06-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134829#comment-17134829
 ] 

ASF subversion and git services commented on SOLR-14564:


Commit a41aa20b0afaadf47ec6e58476a947c6936c1921 in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a41aa20 ]

SOLR-14564: Fix or suppress remaining warnings in solr/core


> Fix or suppress remaining warnings in solr/core
> ---
>
> Key: SOLR-14564
> URL: https://issues.apache.org/jira/browse/SOLR-14564
> Project: Solr
>  Issue Type: Sub-task
> Environment: It's getting to the point where the overhead of cleaning 
> up individual directories is getting to be a pain. So this will be 2-3 
> commits of fixes in whatever order I find them when compiling in IntelliJ.
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-13 Thread Jun Ohtani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134827#comment-17134827
 ] 

Jun Ohtani commented on LUCENE-9390:


I've made a pull request. 

https://github.com/apache/lucene-solr/pull/1577

> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] johtani opened a new pull request #1577: LUCENE-9390: JapaneseTokenizer discards token that is all punctuation characters only

2020-06-13 Thread GitBox


johtani opened a new pull request #1577:
URL: https://github.com/apache/lucene-solr/pull/1577


   # Description
   
   Check and omit token that has all punctuation characters when discard 
punctuation flag is true.
   Currently, JapaneseTokenizer discards token that has punctuation at first 
character only.
   
   # Solution
   
   Add isAllPunctuation method for testing token.
   
   # Tests
   
   Ensure to discard if token is all punctuation characters.
   And not discard if token that start punctuation character and has 
non-punctuation character.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14567) Clean up remaining warnings in solrj

2020-06-13 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-14567:
-

 Summary: Clean up remaining warnings in solrj
 Key: SOLR-14567
 URL: https://issues.apache.org/jira/browse/SOLR-14567
 Project: Solr
  Issue Type: Sub-task
Reporter: Erick Erickson
Assignee: Erick Erickson


This is another place where the number of warnings per directory is getting too 
small to do individually, so I'll do them all in a bunch.

Note: this will exclude autoscaling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14567) Fix or suppress up remaining warnings in solrj

2020-06-13 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-14567:
--
Summary: Fix or suppress up remaining warnings in solrj  (was: Clean up 
remaining warnings in solrj)

> Fix or suppress up remaining warnings in solrj
> --
>
> Key: SOLR-14567
> URL: https://issues.apache.org/jira/browse/SOLR-14567
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> This is another place where the number of warnings per directory is getting 
> too small to do individually, so I'll do them all in a bunch.
> Note: this will exclude autoscaling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12823) remove clusterstate.json in Lucene/Solr 9.0

2020-06-13 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134776#comment-17134776
 ] 

Erick Erickson commented on SOLR-12823:
---

Well, if you'd ever like to apply that knowledge to some of the intermittently 
failing tests, you'd be a hero ;)

Two resources IDK if you're aware of:

Hoss' rollups: [http://fucit.org/solr-jenkins-reports/]

Besting script: [https://gist.github.com/markrmiller/dbdb792216dc98b018ad]

I have a spare machine laying around so if you're so inclined I'd be happy to 
beast things. A number of the tests fail a small percentage of the time...

> remove clusterstate.json in Lucene/Solr 9.0
> ---
>
> Key: SOLR-12823
> URL: https://issues.apache.org/jira/browse/SOLR-12823
> Project: Solr
>  Issue Type: Task
>Reporter: Varun Thacker
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> clusterstate.json is an artifact of a pre 5.0 Solr release. We should remove 
> that in 9.0
> It stays empty unless you explicitly ask to create the collection with the 
> old "stateFormat" and there is no reason for one to create a collection with 
> the old stateFormat.
> We should also remove the "stateFormat" argument in create collection
> We should also remove MIGRATESTATEVERSION as well
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1576: Alternative approach to LUCENE-8962

2020-06-13 Thread GitBox


s1monw commented on pull request #1576:
URL: https://github.com/apache/lucene-solr/pull/1576#issuecomment-643609384


   see #1552 for reference



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw opened a new pull request #1576: lternative approach to LUCENE-8962

2020-06-13 Thread GitBox


s1monw opened a new pull request #1576:
URL: https://github.com/apache/lucene-solr/pull/1576


   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on a change in pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated

2020-06-13 Thread GitBox


janhoy commented on a change in pull request #1572:
URL: https://github.com/apache/lucene-solr/pull/1572#discussion_r439730789



##
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##
@@ -1259,6 +1277,20 @@ public SolrCore create(String coreName, Path 
instancePath, Map p
 }
   }
 
+  /**
+   * Checks that the given path is relative to SOLR_HOME, SOLR_DATA_HOME, 
coreRootDirectory or one of the paths
+   * specified in solr.xml's allowPaths element.
+   * @param path path to check
+   * @throws SolrException if path is outside allowed paths
+   */
+  public void assertPathAllowed(Path path) throws SolrException {
+if (path.normalize().equals(path) && !path.isAbsolute()) return;

Review comment:
   You are right. We need a more thorough check
   * Disallow relative paths starting with "."
   * Always `normalize()` the path before `toAbsolutePath()` to catch the 
`/var/solr/../../etc` case, else that example would return true for 
`startsWith("/var/solr")`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-13 Thread GitBox


s1monw commented on pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-643599140


   > > @mikemccand thanks for replying to all these comments. I do understand 
that this change has an impact and I agree we should add this functionality. I 
just disagree with the how it's done and how much code is used. I will go an 
reply to some of your comments directly, in the meanwhile I went ahead to 
prototype some ideas in how this can be less intrusive and reuse code. I pushed 
one commit here 
[s1monw@3864b6c](https://github.com/s1monw/lucene-solr/commit/3864b6c2b631879fa1e995d47ed2b84aae054747)
 to showcase what I mean. I even think we can get away without a new method on 
MergePolicy but that's too much for the prototype. I'd be ok with adding a 
setting to IWC if we can't agree on a different way.
   > 
   > Thanks @s1monw! I would love if we could find a simple way to implement 
this feature as long as it keeps the "no wasted work" (merge either finishes in 
time, and is reflected in the commit point, or does not, but still runs to 
completion and is reflected later). I will review your prototype soon ... I'm 
mostly offline this weekend but will try to look soon.
   
   thanks @mikemccand I pushed another commit to my prototype to make it almost 
the same as this change but with a bit more code-reuse I think. please take a 
look at this here 
https://github.com/apache/lucene-solr/compare/master...s1monw:LUCENE-8962



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw edited a comment on pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-13 Thread GitBox


s1monw edited a comment on pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-643486962


   @mikemccand thanks for replying to all these comments. I do understand that 
this change has an impact and I agree we should add this functionality. I just 
disagree with the how it's done and how much code is used. I will go an reply 
to some of your comments directly, in the meanwhile I went ahead to prototype 
some ideas in how this can be less intrusive and reuse code. I pushed one 
commit here 
https://github.com/apache/lucene-solr/compare/master...s1monw:LUCENE-8962 to 
showcase what I mean. I even think we can get away without a new method on 
MergePolicy but that's too much for the prototype. I'd be ok with adding a 
setting to IWC if we can't agree on a different way. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-13 Thread GitBox


s1monw commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439647024



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3228,15 +3268,38 @@ private long prepareCommitInternal() throws IOException 
{
   // sneak into the commit point:
   toCommit = segmentInfos.clone();
 
+  if (anyChanges) {
+// Find any merges that can execute on commit (per 
MergePolicy).
+MergePolicy.MergeSpecification mergeSpec =

Review comment:
   I tried to showcase this here 
https://github.com/apache/lucene-solr/compare/master...s1monw:LUCENE-8962





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org