[GitHub] [lucene-solr] dsmiley commented on pull request #1574: SOLR-14566: Log NOW value on coordinator node for dist search requests
dsmiley commented on pull request #1574: URL: https://github.com/apache/lucene-solr/pull/1574#issuecomment-643716917 +1 clever and useful This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on a change in pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated
janhoy commented on a change in pull request #1572: URL: https://github.com/apache/lucene-solr/pull/1572#discussion_r439772292 ## File path: solr/core/src/java/org/apache/solr/core/SolrPaths.java ## @@ -128,4 +130,33 @@ private static void logOnceInfo(String key, String msg) { log.info(msg); } } + + /** + * Checks that the given path is relative to SOLR_HOME, SOLR_DATA_HOME, coreRootDirectory or one of the paths + * specified in solr.xml's allowPaths element. The following paths will fail validation + * + * Relative paths starting with .. + * Windows UNC paths (\\host\share\path) + * Absolute paths which are not below the list of allowed paths + * + * @param pathToAssert path to check + * @param allowPaths list of paths that should be allowed prefixes + * @throws SolrException if path is outside allowed paths + */ + public static void assertPathAllowed(Path pathToAssert, Set allowPaths) throws SolrException { +if (OS.isFamilyWindows() && pathToAssert.toString().startsWith("")) { Review comment: I have not tested this on Windows. On my mac, the `Path` class uses an OSX implementation so I think it will not detect the UNC style path, it does not manage to normalize or make it absolute, so I scoped the check for Windows only. I test on the string version before normalizing since normalize may mess up UNC paths. I decided to block UNC totally instead of trying to be smart about it. Users can always map a drive letter to the desired share to work around it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on a change in pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated
janhoy commented on a change in pull request #1572: URL: https://github.com/apache/lucene-solr/pull/1572#discussion_r439772139 ## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ## @@ -1259,6 +1277,20 @@ public SolrCore create(String coreName, Path instancePath, Map p } } + /** + * Checks that the given path is relative to SOLR_HOME, SOLR_DATA_HOME, coreRootDirectory or one of the paths + * specified in solr.xml's allowPaths element. + * @param path path to check + * @throws SolrException if path is outside allowed paths + */ + public void assertPathAllowed(Path path) throws SolrException { +if (path.normalize().equals(path) && !path.isAbsolute()) return; Review comment: I pushed a commit addressing this. Quite a few changes, but we now detect `..` specifically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134945#comment-17134945 ] ASF subversion and git services commented on SOLR-13169: Commit 396490b65ca1af6ff1f1157a9896c9528c234eea in lucene-solr's branch refs/heads/master from Gus Heck [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=396490b ] SOLR-13169 Improve docs for MOVEREPLICA - better parity with ref guide for v2 api descriptions > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica";, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-13169: Attachment: SOLR-13169.patch > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica";, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134939#comment-17134939 ] Gus Heck commented on SOLR-13169: - Corrections from another read through, and documentation for other parameters. Choosing not to document `waitForFinalState` at this time because it's unclear what value it has. This command already has a wait for the completion of the add command and causing the add command to wait/block on it's own doesn't seem useful (alternately, my understanding of that parameter is flawed and I shouldn't write it into the docs). Opened SOLR-14568 which may change the docs for timeout slightly. This turned into a lot more than originally anticipated, so attaching a patch summarizing changes to the ref guide in case that helps folks look over what I've done. Given no objections I'll port whatever applies to 8.x down to 8.x next weekend (and fix any objections). > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica";, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134936#comment-17134936 ] ASF subversion and git services commented on SOLR-13169: Commit b00d747eb6a94ab5775258b032e621f998ec44ba in lucene-solr's branch refs/heads/master from Gus Heck [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b00d747 ] SOLR-13169 Improve docs for MOVEREPLICA - document additional existing parameters, second pass fixing spelling and other details. > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica";, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1573: Cleanup TermsHashPerField
mikemccand commented on a change in pull request #1573: URL: https://github.com/apache/lucene-solr/pull/1573#discussion_r439751347 ## File path: lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java ## @@ -207,8 +202,6 @@ public void newPostingsArray() { @Override ParallelPostingsArray createPostingsArray(int size) { -IndexOptions indexOptions = fieldInfo.getIndexOptions(); -assert indexOptions != IndexOptions.NONE; Review comment: Hmm why not keep this assertion (to confirm that if the field is not somehow indexed we are not accidentally/incorrectly running this code)? ## File path: lucene/core/src/java/org/apache/lucene/index/ParallelPostingsArray.java ## @@ -22,14 +22,14 @@ final static int BYTES_PER_POSTING = 3 * Integer.BYTES; final int size; - final int[] textStarts; - final int[] intStarts; - final int[] byteStarts; + final int[] textStarts; // maps term ID to the terms text start in the bytesHash + final int[] addressOffset; // maps term ID to current stream address Review comment: +1 for this renaming! ## File path: lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java ## @@ -19,203 +19,207 @@ import java.io.IOException; -import org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute; -import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute; import org.apache.lucene.util.ByteBlockPool; +import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.BytesRefHash.BytesStartArray; import org.apache.lucene.util.BytesRefHash; import org.apache.lucene.util.Counter; import org.apache.lucene.util.IntBlockPool; +/** + * This class allows to store streams of information per term without knowing + * the size of the stream ahead of time. Each stream typically encodes one level + * of information like term frequency per document or term proximity. Internally + * this class allocates a linked list of slices that can be read by a {@link ByteSliceReader} + * for each term. Terms are first deduplicated in a {@link BytesRefHash} once this is done + * internal data-structures point to the current offset of each stream that can be written to. + */ abstract class TermsHashPerField implements Comparable { private static final int HASH_INIT_SIZE = 4; - final TermsHash termsHash; - - final TermsHashPerField nextPerField; - protected final DocumentsWriterPerThread.DocState docState; - protected final FieldInvertState fieldState; - TermToBytesRefAttribute termAtt; - protected TermFrequencyAttribute termFreqAtt; - - // Copied from our perThread - final IntBlockPool intPool; + private final TermsHashPerField nextPerField; + private final IntBlockPool intPool; final ByteBlockPool bytePool; - final ByteBlockPool termBytePool; - - final int streamCount; - final int numPostingInt; - - protected final FieldInfo fieldInfo; - - final BytesRefHash bytesHash; + // for each term we store an integer per stream that points into the bytePool above + // the address is updated once data is written to the stream to point to the next free offset + // this the terms stream. The start address for the stream is stored in postingsArray.byteStarts[termId] + // This is initialized in the #addTerm method, either to a brand new per term stream if the term is new or + // to the addresses where the term stream was written to when we saw it the last time. + private int[] termStreamAddressBuffer; + private int streamAddressOffset; + private final int streamCount; + private final String fieldName; + final IndexOptions indexOptions; + /* This stores the actual term bytes for postings and offsets into the parent hash in the case that this + * TermsHashPerField is hashing term vectors.*/ + private final BytesRefHash bytesHash; ParallelPostingsArray postingsArray; - private final Counter bytesUsed; + private int lastDocID; // only with assert /** streamCount: how many streams this field stores per term. * E.g. doc(+freq) is 1 stream, prox+offset is a second. */ - - public TermsHashPerField(int streamCount, FieldInvertState fieldState, TermsHash termsHash, TermsHashPerField nextPerField, FieldInfo fieldInfo) { -intPool = termsHash.intPool; -bytePool = termsHash.bytePool; -termBytePool = termsHash.termBytePool; -docState = termsHash.docState; -this.termsHash = termsHash; -bytesUsed = termsHash.bytesUsed; -this.fieldState = fieldState; + TermsHashPerField(int streamCount, IntBlockPool intPool, ByteBlockPool bytePool, ByteBlockPool termBytePool, +Counter bytesUsed, TermsHashPerField nextPerField, String fieldName, IndexOptions indexOptions) { +this.intPool = intPool; +this.bytePool = bytePool; this.streamCount = streamCount; -numPostingInt = 2*streamCount; -this.fieldInfo = fieldInfo; +this.fieldName = fieldName; this.nextPerField =
[GitHub] [lucene-solr] msokolov commented on pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
msokolov commented on pull request #1351: URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-643654018 This is so close! just needs precommit fix I think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14568) org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica uses hard coded timeout
Gus Heck created SOLR-14568: --- Summary: org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica uses hard coded timeout Key: SOLR-14568 URL: https://issues.apache.org/jira/browse/SOLR-14568 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: master (9.0) Reporter: Gus Heck org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica gained a hardcoded timeout in SOLR-11045 but there is no clear reason discussed in that ticket and no comment in the code to indicate why it is ignoring the value of the timeout parameter already passed into that method. This should be clarified in code and documented ([~caomanhdat]?) or the timeout parameter should be supported. It sure seems like we should support the api parameter but from the pattern of commits this looks potentially intentional and has survived several revisions, so I hesitate to just change it without input/confirmation. If this can be clarified soon, I'll document the result it in SOLR-131699, otherwise I'll just document the state as it is, and the docs can be updated if there are changes resulting from this ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14567) Fix or suppress remaining warnings in solrj
[ https://issues.apache.org/jira/browse/SOLR-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-14567: -- Summary: Fix or suppress remaining warnings in solrj (was: Fix or suppress up remaining warnings in solrj) > Fix or suppress remaining warnings in solrj > --- > > Key: SOLR-14567 > URL: https://issues.apache.org/jira/browse/SOLR-14567 > Project: Solr > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > This is another place where the number of warnings per directory is getting > too small to do individually, so I'll do them all in a bunch. > Note: this will exclude autoscaling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (SOLR-14417) Gradle build sometimes fails RE BlockPoolSlice
[ https://issues.apache.org/jira/browse/SOLR-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck reopened SOLR-14417: - > Gradle build sometimes fails RE BlockPoolSlice > -- > > Key: SOLR-14417 > URL: https://issues.apache.org/jira/browse/SOLR-14417 > Project: Solr > Issue Type: Task > Components: Build >Reporter: David Smiley >Priority: Minor > > There seems to be some package visibility hacks around our Hdfs integration: > {{/Users/dsmiley/SearchDev/lucene-solr/solr/core/src/test/org/apache/solr/cloud/hdfs/HdfsTestUtil.java:125: > error: BlockPoolSlice is not public in > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl; cannot be accessed > from outside package}} > {{List> modifiedHadoopClasses = Arrays.asList(BlockPoolSlice.class, > DiskChecker.class,}} > This happens on my Gradle build when running {{gradlew testClasses}} (i.e. to > compile tests) but Ant proceeded without issue. The work-around is to run > {{gradlew clean}} first but really I want our build to be smarter here. > CC [~krisden] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14417) Gradle build sometimes fails RE BlockPoolSlice
[ https://issues.apache.org/jira/browse/SOLR-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134845#comment-17134845 ] Gus Heck commented on SOLR-14417: - I just hit this when running a test via Intellij (which is using gradle). My IDE tells me we have our own version of this class that is public, but when I search classes in Intellij, it shows me that it can find both our version and a version of the class in hadoop-hdfs-3.2.0.jar ... the latter of which is not public. This appears to be a classpath ordering inconsistency... > Gradle build sometimes fails RE BlockPoolSlice > -- > > Key: SOLR-14417 > URL: https://issues.apache.org/jira/browse/SOLR-14417 > Project: Solr > Issue Type: Task > Components: Build >Reporter: David Smiley >Priority: Minor > > There seems to be some package visibility hacks around our Hdfs integration: > {{/Users/dsmiley/SearchDev/lucene-solr/solr/core/src/test/org/apache/solr/cloud/hdfs/HdfsTestUtil.java:125: > error: BlockPoolSlice is not public in > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl; cannot be accessed > from outside package}} > {{List> modifiedHadoopClasses = Arrays.asList(BlockPoolSlice.class, > DiskChecker.class,}} > This happens on my Gradle build when running {{gradlew testClasses}} (i.e. to > compile tests) but Ant proceeded without issue. The work-around is to run > {{gradlew clean}} first but really I want our build to be smarter here. > CC [~krisden] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14564) Fix or suppress remaining warnings in solr/core
[ https://issues.apache.org/jira/browse/SOLR-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134830#comment-17134830 ] ASF subversion and git services commented on SOLR-14564: Commit 65e34449d12ed54abea8648a40b5c3e66e33bbca in lucene-solr's branch refs/heads/branch_8x from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=65e3444 ] SOLR-14564: Fix or suppress remaining warnings in solr/core > Fix or suppress remaining warnings in solr/core > --- > > Key: SOLR-14564 > URL: https://issues.apache.org/jira/browse/SOLR-14564 > Project: Solr > Issue Type: Sub-task > Environment: It's getting to the point where the overhead of cleaning > up individual directories is getting to be a pain. So this will be 2-3 > commits of fixes in whatever order I find them when compiling in IntelliJ. >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14564) Fix or suppress remaining warnings in solr/core
[ https://issues.apache.org/jira/browse/SOLR-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-14564. --- Fix Version/s: 8.6 Resolution: Fixed They said it couldn't be done! On to the last two JIRAs for SolrJ warnings. > Fix or suppress remaining warnings in solr/core > --- > > Key: SOLR-14564 > URL: https://issues.apache.org/jira/browse/SOLR-14564 > Project: Solr > Issue Type: Sub-task > Environment: It's getting to the point where the overhead of cleaning > up individual directories is getting to be a pain. So this will be 2-3 > commits of fixes in whatever order I find them when compiling in IntelliJ. >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Fix For: 8.6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14564) Fix or suppress remaining warnings in solr/core
[ https://issues.apache.org/jira/browse/SOLR-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134829#comment-17134829 ] ASF subversion and git services commented on SOLR-14564: Commit a41aa20b0afaadf47ec6e58476a947c6936c1921 in lucene-solr's branch refs/heads/master from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a41aa20 ] SOLR-14564: Fix or suppress remaining warnings in solr/core > Fix or suppress remaining warnings in solr/core > --- > > Key: SOLR-14564 > URL: https://issues.apache.org/jira/browse/SOLR-14564 > Project: Solr > Issue Type: Sub-task > Environment: It's getting to the point where the overhead of cleaning > up individual directories is getting to be a pain. So this will be 2-3 > commits of fixes in whatever order I find them when compiling in IntelliJ. >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134827#comment-17134827 ] Jun Ohtani commented on LUCENE-9390: I've made a pull request. https://github.com/apache/lucene-solr/pull/1577 > Kuromoji tokenizer discards tokens if they start with a punctuation character > - > > Key: LUCENE-9390 > URL: https://issues.apache.org/jira/browse/LUCENE-9390 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jim Ferenczi >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > This issue was first raised in Elasticsearch > [here|https://github.com/elastic/elasticsearch/issues/57614] > The unidic dictionary that is used by the Kuromoji tokenizer contains entries > that mix punctuations and other characters. For instance the following entry: > _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_ > can be found in the Noun.csv file. > Today, tokens that start with punctuations are automatically removed by > default (discardPunctuation is true). I think the code was written this way > because we expect punctuations to be separated from normal tokens but there > are exceptions in the original dictionary. Maybe we should check the entire > token when discarding punctuations ? > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] johtani opened a new pull request #1577: LUCENE-9390: JapaneseTokenizer discards token that is all punctuation characters only
johtani opened a new pull request #1577: URL: https://github.com/apache/lucene-solr/pull/1577 # Description Check and omit token that has all punctuation characters when discard punctuation flag is true. Currently, JapaneseTokenizer discards token that has punctuation at first character only. # Solution Add isAllPunctuation method for testing token. # Tests Ensure to discard if token is all punctuation characters. And not discard if token that start punctuation character and has non-punctuation character. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `ant precommit` and the appropriate test suite. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14567) Clean up remaining warnings in solrj
Erick Erickson created SOLR-14567: - Summary: Clean up remaining warnings in solrj Key: SOLR-14567 URL: https://issues.apache.org/jira/browse/SOLR-14567 Project: Solr Issue Type: Sub-task Reporter: Erick Erickson Assignee: Erick Erickson This is another place where the number of warnings per directory is getting too small to do individually, so I'll do them all in a bunch. Note: this will exclude autoscaling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14567) Fix or suppress up remaining warnings in solrj
[ https://issues.apache.org/jira/browse/SOLR-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-14567: -- Summary: Fix or suppress up remaining warnings in solrj (was: Clean up remaining warnings in solrj) > Fix or suppress up remaining warnings in solrj > -- > > Key: SOLR-14567 > URL: https://issues.apache.org/jira/browse/SOLR-14567 > Project: Solr > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > This is another place where the number of warnings per directory is getting > too small to do individually, so I'll do them all in a bunch. > Note: this will exclude autoscaling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12823) remove clusterstate.json in Lucene/Solr 9.0
[ https://issues.apache.org/jira/browse/SOLR-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134776#comment-17134776 ] Erick Erickson commented on SOLR-12823: --- Well, if you'd ever like to apply that knowledge to some of the intermittently failing tests, you'd be a hero ;) Two resources IDK if you're aware of: Hoss' rollups: [http://fucit.org/solr-jenkins-reports/] Besting script: [https://gist.github.com/markrmiller/dbdb792216dc98b018ad] I have a spare machine laying around so if you're so inclined I'd be happy to beast things. A number of the tests fail a small percentage of the time... > remove clusterstate.json in Lucene/Solr 9.0 > --- > > Key: SOLR-12823 > URL: https://issues.apache.org/jira/browse/SOLR-12823 > Project: Solr > Issue Type: Task >Reporter: Varun Thacker >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Time Spent: 4h 50m > Remaining Estimate: 0h > > clusterstate.json is an artifact of a pre 5.0 Solr release. We should remove > that in 9.0 > It stays empty unless you explicitly ask to create the collection with the > old "stateFormat" and there is no reason for one to create a collection with > the old stateFormat. > We should also remove the "stateFormat" argument in create collection > We should also remove MIGRATESTATEVERSION as well > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on pull request #1576: Alternative approach to LUCENE-8962
s1monw commented on pull request #1576: URL: https://github.com/apache/lucene-solr/pull/1576#issuecomment-643609384 see #1552 for reference This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw opened a new pull request #1576: lternative approach to LUCENE-8962
s1monw opened a new pull request #1576: URL: https://github.com/apache/lucene-solr/pull/1576 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on a change in pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated
janhoy commented on a change in pull request #1572: URL: https://github.com/apache/lucene-solr/pull/1572#discussion_r439730789 ## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ## @@ -1259,6 +1277,20 @@ public SolrCore create(String coreName, Path instancePath, Map p } } + /** + * Checks that the given path is relative to SOLR_HOME, SOLR_DATA_HOME, coreRootDirectory or one of the paths + * specified in solr.xml's allowPaths element. + * @param path path to check + * @throws SolrException if path is outside allowed paths + */ + public void assertPathAllowed(Path path) throws SolrException { +if (path.normalize().equals(path) && !path.isAbsolute()) return; Review comment: You are right. We need a more thorough check * Disallow relative paths starting with "." * Always `normalize()` the path before `toAbsolutePath()` to catch the `/var/solr/../../etc` case, else that example would return true for `startsWith("/var/solr")` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on pull request #1552: LUCENE-8962: merge small segments on commit
s1monw commented on pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-643599140 > > @mikemccand thanks for replying to all these comments. I do understand that this change has an impact and I agree we should add this functionality. I just disagree with the how it's done and how much code is used. I will go an reply to some of your comments directly, in the meanwhile I went ahead to prototype some ideas in how this can be less intrusive and reuse code. I pushed one commit here [s1monw@3864b6c](https://github.com/s1monw/lucene-solr/commit/3864b6c2b631879fa1e995d47ed2b84aae054747) to showcase what I mean. I even think we can get away without a new method on MergePolicy but that's too much for the prototype. I'd be ok with adding a setting to IWC if we can't agree on a different way. > > Thanks @s1monw! I would love if we could find a simple way to implement this feature as long as it keeps the "no wasted work" (merge either finishes in time, and is reflected in the commit point, or does not, but still runs to completion and is reflected later). I will review your prototype soon ... I'm mostly offline this weekend but will try to look soon. thanks @mikemccand I pushed another commit to my prototype to make it almost the same as this change but with a bit more code-reuse I think. please take a look at this here https://github.com/apache/lucene-solr/compare/master...s1monw:LUCENE-8962 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw edited a comment on pull request #1552: LUCENE-8962: merge small segments on commit
s1monw edited a comment on pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-643486962 @mikemccand thanks for replying to all these comments. I do understand that this change has an impact and I agree we should add this functionality. I just disagree with the how it's done and how much code is used. I will go an reply to some of your comments directly, in the meanwhile I went ahead to prototype some ideas in how this can be less intrusive and reuse code. I pushed one commit here https://github.com/apache/lucene-solr/compare/master...s1monw:LUCENE-8962 to showcase what I mean. I even think we can get away without a new method on MergePolicy but that's too much for the prototype. I'd be ok with adding a setting to IWC if we can't agree on a different way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit
s1monw commented on a change in pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439647024 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3228,15 +3268,38 @@ private long prepareCommitInternal() throws IOException { // sneak into the commit point: toCommit = segmentInfos.clone(); + if (anyChanges) { +// Find any merges that can execute on commit (per MergePolicy). +MergePolicy.MergeSpecification mergeSpec = Review comment: I tried to showcase this here https://github.com/apache/lucene-solr/compare/master...s1monw:LUCENE-8962 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org