[jira] [Created] (SOLR-14976) New collection is not getting created.

2020-11-01 Thread Prince (Jira)
Prince created SOLR-14976:
-

 Summary: New collection is not getting created.
 Key: SOLR-14976
 URL: https://issues.apache.org/jira/browse/SOLR-14976
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 7.7.3
Reporter: Prince
 Attachments: Solr_Zk_Logs.txt

Hi Team,

We aren't able to create a new collection, either from solradmin UI or from 
CLI. Once solr is restarted, we are able to proceed with creating collections 
and the same scenario repeats over.

Attached zookeeper and solr logs to the case.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8626) standardise test class naming

2020-11-01 Thread Marcus Eagan (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224350#comment-17224350
 ] 

Marcus Eagan commented on LUCENE-8626:
--

[I added another 36 tests to this 
ticket|https://github.com/apache/lucene-solr/pull/2053]

> standardise test class naming
> -
>
> Key: LUCENE-8626
> URL: https://issues.apache.org/jira/browse/LUCENE-8626
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Christine Poerschke
>Priority: Major
> Attachments: SOLR-12939.01.patch, SOLR-12939.02.patch, 
> SOLR-12939.03.patch, SOLR-12939_hoss_validation_groovy_experiment.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This was mentioned and proposed on the dev mailing list. Starting this ticket 
> here to start to make it happen?
> History: This ticket was created as 
> https://issues.apache.org/jira/browse/SOLR-12939 ticket and then got 
> JIRA-moved to become https://issues.apache.org/jira/browse/LUCENE-8626 ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14413) allow timeAllowed and cursorMark parameters

2020-11-01 Thread Yevhen Tienkaiev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224345#comment-17224345
 ] 

Yevhen Tienkaiev edited comment on SOLR-14413 at 11/1/20, 10:18 PM:


Sounds good to me, since *partialResults* will show when I get partial results 
or not. It still a valid compromise.
Thank you!


was (Author: hronom):
Sounds good to me, since *partialResults* will show when I get partial results 
or not.
Thank you!

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Priority: Minor
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413-jg-update3.patch, SOLR-14413.patch, 
> Screen Shot 2020-10-23 at 10.08.26 PM.png, Screen Shot 2020-10-23 at 10.09.11 
> PM.png, image-2020-08-18-16-56-41-736.png, image-2020-08-18-16-56-59-178.png, 
> image-2020-08-21-14-18-36-229.png, timeallowed_cursormarks_results.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14413) allow timeAllowed and cursorMark parameters

2020-11-01 Thread Yevhen Tienkaiev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224345#comment-17224345
 ] 

Yevhen Tienkaiev commented on SOLR-14413:
-

Sounds good to me, since *partialResults* will show when I get partial results 
or not.
Thank you!

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Priority: Minor
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413-jg-update3.patch, SOLR-14413.patch, 
> Screen Shot 2020-10-23 at 10.08.26 PM.png, Screen Shot 2020-10-23 at 10.09.11 
> PM.png, image-2020-08-18-16-56-41-736.png, image-2020-08-18-16-56-59-178.png, 
> image-2020-08-21-14-18-36-229.png, timeallowed_cursormarks_results.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

2020-11-01 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224262#comment-17224262
 ] 

Michael Sokolov commented on LUCENE-9583:
-

I did fairly recently work up a version of LUCENE-9322/LUCENE-9004  based on a 
graph of docids, without random access, and I guess I didn't like the way it 
came out. Maybe we'll end up going back there, given the weight of opinion on 
this thread, but for the record, the things I didn't like:

1. An additional lookup is required for every access (mapping docid to internal 
storage offset). Additional memory is required especially when merging sorted 
indexes.
2. I found that in addition to reset(), I needed to add clone() because these 
iterators/accessors have internal state (the vector value), and clone() allows 
you to have two at once over the same underlying vectors that don't interfere 
with each other. So eg when constructing the graph, one accessor traverses all 
the values, while the other is used to pull related values for comparison. And 
of course calling reset/advance/access means three calls when one would do.
3. In order to efficiently limit the number of outgoing nodes in the graph, we 
would want to keep a score-based priority queue per node (while constructing 
the graph). This means nodes are stored unordered, and in order to use a 
forward-only iterator to traverse a nodes' neighbors, they must be sorted 
repeatedly during graph traversal (or reset() called between every lookup).
4. We don't have any concrete ideas how we would make use of a restriction to 
forward-only iteration. Maybe in the future there would be an optimization that 
requires it, but so far I haven't seen any proposal.

I do agree that exposing an API based on internal ordinals (not docids) is 
somewhat unsatisfying. Maybe we can bury it more deeply in the codec, or maybe 
there's a middle ground here, an API with direct lookup using docids?

I did not do careful performance studies of these different approaches, and I 
take the point that differences might be small. Speed is a feature here, so I 
don't think we should ignore it, and some data should help us make a decision. 
I can try to work up a study (KnnGraphTester is what I am using to measure 
perf) to see if there's a difference. Maybe I can resurrect the docid-based 
patch I had working at some point, and even try restricting to forward-only 
iteration, which I have to say to me seems radical!


> How should we expose VectorValues.RandomAccess?
> ---
>
> Key: LUCENE-9583
> URL: https://issues.apache.org/jira/browse/LUCENE-9583
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

2020-11-01 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224226#comment-17224226
 ] 

Tomoko Uchida commented on LUCENE-9583:
---

bq. I wonder if the graph approach really needs random access. For each node we 
need to access the list of neighbors so this pattern doesn't require random 
access because the list is already sorted by doc ids. So instead of adding 
another interface I wonder what do you think of adding a reset method in 
VectorValues ? For each node, the pattern to access would be to reset the 
iterator first and then move it to the first neighbor. We can make optimization 
internally to provide fast reset so that we don't need two implementations for 
the first two approaches that we foresee ?

I would prefer this approach, encouraging forward only iteration and calling 
reset() when needed, even if it could look a bit non-intuitive to implement 
graph based aknn search.
I feel exposing public APIs for "random" access pattern needs more careful 
decision and we should start from conservative ways (we already discussed about 
that several times and couldn't reach an agreement so far, according to my 
understanding).

> How should we expose VectorValues.RandomAccess?
> ---
>
> Key: LUCENE-9583
> URL: https://issues.apache.org/jira/browse/LUCENE-9583
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org