Re: Folding Repeated Letters

2020-10-08 Thread Mike Drob
I was thinking about that, but there are words that are legitimately
different with repeated consonants. My primary school teacher lost hair
over getting us to learn the difference between desert and dessert.

Maybe we need something that can borrow the boosting behaviour of fuzzy
query - match the exact term, but also the neighbors with a slight deboost,
so that if the main term exists those others won't show up.

On Thu, Oct 8, 2020 at 5:46 PM Andy Webb  wrote:

> How about something like this?
>
> {
> "add-field-type": [
> {
> "name": "norepeat",
> "class": "solr.TextField",
> "analyzer": {
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> },
> {
> "class": "solr.PatternReplaceFilterFactory",
> "pattern": "(.)\\1+",
> "replacement": "$1"
> }
> ]
> }
> }
> ]
> }
>
> This finds a match...
>
> http://localhost:8983/solr/#/norepeat/analysis?analysis.fieldvalue=Yes&analysis.query=YyyeeEssSs&analysis.fieldtype=norepeat
>
> Andy
>
>
>
> On Thu, 8 Oct 2020 at 23:02, Mike Drob  wrote:
>
> > I'm looking for a way to transform words with repeated letters into the
> > same token - does something like this exist out of the box? Do our
> stemmers
> > support it?
> >
> > For example, say I would want all of these terms to return the same
> search
> > results:
> >
> > YES
> > YESSS
> > YYYEEESSS
> > YYEE[...]S
> >
> > I don't know how long a user would hold down the S key at the end to
> > capture their level of excitement, and I don't want to manually define
> > synonyms for every length.
> >
> > I'm pretty sure that I don't want PhoneticFilter here, maybe
> > PatternReplace? Not a huge fan of how that one is configured, and I think
> > I'd have to set up a bunch of patterns inline for it?
> >
> > Mike
> >
>


Re: Question about solr commits

2020-10-08 Thread Erick Erickson
This is a bit confused. There will be only one timer that starts at time T when
the first doc comes in. At T+ 15 seconds, all docs that have been received since
time T will be committed. The first doc to hit Solr _after_ T+15 seconds starts
a single new timer and the process repeats.

Best,
rick

> On Oct 8, 2020, at 2:26 PM, Rahul Goswami  wrote:
> 
> Shawn,
> So if the autoCommit interval is 15 seconds, and one update request arrives
> at t=0 and another at t=10 seconds, then will there be two timers one
> expiring at t=15 and another at t=25 seconds, but this would amount to ONLY
> ONE commit at t=15 since that one would include changes from both updates.
> Is this understanding correct ?
> 
> Thanks,
> Rahul
> 
> On Wed, Oct 7, 2020 at 11:39 PM yaswanth kumar 
> wrote:
> 
>> Thank you very much both Eric and Shawn
>> 
>> Sent from my iPhone
>> 
>>> On Oct 7, 2020, at 10:41 PM, Shawn Heisey  wrote:
>>> 
>>> On 10/7/2020 4:40 PM, yaswanth kumar wrote:
 I have the below in my solrconfig.xml
 

  ${solr.Data.dir:}


  ${solr.autoCommit.maxTime:6}
  false


  ${solr.autoSoftCommit.maxTime:5000}

  
 Does this mean even though we are always sending data with commit=false
>> on
 update solr api, the above should do the commit every minute (6 ms)
 right?
>>> 
>>> Assuming that you have not defined the "solr.autoCommit.maxTime" and/or
>> "solr.autoSoftCommit.maxTime" properties, this config has autoCommit set to
>> 60 seconds without opening a searcher, and autoSoftCommit set to 5 seconds.
>>> 
>>> So five seconds after any indexing begins, Solr will do a soft commit.
>> When that commit finishes, changes to the index will be visible to
>> queries.  One minute after any indexing begins, Solr will do a hard commit,
>> which guarantees that data is written to disk, but it will NOT open a new
>> searcher, which means that when the hard commit happens, any pending
>> changes to the index will not be visible.
>>> 
>>> It's not "every five seconds" or "every 60 seconds" ... When any changes
>> are made, Solr starts a timer.  When the timer expires, the commit is
>> fired.  If no changes are made, no commits happen, because the timer isn't
>> started.
>>> 
>>> Thanks,
>>> Shawn
>> 



Re: Folding Repeated Letters

2020-10-08 Thread Andy Webb
How about something like this?

{
"add-field-type": [
{
"name": "norepeat",
"class": "solr.TextField",
"analyzer": {
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.PatternReplaceFilterFactory",
"pattern": "(.)\\1+",
"replacement": "$1"
}
]
}
}
]
}

This finds a match...
http://localhost:8983/solr/#/norepeat/analysis?analysis.fieldvalue=Yes&analysis.query=YyyeeEssSs&analysis.fieldtype=norepeat

Andy



On Thu, 8 Oct 2020 at 23:02, Mike Drob  wrote:

> I'm looking for a way to transform words with repeated letters into the
> same token - does something like this exist out of the box? Do our stemmers
> support it?
>
> For example, say I would want all of these terms to return the same search
> results:
>
> YES
> YESSS
> YYYEEESSS
> YYEE[...]S
>
> I don't know how long a user would hold down the S key at the end to
> capture their level of excitement, and I don't want to manually define
> synonyms for every length.
>
> I'm pretty sure that I don't want PhoneticFilter here, maybe
> PatternReplace? Not a huge fan of how that one is configured, and I think
> I'd have to set up a bunch of patterns inline for it?
>
> Mike
>


Folding Repeated Letters

2020-10-08 Thread Mike Drob
I'm looking for a way to transform words with repeated letters into the
same token - does something like this exist out of the box? Do our stemmers
support it?

For example, say I would want all of these terms to return the same search
results:

YES
YESSS
YYYEEESSS
YYEE[...]S

I don't know how long a user would hold down the S key at the end to
capture their level of excitement, and I don't want to manually define
synonyms for every length.

I'm pretty sure that I don't want PhoneticFilter here, maybe
PatternReplace? Not a huge fan of how that one is configured, and I think
I'd have to set up a bunch of patterns inline for it?

Mike


Re: Term too complex for spellcheck.q param

2020-10-08 Thread Andy Webb
I added the maxQueryLength option to DirectSolrSpellchecker in
https://issues.apache.org/jira/browse/SOLR-14131 - that landed in 8.5.0 so
should be available to you.

Andy

On Wed, 7 Oct 2020 at 23:53, gnandre  wrote:

> Is there a way to truncate spellcheck.q param value from Solr side?
>
> On Wed, Oct 7, 2020, 6:22 PM gnandre  wrote:
>
> > Thanks. Is this going to be fixed in some future version?
> >
> > On Wed, Oct 7, 2020, 4:15 PM Mike Drob  wrote:
> >
> >> Right now the only solution is to use a shorter term.
> >>
> >> In a fuzzy query you could also try using a lower edit distance e.g.
> >> term~1
> >> (default is 2), but I’m not sure what the syntax for a spellcheck would
> >> be.
> >>
> >> Mike
> >>
> >> On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:
> >>
> >> > Hi,
> >> >
> >> > I am getting following error when I pass '
> >> > 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
> >> > ' in spellcheck.q param. How to avoid this error? I am using Solr
> 8.5.2
> >> >
> >> > {
> >> >   "error": {
> >> > "code": 500,
> >> > "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
> >> > 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
> >> > "trace":
> >> "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
> >> > Term too complex:
> >> > 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
> >> >
> >> >
> >>
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
> >> >
> >>
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
> >> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
> >> >
> >> >
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
> >> >
> >> >
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n

Re: Solr endpoint on the public internet

2020-10-08 Thread Alexandre Rafalovitch
Could be fun red/blue team exercise. Just watch out for those
cryptominors that get in through Solr injection (among many other
unsecured methods) and are a real pain to remove.

Regards,
   Alex.
P.s. Don't ask me how I know :-(
P.p.s. Read-only docker container may still be a good layer of defence
on top of everything. Respawn it every hour, if needed.

On Thu, 8 Oct 2020 at 15:05, David Hastings  wrote:
>
> Welp. Never mind I refer back to point #1 this is a bad idea
>
> > On Oct 8, 2020, at 3:01 PM, Alexandre Rafalovitch  
> > wrote:
> >
> > The update handlers are now implicitly defined (3 or 4 of them). So,
> > it actually needs to be explicitly shadowed and overridden with other
> > Noop handler. And block Config API to avoid attackers creating new
> > handlers.
> >
> > Regards,
> >   Alex.
> >
> >> On Thu, 8 Oct 2020 at 14:54, David Hastings  wrote:
> >>
> >> Well that’s why I suggested deleting the update handler :)
> >>
>  On Oct 8, 2020, at 2:52 PM, Walter Underwood  
>  wrote:
> >>>
> >>> Let me know where it is and I’ll delete all the documents in your 
> >>> collection.
> >>> It is easy, just one HTTP request.
> >>>
> >>> https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
>  On Oct 8, 2020, at 11:49 AM, Alexandre Rafalovitch  
>  wrote:
> 
>  I think there were past discussions about people doing but they really
>  really knew what they were doing from a security perspective, not just
>  Solr one.
> 
>  You are increasing your risk factor a lot, so you need to think
>  through this. What are you protecting and what are you exposing. Are
>  you trying to protect the updates? You may be able to do it with - for
>  example - read-only docker container, or with embedded Solr or/and
>  with reverse proxy.
> 
>  Are you trying to protect some of the data from being read? Even harder.
> 
>  There are implicit handlers, admin handlers, 'qt' to select query
>  parser, etc. Lots of things to think about.
> 
>  It just may not be worth it.
> 
>  Regards,
>  Alex.
> 
> 
> > On Thu, 8 Oct 2020 at 14:27, Marco Aurélio  
> > wrote:
> >
> > Hi!
> >
> > We're looking into the option of setting up search with Solr without an
> > intermediary application. This would mean our backend would index data 
> > into
> > Solr and we would have a public Solr endpoint on the internet that would
> > receive search requests directly.
> >
> > Since I couldn't find an existing solution similar to ours, I would 
> > like to
> > know whether it's possible to secure Solr in a way that allows anyone 
> > only
> > read-access only to collections and how to achieve that. Specifically
> > because of this part of the documentation
> > :
> >
> > *No Solr API, including the Admin UI, is designed to be exposed to
> > non-trusted parties. Tune your firewall so that only trusted computers 
> > and
> > people are allowed access. Because of this, the project will not regard
> > e.g., Admin UI XSS issues as security vulnerabilities. However, we still
> > ask you to report such issues in JIRA.*
> > Is there a way we can restrict read-only access to Solr collections so 
> > as
> > to allow users to make search requests directly to it or should we 
> > always
> > keep our Solr instances completely private?
> >
> > Thanks in advance!
> >
> > Best regards,
> > Marco Godinho
> >>>


[ANNOUNCE] Apache Solr 8.6.3 released

2020-10-08 Thread Jason Gerlowski
The Lucene PMC is pleased to announce the release of Apache Solr 8.6.3.

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document handling, and
geospatial search. Solr is highly scalable, providing fault tolerant
distributed search and indexing, and powers the search and navigation
features of many of the world's largest internet sites.

Solr 8.6.3 is available for immediate download at:
  

### Solr 8.6.3 Release Highlights:

 * SOLR-14898: Prevent duplicate header accumulation on internally
forwarded requests
 * SOLR-14768: Fix HTTP multipart POST requests to Solr (8.6.0 regression)
 * SOLR-14859: PrefixTree-based fields now reject invalid schema
properties instead of quietly failing certain queries
 * SOLR-14663: CREATE ConfigSet action now copies base node content

Please refer to the Upgrade Notes in the Solr Ref Guide for
information on upgrading from previous Solr versions:
  

Please read CHANGES.txt for a full list of bugfixes:
  

Solr 8.6.3 also includes bugfixes in the corresponding Apache Lucene release:
  

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases. It is possible that the mirror you are using may not have
replicated the release yet. If that is the case, please try another mirror.
This also applies to Maven access.


Re: Solr endpoint on the public internet

2020-10-08 Thread David Hastings
Welp. Never mind I refer back to point #1 this is a bad idea 

> On Oct 8, 2020, at 3:01 PM, Alexandre Rafalovitch  wrote:
> 
> The update handlers are now implicitly defined (3 or 4 of them). So,
> it actually needs to be explicitly shadowed and overridden with other
> Noop handler. And block Config API to avoid attackers creating new
> handlers.
> 
> Regards,
>   Alex.
> 
>> On Thu, 8 Oct 2020 at 14:54, David Hastings  wrote:
>> 
>> Well that’s why I suggested deleting the update handler :)
>> 
 On Oct 8, 2020, at 2:52 PM, Walter Underwood  wrote:
>>> 
>>> Let me know where it is and I’ll delete all the documents in your 
>>> collection.
>>> It is easy, just one HTTP request.
>>> 
>>> https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Oct 8, 2020, at 11:49 AM, Alexandre Rafalovitch  
 wrote:
 
 I think there were past discussions about people doing but they really
 really knew what they were doing from a security perspective, not just
 Solr one.
 
 You are increasing your risk factor a lot, so you need to think
 through this. What are you protecting and what are you exposing. Are
 you trying to protect the updates? You may be able to do it with - for
 example - read-only docker container, or with embedded Solr or/and
 with reverse proxy.
 
 Are you trying to protect some of the data from being read? Even harder.
 
 There are implicit handlers, admin handlers, 'qt' to select query
 parser, etc. Lots of things to think about.
 
 It just may not be worth it.
 
 Regards,
 Alex.
 
 
> On Thu, 8 Oct 2020 at 14:27, Marco Aurélio  
> wrote:
> 
> Hi!
> 
> We're looking into the option of setting up search with Solr without an
> intermediary application. This would mean our backend would index data 
> into
> Solr and we would have a public Solr endpoint on the internet that would
> receive search requests directly.
> 
> Since I couldn't find an existing solution similar to ours, I would like 
> to
> know whether it's possible to secure Solr in a way that allows anyone only
> read-access only to collections and how to achieve that. Specifically
> because of this part of the documentation
> :
> 
> *No Solr API, including the Admin UI, is designed to be exposed to
> non-trusted parties. Tune your firewall so that only trusted computers and
> people are allowed access. Because of this, the project will not regard
> e.g., Admin UI XSS issues as security vulnerabilities. However, we still
> ask you to report such issues in JIRA.*
> Is there a way we can restrict read-only access to Solr collections so as
> to allow users to make search requests directly to it or should we always
> keep our Solr instances completely private?
> 
> Thanks in advance!
> 
> Best regards,
> Marco Godinho
>>> 


Re: Solr endpoint on the public internet

2020-10-08 Thread Alexandre Rafalovitch
The update handlers are now implicitly defined (3 or 4 of them). So,
it actually needs to be explicitly shadowed and overridden with other
Noop handler. And block Config API to avoid attackers creating new
handlers.

Regards,
   Alex.

On Thu, 8 Oct 2020 at 14:54, David Hastings  wrote:
>
> Well that’s why I suggested deleting the update handler :)
>
> > On Oct 8, 2020, at 2:52 PM, Walter Underwood  wrote:
> >
> > Let me know where it is and I’ll delete all the documents in your 
> > collection.
> > It is easy, just one HTTP request.
> >
> > https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Oct 8, 2020, at 11:49 AM, Alexandre Rafalovitch  
> >> wrote:
> >>
> >> I think there were past discussions about people doing but they really
> >> really knew what they were doing from a security perspective, not just
> >> Solr one.
> >>
> >> You are increasing your risk factor a lot, so you need to think
> >> through this. What are you protecting and what are you exposing. Are
> >> you trying to protect the updates? You may be able to do it with - for
> >> example - read-only docker container, or with embedded Solr or/and
> >> with reverse proxy.
> >>
> >> Are you trying to protect some of the data from being read? Even harder.
> >>
> >> There are implicit handlers, admin handlers, 'qt' to select query
> >> parser, etc. Lots of things to think about.
> >>
> >> It just may not be worth it.
> >>
> >> Regards,
> >>  Alex.
> >>
> >>
> >>> On Thu, 8 Oct 2020 at 14:27, Marco Aurélio  
> >>> wrote:
> >>>
> >>> Hi!
> >>>
> >>> We're looking into the option of setting up search with Solr without an
> >>> intermediary application. This would mean our backend would index data 
> >>> into
> >>> Solr and we would have a public Solr endpoint on the internet that would
> >>> receive search requests directly.
> >>>
> >>> Since I couldn't find an existing solution similar to ours, I would like 
> >>> to
> >>> know whether it's possible to secure Solr in a way that allows anyone only
> >>> read-access only to collections and how to achieve that. Specifically
> >>> because of this part of the documentation
> >>> :
> >>>
> >>> *No Solr API, including the Admin UI, is designed to be exposed to
> >>> non-trusted parties. Tune your firewall so that only trusted computers and
> >>> people are allowed access. Because of this, the project will not regard
> >>> e.g., Admin UI XSS issues as security vulnerabilities. However, we still
> >>> ask you to report such issues in JIRA.*
> >>> Is there a way we can restrict read-only access to Solr collections so as
> >>> to allow users to make search requests directly to it or should we always
> >>> keep our Solr instances completely private?
> >>>
> >>> Thanks in advance!
> >>>
> >>> Best regards,
> >>> Marco Godinho
> >


Re: Solr endpoint on the public internet

2020-10-08 Thread David Hastings
Well that’s why I suggested deleting the update handler :)

> On Oct 8, 2020, at 2:52 PM, Walter Underwood  wrote:
> 
> Let me know where it is and I’ll delete all the documents in your collection.
> It is easy, just one HTTP request.
> 
> https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Oct 8, 2020, at 11:49 AM, Alexandre Rafalovitch  
>> wrote:
>> 
>> I think there were past discussions about people doing but they really
>> really knew what they were doing from a security perspective, not just
>> Solr one.
>> 
>> You are increasing your risk factor a lot, so you need to think
>> through this. What are you protecting and what are you exposing. Are
>> you trying to protect the updates? You may be able to do it with - for
>> example - read-only docker container, or with embedded Solr or/and
>> with reverse proxy.
>> 
>> Are you trying to protect some of the data from being read? Even harder.
>> 
>> There are implicit handlers, admin handlers, 'qt' to select query
>> parser, etc. Lots of things to think about.
>> 
>> It just may not be worth it.
>> 
>> Regards,
>>  Alex.
>> 
>> 
>>> On Thu, 8 Oct 2020 at 14:27, Marco Aurélio  
>>> wrote:
>>> 
>>> Hi!
>>> 
>>> We're looking into the option of setting up search with Solr without an
>>> intermediary application. This would mean our backend would index data into
>>> Solr and we would have a public Solr endpoint on the internet that would
>>> receive search requests directly.
>>> 
>>> Since I couldn't find an existing solution similar to ours, I would like to
>>> know whether it's possible to secure Solr in a way that allows anyone only
>>> read-access only to collections and how to achieve that. Specifically
>>> because of this part of the documentation
>>> :
>>> 
>>> *No Solr API, including the Admin UI, is designed to be exposed to
>>> non-trusted parties. Tune your firewall so that only trusted computers and
>>> people are allowed access. Because of this, the project will not regard
>>> e.g., Admin UI XSS issues as security vulnerabilities. However, we still
>>> ask you to report such issues in JIRA.*
>>> Is there a way we can restrict read-only access to Solr collections so as
>>> to allow users to make search requests directly to it or should we always
>>> keep our Solr instances completely private?
>>> 
>>> Thanks in advance!
>>> 
>>> Best regards,
>>> Marco Godinho
> 


Re: Solr endpoint on the public internet

2020-10-08 Thread Walter Underwood
Let me know where it is and I’ll delete all the documents in your collection.
It is easy, just one HTTP request.

https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 8, 2020, at 11:49 AM, Alexandre Rafalovitch  wrote:
> 
> I think there were past discussions about people doing but they really
> really knew what they were doing from a security perspective, not just
> Solr one.
> 
> You are increasing your risk factor a lot, so you need to think
> through this. What are you protecting and what are you exposing. Are
> you trying to protect the updates? You may be able to do it with - for
> example - read-only docker container, or with embedded Solr or/and
> with reverse proxy.
> 
> Are you trying to protect some of the data from being read? Even harder.
> 
> There are implicit handlers, admin handlers, 'qt' to select query
> parser, etc. Lots of things to think about.
> 
> It just may not be worth it.
> 
> Regards,
>   Alex.
> 
> 
> On Thu, 8 Oct 2020 at 14:27, Marco Aurélio  wrote:
>> 
>> Hi!
>> 
>> We're looking into the option of setting up search with Solr without an
>> intermediary application. This would mean our backend would index data into
>> Solr and we would have a public Solr endpoint on the internet that would
>> receive search requests directly.
>> 
>> Since I couldn't find an existing solution similar to ours, I would like to
>> know whether it's possible to secure Solr in a way that allows anyone only
>> read-access only to collections and how to achieve that. Specifically
>> because of this part of the documentation
>> :
>> 
>> *No Solr API, including the Admin UI, is designed to be exposed to
>> non-trusted parties. Tune your firewall so that only trusted computers and
>> people are allowed access. Because of this, the project will not regard
>> e.g., Admin UI XSS issues as security vulnerabilities. However, we still
>> ask you to report such issues in JIRA.*
>> Is there a way we can restrict read-only access to Solr collections so as
>> to allow users to make search requests directly to it or should we always
>> keep our Solr instances completely private?
>> 
>> Thanks in advance!
>> 
>> Best regards,
>> Marco Godinho



Re: Solr endpoint on the public internet

2020-10-08 Thread Alexandre Rafalovitch
I think there were past discussions about people doing but they really
really knew what they were doing from a security perspective, not just
Solr one.

You are increasing your risk factor a lot, so you need to think
through this. What are you protecting and what are you exposing. Are
you trying to protect the updates? You may be able to do it with - for
example - read-only docker container, or with embedded Solr or/and
with reverse proxy.

Are you trying to protect some of the data from being read? Even harder.

There are implicit handlers, admin handlers, 'qt' to select query
parser, etc. Lots of things to think about.

It just may not be worth it.

Regards,
   Alex.


On Thu, 8 Oct 2020 at 14:27, Marco Aurélio  wrote:
>
> Hi!
>
> We're looking into the option of setting up search with Solr without an
> intermediary application. This would mean our backend would index data into
> Solr and we would have a public Solr endpoint on the internet that would
> receive search requests directly.
>
> Since I couldn't find an existing solution similar to ours, I would like to
> know whether it's possible to secure Solr in a way that allows anyone only
> read-access only to collections and how to achieve that. Specifically
> because of this part of the documentation
> :
>
> *No Solr API, including the Admin UI, is designed to be exposed to
> non-trusted parties. Tune your firewall so that only trusted computers and
> people are allowed access. Because of this, the project will not regard
> e.g., Admin UI XSS issues as security vulnerabilities. However, we still
> ask you to report such issues in JIRA.*
> Is there a way we can restrict read-only access to Solr collections so as
> to allow users to make search requests directly to it or should we always
> keep our Solr instances completely private?
>
> Thanks in advance!
>
> Best regards,
> Marco Godinho


Re: Solr endpoint on the public internet

2020-10-08 Thread Jörn Franke


It is like opening a database to the Internet - you simply don’t do it and I 
don’t recommend it.

If you despite the anti pattern want to do it  use the latest Solr versions and 
put a reverse proxy in front. Always use authentication and authorization. Do 
only allow a minimal API endpoints and no admin UI. Limit IPs that can access 
it. Do not use it for confidential data. 
If data (even public one!) gets leaked from your Solr instance it is very bad 
for the reputation of your Organisation.

Future versions allow to disable security problematic modules. Better wait for 
them. Still I would not do it in the first place - you also would not open 
databases to the Internet. I could also not find a use case for which this is 
needed.

> Am 08.10.2020 um 20:27 schrieb Marco Aurélio :
> 
> Hi!
> 
> We're looking into the option of setting up search with Solr without an
> intermediary application. This would mean our backend would index data into
> Solr and we would have a public Solr endpoint on the internet that would
> receive search requests directly.
> 
> Since I couldn't find an existing solution similar to ours, I would like to
> know whether it's possible to secure Solr in a way that allows anyone only
> read-access only to collections and how to achieve that. Specifically
> because of this part of the documentation
> :
> 
> *No Solr API, including the Admin UI, is designed to be exposed to
> non-trusted parties. Tune your firewall so that only trusted computers and
> people are allowed access. Because of this, the project will not regard
> e.g., Admin UI XSS issues as security vulnerabilities. However, we still
> ask you to report such issues in JIRA.*
> Is there a way we can restrict read-only access to Solr collections so as
> to allow users to make search requests directly to it or should we always
> keep our Solr instances completely private?
> 
> Thanks in advance!
> 
> Best regards,
> Marco Godinho


Re: Solr endpoint on the public internet

2020-10-08 Thread Dave
#1. This is a HORRIBLE IDEA
#2 If I was going to do this I would destroy the update request handler as well 
as the entire admin ui from the solr instance, set up a replication from a 
secure solr instance on an interval. This way no one could send an update 
/delete command, you could still update the index, and still be readable. Just 
remove any request handler that isn’t a search or replicate, and put the 
replication only on a port shared between the master and slave, 

> On Oct 8, 2020, at 2:27 PM, Marco Aurélio  wrote:
> 
> Hi!
> 
> We're looking into the option of setting up search with Solr without an
> intermediary application. This would mean our backend would index data into
> Solr and we would have a public Solr endpoint on the internet that would
> receive search requests directly.
> 
> Since I couldn't find an existing solution similar to ours, I would like to
> know whether it's possible to secure Solr in a way that allows anyone only
> read-access only to collections and how to achieve that. Specifically
> because of this part of the documentation
> :
> 
> *No Solr API, including the Admin UI, is designed to be exposed to
> non-trusted parties. Tune your firewall so that only trusted computers and
> people are allowed access. Because of this, the project will not regard
> e.g., Admin UI XSS issues as security vulnerabilities. However, we still
> ask you to report such issues in JIRA.*
> Is there a way we can restrict read-only access to Solr collections so as
> to allow users to make search requests directly to it or should we always
> keep our Solr instances completely private?
> 
> Thanks in advance!
> 
> Best regards,
> Marco Godinho


Solr endpoint on the public internet

2020-10-08 Thread Marco Aurélio
Hi!

We're looking into the option of setting up search with Solr without an
intermediary application. This would mean our backend would index data into
Solr and we would have a public Solr endpoint on the internet that would
receive search requests directly.

Since I couldn't find an existing solution similar to ours, I would like to
know whether it's possible to secure Solr in a way that allows anyone only
read-access only to collections and how to achieve that. Specifically
because of this part of the documentation
:

*No Solr API, including the Admin UI, is designed to be exposed to
non-trusted parties. Tune your firewall so that only trusted computers and
people are allowed access. Because of this, the project will not regard
e.g., Admin UI XSS issues as security vulnerabilities. However, we still
ask you to report such issues in JIRA.*
Is there a way we can restrict read-only access to Solr collections so as
to allow users to make search requests directly to it or should we always
keep our Solr instances completely private?

Thanks in advance!

Best regards,
Marco Godinho


Re: Question about solr commits

2020-10-08 Thread Rahul Goswami
Shawn,
So if the autoCommit interval is 15 seconds, and one update request arrives
at t=0 and another at t=10 seconds, then will there be two timers one
expiring at t=15 and another at t=25 seconds, but this would amount to ONLY
ONE commit at t=15 since that one would include changes from both updates.
Is this understanding correct ?

Thanks,
Rahul

On Wed, Oct 7, 2020 at 11:39 PM yaswanth kumar 
wrote:

> Thank you very much both Eric and Shawn
>
> Sent from my iPhone
>
> > On Oct 7, 2020, at 10:41 PM, Shawn Heisey  wrote:
> >
> > On 10/7/2020 4:40 PM, yaswanth kumar wrote:
> >> I have the below in my solrconfig.xml
> >> 
> >> 
> >>   ${solr.Data.dir:}
> >> 
> >> 
> >>   ${solr.autoCommit.maxTime:6}
> >>   false
> >> 
> >> 
> >>   ${solr.autoSoftCommit.maxTime:5000}
> >> 
> >>   
> >> Does this mean even though we are always sending data with commit=false
> on
> >> update solr api, the above should do the commit every minute (6 ms)
> >> right?
> >
> > Assuming that you have not defined the "solr.autoCommit.maxTime" and/or
> "solr.autoSoftCommit.maxTime" properties, this config has autoCommit set to
> 60 seconds without opening a searcher, and autoSoftCommit set to 5 seconds.
> >
> > So five seconds after any indexing begins, Solr will do a soft commit.
> When that commit finishes, changes to the index will be visible to
> queries.  One minute after any indexing begins, Solr will do a hard commit,
> which guarantees that data is written to disk, but it will NOT open a new
> searcher, which means that when the hard commit happens, any pending
> changes to the index will not be visible.
> >
> > It's not "every five seconds" or "every 60 seconds" ... When any changes
> are made, Solr starts a timer.  When the timer expires, the commit is
> fired.  If no changes are made, no commits happen, because the timer isn't
> started.
> >
> > Thanks,
> > Shawn
>


'Exists' query not working for geospatial fields in Solr >= 8.5.0?

2020-10-08 Thread Ondra Horak
Hi,

I just found Solr queries like field:* are not working anymore for
fields of type SpatialRecursivePrefixTreeFieldType.  It seems to work
in 8.4.1, since 8.5.0 it just gives an empty result. Is this an
intended behaviour, or a bug?

Looking at Solr release notes I'd say it might be a consequence of a
bugfix introduced in Solr 8.5.0:
SOLR-11746: Adding existence queries for PointFields.
DocValuesFieldExistsQuery and NormsFieldExistsQuery are used for
existence queries when possible.
(Houston Putman, hossman, Kai Chan)

What would be the best way to replace this type of queries? I tried to
use a query like field:** as a workaround which works but is quite
inefficient. Another workaround is to search with a large distance to
match any possible point. This is pretty fast (in fact, with my data
it is even faster than field:* in 8.4.1) but it seems like an ugly
hack. Anyway, I would welcome a more transparent behaviour.


Regards,

Ondra Horak


Re: Solr 8.6.2 - Admin UI Issue

2020-10-08 Thread Vinay Rajput
Thanks everyone for your replies.

I definitely cleared browser cache and also tried in incognito mode to rule
out this possibility. I think @Kevin got it right. This is the same issue
already reported in SOLR-14549


Thanks,
Vinay

On Thu, Oct 8, 2020 at 7:16 PM Kevin Risden  wrote:

> Since the image didn't come through - it could be
> https://issues.apache.org/jira/browse/SOLR-14549
>
> Definitely make sure to clear cache to ensure that JS files aren't cached,
> but if that doesn't fix it see if SOLR-14549 is related.
> Kevin Risden
>
>
>
> On Thu, Oct 8, 2020 at 9:38 AM Eric Pugh 
> wrote:
>
> > I’ve seen this behavior as well jumping between versions of Solr.
> > Typically in the browser console I see some sort of very opaque
> Javascript
> > error.
> >
> > > On Oct 8, 2020, at 5:54 AM, Colvin Cowie 
> > wrote:
> > >
> > > Images won't be included on the mailing list. You need to put them
> > > somewhere else and link to them.
> > >
> > > With that said, if you're switching between versions, maybe your
> browser
> > > has the old UI cached? Try clearing the cache / viewing it in a private
> > > window and see if it's any different.
> > >
> > > On Wed, 7 Oct 2020 at 11:22, Vinay Rajput  > > wrote:
> > >
> > >> Hi All,
> > >>
> > >> We are currently using Solr 7.3.1 in cloud mode and planning to
> upgrade.
> > >> When I bootstrapped Solr 8.6.2 in my local machine and uploaded all
> > >> necessary configs, I noticed one issue in admin UI.
> > >>
> > >> If I select a collection and go to files, it shows the content tree
> > having
> > >> all files and folders present in that collection. In Solr 8.6.2, it is
> > >> somehow not showing the folders correctly. In my screenshot, you can
> see
> > >> that velocity and xslt are the folders and we have some config files
> > inside
> > >> these two folders. Because of this issue, I can't click on folder
> nodes
> > and
> > >> see children nodes. I checked the network calls and it looks like we
> are
> > >> getting the correct data from Solr. So, it looks like an Admin UI
> issue
> > to
> > >> me.
> > >>
> > >> Does anyone know if this is a* known issue* or I am missing something
> > >> here? Has anyone noticed the similar issue?  I can confirm that It
> works
> > >> fine with Solr 7.3.1.
> > >>
> > >> [image: image.png][image: image.png]
> > >>
> > >> Left image is for 8.6.2 and right image is for 7.3.1
> > >>
> > >> Thanks,
> > >> Vinay
> >
> > ___
> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> > http://www.opensourceconnections.com <
> > http://www.opensourceconnections.com/> | My Free/Busy <
> > http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> >
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> >
> >
>


Re: Solr 8.6.2 - Admin UI Issue

2020-10-08 Thread Kevin Risden
Since the image didn't come through - it could be
https://issues.apache.org/jira/browse/SOLR-14549

Definitely make sure to clear cache to ensure that JS files aren't cached,
but if that doesn't fix it see if SOLR-14549 is related.
Kevin Risden



On Thu, Oct 8, 2020 at 9:38 AM Eric Pugh 
wrote:

> I’ve seen this behavior as well jumping between versions of Solr.
> Typically in the browser console I see some sort of very opaque Javascript
> error.
>
> > On Oct 8, 2020, at 5:54 AM, Colvin Cowie 
> wrote:
> >
> > Images won't be included on the mailing list. You need to put them
> > somewhere else and link to them.
> >
> > With that said, if you're switching between versions, maybe your browser
> > has the old UI cached? Try clearing the cache / viewing it in a private
> > window and see if it's any different.
> >
> > On Wed, 7 Oct 2020 at 11:22, Vinay Rajput  > wrote:
> >
> >> Hi All,
> >>
> >> We are currently using Solr 7.3.1 in cloud mode and planning to upgrade.
> >> When I bootstrapped Solr 8.6.2 in my local machine and uploaded all
> >> necessary configs, I noticed one issue in admin UI.
> >>
> >> If I select a collection and go to files, it shows the content tree
> having
> >> all files and folders present in that collection. In Solr 8.6.2, it is
> >> somehow not showing the folders correctly. In my screenshot, you can see
> >> that velocity and xslt are the folders and we have some config files
> inside
> >> these two folders. Because of this issue, I can't click on folder nodes
> and
> >> see children nodes. I checked the network calls and it looks like we are
> >> getting the correct data from Solr. So, it looks like an Admin UI issue
> to
> >> me.
> >>
> >> Does anyone know if this is a* known issue* or I am missing something
> >> here? Has anyone noticed the similar issue?  I can confirm that It works
> >> fine with Solr 7.3.1.
> >>
> >> [image: image.png][image: image.png]
> >>
> >> Left image is for 8.6.2 and right image is for 7.3.1
> >>
> >> Thanks,
> >> Vinay
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: Solr 8.6.2 - Admin UI Issue

2020-10-08 Thread Eric Pugh
I’ve seen this behavior as well jumping between versions of Solr.Typically 
in the browser console I see some sort of very opaque Javascript error.   

> On Oct 8, 2020, at 5:54 AM, Colvin Cowie  wrote:
> 
> Images won't be included on the mailing list. You need to put them
> somewhere else and link to them.
> 
> With that said, if you're switching between versions, maybe your browser
> has the old UI cached? Try clearing the cache / viewing it in a private
> window and see if it's any different.
> 
> On Wed, 7 Oct 2020 at 11:22, Vinay Rajput  > wrote:
> 
>> Hi All,
>> 
>> We are currently using Solr 7.3.1 in cloud mode and planning to upgrade.
>> When I bootstrapped Solr 8.6.2 in my local machine and uploaded all
>> necessary configs, I noticed one issue in admin UI.
>> 
>> If I select a collection and go to files, it shows the content tree having
>> all files and folders present in that collection. In Solr 8.6.2, it is
>> somehow not showing the folders correctly. In my screenshot, you can see
>> that velocity and xslt are the folders and we have some config files inside
>> these two folders. Because of this issue, I can't click on folder nodes and
>> see children nodes. I checked the network calls and it looks like we are
>> getting the correct data from Solr. So, it looks like an Admin UI issue to
>> me.
>> 
>> Does anyone know if this is a* known issue* or I am missing something
>> here? Has anyone noticed the similar issue?  I can confirm that It works
>> fine with Solr 7.3.1.
>> 
>> [image: image.png][image: image.png]
>> 
>> Left image is for 8.6.2 and right image is for 7.3.1
>> 
>> Thanks,
>> Vinay

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: Master/Slave

2020-10-08 Thread Eric Pugh
I’ve met folks who’ve actually used the streaming expressions to move data 
around if you are looking for a “all Solr” approach.  If you go down that 
route, I’d love to hear how it works.

> On Oct 8, 2020, at 7:10 AM, Erick Erickson  wrote:
> 
> What Jan said. I wanted to add that the replication API also makes use of it. 
> A little-known fact: you can use the replication API in SolrCloud _without_ 
> configuring anything in solrconfig.xml. You can specify the URL to pull from 
> on the fly in the command….
> 
> Best,
> Erick
> 
>> On Oct 8, 2020, at 2:54 AM, Jan Høydahl  wrote:
>> 
>> The API that enables master/slave is the ReplicationHandler, where the 
>> follower (slave) pulls index files from leader (master).
>> This same API is used in SolrCloud for the PULL replica type, and also as a 
>> fallback for full recovery if transaction log is not enough. 
>> So I don’t see it going away anytime soon, even if the non-cloud deployment 
>> style is less promoted in the documentation.
>> 
>> Jan
>> 
>>> 6. okt. 2020 kl. 16:25 skrev Oakley, Craig (NIH/NLM/NCBI) [C] 
>>> :
>>> 
 it better not ever be depreciated.  it has been the most reliable 
 mechanism for its purpose
>>> 
>>> I would like to know whether that is the consensus of Solr developers.
>>> 
>>> We had been scrambling to move from Master/Slave to CDCR based on the 
>>> assertion that CDCR support would last far longer than Master/Slave support.
>>> 
>>> Can we now assume safely that this assertion is now completely moot? Can we 
>>> now assume safely that Master/Slave is likely to be supported for the 
>>> foreseeable future? Or are we forced to assume that Master/Slave support 
>>> will evaporate shortly after the now-evaporated CDCR support?
>>> 
>>> -Original Message-
>>> From: David Hastings  
>>> Sent: Wednesday, September 30, 2020 3:10 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Master/Slave
>>> 
 whether we should expect Master/Slave replication also to be deprecated
>>> 
>>> it better not ever be depreciated.  it has been the most reliable mechanism
>>> for its purpose, solr cloud isnt going to replace standalone, if it does,
>>> thats when I guess I stop upgrading or move to elastic
>>> 
>>> On Wed, Sep 30, 2020 at 2:58 PM Oakley, Craig (NIH/NLM/NCBI) [C]
>>>  wrote:
>>> 
 Based on the thread below (reading "legacy" as meaning "likely to be
 deprecated in later versions"), we have been working to extract ourselves
 from Master/Slave replication
 
 Most of our collections need to be in two data centers (a read/write copy
 in one local data center: the disaster-recovery-site SolrCloud could be
 read-only). We also need redundancy within each data center for when one
 host or another is unavailable. We implemented this by having different
 SolrClouds in the different data centers; with Master/Slave replication
 pulling data from one of the read/write replicas to each of the Slave
 replicas in the disaster-recovery-site read-only SolrCloud. Additionally,
 for some collections, there is a desire to have local read-only replicas
 remain unchanged for querying during the loading process: for these
 collections, there is a local read/write loading SolrCloud, a local
 read-only querying SolrCloud (normally configured for Master/Slave
 replication from one of the replicas of the loader SolrCloud to both
 replicas of the query SolrCloud, but with Master/Slave disabled when the
 load was in progress on the loader SolrCloud, and with Master/Slave resumed
 after the loaded data passes QA checks).
 
 Based on the thread below, we made an attempt to switch to CDCR. The main
 reason for wanting to change was that CDCR was said to be the supported
 mechanism, and the replacement for Master/Slave replication.
 
 After multiple unsuccessful attempts to get CDCR to work, we ended up with
 reproducible cases of CDCR loosing data in transit. In June, I initiated a
 thread in this group asking for clarification of how/whether CDCR could be
 made reliable. This seemed to me to be met with deafening silence until the
 announcement in July of the release of Solr8.6 and the deprecation of CDCR.
 
 So we are left with the question whether we should expect Master/Slave
 replication also to be deprecated; and if so, with what is it expected to
 be replaced (since not with CDCR)? Or is it now sufficiently safe to assume
 that Master/Slave replication will continue to be supported after all
 (since the assertion that it would be replaced by CDCR has been
 discredited)? In either case, are there other suggested implementations of
 having a read-only SolrCloud receive data from a read/write SolrCloud?
 
 
 Thanks
 
 -Original Message-
 From: Shawn Heisey 
 Sent: Tuesday, May 21, 2019 11:15 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SolrClo

Re: Master/Slave

2020-10-08 Thread Erick Erickson
What Jan said. I wanted to add that the replication API also makes use of it. A 
little-known fact: you can use the replication API in SolrCloud _without_ 
configuring anything in solrconfig.xml. You can specify the URL to pull from on 
the fly in the command….

Best,
Erick

> On Oct 8, 2020, at 2:54 AM, Jan Høydahl  wrote:
> 
> The API that enables master/slave is the ReplicationHandler, where the 
> follower (slave) pulls index files from leader (master).
> This same API is used in SolrCloud for the PULL replica type, and also as a 
> fallback for full recovery if transaction log is not enough. 
> So I don’t see it going away anytime soon, even if the non-cloud deployment 
> style is less promoted in the documentation.
> 
> Jan
> 
>> 6. okt. 2020 kl. 16:25 skrev Oakley, Craig (NIH/NLM/NCBI) [C] 
>> :
>> 
>>> it better not ever be depreciated.  it has been the most reliable mechanism 
>>> for its purpose
>> 
>> I would like to know whether that is the consensus of Solr developers.
>> 
>> We had been scrambling to move from Master/Slave to CDCR based on the 
>> assertion that CDCR support would last far longer than Master/Slave support.
>> 
>> Can we now assume safely that this assertion is now completely moot? Can we 
>> now assume safely that Master/Slave is likely to be supported for the 
>> foreseeable future? Or are we forced to assume that Master/Slave support 
>> will evaporate shortly after the now-evaporated CDCR support?
>> 
>> -Original Message-
>> From: David Hastings  
>> Sent: Wednesday, September 30, 2020 3:10 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Master/Slave
>> 
>>> whether we should expect Master/Slave replication also to be deprecated
>> 
>> it better not ever be depreciated.  it has been the most reliable mechanism
>> for its purpose, solr cloud isnt going to replace standalone, if it does,
>> thats when I guess I stop upgrading or move to elastic
>> 
>> On Wed, Sep 30, 2020 at 2:58 PM Oakley, Craig (NIH/NLM/NCBI) [C]
>>  wrote:
>> 
>>> Based on the thread below (reading "legacy" as meaning "likely to be
>>> deprecated in later versions"), we have been working to extract ourselves
>>> from Master/Slave replication
>>> 
>>> Most of our collections need to be in two data centers (a read/write copy
>>> in one local data center: the disaster-recovery-site SolrCloud could be
>>> read-only). We also need redundancy within each data center for when one
>>> host or another is unavailable. We implemented this by having different
>>> SolrClouds in the different data centers; with Master/Slave replication
>>> pulling data from one of the read/write replicas to each of the Slave
>>> replicas in the disaster-recovery-site read-only SolrCloud. Additionally,
>>> for some collections, there is a desire to have local read-only replicas
>>> remain unchanged for querying during the loading process: for these
>>> collections, there is a local read/write loading SolrCloud, a local
>>> read-only querying SolrCloud (normally configured for Master/Slave
>>> replication from one of the replicas of the loader SolrCloud to both
>>> replicas of the query SolrCloud, but with Master/Slave disabled when the
>>> load was in progress on the loader SolrCloud, and with Master/Slave resumed
>>> after the loaded data passes QA checks).
>>> 
>>> Based on the thread below, we made an attempt to switch to CDCR. The main
>>> reason for wanting to change was that CDCR was said to be the supported
>>> mechanism, and the replacement for Master/Slave replication.
>>> 
>>> After multiple unsuccessful attempts to get CDCR to work, we ended up with
>>> reproducible cases of CDCR loosing data in transit. In June, I initiated a
>>> thread in this group asking for clarification of how/whether CDCR could be
>>> made reliable. This seemed to me to be met with deafening silence until the
>>> announcement in July of the release of Solr8.6 and the deprecation of CDCR.
>>> 
>>> So we are left with the question whether we should expect Master/Slave
>>> replication also to be deprecated; and if so, with what is it expected to
>>> be replaced (since not with CDCR)? Or is it now sufficiently safe to assume
>>> that Master/Slave replication will continue to be supported after all
>>> (since the assertion that it would be replaced by CDCR has been
>>> discredited)? In either case, are there other suggested implementations of
>>> having a read-only SolrCloud receive data from a read/write SolrCloud?
>>> 
>>> 
>>> Thanks
>>> 
>>> -Original Message-
>>> From: Shawn Heisey 
>>> Sent: Tuesday, May 21, 2019 11:15 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: SolrCloud (7.3) and Legacy replication slaves
>>> 
>>> On 5/21/2019 8:48 AM, Michael Tracey wrote:
 Is it possible set up an existing SolrCloud cluster as the master for
 legacy replication to a slave server or two?   It looks like another
>>> option
 is to use Uni-direction CDCR, but not sure what is the best option in
>>> this
 case.
>>> 
>>

Re: Solr 8.6.2 - Admin UI Issue

2020-10-08 Thread Colvin Cowie
Images won't be included on the mailing list. You need to put them
somewhere else and link to them.

With that said, if you're switching between versions, maybe your browser
has the old UI cached? Try clearing the cache / viewing it in a private
window and see if it's any different.

On Wed, 7 Oct 2020 at 11:22, Vinay Rajput  wrote:

> Hi All,
>
> We are currently using Solr 7.3.1 in cloud mode and planning to upgrade.
> When I bootstrapped Solr 8.6.2 in my local machine and uploaded all
> necessary configs, I noticed one issue in admin UI.
>
> If I select a collection and go to files, it shows the content tree having
> all files and folders present in that collection. In Solr 8.6.2, it is
> somehow not showing the folders correctly. In my screenshot, you can see
> that velocity and xslt are the folders and we have some config files inside
> these two folders. Because of this issue, I can't click on folder nodes and
> see children nodes. I checked the network calls and it looks like we are
> getting the correct data from Solr. So, it looks like an Admin UI issue to
> me.
>
> Does anyone know if this is a* known issue* or I am missing something
> here? Has anyone noticed the similar issue?  I can confirm that It works
> fine with Solr 7.3.1.
>
> [image: image.png][image: image.png]
>
> Left image is for 8.6.2 and right image is for 7.3.1
>
> Thanks,
> Vinay
>