[CVE-2020-13941] Apache Solr information disclosure vulnerability

2020-08-14 Thread David Smiley
Reported in SOLR-14515 (private) and fixed in SOLR-14561 (public), released
in Solr version 8.6.0.
The Replication handler (
https://lucene.apache.org/solr/guide/8_6/index-replication.html#http-api-commands-for-the-replicationhandler)
allows commands backup, restore and deleteBackup. Each of these take a
location parameter, which was not validated, i.e you could read/write to
any location the solr user can access.

On a windows system SMB paths such as \\10.0.0.99\share\folder may also be
used, leading to:
* The possibility of restoring another SolrCore from a server on the
network (or mounted remote file system) may lead to:
** Exposing search index data that the attacker should otherwise not have
access to
** Replacing the index data entirely by loading it from a remote file
system that the attacker controls

* Launching SMB attacks which may result in:
** The exfiltration of sensitive data such as OS user hashes (NTLM/LM
hashes),
** In case of misconfigured systems, SMB Relay Attacks which can lead to
user impersonation on SMB Shares or, in a worse-case scenario, Remote Code
Execution

The solution implemented to address these issues was to:
* Restrict the location parameter to trusted paths
* Prevent remote connection when using Windows UNC Paths

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


[ANNOUNCE] Apache Solr 8.6.1 released

2020-08-14 Thread Houston Putman
The Lucene PMC is pleased to announce the release of Apache Solr 8.6.1.

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document handling, and geospatial search. Solr is highly
scalable, providing fault tolerant distributed search and indexing, and
powers the search and navigation features of many of the world's largest
internet sites.

Solr 8.6.1 is available for immediate download at:

  

### Solr 8.6.1 Release Highlights:

 * SOLR-14665: Revert SOLR-12845 adding of default autoscaling cluster
policy, due to performance issues
 * SOLR-14671: Parsing dynamic ZK config sometimes cause
NumberFormatException

Please refer to the Upgrade Notes in the Solr Ref Guide for information on
upgrading from previous Solr versions:

  

Please read CHANGES.txt for a full list of bugfixes:

  

Solr 8.6.1 also includes bugfixes in the corresponding Apache Lucene
release:

  

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror.
This also applies to Maven access.


[ANNOUNCE] Apache Lucene 8.6.1 released

2020-08-14 Thread Houston Putman
The Lucene PMC is pleased to announce the release of Apache Lucene 8.6.1.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:

  

### Lucene 8.6.1 Release Highlights:

 * LUCENE-9443: The UnifiedHighlighter was closing the underlying reader
when there were multiple term-vector fields.

Please read CHANGES.txt for a full list of changes:

  

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases. It is possible that the mirror you are using may not
have
replicated the release yet. If that is the case, please try another mirror.
This also applies to Maven access.


Re: When zero offsets are not bad - a.k.a. multi-token synonyms yet again

2020-08-14 Thread Roman Chyla
Hi Mike,

Thanks for the question! And sorry for the delay, I haven't managed to
get to it yesterday. I have generated better output, marked with (*)
where it currently fails the first time and also included one extra
case to illustrate the PositionLength attribute.

assertU(adoc("id", "603", "bibcode", "xx603",
"title", "THE HUBBLE constant: a summary of the hubble space
telescope program"));


term=hubble posInc=2 posLen=1 type=word offsetStart=4 offsetEnd=10
term=acr::hubble posInc=0 posLen=1 type=ACRONYM offsetStart=4 offsetEnd=10
term=constant posInc=1 posLen=1 type=word offsetStart=11 offsetEnd=20
term=summary posInc=1 posLen=1 type=word offsetStart=23 offsetEnd=30
term=hubble posInc=1 posLen=1 type=word offsetStart=38 offsetEnd=44
term=syn::hubble space telescope posInc=0 posLen=3 type=SYNONYM
offsetStart=38 offsetEnd=60
term=syn::hst posInc=0 posLen=3 type=SYNONYM offsetStart=38 offsetEnd=60
term=acr::hst posInc=0 posLen=3 type=ACRONYM offsetStart=38 offsetEnd=60
* term=space posInc=1 posLen=1 type=word offsetStart=45 offsetEnd=50
term=telescope posInc=1 posLen=1 type=word offsetStart=51 offsetEnd=60
term=program posInc=1 posLen=1 type=word offsetStart=61 offsetEnd=68

* - fails because of offsetEnd < lastToken.offsetEnd; If reordered
(the multi-token synonym emitted as a last token) it would fail as
well, because of the check for lastToken.beginOffset <
currentToken.beginOffset. Basically, any reordering would result in a
failure (unless offsets are trimmed).



The following example has additional twist because of `space-time`;
the tokenizer first splits the word and generate two new tokens --
those alternative tokens are then used to find synonyms (space ==
universe)

assertU(adoc("id", "605", "bibcode", "xx604",
"title", "MIT and anti de sitter space-time"));


term=xx604 posInc=1 posLen=1 type=word offsetStart=0 offsetEnd=13
term=mit posInc=1 posLen=1 type=word offsetStart=0 offsetEnd=3
term=acr::mit posInc=0 posLen=1 type=ACRONYM offsetStart=0 offsetEnd=3
term=syn::massachusetts institute of technology posInc=0 posLen=1
type=SYNONYM offsetStart=0 offsetEnd=3
term=syn::mit posInc=0 posLen=1 type=SYNONYM offsetStart=0 offsetEnd=3
term=acr::mit posInc=0 posLen=1 type=ACRONYM offsetStart=0 offsetEnd=3
term=anti posInc=1 posLen=1 type=word offsetStart=8 offsetEnd=12
term=syn::ads posInc=0 posLen=4 type=SYNONYM offsetStart=8 offsetEnd=28
term=acr::ads posInc=0 posLen=4 type=ACRONYM offsetStart=8 offsetEnd=28
term=syn::anti de sitter space posInc=0 posLen=4 type=SYNONYM
offsetStart=8 offsetEnd=28
term=syn::antidesitter spacetime posInc=0 posLen=4 type=SYNONYM
offsetStart=8 offsetEnd=28
term=syn::antidesitter space posInc=0 posLen=4 type=SYNONYM
offsetStart=8 offsetEnd=28
* term=de posInc=1 posLen=1 type=word offsetStart=13 offsetEnd=15
term=sitter posInc=1 posLen=1 type=word offsetStart=16 offsetEnd=22
term=space posInc=1 posLen=1 type=word offsetStart=23 offsetEnd=28
term=syn::universe posInc=0 posLen=1 type=SYNONYM offsetStart=23 offsetEnd=28
term=time posInc=1 posLen=1 type=word offsetStart=29 offsetEnd=33
term=spacetime posInc=0 posLen=1 type=word offsetStart=23 offsetEnd=33

So far, all of these cases could be handled with the new position
length attribute. But let us look at a case where that would fail too.

assertU(adoc("id", "606", "bibcode", "xx604",
"title", "Massachusetts Institute of Technology and
antidesitter space-time"));


term=massachusetts posInc=1 posLen=1 type=word offsetStart=0 offsetEnd=12
term=syn::massachusetts institute of technology posInc=0 posLen=4
type=SYNONYM offsetStart=0 offsetEnd=36
term=syn::mit posInc=0 posLen=4 type=SYNONYM offsetStart=0 offsetEnd=36
term=acr::mit posInc=0 posLen=4 type=ACRONYM offsetStart=0 offsetEnd=36
term=institute posInc=1 posLen=1 type=word offsetStart=13 offsetEnd=22
term=technology posInc=1 posLen=1 type=word offsetStart=26 offsetEnd=36
term=antidesitter posInc=1 posLen=1 type=word offsetStart=41 offsetEnd=53
term=syn::ads posInc=0 posLen=2 type=SYNONYM offsetStart=41 offsetEnd=59
term=acr::ads posInc=0 posLen=2 type=ACRONYM offsetStart=41 offsetEnd=59
term=syn::anti de sitter space posInc=0 posLen=2 type=SYNONYM
offsetStart=41 offsetEnd=59
term=syn::antidesitter spacetime posInc=0 posLen=2 type=SYNONYM
offsetStart=41 offsetEnd=59
term=syn::antidesitter space posInc=0 posLen=2 type=SYNONYM
offsetStart=41 offsetEnd=59
term=space posInc=1 posLen=1 type=word offsetStart=54 offsetEnd=59
term=syn::universe posInc=0 posLen=1 type=SYNONYM offsetStart=54 offsetEnd=59
term=time posInc=1 posLen=1 type=word offsetStart=60 offsetEnd=64
term=spacetime posInc=0 posLen=1 type=word offsetStart=54 offsetEnd=64

Notice the posLen=4 of MIT; it would cover tokens `massachusetts
institute technology antidesitter` while offsets are still correct.

This would, I think, affect not only highlighting, but also search
(which is, at least for us, more important). But I can imagine that in
more NLP-related 

Re: One tlog remaining after commit after upgrade to 8.6.0?

2020-08-14 Thread Erick Erickson
Does it rotate? I.e. is there a new one after every commit?

If you have steps to repro I can take a look. I’ve also been fooled by
having ZK_HOST defined when I _think_ I’m running standalone
that’s caused some head-scratching…

Erick

> On Aug 14, 2020, at 4:41 AM, Dawid Weiss  wrote:
> 
> Hmm... I've upgraded a Solr instance (not a cloud one) from 7.x to
> 8.6.0 and the same code always produces one remaining unflushable tlog
> file (external hard commit passes but tlog remains open and
> unflushed).
> 
> Is there anything that's changed and that I'm missing?
> 
> Dawid
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Munendra SN to the PMC

2020-08-14 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Welcome Munendra!

From: dev@lucene.apache.org At: 08/07/20 02:38:27To:  dev@lucene.apache.org
Subject: Re: Welcome Munendra SN to the PMC

Congrats Munendra!


-Yonik


On Sun, Aug 2, 2020 at 7:20 PM Ishan Chattopadhyaya  
wrote:

I am pleased to announce that Munendra SN has accepted the PMC's invitation to 
join.

Congratulations and welcome, Munendra!




Re: Welcome Gus Heck to the PMC

2020-08-14 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Welcome Gus!

From: dev@lucene.apache.org At: 08/07/20 02:38:58To:  dev@lucene.apache.org
Subject: Re: Welcome Gus Heck to the PMC

Congrats Gus!

-Yonik

On Sun, Aug 2, 2020 at 7:21 PM Ishan Chattopadhyaya  
wrote:

I am pleased to announce that Gus Heck has accepted the PMC's invitation to 
join.

Congratulations and welcome, Gus!




Re: Welcome Namgyu Kim to the PMC

2020-08-14 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Welcome Namgyu!

From: dev@lucene.apache.org At: 08/07/20 02:39:56To:  dev@lucene.apache.org
Subject: Re: Welcome Namgyu Kim to the PMC

Congrats   Namgyu!
 
-Yonik


On Sun, Aug 2, 2020 at 7:19 PM Ishan Chattopadhyaya  
wrote:

I am pleased to announce that Namgyu Kim has accepted the PMC's invitation to 
join.

Congratulations and welcome, Namgyu!




Re: Survey on ManagedResources feature

2020-08-14 Thread Jan Høydahl
I imagine that some users have build custom UIs to manage stopwords or synonyms 
over REST instead of having to copy files to Solr or Zookeeper.
Question is whether to try improve the security of the APIs, or disable them by 
default and document the limitations related to using them, which could be 
tradeoff for users to make until we come up with a better set of APIs to 
replace it?

Jan

> 14. aug. 2020 kl. 09:32 skrev Matthias Krueger :
> 
> 
> 
> As authentication is plugged into the SolrDispatchFilter I would assume that 
> you would need to be authenticated to read/write Managed Resources but no 
> authorization is checked (i.e. any authenticated user can read/write them), 
> correct?
> 
> Anyway, I came across Managed Resources in at least two scenarios:
> 
> The LTR plugin is using them for updating model/features.
> I use MangedResource#StorageIO and its implementations as a convenient way to 
> abstract away the underlying config storage when creating plugins that need 
> to support both, SolrCloud and Solr Standalone.
> IMO an abstraction that allows distributing configuration (ML models, 
> configuration snippets, external file fields...) that exceeds the typical ZK 
> size limits to SolrCloud while also supporting Solr Standalone would be nice 
> to have.
> 
> Matt
> 
> 
> 
> On 12.08.20 02:08, Noble Paul wrote:
>> The end point is served by restlet. So, your rules are not going to be 
>> honored. The rules work only if it is served by a Solr request handler
>> 
>> On Wed, Aug 12, 2020, 12:46 AM Jason Gerlowski > > wrote:
>> Hey Noble,
>> 
>> Can you explain what you mean when you say it's not secured?  Just for
>> those of us who haven't been following the discussion so far?  On the
>> surface of things users taking advantage of our RuleBasedAuth plugin
>> can secure this API like they can any other HTTP API.  Or are you
>> talking about some other security aspect here?
>> 
>> Jason
>> 
>> On Tue, Aug 11, 2020 at 9:55 AM Noble Paul > > wrote:
>> >
>> > Hi all,
>> > The end-point for Managed resources is not secured. So it needs to be
>> > fixed/eliminated.
>> >
>> > I would like to know what is the level of adoption for that feature
>> > and if it is a critical feature for users.
>> >
>> > Another possibility is to offer a replacement for the feature using a
>> > different API
>> >
>> > Your feedback will help us decide on what a potential solution should be
>> >
>> > --
>> > -
>> > Noble Paul



Re: Survey on ManagedResources feature

2020-08-14 Thread Matthias Krueger

As authentication is plugged into the SolrDispatchFilter I would assume
that you would need to be authenticated to read/write Managed Resources
but no authorization is checked (i.e. any authenticated user can
read/write them), correct?

Anyway, I came across Managed Resources in at least two scenarios:

  * The LTR plugin is using them for updating model/features.
  * I use MangedResource#StorageIO and its implementations as a
convenient way to abstract away the underlying config storage when
creating plugins that need to support both, SolrCloud and Solr
Standalone.

IMO an abstraction that allows distributing configuration (ML models,
configuration snippets, external file fields...) that exceeds the
typical ZK size limits to SolrCloud while also supporting Solr
Standalone would be nice to have.

Matt


On 12.08.20 02:08, Noble Paul wrote:
> The end point is served by restlet. So, your rules are not going to be
> honored. The rules work only if it is served by a Solr request handler
>
> On Wed, Aug 12, 2020, 12:46 AM Jason Gerlowski  > wrote:
>
> Hey Noble,
>
> Can you explain what you mean when you say it's not secured?  Just for
> those of us who haven't been following the discussion so far?  On the
> surface of things users taking advantage of our RuleBasedAuth plugin
> can secure this API like they can any other HTTP API.  Or are you
> talking about some other security aspect here?
>
> Jason
>
> On Tue, Aug 11, 2020 at 9:55 AM Noble Paul  > wrote:
> >
> > Hi all,
> > The end-point for Managed resources is not secured. So it needs
> to be
> > fixed/eliminated.
> >
> > I would like to know what is the level of adoption for that feature
> > and if it is a critical feature for users.
> >
> > Another possibility is to offer a replacement for the feature
> using a
> > different API
> >
> > Your feedback will help us decide on what a potential solution
> should be
> >
> > --
> > -
> > Noble Paul
>


One tlog remaining after commit after upgrade to 8.6.0?

2020-08-14 Thread Dawid Weiss
Hmm... I've upgraded a Solr instance (not a cloud one) from 7.x to
8.6.0 and the same code always produces one remaining unflushable tlog
file (external hard commit passes but tlog remains open and
unflushed).

Is there anything that's changed and that I'm missing?

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org