Re: Fatal exception in solr 1.3+ replication

2008-11-16 Thread William Pierce
Not easily no...It has occurred twice on my machine but what triggers it I 
do not know.  Mark Miller has provided some explanations for what may be 
going on in Lucene that may be causing thisCf. his last email


- Bill

--
From: Noble Paul ??? ?? [EMAIL PROTECTED]
Sent: Saturday, November 15, 2008 11:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Fatal exception in solr 1.3+ replication


Is this issue visible for consistently ? I mean are you able to
reproduce this easily?

On Fri, Nov 14, 2008 at 11:15 PM, William Pierce [EMAIL PROTECTED] 
wrote:

Folks:

I am using the nightly build of 1.3 as of Oct 23 so as to use the 
replication handler.   I am running on windows 2003 server with tomcat 
6.0.14.   Everything was running fine until I noticed that certain 
updated records were not showing up on the slave.  Further investigation 
showed me that the failures have indeed been occurring since early this 
morning with a fatal exceptionhere is a segment of the tomcat log:

 INFO: Total time taken for download : 0 secs
 Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller 
fetchLatestIndex

 INFO: Conf files are not downloaded or are in sync
 Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2 
commit

 INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
 Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler 
doSnapPull

 SEVERE: SnapPull failed
 org.apache.solr.common.SolrException: Snappull failed :
  at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
  at 
org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)

  at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
  at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)

  at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.RuntimeException: 
org.apache.lucene.store.AlreadyClosedException: this Directory is closed

  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
  at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)

  at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
  at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)

  ... 11 more
 Caused by: org.apache.lucene.store.AlreadyClosedException: this 
Directory is closed

  at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
  at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
  at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
  at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
  at 
org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
  at 
org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)

  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
  ... 14 more
 Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2 
commit


Any ideas, anyone?

-- Bill




--
--Noble Paul



Re: Fatal exception in solr 1.3+ replication

2008-11-16 Thread Mark Miller
I meant patch as in a source code patch, so I'm afraid your kind of in a 
tough spot. Thats part of the 'trunk running' risk unfortunately...


You've done it once though, so I am sure you can manage again right ? 
I'm not sure exactly what state your checkout is (though I suppose I can 
guess close from the date), which makes producing a source patch 
difficult, but essentially hiding the problem is pretty simple.


Take your check out (which I hope you have or can get again...the only 
safe way to play the trunk game) and make the following simple changes:


We need to stop using the SolrIndexSearcher constructors that take a 
String or File rather than a Directory or Reader. The key spot is in 
SolrCore:


line: approx 1010
   try {
 newestSearcher = getNewestSearcher(false);
 if (newestSearcher != null) {
   IndexReader currentReader = newestSearcher.get().getReader();
   String newIndexDir = getNewIndexDir();
   if(new File(getIndexDir()).equals(new File(newIndexDir)))  {
 IndexReader newReader = currentReader.reopen();

 if(newReader == currentReader) {
   currentReader.incRef();
 }

 tmp = new SolrIndexSearcher(this, schema, main, newReader, 
true, true);

   } else  {
 tmp = new SolrIndexSearcher(this, schema, main, newIndexDir, 
true);

   }
 } else {
   tmp = new SolrIndexSearcher(this, schema, main, 
getNewIndexDir(), true);


You can see the two lower SolrIndexSearchers are inited with a String. 
You want to init them with a Directory instead,


   tmp = new SolrIndexSearcher(this, schema, main, 
FSDirectory.getDirectory(newIndexDir), true, true);  // you will need to 
add another true param here


By passing a Directory rather than a String, the underlying IndexReaders 
will not try to close the Directory and you won't hit that error. Trunk 
no longer has this problem exposed because we now supply  Directories to 
these constructors (though for a different reason).


If your hesitant on any of this, you might try trunk and just test it 
out after looking at the changes that have been put in, or you might 
email me privately and I may be able to point you to some alternate options.


- Mark


William Pierce wrote:

Mark,

That sounds great!Good luck with the cleaning :-)

Let me know how I can get a patch --- I'd prefer not do a solr build 
from source since we are not Java savvy here:-(


- Bill

--
From: Mark Miller [EMAIL PROTECTED]
Sent: Saturday, November 15, 2008 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Fatal exception in solr 1.3+ replication

Okay, I'm fairly certain I've found it. As usual, take a walk and the 
solution pops into your head out of the blue.


It looks like Lucene's IndexReader reopen call is not very friendly 
with the FSDirectory implementation. If you call reopen and it 
returns a new IndexReader, it creates a new reference on the 
Directory - so if you reopen an IndexReader that was originally 
opened with a non Directory parameter (String or File instead), both 
Readers (the reopened one and the one your reopening on) will close 
the Directory when they close. Thats not right. Thats how we get to 0 
faster than we should. So its kind of a Lucene issue.


My guess that this is hidden in the trunk was right, because I think 
we are no longer using String, File based IndexReader opens, which 
means our IndexReaders don't attempt to close their underlying 
Directories now.


I can probably send you patch for the revision your on to hide this 
as well, but I'm already in the doghouse on cleaning right now ; ) 
The way my brain works, I'll probably be back to this later though.


- Mark


William Pierce wrote:
Trunk may actually still hide the issue (possibly), but something 
really funky seems to have gone on and I can't find it yet. Do you 
have any custom code interacting with solr?


None whatsoever...I am using out-of-the-box solr 1.3 (build of 
10/23).  I am using my C# app to http requests to my solr instance.


Is there something you want me to try at my end that might give you 
a clue? Let me know and I can try to help out.


Best,

- Bill

--
From: Mark Miller [EMAIL PROTECTED]
Sent: Saturday, November 15, 2008 10:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Fatal exception in solr 1.3+ replication

Havn't given up, but this has really got me so far. For every path, 
FSDirectory allows just one instance of FSDirectory to exist, and 
it keeps a ref count of how many have been returned from 
openDirectory for a given path. An FSDirectory will not actually be 
closed unless all references to it are released (it doesn't 
actually even do anything in close, other than drop the reference). 
So pretty much, the only way to get into trouble is to call close 
enough times to equal how many times you called openDirectory, and 
then try to use the FSDirectory 

Re: Solr Sorting, merging/weighting sort fields

2008-11-16 Thread lajkonik86

I have trouble balancing between popularity and search relevance.
The trouble is combining boost factors and a mm(minimum match) of less than
100%.
The mm leads the search to return also less relevant items.

Two conflicting main scenarios exist:
- generic category search (say something like tft) mainly i just want to
list the most popular ones
- product specific search eos 400d for instance

If i set a low boost on popularity the category search doesn't attach enough
significance to popularity
If i set a high boost on popularity the search for eos 400d ends you with
the more popular eos 450d

Anything higher than ^5 messes up the second scenario and anything below
^500 messes up the first scenario.
There are two problems in solving this:
- seemingly equally matching items return strongly different relevancy
scores, based on string length
- the effect of popularity and relevancy seems to be addition instead of a
multiplying effect.

Any tips on how to better understand these effects?


Two examples:

str name=qf
name^1.0 category^1.1 description^0.2  color^0.2
/str
str name=bf
 popularity^5.0
/str
str name=mm1/str

and 

popularity^1500
mm set to default




Walter Underwood wrote:
 
 The boost is a way to adjust the weight of that field, just like you
 adjust the weight of any other field. If the boost is dominating the
 score, reduce the weight and vice versa.
 
 wunder
 
 On 5/10/07 9:22 PM, Chris Hostetter [EMAIL PROTECTED] wrote:
 
 
 : Is this correct?  bf is a boosting function, so a function is needed
 there,
 no?
 
 : If I'm not missing someting, the ^0.5 is just a boost, and popularity
 : is just a (numeric) field.  So boosting a numeric field wouldn't make
 : sense, but appying it to a function would. Am I missing something?
 
 the function parser does the right thing when you give it a bare field
 name, from the javadocs...
 
 http://lucene.apache.org/solr/api/org/apache/solr/search/QueryParsing.html#par
 seFunction(java.lang.String,%20org.apache.solr.schema.IndexSchema)
 // Numeric fields default to correct type
 // (ie: IntFieldSource or FloatFieldSource)
 // Others use implicit ord(...) to generate numeric field value
 myfield
 
 you are correct about 0.5 being the boost, using either the _val_ hack on
 the SolrQueryParser or using he bf param of dismax the ^0.5 will be used
 as a boost on the resulting function query...
 
qt=standardq=%2Bfoo%20_val_:popularity^0.5
qt=dismaxq=foobf=popularity^0.5
 
 
 -Hoss
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-Sorting%2C-merging-weighting-sort-fields-tp10405022p20525893.html
Sent from the Solr - User mailing list archive at Nabble.com.



Build Solr to run SolrJS

2008-11-16 Thread JCodina

I downloaded solr/trunk and build it,
everything seems to work except that the VelocityResponseWriter is not in
the war file
and tomcat , gives an error of configuration when using the conf.xml of the
solrjs.
Any suggestion on how to build the solr to work with solrjs??


Thanks
Joan Codina 
-- 
View this message in context: 
http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20526644.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Build Solr to run SolrJS

2008-11-16 Thread Erik Hatcher
Joan - I'll have a look at this in the near future.  SolrJS was using  
a custom version of a VelocityResponseWriter patch, but since then I  
have committed a version of that code to the contrib/velocity area of  
Solr.  contrib/velocity probably does not work with SolrJS currently,  
but we'll get that fixed up.


Maybe it all works using the the codebase here?  http://wiki.apache.org/solr/SolrJS 



Matthias and Ryan - let's get SolrJS integrated into contrib/ 
velocity.  Any objections/reservations?


Erik




On Nov 16, 2008, at 10:59 AM, JCodina wrote:



I downloaded solr/trunk and build it,
everything seems to work except that the VelocityResponseWriter is  
not in

the war file
and tomcat , gives an error of configuration when using the conf.xml  
of the

solrjs.
Any suggestion on how to build the solr to work with solrjs??


Thanks
Joan Codina
--
View this message in context: 
http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20526644.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Build Solr to run SolrJS

2008-11-16 Thread Matthias Epheser

Erik Hatcher schrieb:
Joan - I'll have a look at this in the near future.  SolrJS was using a 
custom version of a VelocityResponseWriter patch, but since then I have 
committed a version of that code to the contrib/velocity area of Solr.  
contrib/velocity probably does not work with SolrJS currently, but we'll 
get that fixed up.




I'll also have a look at the VelocityResponseWriter from 1.3 and current trunk, 
concerning solrjs tomorrow.


Maybe it all works using the the codebase here?  
http://wiki.apache.org/solr/SolrJS


The solr.war of the testdata in solrjs trunk is a custom build that was created 
before the official 1.3.0 release. It includes the Velocity patch, so it should 
work. As mentioned, it is a custom build from august, so we should aim to make 
it work with 1.3 and trunk.




Matthias and Ryan - let's get SolrJS integrated into contrib/velocity.  
Any objections/reservations?


As SolrJS may be used without velocity at all (using eg. ClientSideWidgets), is 
it possible to put it into contrib/javascript and create a dependency to 
contrib/velocity for ServerSideWidgets?


If that's ok, I'll have a look at the directory structure and the current ant 
build.xml to make them fit into the common solr structure and build.


regards,
matthias





Erik




On Nov 16, 2008, at 10:59 AM, JCodina wrote:



I downloaded solr/trunk and build it,
everything seems to work except that the VelocityResponseWriter is not in
the war file
and tomcat , gives an error of configuration when using the conf.xml 
of the

solrjs.
Any suggestion on how to build the solr to work with solrjs??


Thanks
Joan Codina
--View this message in context: 
http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20526644.html

Sent from the Solr - User mailing list archive at Nabble.com.






Solr security

2008-11-16 Thread Erik Hatcher
I'm pondering the viability of running Solr as effectively a UI  
server... what I mean by that is having a public facing browser-based  
application hitting a Solr backend directly for JSON, XML, etc data.


I know folks are doing this (I won't name names, in case this thread  
comes up with any vulnerabilities that would effect such existing  
environments).


Let's just assume a typical deployment environment... replicated  
Solr's behind a load balancer, maybe even a caching proxy.

What known vulnerabilities are there in Solr 1.3, for example?

What I think we can get out this is a Solr deployment configuration  
suitable for direct browser access, but we're not safely there yet are  
we?  Is this an absurd goal?  Must we always have a moving piece  
between browser and data/search servers?


Thanks,
Erik



Re: Solr security

2008-11-16 Thread Ian Holsman

Erik Hatcher wrote:
I'm pondering the viability of running Solr as effectively a UI 
server... what I mean by that is having a public facing browser-based 
application hitting a Solr backend directly for JSON, XML, etc data.


I know folks are doing this (I won't name names, in case this thread 
comes up with any vulnerabilities that would effect such existing 
environments).


Let's just assume a typical deployment environment... replicated 
Solr's behind a load balancer, maybe even a caching proxy.

What known vulnerabilities are there in Solr 1.3, for example?

What I think we can get out this is a Solr deployment configuration 
suitable for direct browser access, but we're not safely there yet are 
we?  Is this an absurd goal?  Must we always have a moving piece 
between browser and data/search servers?


Thanks,
Erik




First thing I would look at is disabling write access, or writing a 
servlet that sits on top of the write handler to filter your data.


Second thing I would be concerned about is people writing DoS queries 
that bypass the cache.


so you may need to write your own custom request handler to filter out 
that kind of thing.




Re: Solr security

2008-11-16 Thread Erik Hatcher


On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote:
First thing I would look at is disabling write access, or writing a  
servlet that sits on top of the write handler to filter your data.


We can turn off all the update handlers, but how does that affect  
replication?  Can a Solr replicant be entirely read-only in the HTTP  
request sense?


Second thing I would be concerned about is people writing DoS  
queries that bypass the cache.



so you may need to write your own custom request handler to filter  
out that kind of thing.


Is this a concern that can be punted to what you'd naturally be  
putting in front of Solr anyway or a proxy tier that can have DoS  
blocking rules?  I mean, if you're deploying a Struts that hits Solr  
under the covers, how do you prevent against DoS on that?  A malicious  
user could keep sending queries indirectly to a Solr through a whole  
lot of public apps now.  In other words, another tier in front of Solr  
doesn't add (much) to DoS protection to an underlying Solr, no?


Erik



Re: Solr security

2008-11-16 Thread Ryan McKinley
I'm not totally sure what you are suggesting.  Is there a general way  
people deal with security and search?


I'm assuming we already have good ways (better ways) to make sure  
people are authorized/logged in etc.  What do you imagine solr  
security would add?


FYI, I used to have a custom RequstHandler that got the user principal  
from the HttpServletRequest (I have a custom SolrDispatchFilter that  
adds that to the context) and then augments the query with a filter  
that limits to stuff that user can see.  I replaced all that with a  
something that adds the filter to the Solrj query.


Assuming it is safe and all that, what do you think we could add  
that would be general enough?


ryan


On Nov 16, 2008, at 5:12 PM, Erik Hatcher wrote:

I'm pondering the viability of running Solr as effectively a UI  
server... what I mean by that is having a public facing browser- 
based application hitting a Solr backend directly for JSON, XML, etc  
data.


I know folks are doing this (I won't name names, in case this thread  
comes up with any vulnerabilities that would effect such existing  
environments).


Let's just assume a typical deployment environment... replicated  
Solr's behind a load balancer, maybe even a caching proxy.

What known vulnerabilities are there in Solr 1.3, for example?

What I think we can get out this is a Solr deployment configuration  
suitable for direct browser access, but we're not safely there yet  
are we?  Is this an absurd goal?  Must we always have a moving piece  
between browser and data/search servers?


Thanks,
Erik





Re: Solr security

2008-11-16 Thread Mark Miller
Plus, it's just too big a can of worms for solr to handle. You could  
protect up to a small point, but a real ddos attack is not going to be  
defended against by solr. At best we could put in 'kiddie' protection  
against.


- Mark


On Nov 16, 2008, at 5:51 PM, Erik Hatcher [EMAIL PROTECTED]  
wrote:




On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote:
First thing I would look at is disabling write access, or writing a  
servlet that sits on top of the write handler to filter your data.


We can turn off all the update handlers, but how does that affect  
replication?  Can a Solr replicant be entirely read-only in the HTTP  
request sense?


Second thing I would be concerned about is people writing DoS  
queries that bypass the cache.



so you may need to write your own custom request handler to filter  
out that kind of thing.


Is this a concern that can be punted to what you'd naturally be  
putting in front of Solr anyway or a proxy tier that can have DoS  
blocking rules?  I mean, if you're deploying a Struts that hits Solr  
under the covers, how do you prevent against DoS on that? A  
malicious user could keep sending queries indirectly to a Solr  
through a whole lot of public apps now.  In other words, another  
tier in front of Solr doesn't add (much) to DoS protection to an  
underlying Solr, no?


   Erik



Re: Solr security

2008-11-16 Thread Erik Hatcher
What about SolrJS?   Isn't it designed to hit a Solr directly?  (Sure,  
as long as the response looked like Solr response, it could have come  
through some magic 'security' tier).


Erik

On Nov 16, 2008, at 5:54 PM, Ryan McKinley wrote:
I'm not totally sure what you are suggesting.  Is there a general  
way people deal with security and search?


I'm assuming we already have good ways (better ways) to make sure  
people are authorized/logged in etc.  What do you imagine solr  
security would add?


FYI, I used to have a custom RequstHandler that got the user  
principal from the HttpServletRequest (I have a custom  
SolrDispatchFilter that adds that to the context) and then augments  
the query with a filter that limits to stuff that user can see.  I  
replaced all that with a something that adds the filter to the Solrj  
query.


Assuming it is safe and all that, what do you think we could add  
that would be general enough?


ryan


On Nov 16, 2008, at 5:12 PM, Erik Hatcher wrote:

I'm pondering the viability of running Solr as effectively a UI  
server... what I mean by that is having a public facing browser- 
based application hitting a Solr backend directly for JSON, XML,  
etc data.


I know folks are doing this (I won't name names, in case this  
thread comes up with any vulnerabilities that would effect such  
existing environments).


Let's just assume a typical deployment environment... replicated  
Solr's behind a load balancer, maybe even a caching proxy.

What known vulnerabilities are there in Solr 1.3, for example?

What I think we can get out this is a Solr deployment configuration  
suitable for direct browser access, but we're not safely there yet  
are we?  Is this an absurd goal?  Must we always have a moving  
piece between browser and data/search servers?


Thanks,
Erik





Re: Solr security

2008-11-16 Thread Ian Holsman

Erik Hatcher wrote:


On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote:
First thing I would look at is disabling write access, or writing a 
servlet that sits on top of the write handler to filter your data.


We can turn off all the update handlers, but how does that affect 
replication?  Can a Solr replicant be entirely read-only in the HTTP 
request sense?


Second thing I would be concerned about is people writing DoS queries 
that bypass the cache.



so you may need to write your own custom request handler to filter 
out that kind of thing.


Is this a concern that can be punted to what you'd naturally be 
putting in front of Solr anyway or a proxy tier that can have DoS 
blocking rules?  I mean, if you're deploying a Struts that hits Solr 
under the covers, how do you prevent against DoS on that?  A malicious 
user could keep sending queries indirectly to a Solr through a whole 
lot of public apps now.  In other words, another tier in front of Solr 
doesn't add (much) to DoS protection to an underlying Solr, no?


famous last words and all, but you shouldn't be just passing what a user 
types directly into a application should you?


I'd be parsing out wildcards, boosts, and fuzzy searches (or at least 
thinking about the effects).
I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a 
regular query.


but they don't let me into design meetings any more ;(

Erik






Re: Solr security

2008-11-16 Thread Walter Underwood
Agreed, it is pretty easy to create a large variety of denial
of service attacks with sorts, wildcards, requesting a large
number of results, or a page deep in the results.

We have protected against several different DoS problems
in our front-end code.

wunder

On 11/16/08 3:12 PM, Ian Holsman [EMAIL PROTECTED] wrote:

 Erik Hatcher wrote:
 
 On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote:
 First thing I would look at is disabling write access, or writing a
 servlet that sits on top of the write handler to filter your data.
 
 We can turn off all the update handlers, but how does that affect
 replication?  Can a Solr replicant be entirely read-only in the HTTP
 request sense?
 
 Second thing I would be concerned about is people writing DoS queries
 that bypass the cache.
 
 
 so you may need to write your own custom request handler to filter
 out that kind of thing.
 
 Is this a concern that can be punted to what you'd naturally be
 putting in front of Solr anyway or a proxy tier that can have DoS
 blocking rules?  I mean, if you're deploying a Struts that hits Solr
 under the covers, how do you prevent against DoS on that?  A malicious
 user could keep sending queries indirectly to a Solr through a whole
 lot of public apps now.  In other words, another tier in front of Solr
 doesn't add (much) to DoS protection to an underlying Solr, no?
 
 famous last words and all, but you shouldn't be just passing what a user
 types directly into a application should you?
 
 I'd be parsing out wildcards, boosts, and fuzzy searches (or at least
 thinking about the effects).
 I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a
 regular query.
 
 but they don't let me into design meetings any more ;(
 Erik
 
 
 



Re: Solr security

2008-11-16 Thread Ryan McKinley
my assumption with solrjs is that you are hitting read-only solr  
servers that you don't mind if people query directly.  It would not be  
appropriate for something where you don't want people (who really  
care) to know you are running solr and could execute arbitrary queries.


Since it is an example, I don't mind leaving the /admin interface open  
on:

http://example.solrstuff.org/solrjs/admin/
but /update has a password:
http://example.solrstuff.org/solrjs/update

I have said in the past I like the idea of a read-only flag in solr  
config that would throw an error if you try to do something with the  
UpdateHandler.  However there are other ways to do that also.


ryan


On Nov 16, 2008, at 6:03 PM, Erik Hatcher wrote:

What about SolrJS?   Isn't it designed to hit a Solr directly?   
(Sure, as long as the response looked like Solr response, it could  
have come through some magic 'security' tier).


Erik

On Nov 16, 2008, at 5:54 PM, Ryan McKinley wrote:
I'm not totally sure what you are suggesting.  Is there a general  
way people deal with security and search?


I'm assuming we already have good ways (better ways) to make sure  
people are authorized/logged in etc.  What do you imagine solr  
security would add?


FYI, I used to have a custom RequstHandler that got the user  
principal from the HttpServletRequest (I have a custom  
SolrDispatchFilter that adds that to the context) and then augments  
the query with a filter that limits to stuff that user can see.  I  
replaced all that with a something that adds the filter to the  
Solrj query.


Assuming it is safe and all that, what do you think we could add  
that would be general enough?


ryan


On Nov 16, 2008, at 5:12 PM, Erik Hatcher wrote:

I'm pondering the viability of running Solr as effectively a UI  
server... what I mean by that is having a public facing browser- 
based application hitting a Solr backend directly for JSON, XML,  
etc data.


I know folks are doing this (I won't name names, in case this  
thread comes up with any vulnerabilities that would effect such  
existing environments).


Let's just assume a typical deployment environment... replicated  
Solr's behind a load balancer, maybe even a caching proxy.

What known vulnerabilities are there in Solr 1.3, for example?

What I think we can get out this is a Solr deployment  
configuration suitable for direct browser access, but we're not  
safely there yet are we?  Is this an absurd goal?  Must we always  
have a moving piece between browser and data/search servers?


Thanks,
Erik







Re: Solr security

2008-11-16 Thread Ryan McKinley


I'd be parsing out wildcards, boosts, and fuzzy searches (or at  
least thinking about the effects).
I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a  
regular query.




Even if you leave the solr instance public, you can still limit  
grossly inefficent params by forcing things to use  the dismax query  
parser.  You can use invariants to lock what options are available.


I suppose we don't have a way to say the *maximum* number of rows you  
can request is 100 (or something like that)


ryan


Re: Solr security

2008-11-16 Thread Walter Underwood
Limiting the maximum number of rows doesn't work, because
they can request rows 2-20100. --wunder

On 11/16/08 3:27 PM, Ryan McKinley [EMAIL PROTECTED] wrote:

 
 I'd be parsing out wildcards, boosts, and fuzzy searches (or at
 least thinking about the effects).
 I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a
 regular query.
 
 
 Even if you leave the solr instance public, you can still limit
 grossly inefficent params by forcing things to use  the dismax query
 parser.  You can use invariants to lock what options are available.
 
 I suppose we don't have a way to say the *maximum* number of rows you
 can request is 100 (or something like that)
 
 ryan