RE: logging revisited...

2008-12-04 Thread Will Johnson
To a certain extent SLF4j makes this decision a fairly small one, namely
what API do you want to code to inside SOLR and what jars do you want to
ship as a part of the distribution.  It doesn't really matter if you pick
commons-logging, log4j or slf4j; all have drop in replacements via SLF4j.
They also have one for java.util.logging however it requires custom code to
activate since you can't replace java.* classes.  End users get to do pretty
much whatever they want as far as logging goes if you use SLF4j.

SLF4j has also updated their 'legacy' page since the last time I looked
which was the ~last time this came up:

http://www.slf4j.org/legacy.html

We choose to code against slf4j APIs as it seemed like it was where things
were going (including solr) and gave us and our customers the ability to
switch to something else with minimal effort.  We also ship log4j+config
jars by default because it had the richest config/appender set at the time
however the logback project seems like it might be catching up.  (good thing
we can switch with no code changes)

- will



-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 04, 2008 4:44 PM
To: solr-dev@lucene.apache.org
Subject: logging revisited...

While I'm on a roll tossing stuff out there

Since SOLR-560, solr depends on SLF4j as the logging interface.   
However since we also depend on HttpClient we *also* depend on commons- 
logging.  This is a strange.  Our maven artifacts now depend on two  
logging frameworks!

However the good folks at SLF4j have a nice solution -- a drop in  
replacement for commons-logging that uses slf4j.

HttpClient discussed switching to SLF4j for version 4.  They decided  
not to because the slfj4 drop-in replacement gives their users even  
more options.  In Droids we had the same discussion, and now use  
commons-logging API.

So, with that in mind I think we should consider using the commons- 
logging API and shipping the .war file with the slf4j drop in  
replacement.  The behavior will be identical and their will be one  
fewer libraries.  The loss is the potential to use some of slf4j's  
more advanced logging features, but I don't see us taking advantage of  
that anyway.

ryan












[jira] Updated: (SOLR-560) Convert JDK logging to SLF4J

2008-05-16 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-560:
--

Attachment: SOLR-560-slf4j.patch

patch updated for the latest trunk.  i also tested that it works with slf4j 
redirecting to log4j.  

> Convert JDK logging to SLF4J
> 
>
> Key: SOLR-560
> URL: https://issues.apache.org/jira/browse/SOLR-560
> Project: Solr
>  Issue Type: Wish
>Reporter: Ryan McKinley
> Fix For: 1.3
>
> Attachments: slf4j-api-1.5.0.jar, slf4j-jdk14-1.5.0.jar, 
> SOLR-560-slf4j.patch, SOLR-560-slf4j.patch, SOLR-560-slf4j.patch
>
>
> After lots of discussion, we should consider using SLF4j to enable more 
> flexibility in logging configuration.
> See:
> http://www.nabble.com/Solr-Logging-td16836646.html
> http://www.nabble.com/logging-through-log4j-td13747253.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Solr Logging

2008-05-05 Thread Will Johnson
A little late to the email party but...

[   ] Keep solr logging as is. (JUL)
[ X ] Convert solr logging to SLF4J

And SOLR-560 looks good too.

- will





RE: Solr Logging

2008-04-22 Thread Will Johnson
>If you mean "i have to write code to create a logging implementation" then 
>yes ... that is true ... someone, somewhere, has to write an 
?implementation of the JDK Logging API in order for you to use that 
>implentation -- and if you don't like any of the other implentations out 
>there, then you might have to write your own.  :)

Correct, but there are a number of already existing frameworks out there
that already do all of this for you, most of them even let you pick your
underlying logger so if you've already written a fancy JUL "rotating,
parsing and email me when things get bad" handler then you can still use it.
I do agree that commons logging is a bit 'off' to say the least and many
projects including Jetty6 are moving to SLF4j.  Also, if solr as an
application is using Solrj under the hood for federation it would seem that
solr is already using 2 different logging mechanisms.  For consistency sake
we should consolidate on one single configuration mechanism.  It would seem
that one of the following would make sense:

* change solr4j to be JUL based.  I think you already said would be bad
since it's a library and should not impose logging choices
* change solr to be commons logging based.  I agree it's a bit awkward with
all the classloading but it is a ~standard to a large extent
* change both to be 'framework XYZ' based.  Fyi: slf4j already has a creepy
little migrator tool that might be of use.

In the end, I already have my shim that does the necessary translation but
it's nowhere near a general solution that the log4j community could benefit
from.  As long as things are consistent and easy to configure to get
standard logging functionality I'm happy.  

- will

Not to pimp out slf4j too much but the base implementation is only ~22k or
about the same size as commons-csv which is also a dependency.  



RE: Solr Logging

2008-04-22 Thread Will Johnson
(putting on flame suit)

I'd be in favor seeing is how I spent a good bit of time 2 months ago
writing JUL handlers and log managers to forward log messages to our logging
framework (log4j).  Pretty much any alternative (Commons, Log4j, SLF4J) is
better since all of them allow you to _configure_ your underlying
implementation (including JUL if that's what you're into).  JUL on the other
hand ~requires you to write code to switch logging implementations or even
do basic things like rotate log files.  SLF4J seems especially slim and nice
these days but really anything is better than JUL.  

If others are really serious about it, I'd be happy to help the cause.  It
should be a fairly quick refactor and we could leave the default configured
logger as JUL via whatever framework we end up going with

- will

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 22, 2008 11:48 AM
To: solr-dev@lucene.apache.org
Subject: Solr Logging

Anyone have good tips on working w/ java.util.logging (JUL)?  For one,  
the configuration seems to be per JVM, which isn't all that useful in  
a webapp environment.
http://www.crazysquirrel.com/computing/java/logging.jspx 
  has some tips for Tomcat, but I am using Jetty.  Not too mention, it  
seems, that if one wants to implement their own Handler, they have to  
somehow figure out how to get it in the right classloader, since the  
JVM classloader can't seem to find it if it is packaged in a WAR.

I know logging is sometimes a religious debate, but would others  
consider a patch that switched Solr to use log4j?  Or, commons- 
logging?  I just don't think JUL is up to snuff when it comes to  
logging.  It's a PITA to configure, is not flexible, doesn't play nice  
with other logging systems and, all in all, just seems like crappy  
design by committee where the lowest common denominator won out.

The switch is quite painless, and the former offers a lot more  
flexibility, while the latter allows one to plugin whatever they see  
fit.  I will work up a patch so people can at least see the options.


Cheers,
Grant



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-19 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570319#action_12570319
 ] 

Will Johnson commented on SOLR-342:
---

the new solr with the new lucene did the trick.  i was made the mistake of 
using the 2.3 tag instead of the branch before which was why i still saw the 
problem.

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-15 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569408#action_12569408
 ] 

Will Johnson commented on SOLR-342:
---

i switched to the lucene 2.3 branch, updated (and confirmed that yonik's 1 line 
change was in place), reran the tests and still saw the same problem.  if i 
missed something please let me know.

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-10 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567508#action_12567508
 ] 

Will Johnson commented on SOLR-342:
---

we are doing multi-threaded indexing and searching while indexing however the 
'bad' results come back after the indexing run is finished and the index itself 
is static.

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567235#action_12567235
 ] 

Will Johnson commented on SOLR-342:
---

we're using SolrCore in terms of:

core = new SolrCore("foo", dataDir, solrConfig, solrSchema);
UpdateHandler handler = core.getUpdateHandler();
updateHandler.addDoc(command);

which is a bit more low level than normal however when we flipped back to solr 
trunk + lucene 2.3 everything was fine so it leads me to belive that we are ok 
in that respect.

i was going to try and reproduce with lucene directly also but that too is a 
bit outside the scope of what i have time for at the moment.  

and we're not getting any exceptions, just bad search results.

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567218#action_12567218
 ] 

Will Johnson commented on SOLR-342:
---

we're not using parallel reader but we are using direct core access instead of 
going over http.  as for doc size, we're indexing wikipedia but creating 
anumber of extra fields.  they are only large in comparison to any of the 
'large volume' tests i've seen in most of the solr and lucene tests.  

- will

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567198#action_12567198
 ] 

Will Johnson commented on SOLR-342:
---

we have: 

10 
64 
2147483647 

and i'm working on a unit test but just adding a few terms per doc doesnt seem 
to trigger it, at least not 'quickly.'


> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567147#action_12567147
 ] 

Will Johnson commented on SOLR-342:
---

patched solr + lucene trunk is stil broken.  if anyone has any pointers for 
ways to coax this problem to happen before we get 20-30k large docs in the 
system let me know and we can start working on a unit test, otherwise it's 
going to take a while to reproduce anything.

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567099#action_12567099
 ] 

Will Johnson commented on SOLR-342:
---

I think we're running into a very serious issue with trunk + this patch.  
either the document summaries are not matched or the overall matching is 
'wrong'.  i did find this in the lucene jira: LUCENE-994 

"Note that these changes will break users of ParallelReader because the
parallel indices will no longer have matching docIDs. Such users need
to switch IndexWriter back to flushing by doc count, and switch the
MergePolicy back to LogDocMergePolicy. It's likely also necessary to
switch the MergeScheduler back to SerialMergeScheduler to ensure
deterministic docID assignment."

we're seeing rather consistent bad results but only after 20-30k documents and 
multiple commits and wondering if anyone else is seeing anything.  i've 
verified that the results are bad even though luke which would seem to remove 
the search side of hte solr equation.   the basic test case is to search for 
title:foo and get back documents that only have title:bar.  we're going to 
start on a unit test but give the document counts and the corpus we're testing 
against it may be a while so i thought i'd ask to see if anyone had any hints.

removing this patch seems to remove the issue so i doesn't appear to be a 
lucene problem



> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2007-12-26 Thread Will Johnson (JIRA)
XmlUpdateRequestHandler bad documents mid batch aborts rest of batch


 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson


Has anyone run into the problem of handling bad documents / failures mid batch. 
 Ie:


  
1
  
  
2
I_AM_A_BAD_DATE
  
  
3
  


Right now solr adds the first doc and then aborts.  It would seem like it 
should either fail the entire batch or log a message/return a code and then 
continue on to add doc 3.  Option 1 would seem to be much harder to accomplish 
and possibly require more memory while Option 2 would require more information 
to come back from the API.  I'm about to dig into this but I thought I'd ask to 
see if anyone had any suggestions, thoughts or comments.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Resource contention problem in Solrj

2007-12-18 Thread Will Johnson
Fyi:  the CommonsHttpSolrServer already has method to do all of those
things:

  /** set connectionTimeout on the underlying
MultiThreadedHttpConnectionManager */
  public void setConnectionTimeout(int timeout) {
_connectionManager.getParams().setConnectionTimeout(timeout);
  }
  
  /** set maxConnectionsPerHost on the underlying
MultiThreadedHttpConnectionManager */
  public void setDefaultMaxConnectionsPerHost(int connections) {
 
_connectionManager.getParams().setDefaultMaxConnectionsPerHost(connections);
  }
  
  /** set maxTotalConnection on the underlying
MultiThreadedHttpConnectionManager */
  public void setMaxTotalConnections(int connections) {
_connectionManager.getParams().setMaxTotalConnections(connections);
  }


You can also get the underlying connection factory if you want to do other
crazier stuff.

  public MultiThreadedHttpConnectionManager getConnectionManager() {
return _connectionManager;
  }



- will

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Monday, December 17, 2007 9:18 PM
To: solr-dev@lucene.apache.org
Subject: Re: Resource contention problem in Solrj

Excellent!  Thanks for diagnosing this!
-Yonik

On Dec 17, 2007 9:00 PM, climbingrose <[EMAIL PROTECTED]> wrote:
> There seems to be resource contention problem with Solrj under load. To
> reproduce the problem: set up a sample webapp with solrj connect to a HTTP
> Solr instance and hammer the webapp with Apache ab (say 10 concurrent
> connection with 100 requests). You'll notice that the webapp's servlet
> container quickly consumes 100% CPU and stays there unless you restart it.
I
> can confirm that this happens with both Tomcat and Jetty. Meanwhile, the
> server that Solr is deployed on seems to be running fine.
>
> From this observation, I suspect that Solrj has connection contention
> problem. And this seems to be the case if you look at
CommonHttpSolrServer.
> This class uses MultiThreadedHttpConnectionManager which has
> maxConnectionsPerHost set to 2 by default. When the number of thread
> increases, this is obviously not enough and leads to connection contention
> problem. I quickly solve problem by adding another constructor to
> CommonHttpSolrServer that allows setting maxConnectionsPerHost and
> maxTotalConnections:
>
> public CommonsHttpSolrServer(int maxConsPerHost, int maxTotalCons, String
> solrServerUrl) throws MalformedURLException {
> this(solrServerUrl);
> this.maxConsPerHost = maxConsPerHost;
> this.maxTotalCons = maxTotalCons;
> HttpConnectionManagerParams params = new
HttpConnectionManagerParams();
> params.setDefaultMaxConnectionsPerHost(maxConsPerHost);
> params.setMaxTotalConnections(maxTotalCons);
> _connectionManager.setParams(params);
> }
>
> Hope this information would help others.
>
> --
> Regards,
>
> Cuong Hoang
>



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-12-03 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547925
 ] 

Will Johnson commented on SOLR-342:
---

is there any update on getting this patch committed?  we needed to be able to 
set some of the buffer sizes so the script wouldn't help.  have other people 
experienced tourbles with 2.3 and/or this patch that i should be wary of?

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-12-03 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547854
 ] 

Will Johnson commented on SOLR-342:
---

just a comment to say that we added this patch and saw rather signifigant 
improvements, on the order of 10-25% for different index tests.

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-421) Make SolrParams serializable

2007-12-03 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547853
 ] 

Will Johnson commented on SOLR-421:
---

i also added 'implements java.io.Serializable' to:

SolrRequest
SolrInputField
SolrInputDocument

i'd generate a patch but my tree is so heavily patched for SOLR-342 (which 
rocks by the way) and i'm hesitatnt to try anything too ambitious this morning.

> Make SolrParams serializable
> 
>
> Key: SOLR-421
> URL: https://issues.apache.org/jira/browse/SOLR-421
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Trivial
>
> Making SolrParams serializable will allow it to be sent over RMI or used in 
> other tools that require serialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-421) Make SolrParams serializable

2007-11-28 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546249
 ] 

Will Johnson commented on SOLR-421:
---

it would also be good to make the same changes to all of the solrj library 
classes as well.  i know they are meant to be sent over http with solr specific 
marshaling, but we ended up needing to send some solrj request objects to 
another box via RMI and it was a bit of a pain.


> Make SolrParams serializable
> 
>
> Key: SOLR-421
> URL: https://issues.apache.org/jira/browse/SOLR-421
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Trivial
>
> Making SolrParams serializable will allow it to be sent over RMI or used in 
> other tools that require serialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: schema query

2007-11-05 Thread Will Johnson
Check out luke:

http://wiki.apache.org/solr/LukeRequestHandler

- will

-Original Message-
From: S DALAL [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 05, 2007 7:29 AM
To: solr-dev@lucene.apache.org
Subject: schema query

Hi,
   Is there a way to query for the schema or the field properties ?  To give
a overview, i want to plug Solr to a web crawler and the index the pages
crawled. So, while indexing the crawler needs to know about the fields to
create the document.

One way, i can think of is to scrape the
http://localhost:9696/solr/admin/get-file.jsp?file=schema.xml  page, is
there a existing better way ?

thanks and regards
dalal



RE: CommonsHttpSolrServer and multithread

2007-10-18 Thread Will Johnson
You can also get a hold of the underlying MultiThreadedHttpConnectionManager
if you want to tweak the configuration further:


public class CommonsHttpSolrServer { 
  .
  public MultiThreadedHttpConnectionManager getConnectionManager()
}

- will

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 18, 2007 12:08 PM
To: solr-dev@lucene.apache.org
Subject: Re: CommonsHttpSolrServer and multithread

> 
> but Is CommonsHttpSolrServer thread-safe?
> 

It better be!  To the best of my knowledge, it is.  If you have any 
troubles with it, we need to fix them.

the underlying connections are thread safe:
http://jakarta.apache.org/httpcomponents/httpclient-3.x/threading.html

we use MultiThreadedHttpConnectionManager

ryan



Re: svn commit: r577427 - in /lucene/solr/trunk/client/java/solrj/test/org/apache/solr/client/solrj: LargeVolumeTestBase.java embedded/LargeVolumeEmbeddedTest.java embedded/LargeVolumeJettyTest.java

2007-09-20 Thread Will Johnson
Even if we used a dependency management tool, the junit/ant  
integration

still requires that developers have the ant-junit bindings (aka:
ant-junit.jar) in the class path when the build.xml is parsed.   
supposedly

you can explicitly declare the junit tasks with your own taskdef and
identify the location of the jars yourself) but the jars still have to
exist when that taskdef is evaluated -- which makes it hard to then  
pull

those jars as part of a target.

Everybody i've ever talked to who i felt confident knew more about ant
then me (with Erik at teh top of the list) has said the same thing:  
"Put

junit and ant-junit in your ANT_LIB ... don't even try to do anything
else, it will just burn you."



we do the following:


  



  
  

and so on

it works nicely for all the main targets (compile, test, etc).  i  
also just verified that the same method works in the solr build  
file.  it guarantees that everyone is running the exact same version  
of junit and doesn't require any extra steps for developers to be  
able to build/test code.  there are lots of other ways to do this  
including a custom task def but the above method is pretty  
straightforward and ~vanilla ant.


- will




[jira] Updated: (SOLR-360) Multithread update client causes exceptions and dropped documents

2007-09-19 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-360:
--

Attachment: TestJettyLargeVolume.java

i'll work on the patch to make it  cleaner and run with the build process but i 
wanted to get this up as soon as possible.  if you drop it into 
/client/java/solrj/test/org/apache/solr/client/solrj/embedded it compiles/runs 
with eclipse.

> Multithread update client causes exceptions and dropped documents
> -
>
> Key: SOLR-360
> URL: https://issues.apache.org/jira/browse/SOLR-360
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: test fails on both pc + mac, tomcat + jetty all java 1.6
>Reporter: Will Johnson
> Attachments: TestJettyLargeVolume.java
>
>
> we were doing some performance testing for the updating aspects of solr and 
> ran into what seems to be a large problem.  we're creating small documents 
> with an id and one field of 1 term only submitting them in batches of 200 
> with commits every 5000 docs.  when we run the client with 1 thread 
> everything is fine.  when we run it win >1 threads things go south (stack 
> trace is below).  i've attached the junit test which shows the problem.  this 
> happens on both a mac and a pc and when running solr in both jetty and 
> tomcat.  i'll create a junit issue if necessary but i thought i'd see if 
> anyone else had run into this problem first.   
> also, the problem does not seem to surface under solr1.2
> (RyanM suggested it might be related to SOLR-215)
> (output from junit test)
> Started thread: 0
> Started thread: 1
> org.apache.solr.common.SolrException: 
> Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje
> Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecur

[jira] Created: (SOLR-360) Multithread update client causes exceptions and dropped documents

2007-09-19 Thread Will Johnson (JIRA)
Multithread update client causes exceptions and dropped documents
-

 Key: SOLR-360
 URL: https://issues.apache.org/jira/browse/SOLR-360
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
 Environment: test fails on both pc + mac, tomcat + jetty all java 1.6
Reporter: Will Johnson


we were doing some performance testing for the updating aspects of solr and ran 
into what seems to be a large problem.  we're creating small documents with an 
id and one field of 1 term only submitting them in batches of 200 with commits 
every 5000 docs.  when we run the client with 1 thread everything is fine.  
when we run it win >1 threads things go south (stack trace is below).  i've 
attached the junit test which shows the problem.  this happens on both a mac 
and a pc and when running solr in both jetty and tomcat.  i'll create a junit 
issue if necessary but i thought i'd see if anyone else had run into this 
problem first.   

also, the problem does not seem to surface under solr1.2

(RyanM suggested it might be related to SOLR-215)

(output from junit test)
Started thread: 0
Started thread: 1
org.apache.solr.common.SolrException: 
Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje

Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserj

Re: boosting a query by a function of other fields

2007-09-06 Thread Will Johnson

I haven't yet looked at SOLR-192 to see how it is done there, though.

-Mike



it no where near perfect but it did at least pass some unit tests.   
my immediate need to have that bit of functionality has lessened but  
i'd be happy to keep working and testing if anyone has comments on  
the patch.  it does (as the ticket states) depend on a lucene patch  
at the moment to get field names etc but that could probably be  
removed if necessary.


- will



[jira] Updated: (SOLR-192) Move FunctionQuery to Lucene

2007-08-29 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-192:
--

Attachment: SOLR-192-functionQueries.patch

patch attached that uses the functionality from lucene instead of solr.  there 
were some changes in the api in the solr->lucene transition so there was one 
api change to a private static method in solr.search.QueryParsing.  this patch 
also relies on LUCENE-989 (http://issues.apache.org/jira/browse/LUCENE-989) to 
get access to field names.   a future patch could then get access to the 
statistics for exposing in results.

- will

> Move FunctionQuery to Lucene
> 
>
> Key: SOLR-192
> URL: https://issues.apache.org/jira/browse/SOLR-192
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Grant Ingersoll
> Attachments: SOLR-192-functionQueries.patch
>
>
> FunctionQuery is a useful concept to have in Lucene core.  Deprecate the Solr 
> implementation and migrate it Lucene core.  Have the deprecated Solr version 
> call the Lucene version.
> See https://issues.apache.org/jira/browse/LUCENE-446

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-192) Move FunctionQuery to Lucene

2007-08-27 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523042
 ] 

Will Johnson commented on SOLR-192:
---

is anyone currently working on doing this migration?  i submitted a patch to 
the lucene project tracker (https://issues.apache.org/jira/browse/LUCENE-989) 
and was going to post a patch for solr to use the new features but the 
implementations look to be reasonably different.   

> Move FunctionQuery to Lucene
> 
>
> Key: SOLR-192
> URL: https://issues.apache.org/jira/browse/SOLR-192
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Grant Ingersoll
>
> FunctionQuery is a useful concept to have in Lucene core.  Deprecate the Solr 
> implementation and migrate it Lucene core.  Have the deprecated Solr version 
> call the Lucene version.
> See https://issues.apache.org/jira/browse/LUCENE-446

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513912
 ] 

Will Johnson commented on SOLR-215:
---

did anything ever get baked into the patch for handling the core name as a cgi 
param instead of as a url path element?  the email thread we had going didn't 
seem to come to any hard conclusions but i'd like to lobby for it as a part of 
the spec.  i read through the patch but i couldn't quite follow things enough 
to tell.

> Multiple Solr Cores
> ---
>
> Key: SOLR-215
> URL: https://issues.apache.org/jira/browse/SOLR-215
> Project: Solr
>  Issue Type: Improvement
>Reporter: Henri Biestro
>Priority: Minor
> Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
> solr-trunk-542847.patch, solr-trunk-src.patch
>
>
> WHAT:
> As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
> This patch is intended to allow multiple cores in Solr which also brings 
> multiple indexes capability.
> The patch file to grab is solr-215.patch.zip (see MISC session below).
> WHY:
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents if needed. If you really need multiple indexes, deploy multiple web 
> applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> HOW:
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
> web.xml; this allows easy to access each core through a different url. 
> USAGE (example web deployment, patch installed):
> Step0
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
> monitor.ml
> Will index the 2 documents in solr.xml & monitor.xml
> Step1:
> http://localhost:8983/solr/core0/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core0 index; 2 
> documents
> Step2:
> http://localhost:8983/solr/core1/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core1 index; no 
> documents
> Step3:
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
> java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
> Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
> running queries from the admin interface, you can verify indexes have 
> different content. 
> USAGE (Java code):
> //create a configuration
> SolrConfig config = new SolrConfig("solrconfig.xml");
> //create a schema
> IndexSchema schema = new IndexSchema(config, "schema0.xml");
> //create a core from the 2 other.
> SolrCore core = new SolrCore("core0", "/path/to/index", config, schema);
> //Accessing a core:
> SolrCore core = SolrCore.getCore("core0"); 
> PATCH MODIFICATIONS DETAILS (per package):
> org.apache.solr.core:
> The heaviest modifications ar

[jira] Updated: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-312:
--

Attachment: create-solrj-javadoc.patch

simple patch to add new task

> create solrj javadoc in build.xml
> -
>
> Key: SOLR-312
> URL: https://issues.apache.org/jira/browse/SOLR-312
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 1.3
> Environment: a new task in build.xml named javadoc-solrj that does 
> pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
> heavily based on the example from the solr core javadoc target.
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: create-solrj-javadoc.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Will Johnson (JIRA)
create solrj javadoc in build.xml
-

 Key: SOLR-312
 URL: https://issues.apache.org/jira/browse/SOLR-312
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
 Environment: a new task in build.xml named javadoc-solrj that does 
pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
heavily based on the example from the solr core javadoc target.
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-07-13 Thread Will Johnson
>comments?

Hooray, and very cool.  I didn't know you only needed a locking
mechanism if you only have multiple index writers so the use of NoLock
by default makes perfect sense.

A quick stability update: Since I first submitted the patch ~2 months
ago we've had 0 lockups with it running in all our test environments.  

- will


[jira] Updated: (SOLR-267) log handler + query + hits

2007-07-11 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-267:
--

Attachment: LogQueryHitCounts.patch

new path produces the following output:

Jul 11, 2007 1:35:19 PM org.apache.solr.core.SolrCore execute
INFO: webapp=/solr path=/select/ 
params=indent=on&start=0&q=solr&version=2.2&rows=10 hits=0 status=0 QTime=79

and adds a NamedList toLog as yonik suggested.  

> log handler + query + hits
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, 
> LogQueryHitCounts.patch, LogQueryHitCounts.patch, LogQueryHitCounts.patch, 
> LogQueryHitCounts.patch, LogQueryHitCounts.patch
>
>
> adds a logger to log handler, query string and hit counts for each query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR 2 1 5) Multiple Solr Cores

2007-07-11 Thread Will Johnson
Most of the time I, and I imagine others, don't know the set of core's
ahead of time.  It seems somewhat wasteful to create a ton of solr
server connections when a single one can handle things just as easily.
I guess I don't see why this param should be any different than any
others like output formats etc.

As for POST's, you can still have cgi arguments and access them via the
same servlet request parameters while accessing the input stream.

I'll leave the efficiency issues to people more familiar with the patch
but if it has to be in the url then you force people using solrj and
other similar apis to create a Map and manage them
that way.

- will

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, July 11, 2007 1:20 PM
To: solr-dev@lucene.apache.org
Subject: Re: [jira] Commented: (SOLR 2 1 5) Multiple Solr Cores

On 7/11/07, Will Johnson <[EMAIL PROTECTED]> wrote:
> I think it would be nice to have the core name
> specified as a CGI param instead of (or in addition to) a url path.
> Otherwise, large section of client code (such as solrj/solr#) will
need
> to be changed.

Only if you want to talk to multiple cores over a single "connection",
right?
Hopefully existing client code will allow the specification of the URL,
and
one would use http://localhost:8983/solr/core1/

Still might be useful as a param *if* it can be done efficiently.
I wonder about the case when the param comes in via POST though.

-Yonik


RE: [jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-11 Thread Will Johnson
>One question I had was about backward compatibility... is there a way
to >register a null or default core that reverts to the original paths?
Are >there any other backward compatible gotchas (not related to custom
java >code)?

I'm very excited about this patch as it would remove my current scheme
of running shell scripts to hot deploy new solr webapps on the fly. 

Along with registering a default core so that all existing code/tests
continue to work I think it would be nice to have the core name
specified as a CGI param instead of (or in addition to) a url path.
Otherwise, large section of client code (such as solrj/solr#) will need
to be changed.  

For example:

http://localhost:8983/solr/select?q=foo&core=core1
http://localhost:8983/solr/update?core=core1 

- will


[jira] Updated: (SOLR-280) slightly more efficient SolrDocument implementation

2007-07-02 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-280:
--

Attachment: SOLR-280-SolrDocument2-API-Compatibility.patch

>The API changes mostly affect solrj users. 

being one of those heavily affected users i created the attached patch to make 
us unaffected.  (or at least i went from a few hundred compile errors to 0)

the following methods were added back and are mostly 1-5 line wrappers to the 
existing methods or underlying datastructures.

setField(String, Object)
getFieldValue(String)
getFieldValues(String)
addField(String, Object)
getFieldNames() 

- will

> slightly more efficient SolrDocument implementation
> ---
>
> Key: SOLR-280
> URL: https://issues.apache.org/jira/browse/SOLR-280
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-280-SolrDocument2-API-Compatibility.patch, 
> SOLR-280-SolrDocument2.patch, SOLR-280-SolrDocument2.patch
>
>
> Following discussion in SOLR-272
> This implementation stores fields as a Map rather then a 
> Map>.  The API changes slightly in that:
>  getFieldValue( name ) returns a Collection if there are more then one fields 
> and a Object if there is only one.
> getFirstValue( name ) returns a single value for the field.  This is intended 
> to make things easier for client applications.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-278) LukeRequest/Response for handling show=schema

2007-06-29 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509160
 ] 

Will Johnson commented on SOLR-278:
---

I guess I was hoping for a super set of features in
LukeResponse.FieldInfo which will be partially set by the schema and
partially set by the luke-ish info.  We could be even merge the two if
it made sense.

In the end I need to get a list of "fields that solr currently knows
about" which seems to be a grouping of both the schema and the index via
dynamic fields.  The current patch does this but I think there is a
better approach somewhere out there.

- will



> LukeRequest/Response for handling show=schema
> -
>
> Key: SOLR-278
> URL: https://issues.apache.org/jira/browse/SOLR-278
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LukeSchemaHandling.patch
>
>
> the soon to be attached patch adds a method to LukeRequest to set the option 
> for showing schema from SOLR-266.  the patch also modifies LukeRepsonse to 
> handle the schema info in the same manner as the fields from the 'normal' 
> luke response.  i think it's worth talking about unifying the response format 
> so that they aren't different but that's a larger discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-278) LukeRequest/Response for handling show=schema

2007-06-29 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-278:
--

Attachment: LukeSchemaHandling.patch

> LukeRequest/Response for handling show=schema
> -
>
> Key: SOLR-278
> URL: https://issues.apache.org/jira/browse/SOLR-278
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LukeSchemaHandling.patch
>
>
> the soon to be attached patch adds a method to LukeRequest to set the option 
> for showing schema from SOLR-266.  the patch also modifies LukeRepsonse to 
> handle the schema info in the same manner as the fields from the 'normal' 
> luke response.  i think it's worth talking about unifying the response format 
> so that they aren't different but that's a larger discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-278) LukeRequest/Response for handling show=schema

2007-06-29 Thread Will Johnson (JIRA)
LukeRequest/Response for handling show=schema
-

 Key: SOLR-278
 URL: https://issues.apache.org/jira/browse/SOLR-278
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3


the soon to be attached patch adds a method to LukeRequest to set the option 
for showing schema from SOLR-266.  the patch also modifies LukeRepsonse to 
handle the schema info in the same manner as the fields from the 'normal' luke 
response.  i think it's worth talking about unifying the response format so 
that they aren't different but that's a larger discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-267) log handler + query + hits

2007-06-27 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-267:
--

Attachment: LogQueryHitCounts.patch

new patch that writes out request params as cgi instead of NL.toString() for 
pasting into a browser.  i also figured out the HttpResponseHeader however i'm 
not sure how people feel about having that info duplicated in teh solr logs, 
the http headers/access logs and the actual solr response.  in any case the 
following logic would go into SolrDispatchFilter: (but is not in this patch)




> log handler + query + hits
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, 
> LogQueryHitCounts.patch, LogQueryHitCounts.patch, LogQueryHitCounts.patch, 
> LogQueryHitCounts.patch
>
>
> adds a logger to log handler, query string and hit counts for each query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-267) log handler + query + hits

2007-06-26 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508163
 ] 

Will Johnson commented on SOLR-267:
---

A few response rolled up:

Yonik Seeley commented on SOLR-267:
---


After having used this for a ~week now I kind of do too.  I can work on
a patch that switches that log component back unless someone else (who
wants it more) beats me to it.

"hits".

Agreed, I'd love to have query pipelines and indexing pipelines for
processing logic but that's a much bigger effort.  At the moment it's
only 1 line extra in each of the 'real' query handlers which doesn't
seem too bad.


Ian Holsman commented on SOLR-267:
--

long? >you might need/want to put in some quotes are the query.

It will look very long :)  As long as there are no spaces which the url
encoding should handle I think things are ok (this assumes we're going
to switch back to cgi params)

it >in)

Not that I know how to do.  Since the dispatch filter is a filter not a
servlet it doesn't have access to an HttpServletResponse, only a
ServletResponse which means it can't set HttpHeaders.  This was my
original idea for how to solve this problem and seems a bit more
'standard' anyways but I hit a dead end without getting more hackish
than usual.

- will

 


> log handler + query + hits
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, 
> LogQueryHitCounts.patch, LogQueryHitCounts.patch, LogQueryHitCounts.patch
>
>
> adds a logger to log handler, query string and hit counts for each query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-267) log handler + query + hits

2007-06-22 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-267:
--

Attachment: LogQueryHitCounts.patch

new patch to promote responseHeader from a defacto standard to an api standard 
in SolrQueryResponse.  this enables the SolrCore.execute() method to simply 
print out it's contents containing any info people want logged.  the format now 
is:

 Jun 22, 2007 10:37:25 AM org.apache.solr.core.SolrCore execute
INFO: webapp=/solr path=/select/ hits=0 status=0 QTime=0 
params={indent=on,start=0,q=solr,version=2.2,rows=10}

which is fully labeled but i think mkaes things much easier to read/parse as 
you can look for labels instead of positions which may or may not change.

> log handler + query + hits
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, 
> LogQueryHitCounts.patch, LogQueryHitCounts.patch, LogQueryHitCounts.patch
>
>
> adds a logger to log handler, query string and hit counts for each query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-267) log handler + query + hits

2007-06-22 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507312
 ] 

Will Johnson commented on SOLR-267:
---

One thing that comes to mind is making the response header a standard
part of the SolrQueryResponse object with get/set/add methods then the
log message can just print out what ever is in the response header. With
trunk, it already contains much of the same data (status, qtime,
params).  The only issue is that in order to keep things 'clean' the
output would change to being fully labeled:

webapp=/solr path=/select/ status=0 QTime=63
params={indent=on,start=0,q=*:*,version=2.2,rows=10} myotherheader=foo

In general I think this makes things much cleaner and easier to read but
it does break backwards compatibility for log parsing purposes.  Any
other ideas?

- will







> log handler + query + hits
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, 
> LogQueryHitCounts.patch, LogQueryHitCounts.patch
>
>
> adds a logger to log handler, query string and hit counts for each query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-06-21 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506983
 ] 

Will Johnson commented on SOLR-240:
---

longer >>than without.


No, after I applied the patch I have never seen a lockup. 

oldest Solr collections have been running in CNET for 2 years now, and
I've never seen this happen).   What I *have* seen is that exact
exception when the server died, restarted, and then couldn't grab the
write lock normally due to not a big enough heap causing excessive
GC and leading resin's wrapper to restart the container.

Another reason to use native locking.  From the lucene native fs lock
javadocs:  "Furthermore, if the JVM crashes, the OS will free any held
locks, whereas SimpleFSLockFactory will keep the locks held, requiring
manual removal before re-running Lucene."  

My hunch (and that's all it is) is that people seeing/not seeing the
issue may come down to usage patterns.  My project is heavily focused on
low indexing latency so we're doing huge numbers of
add/deletes/commits/searches in very fast succession and from multiple
clients.  A more batch oriented update usage pattern may not see the
issue.

The patch because as is, it doesn't change any api or cause any change
of existing functionality whatsoever unless you use the new option in
solrconfig.  I would argue that using native locking should be the
default though.

- will


> java.io.IOException: Lock obtain timed out: SimpleFSLock
> 
>
> Key: SOLR-240
> URL: https://issues.apache.org/jira/browse/SOLR-240
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.2
> Environment: windows xp
>Reporter: Will Johnson
> Attachments: IndexWriter.patch, IndexWriter2.patch, stacktrace.txt, 
> ThrashIndex.java
>
>
> when running the soon to be attached sample application against solr it will 
> eventually die.  this same error has happened on both windows and rh4 linux.  
> the app is just submitting docs with an id in batches of 10, performing a 
> commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-267) log handler + query + hits

2007-06-21 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-267:
--

Attachment: LogQueryHitCounts.patch

slight update to log the webapp name which is set in the SolrDispatchFilter.  
this lets you distinguish between multiple solr instances for tracking 
purposes. 

log output now looks like:

Jun 21, 2007 10:31:05 AM org.apache.solr.core.SolrCore execute
INFO: /solr /select/ indent=on&start=0&q=*:*&version=2.2&rows=10 hits=5 0 62

> log handler + query + hits
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, 
> LogQueryHitCounts.patch, LogQueryHitCounts.patch
>
>
> adds a logger to log handler, query string and hit counts for each query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Updated: (SOLR-267) log handler + query + hits

2007-06-21 Thread Will Johnson
>This produces log messages that look like this:
>INFO: /select q=solr&wt=python&indent=on hits=1 0 94
>
>If there was no DocSet, it would look like this:
>INFO: /select q=solr&wt=python&indent=on - 0 94

I would think that tacking the new stats onto the end of the line would
be better than in the middle.  Usually when I parse log files it
involves something like:

String[] arr = line.split(" ");
code = arr[3]
time = arr[4]

instead of the following which is what it seems you're implying that
people are doing:

String[] arr = line.splti(" ")
code = arr[arr.length-2]
time = arr[arr.length-1]

but then again, I don't have any code written to parse things yet so
backwards compatibility isn't an issue for me and either format is fine.


[jira] Updated: (SOLR-267) log handler + query + hits

2007-06-20 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-267:
--

Attachment: LogQueryHitCounts.patch

updated patch to work in SolrCore.execute() instead.  i annotated the msg with 
hits=## as requested however i left time unlabeled for backwards compatibility 
and i had no idea what the static '0' was but i left it there just to be safe 
as well.   i think i tmight be good to clean that up and i'm happy to but i 
don't know who or how those numbers are being used today.

> log handler + query + hits
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>    Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch
>
>
> adds a logger to log handler, query string and hit counts for each query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-267) log handler + query + hits

2007-06-20 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-267:
--

Description: 
adds a logger to log handler, query string and hit counts for each query



  was:
adds a logger



Summary: log handler + query + hits  (was: log handler + query + )

> log handler + query + hits
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch
>
>
> adds a logger to log handler, query string and hit counts for each query

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-267) log handler + query +

2007-06-20 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-267:
--

Attachment: LogQueryHitCounts.patch

hit a random key a little fast on the last post.  the attached patch adds a 
logger to the Standard and DisMax request handlers to log the hander name, 
query string and hit count for each query.  

> log handler + query + 
> --
>
> Key: SOLR-267
> URL: https://issues.apache.org/jira/browse/SOLR-267
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: LogQueryHitCounts.patch
>
>
> adds a logger

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-267) log handler + query +

2007-06-20 Thread Will Johnson (JIRA)
log handler + query + 
--

 Key: SOLR-267
 URL: https://issues.apache.org/jira/browse/SOLR-267
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3


adds a logger



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: requestsPerSecond, averageResponseTime

2007-06-20 Thread Will Johnson
The one thing most people (ie product managers) want to see is the
number of times that users get 0 hits for a query but that doesn't seem
to be logged anywhere in solr that's easily accessible in log files.  Am
I missing something very obvious or should we try and fix this somehow?
I know some other engines will log the number of hits in with the query
log which seems like a nice way of doing things.  

Any ideas or pointers?

- will

 

-Original Message-
From: Clay Webster [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 20, 2007 10:33 AM
To: solr-dev@lucene.apache.org
Subject: Re: requestsPerSecond, averageResponseTime

Hey Ian, these version with all the parameter options only shows the
table headers.. no data.  (No requests?)

PS: I think there's interest. ;-)

--cw


On 6/19/07, Ian Holsman <[EMAIL PROTECTED]> wrote:
>
> I've been working on a tool to parse log files to get some of this
kind
> of information as well
>
> it's really alpha, but if your curious the dummy system is here:
>
> http://pyro.holsman.net:9081/top/ -- slightly obfuscated queries (to
> roll them up)
> http://pyro.holsman.net:9081/overall/?period=5m&hours=12 -- # of
> requests, response time, and deviation in that
>
>
http://pyro.holsman.net:9081/overall/?period=5m&hours=12&format=csv&cols
=1,2,5,6,7,8
> - same thing as a CSV file and showing selected columns
>
>
> The aim is to use this as a data source  for something like cacti and
> sticking a flash graph on top of it.
>
> If there is enough interest I can contribute this to solr
>
> Yonik Seeley wrote:
> > requestsPerSecond and averageResponseTime were added to statistics
for
> > each response handler.  Are these statistics really useful enough to
> > keep as-is?
> >
> > averageResponseTime is cumulative since the server started, so it's
> > not useful for monitoring purposes, but only benchmarking purposes
(it
> > won't tell you if your queries are getting slower all of a sudden).
> > (it will also count slower warming queries, not just live queries).
> >
> > requestsPerSecond is likewise flawed... it won't let you detect a
> > flood of traffic or a dropoff.  Also, if you turned off traffic to
the
> > server yesterday, that will continue to be reflected in the
> > requestsPerSecond today.
> >
> > Since it seems like these parameters are only useful for
benchmarking
> > (which can easily be done from log files), perhaps we should defer
> > adding them until we can come up with versions that are useful for
> > monitoring?
> >
> > -Yonik
> >
>
>


[jira] Updated: (SOLR-176) Add detailed timing data to query response output

2007-06-20 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-176:
--

Attachment: RequesthandlerBase.patch

a slightly more ambitious patch that tracks: 

* total number of requests/errors
* requests/errors in the current interval (interval defined in solrconfig)
* requets/errors as of the start of the last interval
* avg requet times for total / current interval



> Add detailed timing data to query response output
> -
>
> Key: SOLR-176
> URL: https://issues.apache.org/jira/browse/SOLR-176
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.2
>Reporter: Mike Klaas
>Assignee: Mike Klaas
>Priority: Minor
> Fix For: 1.3
>
> Attachments: dtiming.patch, dtiming.patch, dtiming.patch, 
> dtiming.patch, RequesthandlerBase.patch, RequesthandlerBase.patch
>
>
> see 
> http://www.nabble.com/%27accumulate%27-copyField-for-faceting-tf3329986.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: requestsPerSecond, averageResponseTime

2007-06-19 Thread Will Johnson
>Has anyone tried to get solr statistics with cacti/nagios?  If it isn't

>too difficult, I would like to set this up.

>Can cacti read & parse a file?

Generally speaking nagios/cacti are as powerful as you are with bash.
We haven't done it yet but it's a requirement at my company to integrate
the three in the next couple months.  Our current plan is to get as many
stats into the statistics page and then have a shell script grab the xml
(possibly with an xsl) and then feed that into the monitoring apps.
>From there it's ringing pagers and usage graphs galore.

When we get something working I'll make sure to post a write up on the
list unless someone else beats me to it.

- will


RE: requestsPerSecond, averageResponseTime

2007-06-19 Thread Will Johnson
Would it be better to have an option to record traffic for the last 'x
minutes/seconds/hours' configurable on a per handler basis?  The goal is
to have hooks for nagios/cacti/etc to be able to pull live status info
for monitoring purposes.  If you want fine grained performance history
then log files are the best approach, I just think a way to have beepers
go off if a server starts getting huge amounts of traffic is a good
thing.  

For the record nagios and/or cacti could both keep track of 'in the last
x' type of statistics based on totals but having solr compute that
automatically would be nice.  

- will


 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, June 19, 2007 10:27 AM
To: solr-dev@lucene.apache.org
Subject: requestsPerSecond, averageResponseTime

requestsPerSecond and averageResponseTime were added to statistics for
each response handler.  Are these statistics really useful enough to
keep as-is?

averageResponseTime is cumulative since the server started, so it's
not useful for monitoring purposes, but only benchmarking purposes (it
won't tell you if your queries are getting slower all of a sudden).
(it will also count slower warming queries, not just live queries).

requestsPerSecond is likewise flawed... it won't let you detect a
flood of traffic or a dropoff.  Also, if you turned off traffic to the
server yesterday, that will continue to be reflected in the
requestsPerSecond today.

Since it seems like these parameters are only useful for benchmarking
(which can easily be done from log files), perhaps we should defer
adding them until we can come up with versions that are useful for
monitoring?

-Yonik


RE: [jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

2007-06-19 Thread Will Johnson
>i haven't read anything in the jira issue this refrences, but in
instances
>where reliability and uptime are of high concern, you'll typically have
a
>master/multi-slave setup with the slaves sitting behind a load balancer
>-- in that configuration, you can deploy any change to your schema by:

>this process results in 0 downtime for any schema.xml change,
regardless
>whether the changes require rebuilding your index.

True, but that implies indexing downtime which is also bad.  Also, the
master/slave setups kill indexing latency which is my primary concern
and the reason I went with solr to begin with.  also while you're
suggested steps work they're a bit heavy on the operations side compared
to a client's ability to add a field by hitting a url.   

>if you change/add a copyField declaration, you'll need to reindex ...
>copyField is evaluated when a document is being indexed.

True, but not if you haven't fed any data into that copy field yet.  Ie
'from now on' I want all data from field x copied into field y.  

- will



RE: [jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

2007-06-18 Thread Will Johnson
>I haven't looked at the patch, but have a couple questions:
>* What is the motivation/use case for editing the schema at runtime?  (I'm not 
>suggesting there aren't good ones, just curious)

to add new fields on the fly without having any search downtime

>* Would changes be saved?

the patch as is just re-reads the schema from the location it's originally set 
from.  all changes are 'saved'

> * Why not dynamic fields?

becuase the field names start to get too complex.  for example you could model 
the id field in the default schema as a dynamic field:

 

becomes:

*_str_it_st_rt_mvf

working that out for all possible combinations seems a bit onerous.  the 
default dynamic fields cover most cases but i'm sure my product managers will 
want one that i don't have the day after we go live.  also, if i have extra 
info about a field like the fact that i don't want it stored i shoudl be able 
to take advantage of that without having to bounce anything.


> it seems to me that restarting a webapp and suffering
> downtime is a heavy price to pay just to add a new field or even to just
> change an existing field property.

>*adding* fields should be relatively straightforward -- the more I learn about 
>lucene indexing(indexes), it seems like most >schema *changes* require the 
>index to be rebuilt anyway.

correct, i'm fine if we want to restrict the schema 'changes' to only allow the 
addition of new fields but the index schema also reflects things like default 
query parsing options and copy fields which shouldn't require and index changes 
at all which why i went for a more loose approach to start.

- will

 



[jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

2007-06-18 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505938
 ] 

Will Johnson commented on SOLR-265:
---

After doing some more thinking about the issue after I submitted the
patch I agree that there probably are some threading issues to work out.
I was working on another approach that was much larger (only keep 1 copy
in SolrCore accessible by getSchema() and add protection there) but that
required a much larger code change than the one posted so I went with
the shorter to at least promote discussion.  If the single schema
getter() makes sense, I'll be happy to provide such a patch. 

There do seem to be other alternatives though:

First is a ModifySchema handler that could support adding fields etc
which should be easier to defend against from a synchronization
standpoint. At least there are fewer times when fields.clear() has been
called but new values have not been added back.  As this is all I care
about at the moment I'd be happy, but I assume someone might want to do
something else more complex.

The second is to wrap up the clear/repopulate methods with some basic
protection but actually allow different schemas inside a single request.
This could be done by requiring all new schemas to be 'compatible' in
some defined way.  Since there doesn't seem to be any validation that
goes on if I stop the app, change the schema and then restart it,
compatible might just mean valid xml.  If field 'new_x' suddenly appears
during the middle of my post it shouldn't have any effect as my posted
data won't contain 'new_x.'  from a client's contractual perspective, if
you want new fields processed correctly you have to wait for
updateSchema to finish.

In any case, it seems to me that restarting a webapp and suffering
downtime is a heavy price to pay just to add a new field or even to just
change an existing field property.

- will






> Make IndexSchema updateable in live system
> --
>
> Key: SOLR-265
> URL: https://issues.apache.org/jira/browse/SOLR-265
> Project: Solr
>  Issue Type: Improvement
>      Components: update
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: IndexSchemaReload.patch
>
>
> I've seen a few items on the mailing lists recently surrounding updating a 
> schema on the file system and not seeing the changes get propagated.  while I 
> think that automatically detecting schema changes on disk may be unrealistic, 
> I do think it would be useful to be able to update the schema without having 
> to bounce the webapp.  the forthcoming patch adds a method to SolrCore to do 
> just that as well as a request handler to be able to call said method.  
> The patch as it exists is a straw man for discussion.  The one thing that 
> concerned me was making IndexScheam schema non-final in SolrCore.  I'm not 
> quite sure why it needs to be final to begin with so perhaps someone can shed 
> some light on the situation.  Also, I think it would be useful to be able to 
> upload a schema through the admin GUI, have it persisted to disk and then 
> call relaodSchema()but that seemed like a good bit of effort for a patch that 
> might not even be a good idea.
> I'd also point out that this specific problem is one I've been trying to 
> address recently and while I think it could be solved with various dynamic 
> fields the combination of all the options for fields seemed to create too 
> many variables to make useful dynamic name patterns.
> Thoughts?
> - will  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-265) Make IndexSchema updateable in live system

2007-06-18 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-265:
--

Attachment: IndexSchemaReload.patch

updates to:

* solconfig.xml to register handler
* IndexSchema to add reload() method that clears() all internal data structures 
and calls readconfig()
* a new o.a.s.handler.admin.IndexSchemaRequestHandler to trigger the updating



> Make IndexSchema updateable in live system
> --
>
> Key: SOLR-265
> URL: https://issues.apache.org/jira/browse/SOLR-265
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: IndexSchemaReload.patch
>
>
> I've seen a few items on the mailing lists recently surrounding updating a 
> schema on the file system and not seeing the changes get propagated.  while I 
> think that automatically detecting schema changes on disk may be unrealistic, 
> I do think it would be useful to be able to update the schema without having 
> to bounce the webapp.  the forthcoming patch adds a method to SolrCore to do 
> just that as well as a request handler to be able to call said method.  
> The patch as it exists is a straw man for discussion.  The one thing that 
> concerned me was making IndexScheam schema non-final in SolrCore.  I'm not 
> quite sure why it needs to be final to begin with so perhaps someone can shed 
> some light on the situation.  Also, I think it would be useful to be able to 
> upload a schema through the admin GUI, have it persisted to disk and then 
> call relaodSchema()but that seemed like a good bit of effort for a patch that 
> might not even be a good idea.
> I'd also point out that this specific problem is one I've been trying to 
> address recently and while I think it could be solved with various dynamic 
> fields the combination of all the options for fields seemed to create too 
> many variables to make useful dynamic name patterns.
> Thoughts?
> - will  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-265) Make IndexSchema updateable in live system

2007-06-18 Thread Will Johnson (JIRA)
Make IndexSchema updateable in live system
--

 Key: SOLR-265
 URL: https://issues.apache.org/jira/browse/SOLR-265
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3


I've seen a few items on the mailing lists recently surrounding updating a 
schema on the file system and not seeing the changes get propagated.  while I 
think that automatically detecting schema changes on disk may be unrealistic, I 
do think it would be useful to be able to update the schema without having to 
bounce the webapp.  the forthcoming patch adds a method to SolrCore to do just 
that as well as a request handler to be able to call said method.  

The patch as it exists is a straw man for discussion.  The one thing that 
concerned me was making IndexScheam schema non-final in SolrCore.  I'm not 
quite sure why it needs to be final to begin with so perhaps someone can shed 
some light on the situation.  Also, I think it would be useful to be able to 
upload a schema through the admin GUI, have it persisted to disk and then call 
relaodSchema()but that seemed like a good bit of effort for a patch that might 
not even be a good idea.

I'd also point out that this specific problem is one I've been trying to 
address recently and while I think it could be solved with various dynamic 
fields the combination of all the options for fields seemed to create too many 
variables to make useful dynamic name patterns.

Thoughts?

- will  


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-06-18 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505846
 ] 

Will Johnson commented on SOLR-239:
---

after looking at all the dependencies for IndexSchema and with the addition of 
the new solrj stuff in trunk i no longer think this approach is the correct way 
to go about things.  the LukeRequest/LukeResponse seems to give most of the 
same info with ~0 overhead and it's already checked in.  

> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch, IndexSchemaStream2.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: search components (plugins)

2007-06-11 Thread Will Johnson
Sorry, I forgot to turn on my _wild_ideas_ flag before that last post.

That being said, you could build the notion of dependencies into each
stage and have the search logic omputed based on those dependencies,
alternatively you could do pre/post methods for each processing stage
that allow each stage hands on access to the searcher...  crap looks
like ryan beat me by 3 minutes.  oh well, what he said.

- will

 

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 11, 2007 2:27 PM
To: solr-dev@lucene.apache.org
Subject: Re: search components (plugins)


:   // choose one query method
:   docs = Query( req, debug )
:- standard
:- dismax
:- mlt (as input)

there are two small hitches to an approach like this, the first is that
you'd like to reuse more of hte query processing then to just say "go
pick
the list of docs basedo nthe reuqest" ...ideally we'd want things like
"fq" parsing/processing to be refactored so it can be reused by both
standard and dismax and mlt, but that requires chagning the API to be
something like...
   Query, Filter = MakeQuery(req, debug)
..and you delegate to the outermost "controller" to deal with the actual
conversion to generate the "docs" ... expcet you alo hve to worry about
start, rows, sort etc at that level, which makes it a lot less clean.

the second hitch is that "docs" only makes sense in ssuedo code ... in
reality there are DocSets and DocLists, and the efficiencies of geting
only one instead of both can be significant, but if the first phase of
processing doesn't know what expectations the later phases have (facet
or
not?  returns aDocList in teh response or not?) it may have to assume
you
need both.

:   // zero or more...
:   info[] = Info( req, docs, debug )
:+ facet
:+ mlt (on each result)

this for the record is what i was kind of amming for back when i made
the
SimpleFacets class ... give if the docset and some Solr Params, and then
ask it for what you want (either some specific peice of functionality
like getFacetFieldCounts, or all possibletypes of facets even if you
don't
know what they are with getFacetCounts()).

-Hoss



[jira] Updated: (SOLR-259) More descriptive text on improperly set solr/home

2007-06-11 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-259:
--

Attachment: betterSolrHomeError.patch

+import java.util.logging.Level

and a simple

log.log(Level.SEVERE, "Could not start SOLR. Check solr/home property", t);


> More descriptive text on improperly set solr/home
> -
>
> Key: SOLR-259
> URL: https://issues.apache.org/jira/browse/SOLR-259
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: betterSolrHomeError.patch
>
>
> when solr/home is set improperly, tomcat (and other containers) fail to log 
> any useful error messages because everything goes to SolrConfig.severeErrors 
> instead of some basic container level logs.  the soon to be attached 1.5 line 
> patch adds a simple log message to the standard container logs to tell you to 
> check your settings and tell you what solr/home is currently set to.  
> Before the patch if solr/home is improperly set you get:
> Jun 11, 2007 2:21:13 PM org.apache.solr.servlet.SolrDispatchFilter init
> INFO: SolrDispatchFilter.init()
> Jun 11, 2007 2:21:13 PM org.apache.solr.core.Config getInstanceDir
> INFO: Using JNDI solr.home: 
> C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr
> Jun 11, 2007 2:21:13 PM org.apache.solr.core.Config setInstanceDir
> INFO: Solr home set to 
> 'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/'
> Jun 11, 2007 2:21:13 PM org.apache.catalina.core.StandardContext start
> SEVERE: Error filterStart
> Jun 11, 2007 2:21:13 PM org.apache.catalina.core.StandardContext start
> SEVERE: Context [/solr] startup failed due to previous errors
> After the patch you get:
> un 11, 2007 2:30:37 PM org.apache.solr.servlet.SolrDispatchFilter init
> INFO: SolrDispatchFilter.init()
> Jun 11, 2007 2:30:37 PM org.apache.solr.core.Config getInstanceDir
> INFO: Using JNDI solr.home: 
> C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr
> Jun 11, 2007 2:30:37 PM org.apache.solr.core.Config setInstanceDir
> INFO: Solr home set to 
> 'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/'
> Jun 11, 2007 2:30:37 PM org.apache.solr.servlet.SolrDispatchFilter init
> SEVERE: Could not start SOLR. Check solr/home property
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:66)
>   at 
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
>   at 
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>   at 
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>   at 
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3693)
>   at 
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4340)
>   at 
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
>   at 
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
>   at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525)
>   at 
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626)
>   at 
> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
>   at 
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488)
>   at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1206)
>   at 
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293)
>   at 
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
>   at 
> org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1337)
>   at 
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601)
>   at 
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
>   at 
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1590)
>   at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.RuntimeException: Error in solrconfig.xml
>   at org.apache.solr.core.SolrConfig.(SolrConfig.java:90)
>   ... 20 more
> Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' 
> in cl

[jira] Created: (SOLR-259) More descriptive text on improperly set solr/home

2007-06-11 Thread Will Johnson (JIRA)
More descriptive text on improperly set solr/home
-

 Key: SOLR-259
 URL: https://issues.apache.org/jira/browse/SOLR-259
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.2
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.2


when solr/home is set improperly, tomcat (and other containers) fail to log any 
useful error messages because everything goes to SolrConfig.severeErrors 
instead of some basic container level logs.  the soon to be attached 1.5 line 
patch adds a simple log message to the standard container logs to tell you to 
check your settings and tell you what solr/home is currently set to.  

Before the patch if solr/home is improperly set you get:

Jun 11, 2007 2:21:13 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Jun 11, 2007 2:21:13 PM org.apache.solr.core.Config getInstanceDir
INFO: Using JNDI solr.home: 
C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr
Jun 11, 2007 2:21:13 PM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to 
'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/'
Jun 11, 2007 2:21:13 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
Jun 11, 2007 2:21:13 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr] startup failed due to previous errors

After the patch you get:

un 11, 2007 2:30:37 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Jun 11, 2007 2:30:37 PM org.apache.solr.core.Config getInstanceDir
INFO: Using JNDI solr.home: 
C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr
Jun 11, 2007 2:30:37 PM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to 
'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/'
Jun 11, 2007 2:30:37 PM org.apache.solr.servlet.SolrDispatchFilter init
SEVERE: Could not start SOLR. Check solr/home property
java.lang.ExceptionInInitializerError
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:66)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3693)
at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4340)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525)
at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626)
at 
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488)
at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1206)
at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293)
at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at 
org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1337)
at 
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601)
at 
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
at 
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1590)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: Error in solrconfig.xml
at org.apache.solr.core.SolrConfig.(SolrConfig.java:90)
... 20 more
Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in 
classpath or 
'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/conf/', 
cwd=C:\data\apps\tomcat6.0.13\bin
at org.apache.solr.core.Config.openResource(Config.java:357)
at org.apache.solr.core.SolrConfig.initConfig(SolrConfig.java:79)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:87)
... 20 more
Jun 11, 2007 2:30:37 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
Jun 11, 2007 2:30:37 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr] startup failed due to previous errors


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR-236) Field collapsing

2007-06-11 Thread Will Johnson
And one other point, one of the reasons why it's hard to find an example
of post-faceting is that many of the major engines can't do it. 

- will

-Original Message-----
From: Will Johnson [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 11, 2007 11:05 AM
To: solr-dev@lucene.apache.org
Subject: RE: [jira] Commented: (SOLR-236) Field collapsing

>I assumed they would... I think our signals might be crossed w.r.t.
>the meaning of pre or post collapsing.  Faceting "post collapsing" I
>took to mean that the base docset would be restricted to the top "n"
>of each category.

In my view, faceting should occur on the full collapsed result set.  Ie
break down 100 hits to 50 unique ones, then compute facets on those 50
even though you may only return 10 to the user.

>circuitcity does it how I would expect... field collapsing does not
>effect the facets on the left.
>For example, if I search for memory, a facet tells me that there are
>70 under "Digital Cameras".  If I look down the collapsed results,
>"Digital Cameras" only shows the top match, but has a link to "View
>all 70 matches".

I agree, circuit city is a use case where you want pre-faceting.  If you
think about site collapsing though I may se that there are 57 documents
in my result set of type x, then clicking on type x should show me 57
docs.

>15 documents displayed to the user, or 15 total documents that matched
>the query?
>If the latter, I don't see how you could get greater than 15 for any
>facet count.

If I see that there are 15 of type x and click on it then 'total result
found' on the next page should say 15, not any higher.


-Yonik


RE: [jira] Commented: (SOLR-236) Field collapsing

2007-06-11 Thread Will Johnson
>I assumed they would... I think our signals might be crossed w.r.t.
>the meaning of pre or post collapsing.  Faceting "post collapsing" I
>took to mean that the base docset would be restricted to the top "n"
>of each category.

In my view, faceting should occur on the full collapsed result set.  Ie
break down 100 hits to 50 unique ones, then compute facets on those 50
even though you may only return 10 to the user.

>circuitcity does it how I would expect... field collapsing does not
>effect the facets on the left.
>For example, if I search for memory, a facet tells me that there are
>70 under "Digital Cameras".  If I look down the collapsed results,
>"Digital Cameras" only shows the top match, but has a link to "View
>all 70 matches".

I agree, circuit city is a use case where you want pre-faceting.  If you
think about site collapsing though I may se that there are 57 documents
in my result set of type x, then clicking on type x should show me 57
docs.

>15 documents displayed to the user, or 15 total documents that matched
>the query?
>If the latter, I don't see how you could get greater than 15 for any
>facet count.

If I see that there are 15 of type x and click on it then 'total result
found' on the next page should say 15, not any higher.


-Yonik


RE: [jira] Commented: (SOLR-236) Field collapsing

2007-06-11 Thread Will Johnson
Having worked on a number of customer implementations regarding this
feature I can say that the number one requirement is for the facet
counts to be accurate post collapsing.  It all comes down to the user
experience.  For example, if I run a query that get collapsed and has a
facet count for the non-collapsed value then when I click on that facet
for refinement the number of hits in my subsequent query will not match
the number of hits displayed by that facet count.  Ie if it says there
are 10 docs in my result set of type x then when I click on type x I
expect to get back 10 hits.  Further, I could easily end up with a
result set with 15 total hits but a facet count hat says there are 200
results of type x which is very disconcerting from a user perspective. 

I agree that there are times when pre-faceting is also good, but
post-faceting has always been a rather hard requirement for most
ecommerce/data discovery sites.

- will

-Original Message-
From: Emmanuel Keller (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Sunday, June 10, 2007 7:33 AM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-236) Field collapsing


[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12503162 ] 

Emmanuel Keller commented on SOLR-236:
--

Do we have to make a choice ? Both behaviors are interesting. 
What about a new parameter like collapse.facet=[pre|post] ?



> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.2
>Reporter: Emmanuel Keller
> Attachments: field_collapsing_1.1.0.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a
given field to a single entry in the result set. Site collapsing is a
special case of this, where all results for a given web site is
collapsed into one or two entries in the result set, typically with an
associated "more documents from this site" link. See also Duplicate
detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed
before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: search components (plugins)

2007-06-11 Thread Will Johnson
Some thoughts:

One of the most powerful and useful concepts that many of the other
engines (well the good ones) use is the notion of processing pipelines.

For queries this means a series of stages that do things such as:

* faceting
* collapsing
* applying default values
* spell checking
* adding in promotions/boosted content
* applying relevancy logic
* more like this

But it is also heavily used at indexing time.  The more complex engines
use these pipelines for all kinds of crazy stuff like converting
msoffice docs, ocr, speech to text, etc which I think is what nutch does
to some extent.  However solr could still use the same notion to do more
lower level operations like:

* applying synonyms
* removing/renaming fields
* translating xml formats (it would be nice to have any update handler
be able to apply an xslt on incoming data)
* validate incoming data against some business logic

I think much of this is wrapped up in the field definitions at the
moment but it could be extended to be more document aware.

Anything that makes chaining of pre-built processing easier would be
nice.  In addition, if these stages are specified in solrconfig then
decisions like 'do I want faceting before or after collpasing' become
simple cut/paste choices not code changes.

Further, if the last processing step is 'index this doc' or 'search the
index' those should be easy to replace with 'send this doc to segment x'
or 'search all the sub indexes' with simple xml config file changes
assuming those stages exist. (which again is how many of the other
engines do things)  

- will



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Sunday, June 10, 2007 12:51 PM
To: solr-dev@lucene.apache.org
Subject: search components (plugins)

Some people have needed some custom query logic, and they had to
implement their own request handlers.  They still wanted all of the
other functionality (or almost all), so they are forced to copy the
standard request handler or dismax, or both. That's not the easiest to
maintain, and could be more elegant.

Another layer of plugins sounded like overkill at first, but I'm
starting to rethink it, esp in the face of the expanding number of
different variations:
  - standard
  - dismax
  - more-like-this
  - field collapsing

Seems like we should be able to more easily mix and match, or add new
pieces, w/o having whole new request handlers.

Looking toward the future, and distributed search, this might be a
natural place to add hooks to implement that distributed logic.  This
would allow other people to efficiently support their custom
functionality in a distributed environment.

Thoughts?

-Yonik


RE: [jira] Commented: (SOLR-20) A simple Java client for updating and searching

2007-06-08 Thread Will Johnson
Has anyone thought of adding the docsum time to the qtime or possibly
adding separate timing information for the real 'solr query time'.
While my bosses are very pleased that most searches seem to take ~5ms it
does seem a bit misleading.

I'll take a crack at a patch unless there is a reason not to.

- will

-Original Message-
From: Ryan McKinley (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 08, 2007 1:09 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-20) A simple Java client for updating
and searching


[
https://issues.apache.org/jira/browse/SOLR-20?page=com.atlassian.jira.pl
ugin.system.issuetabpanels:comment-tabpanel#action_12502885 ] 

Ryan McKinley commented on SOLR-20:
---

I don't know if you are on solr-dev, Yonik noted that the QTime does not
include the time to write the response, only the query time.  To get an
accurate number for how long the whole query takes, check your app
server logs
http://www.nabble.com/Re%3A-A-simple-Java-client-for-updating-and-search
ing-tf3890950.html

To get a quick response from solr, try rows=0 or a 404 path.  (Of
course, the speed will depend on you network connection speed between
client-server)

> A simple Java client for updating and searching
> ---
>
> Key: SOLR-20
> URL: https://issues.apache.org/jira/browse/SOLR-20
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
> Environment: all
>Reporter: Darren Erik Vengroff
>Priority: Minor
> Attachments: DocumentManagerClient.java,
DocumentManagerClient.java, solr-client-java-2.zip.zip,
solr-client-java.zip, solr-client-sources.jar, solr-client.zip,
solr-client.zip, solr-client.zip, solrclient_addqueryfacet.zip,
SolrClientException.java, SolrServerException.java
>
>
> I wrote a simple little client class that can connect to a Solr server
and issue add, delete, commit and optimize commands using Java methods.
I'm posting here for review and comments as suggested by Yonik.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-06-08 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-239:
--

Attachment: IndexSchemaStream2.patch

new patch that includes a GetFile servlet to possibly replace get-file.jsp due 
to the fact that it writes out invalid xml.  

> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch, IndexSchemaStream2.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR-236) Field collapsing

2007-06-05 Thread Will Johnson
I haven't looked at any of the patches but I can comment some other uses
for the feature that are in production today with major vendors.  While
it's used for site collapsing ala google it's also heavily used in
ecommerce settings.  Check out BestBuy.com/circuitcity/etc and do a
search for some really generic word like 'cable' and notice all the
groups of items; BB shows 3 per group, CC shows 1 per group.  In each
case it's not clear that the number of docs is really limited at all, ie
it's more important to get back all the categories with n docs per
category and the counts per category than it is to get back a fixed
number of results or even categories for that matter.  Also notice that
neither of these sites allow you to page through the categorized
results.

I'd also point out that many vendors require the collapsing field to be
an int instead of a string and then force the front end to do the
mapping.  just one more thing to consider

- will

 

-Original Message-
From: Yonik Seeley (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 05, 2007 9:01 AM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-236) Field collapsing


[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12501550 ] 

Yonik Seeley commented on SOLR-236:
---

I guess adjacent collapsing can make sense when one is sorting by the
field that is being collapsed.

For the normal collapsing though, this patch appears to implement it by
changing the sort order to the collapsing field (normally not desired).
For example, if sorting by relevance and collapsing on a field, one
would normally want the groups sorted by relevance (with the group
relevance defined as the max score of it's members).

As far as how to do paging, it makes sense to rigidly define it in terms
of number of documents, regardless of how many documents are in each
group.  Going back to google, it always displays the first 10 documents,
but a variable number of groups.   That does mean that a group could be
split across pages.  It would actually be much simpler (IMO) to always
return a fixed number of groups rather than a fixed number of documents,
but I don't think this would be less useful to people.  Thoughts?

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.2
>Reporter: Emmanuel Keller
> Attachments: field_collapsing_1.1.0.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a
given field to a single entry in the result set. Site collapsing is a
special case of this, where all results for a given web site is
collapsed into one or two entries in the result set, typically with an
associated "more documents from this site" link. See also Duplicate
detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed
before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-06-04 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-239:
--

Attachment: IndexSchemaStream2.patch

New patch that address all 6 suggestions.  The one thing that is interesting is 
that using http://localhost:8983/solr/admin/get-file.jsp?file=schema.xml does 
not work as it prints out a number of newlines before the XML declaration which 
causes it to be invalid.  I'm not quite sure how to fix this without rewriting 
get-file.jsp as a servlet and making sure it only prints out the xml.

In any case it does work against url's that only contain valid xml however I 
wasn't sure how we go about testing things that require the example to be 
running. (the test is therefore commented out)

as for motivations, yes it does require a good bit of overhead and i think it 
would be good to have a 'lighter' IndexSchema implementation for client api's.  
 i do think however that it's nice to know exactly what is running and to be 
able to inspect each fields capabilities so i'm not sure what the right thing 
to do is.

- will


> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch, IndexSchemaStream2.patch, IndexSchemaStream2.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-06-01 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500704
 ] 

Will Johnson commented on SOLR-239:
---

after seeing that i'd need to regenerate a patch for the new IndexSchema's 
SolrException handling i got to thinking about ways to preserve the 
getInputStream() functionality.  tracing things down a bit it seems to all fall 
to Config.openResource(fileName).  i was wondering if it might not be better to 
extend that code to handle URL's as well as file names by looking for http:// 
at the beginning of the resourceName.  this might open up other avenues for 
centralized configuration of all of solr in the future but it does at least 
solve this problem and maintain more backwards compatibility with the existing 
api.  

thoughts?


> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
>     Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch, IndexSchemaStream2.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-05-29 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499843
 ] 

Will Johnson commented on SOLR-240:
---

i get the stacktrace below with the latest from head with useNativeLocks turned 
off (from my patch).  this took about 2 minutes to reproduce on my windows 
laptop.

one thing i thought of is that local antivirus scanning / backup software which 
we run here may be getting in the way.  i know many other search engines / high 
performance databases out there have issues with antivirus software because it 
causes similar locking issues.  i'm disabling as much of the IT 'malware' as 
possible and seeing better results even with default locking however i had 
everything running when i had good results with the native locking enabled so 
it still seems to be a good idea to use the patch (or something similar).

- will

SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
out: [EMAIL PROTECTED]
b822c61c394dd5f449aaf5e5717356-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:70)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:391)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:81)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:120)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:181)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:259)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:166)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:79)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:198)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:166)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:368)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

> java.io.IOException: Lock obtain timed out: SimpleFSLock
> 
>
> Key: SOLR-240
> URL: https://issues.apache.org/jira/browse/SOLR-240
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.2
> Environment: windows xp
>Reporter: Will Johnson
> Attachments: IndexWriter.patch, IndexWriter2.patch, stacktrace.txt, 
> ThrashIndex.java
>
>
> when running the soon to be attached sample application against solr it will 
> eventually die.  this same error has happened on both windows and rh4 linux.  
> the app is just submitting docs with an id in batches of 10, performing a 
> commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-25 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-239:
--

Attachment: IndexSchemaStream2.patch

updated with fixed and test raw-schema.jsp and added back the IndexSchema 
testDynamicCopy() test.



> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch, IndexSchemaStream2.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-24 Thread Will Johnson
i'll have another go at the patch tomorrow morning; testing the raw-schema.jsp 
(even if it's not used) and put back the test.
 
- will



From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Thu 5/24/2007 6:02 PM
To: solr-dev@lucene.apache.org
Subject: RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream 
instead of Config file




: 2) why did you remove testDynamicCopy() from IndexSchemaTest ?
:
: becuase it had nothing to do with testing the index schema.  as far as i
: could tell it was a ctrl-c / ctrl-v error.  that or i'm really blind and
: happy to put it back.

idon't see a test with that name defined anywhere.  it's testing that you
can declare dynamic fields and copy them using copyField ... that sounds
like an IndexSchemaTest to me  (lots of other schema related tests may be
in BasicFunctionalityTest or ConvertedLegacyTest, but we should try to use
the class specific test classes when the test is very narrow)

: 3) raw-schema.jsp on the trunk appears to be completely broken (multiple
: <%@ page contentType="..."%> declarations), and not linked to from the

: my patch worked but i also saw that it wasn't linked anywhere.

i thought your patch left the multiple contentType declarations, but i
don't rememebr for certain now ... it's a trivial issue either way.




-Hoss





RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-24 Thread Will Johnson
1) there is a public API change here by removing the getIputStream() method 
from IndexSearcher.  probably not a big deal but important that we consider it.

true, that called wasn't used anywhere else in the solr trunk code.  also after 
a lot of thought i realized that it's in general a poor idea to rely on getting 
an input stream in any reliable fashion other than when it's first opened.  
(many don't support reset)  i can put it back easily if people are that worried 
about breaking compatibility but in general it seems like it's asking for 
trouble without knowing the implemntation.

2) why did you remove testDynamicCopy() from IndexSchemaTest ?

becuase it had nothing to do with testing the index schema.  as far as i could 
tell it was a ctrl-c / ctrl-v error.  that or i'm really blind and happy to put 
it back.

3) raw-schema.jsp on the trunk appears to be completely broken (multiple <%@ 
page contentType="..."%> declarations), and not linked to from the admin screen 
anyway ... we might want to just remove it completely and make a note in the 
CHANGES in case people have the old URL bookmarked.

my patch worked but i also saw that it wasn't linked anywhere.
 
- will
 

> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>    Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by. 

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-24 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-239:
--

Attachment: IndexSchemaStream2.patch

the attached patch (IndexSchemaStream2.patch) includes a cleaned up test case 
as well as making the IndexSchema constructors throw a SolrException since they 
are reading InputStreams (which they were before).  i think perhaps they should 
throw something a big 'stronger' but that seemed to have more wide-reaching 
implications.


> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
> IndexSchemaStream2.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2007-05-23 Thread Will Johnson
Good point, I was proposing it as an alternative to myfield_facet since
that seems to overload the field name a bit too much.  I agree that
solrconfig + specialized request handlers are a much better location for
that kind of stuff.  

Also, the reason other engines require you to mark the fields in the
index definition is because they actually index the data differently if
it is a facet vs a normal indexed field.  It's cool that solr doesn't
have to do this but there may be a case where it would be a good idea
someday. 

- will

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 23, 2007 6:34 PM
To: Solr Dev
Subject: RE: [jira] Commented: (SOLR-247) Allow facet.field=* to facet
on all fields (without knowing what they are)


: What about adding an optional parameter to the field definition in the
: IndexSchema for defaultFacet="true/false".  This would make solr's

information should go in schema.xml if the are inherient to the data
and the physical index.  Things should go in the solrconfig.xml if they
relate to how the index is used -- a master might have a differnet
solrconfig then a slave because it doesn't get used for queries, while
two
diffenret slaves might have differnet solrconfigs because they get used
by
different sets of clients and need differnet cache configs or request
handler configs -- but all three would use the same schema.xml because
the
physical index is the same in all cases.

a mechanism already exists to say "by default, i want clients to get
facets on certian fields" in teh solrconfig.xml, it's just a default
param
for hte requestHandler ...

  

 
   category
   author
   type
   ...

...then the params are defaulted for everyone, and the only thingthe
user
needs in the URL is "facet=true" ... or that can be defaulted as well.


-Hoss



RE: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2007-05-23 Thread Will Johnson
What about adding an optional parameter to the field definition in the
IndexSchema for defaultFacet="true/false".  This would make solr's
functionality/configuration similar with many of the major search engine
vendors and keep people from having to follow naming conventions for
fields.  Then facet.field=* just turns on those fields with
defaultFacet="true" but still lets you facet on others if you deem
necessary.  If there were a list of default facet fields it might also
let the index warming process pre-cache the results of those filter
queries which would be a nice side benefit.

The *_facet thing scares me because I'm afraid I'll eventually be
'forced' to have field names like:

myfield_facet_vector_stem_morelikethis_highlight.

- will

-Original Message-
From: Ryan McKinley (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 23, 2007 3:38 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on
all fields (without knowing what they are)


[
https://issues.apache.org/jira/browse/SOLR-247?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12498338 ] 

Ryan McKinley commented on SOLR-247:


> 
> There are *lots* of reasons why a field might be indexed though, so
faceting on every indexed field doesn't seem like it would ever make
sense.
> 

agreed, but *_facet would be useful

> 
> if we do this, i would think it only makes sense to generalize the use
of "*" in both fl and facet.field into a true glob style syntax

One issue is that fl=XXX is typically a field list separated with "," or
"|", facet.field expects each field as a separate parameter.




> Allow facet.field=* to facet on all fields (without knowing what they
are)
>

--
>
> Key: SOLR-247
> URL: https://issues.apache.org/jira/browse/SOLR-247
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-247-FacetAllFields.patch
>
>
> I don't know if this is a good idea to include -- it is potentially a
bad idea to use it, but that can be ok.
> This came out of trying to use faceting for the LukeRequestHandler top
term collecting.
> http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-176) Add detailed timing data to query response output

2007-05-17 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-176:
--

Attachment: RequesthandlerBase.patch

added some average stats to RequestHandlerBase.  all of the same info can be 
obtained by parsing the log files but having it show up on the admin screens 
and jmx is simple and nice to have.  stats added: avgTimePerRequest and 
avgRequestsPerSecond.

> Add detailed timing data to query response output
> -
>
> Key: SOLR-176
> URL: https://issues.apache.org/jira/browse/SOLR-176
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.2
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
>Priority: Minor
> Fix For: 1.2
>
> Attachments: dtiming.patch, dtiming.patch, RequesthandlerBase.patch
>
>
> see 
> http://www.nabble.com/%27accumulate%27-copyField-for-faceting-tf3329986.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-05-15 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-240:
--

Attachment: IndexWriter2.patch

the attached patch adds a param to SolrIndexConfig called useNativeLocks.  the 
default is false which will keeps with the existing design using 
SimpleFSLockFactory.  if people think we should allow fully pluggable locking 
mechanisms i'm game but i wasn't quite sure how to tackle that problem.  

as for testing, i wasn't quite sure how to run tests to ensure that the locks 
were working beyond some basic println's (which passed).  if anyone has good 
ideas i'm all ears.

- will


> java.io.IOException: Lock obtain timed out: SimpleFSLock
> 
>
> Key: SOLR-240
> URL: https://issues.apache.org/jira/browse/SOLR-240
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.2
>     Environment: windows xp
>Reporter: Will Johnson
> Attachments: IndexWriter.patch, IndexWriter2.patch, stacktrace.txt, 
> ThrashIndex.java
>
>
> when running the soon to be attached sample application against solr it will 
> eventually die.  this same error has happened on both windows and rh4 linux.  
> the app is just submitting docs with an id in batches of 10, performing a 
> commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: (solr 240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-05-15 Thread Will Johnson
On my XP laptop it takes a couple minutes, on the Linux server it took 2
days.

- will

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, May 15, 2007 4:57 PM
To: solr-dev@lucene.apache.org
Subject: Re: (solr 240) java.io.IOException: Lock obtain timed out:
SimpleFSLock

I've been running this for for an hour so far... how long does it
normally take you to get an exception?


RE: [jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-05-15 Thread Will Johnson
True, but the javadocs for the Standard Lock's implementation classes
also say they don't work:

http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html

Further, NFS locking is also clearly stated to not work in the
SimpleFSLockFactory:

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or
g/apache/lucene/store/SimpleFSLockFactory.html

So it appears we're in between a lock and a hard place...  (oh the 80's
sitcom humor)

Adding a config parameter sounds good too but the new patch is no worse
than what exists in terms of javadoc warnings and has been shown to
actually fix what I would imagine is a rather standard configuration
(local disk xp/rh)

- will

 

-Original Message-
From: Hoss Man (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 15, 2007 4:27 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-240) java.io.IOException: Lock obtain
timed out: SimpleFSLock


[
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12496115 ] 

Hoss Man commented on SOLR-240:
---

the idea of using different lock implementations has come up in the
past, 

http://www.nabble.com/switch-to-native-locks-by-default--tf2967095.html

one reason not to hardcode native locks was because not all file systems
support it -- so we left in the usage of SimpleFSLock because it's the
most generally reusable.

rather then change from one hardcoded lock type to another hardcoded
lock type, we should support a config option for making the choice ...
perhaps even adding a SolrLockFactory that defines an init(NamedList)
method and creating simple SOlr sucbclasses of all the core Lucene
LockFactor Imples so it's easy for people to write their own if they
want (and we don't just have "if (lockType.equlas("simple"))..." type
config parsing.

> java.io.IOException: Lock obtain timed out: SimpleFSLock
> 
>
> Key: SOLR-240
> URL: https://issues.apache.org/jira/browse/SOLR-240
> Project: Solr
>  Issue Type: Bug
>  Components: update
>    Affects Versions: 1.2
> Environment: windows xp
>Reporter: Will Johnson
> Attachments: IndexWriter.patch, stacktrace.txt,
ThrashIndex.java
>
>
> when running the soon to be attached sample application against solr
it will eventually die.  this same error has happened on both windows
and rh4 linux.  the app is just submitting docs with an id in batches of
10, performing a commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-05-15 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-240:
--

Attachment: IndexWriter.patch

I found this:

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or
g/apache/lucene/store/NativeFSLockFactory.html

And so I made the attached patch which seems to run at least 100x longer
than without.

- will







> java.io.IOException: Lock obtain timed out: SimpleFSLock
> 
>
> Key: SOLR-240
> URL: https://issues.apache.org/jira/browse/SOLR-240
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.2
> Environment: windows xp
>    Reporter: Will Johnson
> Attachments: IndexWriter.patch, stacktrace.txt, ThrashIndex.java
>
>
> when running the soon to be attached sample application against solr it will 
> eventually die.  this same error has happened on both windows and rh4 linux.  
> the app is just submitting docs with an id in batches of 10, performing a 
> commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-05-15 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-240:
--

Attachment: stacktrace.txt
ThrashIndex.java

> java.io.IOException: Lock obtain timed out: SimpleFSLock
> 
>
> Key: SOLR-240
> URL: https://issues.apache.org/jira/browse/SOLR-240
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.2
> Environment: windows xp
>    Reporter: Will Johnson
> Attachments: stacktrace.txt, ThrashIndex.java
>
>
> when running the soon to be attached sample application against solr it will 
> eventually die.  this same error has happened on both windows and rh4 linux.  
> the app is just submitting docs with an id in batches of 10, performing a 
> commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-05-15 Thread Will Johnson (JIRA)
java.io.IOException: Lock obtain timed out: SimpleFSLock


 Key: SOLR-240
 URL: https://issues.apache.org/jira/browse/SOLR-240
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.2
 Environment: windows xp
Reporter: Will Johnson


when running the soon to be attached sample application against solr it will 
eventually die.  this same error has happened on both windows and rh4 linux.  
the app is just submitting docs with an id in batches of 10, performing a 
commit then repeating over and over again.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-15 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-239:
--

Attachment: IndexSchemaStream2.patch

patch updated.  now with the added benefit of compiling.

> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-14 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-239:
--

Attachment: IndexSchemaStream.patch

patch with test cases attached.  i also had to change raw-schema.jsp to be a 
redirect to get-files.jsp however it wasn't clear that raw-schema.jsp was in 
use anymore.

> Read IndexSchema from InputStream instead of Config file
> 
>
> Key: SOLR-239
> URL: https://issues.apache.org/jira/browse/SOLR-239
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2
> Environment: all
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: IndexSchemaStream.patch
>
>
> Soon to follow patch adds a constructor to IndexSchema to allow them to be 
> created directly from InputStreams.  The overall logic for the Core's use of 
> the IndexSchema creation/use does not change however this allows java clients 
> like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
> parsed, the client can inspect an index's capabilities which is useful for 
> building generic search UI's.  ie provide a drop down list of fields to 
> search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-14 Thread Will Johnson (JIRA)
Read IndexSchema from InputStream instead of Config file


 Key: SOLR-239
 URL: https://issues.apache.org/jira/browse/SOLR-239
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.2
 Environment: all
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.2


Soon to follow patch adds a constructor to IndexSchema to allow them to be 
created directly from InputStreams.  The overall logic for the Core's use of 
the IndexSchema creation/use does not change however this allows java clients 
like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
parsed, the client can inspect an index's capabilities which is useful for 
building generic search UI's.  ie provide a drop down list of fields to 
search/sort by.  



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Updated: (SOLR-217) schema option to ignore unused fields

2007-05-07 Thread Will Johnson
Any update on this?  I'm one little * away from having a clean
build/test.

- will

-Original Message-
From: Hoss Man (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 01, 2007 7:42 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Updated: (SOLR-217) schema option to ignore unused
fields


 [
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p
lugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-217:
--

Attachment: ignoreUnnamedFields_v3.patch

added a simple test to the existing patch.

one thing to note is that this will result in the field being "ignored"
if you try to query on it as well ... but this is a more general problem
of qhat to do when people try to query on an unindexed field (see
SOLR-223)

will commit in a day or so barring objections

> schema option to ignore unused fields
> -
>
> Key: SOLR-217
> URL: https://issues.apache.org/jira/browse/SOLR-217
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: ignoreNonIndexedNonStoredField.patch,
ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch,
ignoreUnnamedFields_v3.patch
>
>
> One thing that causes problems for me (and i assume others) is that
Solr is schema-strict in that unknown fields cause solr to throw
exceptions and there is no way to relax this constraint.  this can cause
all sorts of serious problems if you have automated feeding applications
that do things like SELECT * FROM table1 or where you want to add other
fields to the document for processing purposes before sending them to
solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-86) [PATCH] standalone updater cli based on httpClient

2007-05-04 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493784
 ] 

Will Johnson commented on SOLR-86:
--

has anyone brought up the idea of creating post.bat and post.sh scripts that 
use this java class instead of the curl example that currently ships in 
example/exampledocs?  it would be one less thing for people to figure out and 
possibly screw up. 

> [PATCH]  standalone updater cli based on httpClient
> ---
>
> Key: SOLR-86
> URL: https://issues.apache.org/jira/browse/SOLR-86
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Thorsten Scherler
> Assigned To: Erik Hatcher
> Attachments: simple-post-tool-2007-02-15.patch, 
> simple-post-tool-2007-02-16.patch, 
> simple-post-using-urlconnection-approach.patch, solr-86.diff, solr-86.diff
>
>
> We need a cross platform replacement for the post.sh. 
> The attached code is a direct replacement of the post.sh since it is actually 
> doing the same exact thing.
> In the future one can extend the CLI with other feature like auto commit, 
> etc.. 
> Right now the code assumes that SOLR-85 is applied since we using the servlet 
> of this issue to actually do the update.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-20) A simple Java client for updating and searching

2007-05-01 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492902
 ] 

Will Johnson commented on SOLR-20:
--

the new api's work great, thanks!  what's the plan for this going forward?  id' 
like to start doing some work on this as it's rather critical to my current 
project and an are i've dealt with a lot in the past.  assuming it's not 
getting dumped into org.apache.* land any time soon are you accepting patches 
to this code?  if so i have some modifications to the api's that i think will 
make them easier to use (such as a method to set FacetParams on SolrQuery) and 
i'll even flush out the SolrServerTest for fun.  

also, i noticed that all the methods on SolrServer throw undeclared 
SolrExceptions which extends RuntimeException when things so south.  should 
those throw some other sort of non-ignorable exception like a new 
SolrServerException?  while it made coding/compiling easier to leave out all 
the usually required try's and catches it made running/debugging much less 
enjoyable.

- will

> A simple Java client for updating and searching
> ---
>
> Key: SOLR-20
> URL: https://issues.apache.org/jira/browse/SOLR-20
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
> Environment: all
>Reporter: Darren Erik Vengroff
>Priority: Minor
> Attachments: DocumentManagerClient.java, DocumentManagerClient.java, 
> solr-client-java-2.zip.zip, solr-client-java.zip, solr-client-sources.jar, 
> solr-client.zip, solr-client.zip, solr-client.zip, 
> solrclient_addqueryfacet.zip, SolrClientException.java, 
> SolrServerException.java
>
>
> I wrote a simple little client class that can connect to a Solr server and 
> issue add, delete, commit and optimize commands using Java methods.  I'm 
> posting here for review and comments as suggested by Yonik.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-20) A simple Java client for updating and searching

2007-04-30 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492700
 ] 

Will Johnson commented on SOLR-20:
--

the trunk version at http://solrstuff.org/svn/solrj/  seems to be missing a 
dependency and a copy of SolrParams.  ant returns

compile:
[javac] Compiling 40 source files to C:\data\workspace\solrj\bin
[javac] 
C:\data\workspace\solrj\src\org\apache\solr\client\solrj\impl\XMLResponseParser.java:10:
 package javax.xml.stream does not exist
[javac] import javax.xml.stream.XMLInputFactory;



[javac] 
C:\data\workspace\solrj\src\org\apache\solr\client\solrj\query\SolrQuery.java:10:
 cannot find symbol
[javac] symbol  : class SolrParams
[javac] location: package org.apache.solr.util
[javac] import org.apache.solr.util.SolrParams;

> A simple Java client for updating and searching
> ---
>
> Key: SOLR-20
> URL: https://issues.apache.org/jira/browse/SOLR-20
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
> Environment: all
>Reporter: Darren Erik Vengroff
>Priority: Minor
> Attachments: DocumentManagerClient.java, DocumentManagerClient.java, 
> solr-client-java-2.zip.zip, solr-client-java.zip, solr-client-sources.jar, 
> solr-client.zip, solr-client.zip, solr-client.zip, 
> solrclient_addqueryfacet.zip, SolrClientException.java, 
> SolrServerException.java
>
>
> I wrote a simple little client class that can connect to a Solr server and 
> issue add, delete, commit and optimize commands using Java methods.  I'm 
> posting here for review and comments as suggested by Yonik.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-217) schema option to ignore unused fields

2007-04-30 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-217:
--

Attachment: ignoreUnnamedFields_v3.patch

v3 patch included.  this version of the patch also takes into account the 
suggested example/solr/conf/schema.xml changes.  

> schema option to ignore unused fields
> -
>
> Key: SOLR-217
> URL: https://issues.apache.org/jira/browse/SOLR-217
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: ignoreNonIndexedNonStoredField.patch, 
> ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is 
> schema-strict in that unknown fields cause solr to throw exceptions and there 
> is no way to relax this constraint.  this can cause all sorts of serious 
> problems if you have automated feeding applications that do things like 
> SELECT * FROM table1 or where you want to add other fields to the document 
> for processing purposes before sending them to solr but don't want to deal 
> with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-217) schema option to ignore unused fields

2007-04-30 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492664
 ] 

Will Johnson commented on SOLR-217:
---

since we now have required fields 
(http://issues.apache.org/jira/browse/SOLR-181) any chance we can have ignored 
fields as well?  let me know if something else needs to be done to the patch 
but as far as i can tell the code works and people seem to agree that it's the 
correct approach.

- will

> schema option to ignore unused fields
> -
>
> Key: SOLR-217
> URL: https://issues.apache.org/jira/browse/SOLR-217
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: ignoreNonIndexedNonStoredField.patch, 
> ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is 
> schema-strict in that unknown fields cause solr to throw exceptions and there 
> is no way to relax this constraint.  this can cause all sorts of serious 
> problems if you have automated feeding applications that do things like 
> SELECT * FROM table1 or where you want to add other fields to the document 
> for processing purposes before sending them to solr but don't want to deal 
> with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR-217) schema option to ignore unused fields

2007-04-27 Thread Will Johnson
I agree, the default schema should preserve the strictness of the
existing core as it's already helped me figure out more than a few
problems.  Having the documented option to bypass that error is also
nice.  

Fyi:  the second patch does include a log.finest() message about
ignoring the field.  I wasn't sure what level would be appropriate but
that was the same level used in the rest of the class.

- will

-Original Message-
From: J.J. Larrea (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 2:54 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-217) schema option to ignore unused
fields


[
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12492369 ] 

J.J. Larrea commented on SOLR-217:
--

+1 to Hoss' elaboration of Yonik's suggested approach, except for
reverse-compatibility (where we DO want an error for unknown fields)
schema.xml should probably read something like:

   
   
   ...
   
   


> schema option to ignore unused fields
> -
>
> Key: SOLR-217
> URL: https://issues.apache.org/jira/browse/SOLR-217
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>    Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: ignoreNonIndexedNonStoredField.patch,
ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that
Solr is schema-strict in that unknown fields cause solr to throw
exceptions and there is no way to relax this constraint.  this can cause
all sorts of serious problems if you have automated feeding applications
that do things like SELECT * FROM table1 or where you want to add other
fields to the document for processing purposes before sending them to
solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-217) schema option to ignore unused fields

2007-04-27 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-217:
--

Attachment: ignoreNonIndexedNonStoredField.patch

I like that solution and I can definitely see the advantages of having
dumb_*=ignored and so on.  How does this patch sound instead of the
previous:


public Field createField(SchemaField field, String externalVal, float
boost) {
String val;
try {
  val = toInternal(externalVal);
} catch (NumberFormatException e) {
  throw new SolrException(500, "Error while creating field '" +
field + "' from value '" + externalVal + "'", e, false);
}
if (val==null) return null;
if (!field.indexed() && !field.stored()) {
log.finest("Ignoring unindexed/unstored field: " + field);
return null;
}

... blah blah blah


- will






> schema option to ignore unused fields
> -
>
> Key: SOLR-217
> URL: https://issues.apache.org/jira/browse/SOLR-217
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: ignoreNonIndexedNonStoredField.patch, 
> ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is 
> schema-strict in that unknown fields cause solr to throw exceptions and there 
> is no way to relax this constraint.  this can cause all sorts of serious 
> problems if you have automated feeding applications that do things like 
> SELECT * FROM table1 or where you want to add other fields to the document 
> for processing purposes before sending them to solr but don't want to deal 
> with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR-217) schema option to ignore unused fields

2007-04-27 Thread Will Johnson
So are you proposing that the DocumentBuilder check those properties on
the field before it adds the field or do we need to add checks
everywhere else to make sure nothing happens?  

I'm happy to make either change and resubmit the patch.

- will

-Original Message-
From: Erik Hatcher (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 12:11 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-217) schema option to ignore unused
fields


[
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12492332 ] 

Erik Hatcher commented on SOLR-217:
---

I like Yonik's suggestion of allowing unstored+unindexed fields to be
no-op.

> schema option to ignore unused fields
> -
>
> Key: SOLR-217
> URL: https://issues.apache.org/jira/browse/SOLR-217
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that
Solr is schema-strict in that unknown fields cause solr to throw
exceptions and there is no way to relax this constraint.  this can cause
all sorts of serious problems if you have automated feeding applications
that do things like SELECT * FROM table1 or where you want to add other
fields to the document for processing purposes before sending them to
solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-217) schema option to ignore unused fields

2007-04-27 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492326
 ] 

Will Johnson commented on SOLR-217:
---

i was actually taking this requirement from the other enterprise search
engines that i've worked with that do this by default.  ie, solr is
different in this case.  your *->nothing method sounds good as well but it
doesn't seem as obvious to someone reading the schema or trying to feed
data.  you might also run into problems later on when there are other
properties for 'things to do' for fields other than indexing and searching.

- will




> schema option to ignore unused fields
> -
>
> Key: SOLR-217
> URL: https://issues.apache.org/jira/browse/SOLR-217
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>    Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is 
> schema-strict in that unknown fields cause solr to throw exceptions and there 
> is no way to relax this constraint.  this can cause all sorts of serious 
> problems if you have automated feeding applications that do things like 
> SELECT * FROM table1 or where you want to add other fields to the document 
> for processing purposes before sending them to solr but don't want to deal 
> with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-217) schema option to ignore unused fields

2007-04-27 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-217:
--

Attachment: ignoreUnnamedFields.patch

the attached patch solve this problme by adding a new option to schema.xml to 
allow unnamed fields including those that don't match dynamic fields to be 
ignored.  the default is false if the attribute is missing which is consistent 
with existing SOLR functionality.  if you want to enable this feature the 
schema.xml would look like:

  blah blah blah ...

  blah blah blah ...

> schema option to ignore unused fields
> -
>
> Key: SOLR-217
> URL: https://issues.apache.org/jira/browse/SOLR-217
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.2
>
> Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is 
> schema-strict in that unknown fields cause solr to throw exceptions and there 
> is no way to relax this constraint.  this can cause all sorts of serious 
> problems if you have automated feeding applications that do things like 
> SELECT * FROM table1 or where you want to add other fields to the document 
> for processing purposes before sending them to solr but don't want to deal 
> with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-217) schema option to ignore unused fields

2007-04-27 Thread Will Johnson (JIRA)
schema option to ignore unused fields
-

 Key: SOLR-217
 URL: https://issues.apache.org/jira/browse/SOLR-217
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.2
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.2
 Attachments: ignoreUnnamedFields.patch

One thing that causes problems for me (and i assume others) is that Solr is 
schema-strict in that unknown fields cause solr to throw exceptions and there 
is no way to relax this constraint.  this can cause all sorts of serious 
problems if you have automated feeding applications that do things like SELECT 
* FROM table1 or where you want to add other fields to the document for 
processing purposes before sending them to solr but don't want to deal with 
'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >