RE: logging revisited...
To a certain extent SLF4j makes this decision a fairly small one, namely what API do you want to code to inside SOLR and what jars do you want to ship as a part of the distribution. It doesn't really matter if you pick commons-logging, log4j or slf4j; all have drop in replacements via SLF4j. They also have one for java.util.logging however it requires custom code to activate since you can't replace java.* classes. End users get to do pretty much whatever they want as far as logging goes if you use SLF4j. SLF4j has also updated their 'legacy' page since the last time I looked which was the ~last time this came up: http://www.slf4j.org/legacy.html We choose to code against slf4j APIs as it seemed like it was where things were going (including solr) and gave us and our customers the ability to switch to something else with minimal effort. We also ship log4j+config jars by default because it had the richest config/appender set at the time however the logback project seems like it might be catching up. (good thing we can switch with no code changes) - will -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2008 4:44 PM To: solr-dev@lucene.apache.org Subject: logging revisited... While I'm on a roll tossing stuff out there Since SOLR-560, solr depends on SLF4j as the logging interface. However since we also depend on HttpClient we *also* depend on commons- logging. This is a strange. Our maven artifacts now depend on two logging frameworks! However the good folks at SLF4j have a nice solution -- a drop in replacement for commons-logging that uses slf4j. HttpClient discussed switching to SLF4j for version 4. They decided not to because the slfj4 drop-in replacement gives their users even more options. In Droids we had the same discussion, and now use commons-logging API. So, with that in mind I think we should consider using the commons- logging API and shipping the .war file with the slf4j drop in replacement. The behavior will be identical and their will be one fewer libraries. The loss is the potential to use some of slf4j's more advanced logging features, but I don't see us taking advantage of that anyway. ryan
[jira] Updated: (SOLR-560) Convert JDK logging to SLF4J
[ https://issues.apache.org/jira/browse/SOLR-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-560: -- Attachment: SOLR-560-slf4j.patch patch updated for the latest trunk. i also tested that it works with slf4j redirecting to log4j. > Convert JDK logging to SLF4J > > > Key: SOLR-560 > URL: https://issues.apache.org/jira/browse/SOLR-560 > Project: Solr > Issue Type: Wish >Reporter: Ryan McKinley > Fix For: 1.3 > > Attachments: slf4j-api-1.5.0.jar, slf4j-jdk14-1.5.0.jar, > SOLR-560-slf4j.patch, SOLR-560-slf4j.patch, SOLR-560-slf4j.patch > > > After lots of discussion, we should consider using SLF4j to enable more > flexibility in logging configuration. > See: > http://www.nabble.com/Solr-Logging-td16836646.html > http://www.nabble.com/logging-through-log4j-td13747253.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Solr Logging
A little late to the email party but... [ ] Keep solr logging as is. (JUL) [ X ] Convert solr logging to SLF4J And SOLR-560 looks good too. - will
RE: Solr Logging
>If you mean "i have to write code to create a logging implementation" then >yes ... that is true ... someone, somewhere, has to write an ?implementation of the JDK Logging API in order for you to use that >implentation -- and if you don't like any of the other implentations out >there, then you might have to write your own. :) Correct, but there are a number of already existing frameworks out there that already do all of this for you, most of them even let you pick your underlying logger so if you've already written a fancy JUL "rotating, parsing and email me when things get bad" handler then you can still use it. I do agree that commons logging is a bit 'off' to say the least and many projects including Jetty6 are moving to SLF4j. Also, if solr as an application is using Solrj under the hood for federation it would seem that solr is already using 2 different logging mechanisms. For consistency sake we should consolidate on one single configuration mechanism. It would seem that one of the following would make sense: * change solr4j to be JUL based. I think you already said would be bad since it's a library and should not impose logging choices * change solr to be commons logging based. I agree it's a bit awkward with all the classloading but it is a ~standard to a large extent * change both to be 'framework XYZ' based. Fyi: slf4j already has a creepy little migrator tool that might be of use. In the end, I already have my shim that does the necessary translation but it's nowhere near a general solution that the log4j community could benefit from. As long as things are consistent and easy to configure to get standard logging functionality I'm happy. - will Not to pimp out slf4j too much but the base implementation is only ~22k or about the same size as commons-csv which is also a dependency.
RE: Solr Logging
(putting on flame suit) I'd be in favor seeing is how I spent a good bit of time 2 months ago writing JUL handlers and log managers to forward log messages to our logging framework (log4j). Pretty much any alternative (Commons, Log4j, SLF4J) is better since all of them allow you to _configure_ your underlying implementation (including JUL if that's what you're into). JUL on the other hand ~requires you to write code to switch logging implementations or even do basic things like rotate log files. SLF4J seems especially slim and nice these days but really anything is better than JUL. If others are really serious about it, I'd be happy to help the cause. It should be a fairly quick refactor and we could leave the default configured logger as JUL via whatever framework we end up going with - will -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 22, 2008 11:48 AM To: solr-dev@lucene.apache.org Subject: Solr Logging Anyone have good tips on working w/ java.util.logging (JUL)? For one, the configuration seems to be per JVM, which isn't all that useful in a webapp environment. http://www.crazysquirrel.com/computing/java/logging.jspx has some tips for Tomcat, but I am using Jetty. Not too mention, it seems, that if one wants to implement their own Handler, they have to somehow figure out how to get it in the right classloader, since the JVM classloader can't seem to find it if it is packaged in a WAR. I know logging is sometimes a religious debate, but would others consider a patch that switched Solr to use log4j? Or, commons- logging? I just don't think JUL is up to snuff when it comes to logging. It's a PITA to configure, is not flexible, doesn't play nice with other logging systems and, all in all, just seems like crappy design by committee where the lowest common denominator won out. The switch is quite painless, and the former offers a lot more flexibility, while the latter allows one to plugin whatever they see fit. I will work up a patch so people can at least see the options. Cheers, Grant
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570319#action_12570319 ] Will Johnson commented on SOLR-342: --- the new solr with the new lucene did the trick. i was made the mistake of using the 2.3 tag instead of the branch before which was why i still saw the problem. > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569408#action_12569408 ] Will Johnson commented on SOLR-342: --- i switched to the lucene 2.3 branch, updated (and confirmed that yonik's 1 line change was in place), reran the tests and still saw the same problem. if i missed something please let me know. > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567508#action_12567508 ] Will Johnson commented on SOLR-342: --- we are doing multi-threaded indexing and searching while indexing however the 'bad' results come back after the indexing run is finished and the index itself is static. > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567235#action_12567235 ] Will Johnson commented on SOLR-342: --- we're using SolrCore in terms of: core = new SolrCore("foo", dataDir, solrConfig, solrSchema); UpdateHandler handler = core.getUpdateHandler(); updateHandler.addDoc(command); which is a bit more low level than normal however when we flipped back to solr trunk + lucene 2.3 everything was fine so it leads me to belive that we are ok in that respect. i was going to try and reproduce with lucene directly also but that too is a bit outside the scope of what i have time for at the moment. and we're not getting any exceptions, just bad search results. > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567218#action_12567218 ] Will Johnson commented on SOLR-342: --- we're not using parallel reader but we are using direct core access instead of going over http. as for doc size, we're indexing wikipedia but creating anumber of extra fields. they are only large in comparison to any of the 'large volume' tests i've seen in most of the solr and lucene tests. - will > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567198#action_12567198 ] Will Johnson commented on SOLR-342: --- we have: 10 64 2147483647 and i'm working on a unit test but just adding a few terms per doc doesnt seem to trigger it, at least not 'quickly.' > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567147#action_12567147 ] Will Johnson commented on SOLR-342: --- patched solr + lucene trunk is stil broken. if anyone has any pointers for ways to coax this problem to happen before we get 20-30k large docs in the system let me know and we can start working on a unit test, otherwise it's going to take a while to reproduce anything. > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567099#action_12567099 ] Will Johnson commented on SOLR-342: --- I think we're running into a very serious issue with trunk + this patch. either the document summaries are not matched or the overall matching is 'wrong'. i did find this in the lucene jira: LUCENE-994 "Note that these changes will break users of ParallelReader because the parallel indices will no longer have matching docIDs. Such users need to switch IndexWriter back to flushing by doc count, and switch the MergePolicy back to LogDocMergePolicy. It's likely also necessary to switch the MergeScheduler back to SerialMergeScheduler to ensure deterministic docID assignment." we're seeing rather consistent bad results but only after 20-30k documents and multiple commits and wondering if anyone else is seeing anything. i've verified that the results are bad even though luke which would seem to remove the search side of hte solr equation. the basic test case is to search for title:foo and get back documents that only have title:bar. we're going to start on a unit test but give the document counts and the corpus we're testing against it may be a while so i thought i'd ask to see if anyone had any hints. removing this patch seems to remove the issue so i doesn't appear to be a lucene problem > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
XmlUpdateRequestHandler bad documents mid batch aborts rest of batch Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.3 Reporter: Will Johnson Has anyone run into the problem of handling bad documents / failures mid batch. Ie: 1 2 I_AM_A_BAD_DATE 3 Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Resource contention problem in Solrj
Fyi: the CommonsHttpSolrServer already has method to do all of those things: /** set connectionTimeout on the underlying MultiThreadedHttpConnectionManager */ public void setConnectionTimeout(int timeout) { _connectionManager.getParams().setConnectionTimeout(timeout); } /** set maxConnectionsPerHost on the underlying MultiThreadedHttpConnectionManager */ public void setDefaultMaxConnectionsPerHost(int connections) { _connectionManager.getParams().setDefaultMaxConnectionsPerHost(connections); } /** set maxTotalConnection on the underlying MultiThreadedHttpConnectionManager */ public void setMaxTotalConnections(int connections) { _connectionManager.getParams().setMaxTotalConnections(connections); } You can also get the underlying connection factory if you want to do other crazier stuff. public MultiThreadedHttpConnectionManager getConnectionManager() { return _connectionManager; } - will -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, December 17, 2007 9:18 PM To: solr-dev@lucene.apache.org Subject: Re: Resource contention problem in Solrj Excellent! Thanks for diagnosing this! -Yonik On Dec 17, 2007 9:00 PM, climbingrose <[EMAIL PROTECTED]> wrote: > There seems to be resource contention problem with Solrj under load. To > reproduce the problem: set up a sample webapp with solrj connect to a HTTP > Solr instance and hammer the webapp with Apache ab (say 10 concurrent > connection with 100 requests). You'll notice that the webapp's servlet > container quickly consumes 100% CPU and stays there unless you restart it. I > can confirm that this happens with both Tomcat and Jetty. Meanwhile, the > server that Solr is deployed on seems to be running fine. > > From this observation, I suspect that Solrj has connection contention > problem. And this seems to be the case if you look at CommonHttpSolrServer. > This class uses MultiThreadedHttpConnectionManager which has > maxConnectionsPerHost set to 2 by default. When the number of thread > increases, this is obviously not enough and leads to connection contention > problem. I quickly solve problem by adding another constructor to > CommonHttpSolrServer that allows setting maxConnectionsPerHost and > maxTotalConnections: > > public CommonsHttpSolrServer(int maxConsPerHost, int maxTotalCons, String > solrServerUrl) throws MalformedURLException { > this(solrServerUrl); > this.maxConsPerHost = maxConsPerHost; > this.maxTotalCons = maxTotalCons; > HttpConnectionManagerParams params = new HttpConnectionManagerParams(); > params.setDefaultMaxConnectionsPerHost(maxConsPerHost); > params.setMaxTotalConnections(maxTotalCons); > _connectionManager.setParams(params); > } > > Hope this information would help others. > > -- > Regards, > > Cuong Hoang >
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547925 ] Will Johnson commented on SOLR-342: --- is there any update on getting this patch committed? we needed to be able to set some of the buffer sizes so the script wouldn't help. have other people experienced tourbles with 2.3 and/or this patch that i should be wary of? > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547854 ] Will Johnson commented on SOLR-342: --- just a comment to say that we added this patch and saw rather signifigant improvements, on the order of 10-25% for different index tests. > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > --- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: SOLR-342.patch, SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-421) Make SolrParams serializable
[ https://issues.apache.org/jira/browse/SOLR-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547853 ] Will Johnson commented on SOLR-421: --- i also added 'implements java.io.Serializable' to: SolrRequest SolrInputField SolrInputDocument i'd generate a patch but my tree is so heavily patched for SOLR-342 (which rocks by the way) and i'm hesitatnt to try anything too ambitious this morning. > Make SolrParams serializable > > > Key: SOLR-421 > URL: https://issues.apache.org/jira/browse/SOLR-421 > Project: Solr > Issue Type: Improvement >Reporter: Grant Ingersoll >Priority: Trivial > > Making SolrParams serializable will allow it to be sent over RMI or used in > other tools that require serialization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-421) Make SolrParams serializable
[ https://issues.apache.org/jira/browse/SOLR-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546249 ] Will Johnson commented on SOLR-421: --- it would also be good to make the same changes to all of the solrj library classes as well. i know they are meant to be sent over http with solr specific marshaling, but we ended up needing to send some solrj request objects to another box via RMI and it was a bit of a pain. > Make SolrParams serializable > > > Key: SOLR-421 > URL: https://issues.apache.org/jira/browse/SOLR-421 > Project: Solr > Issue Type: Improvement >Reporter: Grant Ingersoll >Priority: Trivial > > Making SolrParams serializable will allow it to be sent over RMI or used in > other tools that require serialization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: schema query
Check out luke: http://wiki.apache.org/solr/LukeRequestHandler - will -Original Message- From: S DALAL [mailto:[EMAIL PROTECTED] Sent: Monday, November 05, 2007 7:29 AM To: solr-dev@lucene.apache.org Subject: schema query Hi, Is there a way to query for the schema or the field properties ? To give a overview, i want to plug Solr to a web crawler and the index the pages crawled. So, while indexing the crawler needs to know about the fields to create the document. One way, i can think of is to scrape the http://localhost:9696/solr/admin/get-file.jsp?file=schema.xml page, is there a existing better way ? thanks and regards dalal
RE: CommonsHttpSolrServer and multithread
You can also get a hold of the underlying MultiThreadedHttpConnectionManager if you want to tweak the configuration further: public class CommonsHttpSolrServer { . public MultiThreadedHttpConnectionManager getConnectionManager() } - will -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Thursday, October 18, 2007 12:08 PM To: solr-dev@lucene.apache.org Subject: Re: CommonsHttpSolrServer and multithread > > but Is CommonsHttpSolrServer thread-safe? > It better be! To the best of my knowledge, it is. If you have any troubles with it, we need to fix them. the underlying connections are thread safe: http://jakarta.apache.org/httpcomponents/httpclient-3.x/threading.html we use MultiThreadedHttpConnectionManager ryan
Re: svn commit: r577427 - in /lucene/solr/trunk/client/java/solrj/test/org/apache/solr/client/solrj: LargeVolumeTestBase.java embedded/LargeVolumeEmbeddedTest.java embedded/LargeVolumeJettyTest.java
Even if we used a dependency management tool, the junit/ant integration still requires that developers have the ant-junit bindings (aka: ant-junit.jar) in the class path when the build.xml is parsed. supposedly you can explicitly declare the junit tasks with your own taskdef and identify the location of the jars yourself) but the jars still have to exist when that taskdef is evaluated -- which makes it hard to then pull those jars as part of a target. Everybody i've ever talked to who i felt confident knew more about ant then me (with Erik at teh top of the list) has said the same thing: "Put junit and ant-junit in your ANT_LIB ... don't even try to do anything else, it will just burn you." we do the following: and so on it works nicely for all the main targets (compile, test, etc). i also just verified that the same method works in the solr build file. it guarantees that everyone is running the exact same version of junit and doesn't require any extra steps for developers to be able to build/test code. there are lots of other ways to do this including a custom task def but the above method is pretty straightforward and ~vanilla ant. - will
[jira] Updated: (SOLR-360) Multithread update client causes exceptions and dropped documents
[ https://issues.apache.org/jira/browse/SOLR-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-360: -- Attachment: TestJettyLargeVolume.java i'll work on the patch to make it cleaner and run with the build process but i wanted to get this up as soon as possible. if you drop it into /client/java/solrj/test/org/apache/solr/client/solrj/embedded it compiles/runs with eclipse. > Multithread update client causes exceptions and dropped documents > - > > Key: SOLR-360 > URL: https://issues.apache.org/jira/browse/SOLR-360 > Project: Solr > Issue Type: Bug > Components: update >Affects Versions: 1.3 > Environment: test fails on both pc + mac, tomcat + jetty all java 1.6 >Reporter: Will Johnson > Attachments: TestJettyLargeVolume.java > > > we were doing some performance testing for the updating aspects of solr and > ran into what seems to be a large problem. we're creating small documents > with an id and one field of 1 term only submitting them in batches of 200 > with commits every 5000 docs. when we run the client with 1 thread > everything is fine. when we run it win >1 threads things go south (stack > trace is below). i've attached the junit test which shows the problem. this > happens on both a mac and a pc and when running solr in both jetty and > tomcat. i'll create a junit issue if necessary but i thought i'd see if > anyone else had run into this problem first. > also, the problem does not seem to surface under solr1.2 > (RyanM suggested it might be related to SOLR-215) > (output from junit test) > Started thread: 0 > Started thread: 1 > org.apache.solr.common.SolrException: > Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje > Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecur
[jira] Created: (SOLR-360) Multithread update client causes exceptions and dropped documents
Multithread update client causes exceptions and dropped documents - Key: SOLR-360 URL: https://issues.apache.org/jira/browse/SOLR-360 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.3 Environment: test fails on both pc + mac, tomcat + jetty all java 1.6 Reporter: Will Johnson we were doing some performance testing for the updating aspects of solr and ran into what seems to be a large problem. we're creating small documents with an id and one field of 1 term only submitting them in batches of 200 with commits every 5000 docs. when we run the client with 1 thread everything is fine. when we run it win >1 threads things go south (stack trace is below). i've attached the junit test which shows the problem. this happens on both a mac and a pc and when running solr in both jetty and tomcat. i'll create a junit issue if necessary but i thought i'd see if anyone else had run into this problem first. also, the problem does not seem to surface under solr1.2 (RyanM suggested it might be related to SOLR-215) (output from junit test) Started thread: 0 Started thread: 1 org.apache.solr.common.SolrException: Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserj
Re: boosting a query by a function of other fields
I haven't yet looked at SOLR-192 to see how it is done there, though. -Mike it no where near perfect but it did at least pass some unit tests. my immediate need to have that bit of functionality has lessened but i'd be happy to keep working and testing if anyone has comments on the patch. it does (as the ticket states) depend on a lucene patch at the moment to get field names etc but that could probably be removed if necessary. - will
[jira] Updated: (SOLR-192) Move FunctionQuery to Lucene
[ https://issues.apache.org/jira/browse/SOLR-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-192: -- Attachment: SOLR-192-functionQueries.patch patch attached that uses the functionality from lucene instead of solr. there were some changes in the api in the solr->lucene transition so there was one api change to a private static method in solr.search.QueryParsing. this patch also relies on LUCENE-989 (http://issues.apache.org/jira/browse/LUCENE-989) to get access to field names. a future patch could then get access to the statistics for exposing in results. - will > Move FunctionQuery to Lucene > > > Key: SOLR-192 > URL: https://issues.apache.org/jira/browse/SOLR-192 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Grant Ingersoll > Attachments: SOLR-192-functionQueries.patch > > > FunctionQuery is a useful concept to have in Lucene core. Deprecate the Solr > implementation and migrate it Lucene core. Have the deprecated Solr version > call the Lucene version. > See https://issues.apache.org/jira/browse/LUCENE-446 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-192) Move FunctionQuery to Lucene
[ https://issues.apache.org/jira/browse/SOLR-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523042 ] Will Johnson commented on SOLR-192: --- is anyone currently working on doing this migration? i submitted a patch to the lucene project tracker (https://issues.apache.org/jira/browse/LUCENE-989) and was going to post a patch for solr to use the new features but the implementations look to be reasonably different. > Move FunctionQuery to Lucene > > > Key: SOLR-192 > URL: https://issues.apache.org/jira/browse/SOLR-192 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Grant Ingersoll > > FunctionQuery is a useful concept to have in Lucene core. Deprecate the Solr > implementation and migrate it Lucene core. Have the deprecated Solr version > call the Lucene version. > See https://issues.apache.org/jira/browse/LUCENE-446 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-215) Multiple Solr Cores
[ https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513912 ] Will Johnson commented on SOLR-215: --- did anything ever get baked into the patch for handling the core name as a cgi param instead of as a url path element? the email thread we had going didn't seem to come to any hard conclusions but i'd like to lobby for it as a part of the spec. i read through the patch but i couldn't quite follow things enough to tell. > Multiple Solr Cores > --- > > Key: SOLR-215 > URL: https://issues.apache.org/jira/browse/SOLR-215 > Project: Solr > Issue Type: Improvement >Reporter: Henri Biestro >Priority: Minor > Attachments: solr-215.patch, solr-215.patch, solr-215.patch, > solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, > solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, > solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, > solr-trunk-542847.patch, solr-trunk-src.patch > > > WHAT: > As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index. > This patch is intended to allow multiple cores in Solr which also brings > multiple indexes capability. > The patch file to grab is solr-215.patch.zip (see MISC session below). > WHY: > The current Solr practical wisdom is that one schema - thus one index - is > most likely to accomodate your indexing needs, using a filter to segregate > documents if needed. If you really need multiple indexes, deploy multiple web > applications. > There are a some use cases however where having multiple indexes or multiple > cores through Solr itself may make sense. > Multiple cores: > Deployment issues within some organizations where IT will resist deploying > multiple web applications. > Seamless schema update where you can create a new core and switch to it > without starting/stopping servers. > Embedding Solr in your own application (instead of 'raw' Lucene) and > functionally need to segregate schemas & collections. > Multiple indexes: > Multiple language collections where each document exists in different > languages, analysis being language dependant. > Having document types that have nothing (or very little) in common with > respect to their schema, their lifetime/update frequencies or even collection > sizes. > HOW: > The best analogy is to consider that instead of deploying multiple > web-application, you can have one web-application that hosts more than one > Solr core. The patch does not change any of the core logic (nor the core > code); each core is configured & behaves exactly as the one core in 1.2; the > various caches are per-core & so is the info-bean-registry. > What the patch does is replace the SolrCore singleton by a collection of > cores; all the code modifications are driven by the removal of the different > singletons (the config, the schema & the core). > Each core is 'named' and a static map (keyed by name) allows to easily manage > them. > You declare one servlet filter mapping per core you want to expose in the > web.xml; this allows easy to access each core through a different url. > USAGE (example web deployment, patch installed): > Step0 > java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml > monitor.ml > Will index the 2 documents in solr.xml & monitor.xml > Step1: > http://localhost:8983/solr/core0/admin/stats.jsp > Will produce the statistics page from the admin servlet on core0 index; 2 > documents > Step2: > http://localhost:8983/solr/core1/admin/stats.jsp > Will produce the statistics page from the admin servlet on core1 index; no > documents > Step3: > java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml > java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml > Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1; > running queries from the admin interface, you can verify indexes have > different content. > USAGE (Java code): > //create a configuration > SolrConfig config = new SolrConfig("solrconfig.xml"); > //create a schema > IndexSchema schema = new IndexSchema(config, "schema0.xml"); > //create a core from the 2 other. > SolrCore core = new SolrCore("core0", "/path/to/index", config, schema); > //Accessing a core: > SolrCore core = SolrCore.getCore("core0"); > PATCH MODIFICATIONS DETAILS (per package): > org.apache.solr.core: > The heaviest modifications ar
[jira] Updated: (SOLR-312) create solrj javadoc in build.xml
[ https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-312: -- Attachment: create-solrj-javadoc.patch simple patch to add new task > create solrj javadoc in build.xml > - > > Key: SOLR-312 > URL: https://issues.apache.org/jira/browse/SOLR-312 > Project: Solr > Issue Type: New Feature > Components: clients - java >Affects Versions: 1.3 > Environment: a new task in build.xml named javadoc-solrj that does > pretty much what you'd expect. creates a new fold build/docs/api-solrj. > heavily based on the example from the solr core javadoc target. >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: create-solrj-javadoc.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-312) create solrj javadoc in build.xml
create solrj javadoc in build.xml - Key: SOLR-312 URL: https://issues.apache.org/jira/browse/SOLR-312 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Environment: a new task in build.xml named javadoc-solrj that does pretty much what you'd expect. creates a new fold build/docs/api-solrj. heavily based on the example from the solr core javadoc target. Reporter: Will Johnson Priority: Minor Fix For: 1.3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
>comments? Hooray, and very cool. I didn't know you only needed a locking mechanism if you only have multiple index writers so the use of NoLock by default makes perfect sense. A quick stability update: Since I first submitted the patch ~2 months ago we've had 0 lockups with it running in all our test environments. - will
[jira] Updated: (SOLR-267) log handler + query + hits
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-267: -- Attachment: LogQueryHitCounts.patch new path produces the following output: Jul 11, 2007 1:35:19 PM org.apache.solr.core.SolrCore execute INFO: webapp=/solr path=/select/ params=indent=on&start=0&q=solr&version=2.2&rows=10 hits=0 status=0 QTime=79 and adds a NamedList toLog as yonik suggested. > log handler + query + hits > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, > LogQueryHitCounts.patch, LogQueryHitCounts.patch, LogQueryHitCounts.patch, > LogQueryHitCounts.patch, LogQueryHitCounts.patch > > > adds a logger to log handler, query string and hit counts for each query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (SOLR 2 1 5) Multiple Solr Cores
Most of the time I, and I imagine others, don't know the set of core's ahead of time. It seems somewhat wasteful to create a ton of solr server connections when a single one can handle things just as easily. I guess I don't see why this param should be any different than any others like output formats etc. As for POST's, you can still have cgi arguments and access them via the same servlet request parameters while accessing the input stream. I'll leave the efficiency issues to people more familiar with the patch but if it has to be in the url then you force people using solrj and other similar apis to create a Map and manage them that way. - will -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, July 11, 2007 1:20 PM To: solr-dev@lucene.apache.org Subject: Re: [jira] Commented: (SOLR 2 1 5) Multiple Solr Cores On 7/11/07, Will Johnson <[EMAIL PROTECTED]> wrote: > I think it would be nice to have the core name > specified as a CGI param instead of (or in addition to) a url path. > Otherwise, large section of client code (such as solrj/solr#) will need > to be changed. Only if you want to talk to multiple cores over a single "connection", right? Hopefully existing client code will allow the specification of the URL, and one would use http://localhost:8983/solr/core1/ Still might be useful as a param *if* it can be done efficiently. I wonder about the case when the param comes in via POST though. -Yonik
RE: [jira] Commented: (SOLR-215) Multiple Solr Cores
>One question I had was about backward compatibility... is there a way to >register a null or default core that reverts to the original paths? Are >there any other backward compatible gotchas (not related to custom java >code)? I'm very excited about this patch as it would remove my current scheme of running shell scripts to hot deploy new solr webapps on the fly. Along with registering a default core so that all existing code/tests continue to work I think it would be nice to have the core name specified as a CGI param instead of (or in addition to) a url path. Otherwise, large section of client code (such as solrj/solr#) will need to be changed. For example: http://localhost:8983/solr/select?q=foo&core=core1 http://localhost:8983/solr/update?core=core1 - will
[jira] Updated: (SOLR-280) slightly more efficient SolrDocument implementation
[ https://issues.apache.org/jira/browse/SOLR-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-280: -- Attachment: SOLR-280-SolrDocument2-API-Compatibility.patch >The API changes mostly affect solrj users. being one of those heavily affected users i created the attached patch to make us unaffected. (or at least i went from a few hundred compile errors to 0) the following methods were added back and are mostly 1-5 line wrappers to the existing methods or underlying datastructures. setField(String, Object) getFieldValue(String) getFieldValues(String) addField(String, Object) getFieldNames() - will > slightly more efficient SolrDocument implementation > --- > > Key: SOLR-280 > URL: https://issues.apache.org/jira/browse/SOLR-280 > Project: Solr > Issue Type: Improvement >Reporter: Ryan McKinley >Assignee: Ryan McKinley >Priority: Minor > Attachments: SOLR-280-SolrDocument2-API-Compatibility.patch, > SOLR-280-SolrDocument2.patch, SOLR-280-SolrDocument2.patch > > > Following discussion in SOLR-272 > This implementation stores fields as a Map rather then a > Map>. The API changes slightly in that: > getFieldValue( name ) returns a Collection if there are more then one fields > and a Object if there is only one. > getFirstValue( name ) returns a single value for the field. This is intended > to make things easier for client applications. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-278) LukeRequest/Response for handling show=schema
[ https://issues.apache.org/jira/browse/SOLR-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509160 ] Will Johnson commented on SOLR-278: --- I guess I was hoping for a super set of features in LukeResponse.FieldInfo which will be partially set by the schema and partially set by the luke-ish info. We could be even merge the two if it made sense. In the end I need to get a list of "fields that solr currently knows about" which seems to be a grouping of both the schema and the index via dynamic fields. The current patch does this but I think there is a better approach somewhere out there. - will > LukeRequest/Response for handling show=schema > - > > Key: SOLR-278 > URL: https://issues.apache.org/jira/browse/SOLR-278 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LukeSchemaHandling.patch > > > the soon to be attached patch adds a method to LukeRequest to set the option > for showing schema from SOLR-266. the patch also modifies LukeRepsonse to > handle the schema info in the same manner as the fields from the 'normal' > luke response. i think it's worth talking about unifying the response format > so that they aren't different but that's a larger discussion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-278) LukeRequest/Response for handling show=schema
[ https://issues.apache.org/jira/browse/SOLR-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-278: -- Attachment: LukeSchemaHandling.patch > LukeRequest/Response for handling show=schema > - > > Key: SOLR-278 > URL: https://issues.apache.org/jira/browse/SOLR-278 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LukeSchemaHandling.patch > > > the soon to be attached patch adds a method to LukeRequest to set the option > for showing schema from SOLR-266. the patch also modifies LukeRepsonse to > handle the schema info in the same manner as the fields from the 'normal' > luke response. i think it's worth talking about unifying the response format > so that they aren't different but that's a larger discussion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-278) LukeRequest/Response for handling show=schema
LukeRequest/Response for handling show=schema - Key: SOLR-278 URL: https://issues.apache.org/jira/browse/SOLR-278 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.3 Reporter: Will Johnson Priority: Minor Fix For: 1.3 the soon to be attached patch adds a method to LukeRequest to set the option for showing schema from SOLR-266. the patch also modifies LukeRepsonse to handle the schema info in the same manner as the fields from the 'normal' luke response. i think it's worth talking about unifying the response format so that they aren't different but that's a larger discussion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-267) log handler + query + hits
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-267: -- Attachment: LogQueryHitCounts.patch new patch that writes out request params as cgi instead of NL.toString() for pasting into a browser. i also figured out the HttpResponseHeader however i'm not sure how people feel about having that info duplicated in teh solr logs, the http headers/access logs and the actual solr response. in any case the following logic would go into SolrDispatchFilter: (but is not in this patch) > log handler + query + hits > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, > LogQueryHitCounts.patch, LogQueryHitCounts.patch, LogQueryHitCounts.patch, > LogQueryHitCounts.patch > > > adds a logger to log handler, query string and hit counts for each query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-267) log handler + query + hits
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508163 ] Will Johnson commented on SOLR-267: --- A few response rolled up: Yonik Seeley commented on SOLR-267: --- After having used this for a ~week now I kind of do too. I can work on a patch that switches that log component back unless someone else (who wants it more) beats me to it. "hits". Agreed, I'd love to have query pipelines and indexing pipelines for processing logic but that's a much bigger effort. At the moment it's only 1 line extra in each of the 'real' query handlers which doesn't seem too bad. Ian Holsman commented on SOLR-267: -- long? >you might need/want to put in some quotes are the query. It will look very long :) As long as there are no spaces which the url encoding should handle I think things are ok (this assumes we're going to switch back to cgi params) it >in) Not that I know how to do. Since the dispatch filter is a filter not a servlet it doesn't have access to an HttpServletResponse, only a ServletResponse which means it can't set HttpHeaders. This was my original idea for how to solve this problem and seems a bit more 'standard' anyways but I hit a dead end without getting more hackish than usual. - will > log handler + query + hits > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, > LogQueryHitCounts.patch, LogQueryHitCounts.patch, LogQueryHitCounts.patch > > > adds a logger to log handler, query string and hit counts for each query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-267) log handler + query + hits
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-267: -- Attachment: LogQueryHitCounts.patch new patch to promote responseHeader from a defacto standard to an api standard in SolrQueryResponse. this enables the SolrCore.execute() method to simply print out it's contents containing any info people want logged. the format now is: Jun 22, 2007 10:37:25 AM org.apache.solr.core.SolrCore execute INFO: webapp=/solr path=/select/ hits=0 status=0 QTime=0 params={indent=on,start=0,q=solr,version=2.2,rows=10} which is fully labeled but i think mkaes things much easier to read/parse as you can look for labels instead of positions which may or may not change. > log handler + query + hits > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, > LogQueryHitCounts.patch, LogQueryHitCounts.patch, LogQueryHitCounts.patch > > > adds a logger to log handler, query string and hit counts for each query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-267) log handler + query + hits
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507312 ] Will Johnson commented on SOLR-267: --- One thing that comes to mind is making the response header a standard part of the SolrQueryResponse object with get/set/add methods then the log message can just print out what ever is in the response header. With trunk, it already contains much of the same data (status, qtime, params). The only issue is that in order to keep things 'clean' the output would change to being fully labeled: webapp=/solr path=/select/ status=0 QTime=63 params={indent=on,start=0,q=*:*,version=2.2,rows=10} myotherheader=foo In general I think this makes things much cleaner and easier to read but it does break backwards compatibility for log parsing purposes. Any other ideas? - will > log handler + query + hits > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, > LogQueryHitCounts.patch, LogQueryHitCounts.patch > > > adds a logger to log handler, query string and hit counts for each query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
[ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506983 ] Will Johnson commented on SOLR-240: --- longer >>than without. No, after I applied the patch I have never seen a lockup. oldest Solr collections have been running in CNET for 2 years now, and I've never seen this happen). What I *have* seen is that exact exception when the server died, restarted, and then couldn't grab the write lock normally due to not a big enough heap causing excessive GC and leading resin's wrapper to restart the container. Another reason to use native locking. From the lucene native fs lock javadocs: "Furthermore, if the JVM crashes, the OS will free any held locks, whereas SimpleFSLockFactory will keep the locks held, requiring manual removal before re-running Lucene." My hunch (and that's all it is) is that people seeing/not seeing the issue may come down to usage patterns. My project is heavily focused on low indexing latency so we're doing huge numbers of add/deletes/commits/searches in very fast succession and from multiple clients. A more batch oriented update usage pattern may not see the issue. The patch because as is, it doesn't change any api or cause any change of existing functionality whatsoever unless you use the new option in solrconfig. I would argue that using native locking should be the default though. - will > java.io.IOException: Lock obtain timed out: SimpleFSLock > > > Key: SOLR-240 > URL: https://issues.apache.org/jira/browse/SOLR-240 > Project: Solr > Issue Type: Bug > Components: update >Affects Versions: 1.2 > Environment: windows xp >Reporter: Will Johnson > Attachments: IndexWriter.patch, IndexWriter2.patch, stacktrace.txt, > ThrashIndex.java > > > when running the soon to be attached sample application against solr it will > eventually die. this same error has happened on both windows and rh4 linux. > the app is just submitting docs with an id in batches of 10, performing a > commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-267) log handler + query + hits
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-267: -- Attachment: LogQueryHitCounts.patch slight update to log the webapp name which is set in the SolrDispatchFilter. this lets you distinguish between multiple solr instances for tracking purposes. log output now looks like: Jun 21, 2007 10:31:05 AM org.apache.solr.core.SolrCore execute INFO: /solr /select/ indent=on&start=0&q=*:*&version=2.2&rows=10 hits=5 0 62 > log handler + query + hits > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch, > LogQueryHitCounts.patch, LogQueryHitCounts.patch > > > adds a logger to log handler, query string and hit counts for each query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Updated: (SOLR-267) log handler + query + hits
>This produces log messages that look like this: >INFO: /select q=solr&wt=python&indent=on hits=1 0 94 > >If there was no DocSet, it would look like this: >INFO: /select q=solr&wt=python&indent=on - 0 94 I would think that tacking the new stats onto the end of the line would be better than in the middle. Usually when I parse log files it involves something like: String[] arr = line.split(" "); code = arr[3] time = arr[4] instead of the following which is what it seems you're implying that people are doing: String[] arr = line.splti(" ") code = arr[arr.length-2] time = arr[arr.length-1] but then again, I don't have any code written to parse things yet so backwards compatibility isn't an issue for me and either format is fine.
[jira] Updated: (SOLR-267) log handler + query + hits
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-267: -- Attachment: LogQueryHitCounts.patch updated patch to work in SolrCore.execute() instead. i annotated the msg with hits=## as requested however i left time unlabeled for backwards compatibility and i had no idea what the static '0' was but i left it there just to be safe as well. i think i tmight be good to clean that up and i'm happy to but i don't know who or how those numbers are being used today. > log handler + query + hits > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search > Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch, LogQueryHitCounts.patch > > > adds a logger to log handler, query string and hit counts for each query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-267) log handler + query + hits
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-267: -- Description: adds a logger to log handler, query string and hit counts for each query was: adds a logger Summary: log handler + query + hits (was: log handler + query + ) > log handler + query + hits > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch > > > adds a logger to log handler, query string and hit counts for each query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-267) log handler + query +
[ https://issues.apache.org/jira/browse/SOLR-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-267: -- Attachment: LogQueryHitCounts.patch hit a random key a little fast on the last post. the attached patch adds a logger to the Standard and DisMax request handlers to log the hander name, query string and hit count for each query. > log handler + query + > -- > > Key: SOLR-267 > URL: https://issues.apache.org/jira/browse/SOLR-267 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: LogQueryHitCounts.patch > > > adds a logger -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-267) log handler + query +
log handler + query + -- Key: SOLR-267 URL: https://issues.apache.org/jira/browse/SOLR-267 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Reporter: Will Johnson Priority: Minor Fix For: 1.3 adds a logger -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: requestsPerSecond, averageResponseTime
The one thing most people (ie product managers) want to see is the number of times that users get 0 hits for a query but that doesn't seem to be logged anywhere in solr that's easily accessible in log files. Am I missing something very obvious or should we try and fix this somehow? I know some other engines will log the number of hits in with the query log which seems like a nice way of doing things. Any ideas or pointers? - will -Original Message- From: Clay Webster [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 20, 2007 10:33 AM To: solr-dev@lucene.apache.org Subject: Re: requestsPerSecond, averageResponseTime Hey Ian, these version with all the parameter options only shows the table headers.. no data. (No requests?) PS: I think there's interest. ;-) --cw On 6/19/07, Ian Holsman <[EMAIL PROTECTED]> wrote: > > I've been working on a tool to parse log files to get some of this kind > of information as well > > it's really alpha, but if your curious the dummy system is here: > > http://pyro.holsman.net:9081/top/ -- slightly obfuscated queries (to > roll them up) > http://pyro.holsman.net:9081/overall/?period=5m&hours=12 -- # of > requests, response time, and deviation in that > > http://pyro.holsman.net:9081/overall/?period=5m&hours=12&format=csv&cols =1,2,5,6,7,8 > - same thing as a CSV file and showing selected columns > > > The aim is to use this as a data source for something like cacti and > sticking a flash graph on top of it. > > If there is enough interest I can contribute this to solr > > Yonik Seeley wrote: > > requestsPerSecond and averageResponseTime were added to statistics for > > each response handler. Are these statistics really useful enough to > > keep as-is? > > > > averageResponseTime is cumulative since the server started, so it's > > not useful for monitoring purposes, but only benchmarking purposes (it > > won't tell you if your queries are getting slower all of a sudden). > > (it will also count slower warming queries, not just live queries). > > > > requestsPerSecond is likewise flawed... it won't let you detect a > > flood of traffic or a dropoff. Also, if you turned off traffic to the > > server yesterday, that will continue to be reflected in the > > requestsPerSecond today. > > > > Since it seems like these parameters are only useful for benchmarking > > (which can easily be done from log files), perhaps we should defer > > adding them until we can come up with versions that are useful for > > monitoring? > > > > -Yonik > > > >
[jira] Updated: (SOLR-176) Add detailed timing data to query response output
[ https://issues.apache.org/jira/browse/SOLR-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-176: -- Attachment: RequesthandlerBase.patch a slightly more ambitious patch that tracks: * total number of requests/errors * requests/errors in the current interval (interval defined in solrconfig) * requets/errors as of the start of the last interval * avg requet times for total / current interval > Add detailed timing data to query response output > - > > Key: SOLR-176 > URL: https://issues.apache.org/jira/browse/SOLR-176 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.2 >Reporter: Mike Klaas >Assignee: Mike Klaas >Priority: Minor > Fix For: 1.3 > > Attachments: dtiming.patch, dtiming.patch, dtiming.patch, > dtiming.patch, RequesthandlerBase.patch, RequesthandlerBase.patch > > > see > http://www.nabble.com/%27accumulate%27-copyField-for-faceting-tf3329986.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: requestsPerSecond, averageResponseTime
>Has anyone tried to get solr statistics with cacti/nagios? If it isn't >too difficult, I would like to set this up. >Can cacti read & parse a file? Generally speaking nagios/cacti are as powerful as you are with bash. We haven't done it yet but it's a requirement at my company to integrate the three in the next couple months. Our current plan is to get as many stats into the statistics page and then have a shell script grab the xml (possibly with an xsl) and then feed that into the monitoring apps. >From there it's ringing pagers and usage graphs galore. When we get something working I'll make sure to post a write up on the list unless someone else beats me to it. - will
RE: requestsPerSecond, averageResponseTime
Would it be better to have an option to record traffic for the last 'x minutes/seconds/hours' configurable on a per handler basis? The goal is to have hooks for nagios/cacti/etc to be able to pull live status info for monitoring purposes. If you want fine grained performance history then log files are the best approach, I just think a way to have beepers go off if a server starts getting huge amounts of traffic is a good thing. For the record nagios and/or cacti could both keep track of 'in the last x' type of statistics based on totals but having solr compute that automatically would be nice. - will -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, June 19, 2007 10:27 AM To: solr-dev@lucene.apache.org Subject: requestsPerSecond, averageResponseTime requestsPerSecond and averageResponseTime were added to statistics for each response handler. Are these statistics really useful enough to keep as-is? averageResponseTime is cumulative since the server started, so it's not useful for monitoring purposes, but only benchmarking purposes (it won't tell you if your queries are getting slower all of a sudden). (it will also count slower warming queries, not just live queries). requestsPerSecond is likewise flawed... it won't let you detect a flood of traffic or a dropoff. Also, if you turned off traffic to the server yesterday, that will continue to be reflected in the requestsPerSecond today. Since it seems like these parameters are only useful for benchmarking (which can easily be done from log files), perhaps we should defer adding them until we can come up with versions that are useful for monitoring? -Yonik
RE: [jira] Commented: (SOLR-265) Make IndexSchema updateable in live system
>i haven't read anything in the jira issue this refrences, but in instances >where reliability and uptime are of high concern, you'll typically have a >master/multi-slave setup with the slaves sitting behind a load balancer >-- in that configuration, you can deploy any change to your schema by: >this process results in 0 downtime for any schema.xml change, regardless >whether the changes require rebuilding your index. True, but that implies indexing downtime which is also bad. Also, the master/slave setups kill indexing latency which is my primary concern and the reason I went with solr to begin with. also while you're suggested steps work they're a bit heavy on the operations side compared to a client's ability to add a field by hitting a url. >if you change/add a copyField declaration, you'll need to reindex ... >copyField is evaluated when a document is being indexed. True, but not if you haven't fed any data into that copy field yet. Ie 'from now on' I want all data from field x copied into field y. - will
RE: [jira] Commented: (SOLR-265) Make IndexSchema updateable in live system
>I haven't looked at the patch, but have a couple questions: >* What is the motivation/use case for editing the schema at runtime? (I'm not >suggesting there aren't good ones, just curious) to add new fields on the fly without having any search downtime >* Would changes be saved? the patch as is just re-reads the schema from the location it's originally set from. all changes are 'saved' > * Why not dynamic fields? becuase the field names start to get too complex. for example you could model the id field in the default schema as a dynamic field: becomes: *_str_it_st_rt_mvf working that out for all possible combinations seems a bit onerous. the default dynamic fields cover most cases but i'm sure my product managers will want one that i don't have the day after we go live. also, if i have extra info about a field like the fact that i don't want it stored i shoudl be able to take advantage of that without having to bounce anything. > it seems to me that restarting a webapp and suffering > downtime is a heavy price to pay just to add a new field or even to just > change an existing field property. >*adding* fields should be relatively straightforward -- the more I learn about >lucene indexing(indexes), it seems like most >schema *changes* require the >index to be rebuilt anyway. correct, i'm fine if we want to restrict the schema 'changes' to only allow the addition of new fields but the index schema also reflects things like default query parsing options and copy fields which shouldn't require and index changes at all which why i went for a more loose approach to start. - will
[jira] Commented: (SOLR-265) Make IndexSchema updateable in live system
[ https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505938 ] Will Johnson commented on SOLR-265: --- After doing some more thinking about the issue after I submitted the patch I agree that there probably are some threading issues to work out. I was working on another approach that was much larger (only keep 1 copy in SolrCore accessible by getSchema() and add protection there) but that required a much larger code change than the one posted so I went with the shorter to at least promote discussion. If the single schema getter() makes sense, I'll be happy to provide such a patch. There do seem to be other alternatives though: First is a ModifySchema handler that could support adding fields etc which should be easier to defend against from a synchronization standpoint. At least there are fewer times when fields.clear() has been called but new values have not been added back. As this is all I care about at the moment I'd be happy, but I assume someone might want to do something else more complex. The second is to wrap up the clear/repopulate methods with some basic protection but actually allow different schemas inside a single request. This could be done by requiring all new schemas to be 'compatible' in some defined way. Since there doesn't seem to be any validation that goes on if I stop the app, change the schema and then restart it, compatible might just mean valid xml. If field 'new_x' suddenly appears during the middle of my post it shouldn't have any effect as my posted data won't contain 'new_x.' from a client's contractual perspective, if you want new fields processed correctly you have to wait for updateSchema to finish. In any case, it seems to me that restarting a webapp and suffering downtime is a heavy price to pay just to add a new field or even to just change an existing field property. - will > Make IndexSchema updateable in live system > -- > > Key: SOLR-265 > URL: https://issues.apache.org/jira/browse/SOLR-265 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: IndexSchemaReload.patch > > > I've seen a few items on the mailing lists recently surrounding updating a > schema on the file system and not seeing the changes get propagated. while I > think that automatically detecting schema changes on disk may be unrealistic, > I do think it would be useful to be able to update the schema without having > to bounce the webapp. the forthcoming patch adds a method to SolrCore to do > just that as well as a request handler to be able to call said method. > The patch as it exists is a straw man for discussion. The one thing that > concerned me was making IndexScheam schema non-final in SolrCore. I'm not > quite sure why it needs to be final to begin with so perhaps someone can shed > some light on the situation. Also, I think it would be useful to be able to > upload a schema through the admin GUI, have it persisted to disk and then > call relaodSchema()but that seemed like a good bit of effort for a patch that > might not even be a good idea. > I'd also point out that this specific problem is one I've been trying to > address recently and while I think it could be solved with various dynamic > fields the combination of all the options for fields seemed to create too > many variables to make useful dynamic name patterns. > Thoughts? > - will -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-265) Make IndexSchema updateable in live system
[ https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-265: -- Attachment: IndexSchemaReload.patch updates to: * solconfig.xml to register handler * IndexSchema to add reload() method that clears() all internal data structures and calls readconfig() * a new o.a.s.handler.admin.IndexSchemaRequestHandler to trigger the updating > Make IndexSchema updateable in live system > -- > > Key: SOLR-265 > URL: https://issues.apache.org/jira/browse/SOLR-265 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.3 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.3 > > Attachments: IndexSchemaReload.patch > > > I've seen a few items on the mailing lists recently surrounding updating a > schema on the file system and not seeing the changes get propagated. while I > think that automatically detecting schema changes on disk may be unrealistic, > I do think it would be useful to be able to update the schema without having > to bounce the webapp. the forthcoming patch adds a method to SolrCore to do > just that as well as a request handler to be able to call said method. > The patch as it exists is a straw man for discussion. The one thing that > concerned me was making IndexScheam schema non-final in SolrCore. I'm not > quite sure why it needs to be final to begin with so perhaps someone can shed > some light on the situation. Also, I think it would be useful to be able to > upload a schema through the admin GUI, have it persisted to disk and then > call relaodSchema()but that seemed like a good bit of effort for a patch that > might not even be a good idea. > I'd also point out that this specific problem is one I've been trying to > address recently and while I think it could be solved with various dynamic > fields the combination of all the options for fields seemed to create too > many variables to make useful dynamic name patterns. > Thoughts? > - will -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-265) Make IndexSchema updateable in live system
Make IndexSchema updateable in live system -- Key: SOLR-265 URL: https://issues.apache.org/jira/browse/SOLR-265 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Will Johnson Priority: Minor Fix For: 1.3 I've seen a few items on the mailing lists recently surrounding updating a schema on the file system and not seeing the changes get propagated. while I think that automatically detecting schema changes on disk may be unrealistic, I do think it would be useful to be able to update the schema without having to bounce the webapp. the forthcoming patch adds a method to SolrCore to do just that as well as a request handler to be able to call said method. The patch as it exists is a straw man for discussion. The one thing that concerned me was making IndexScheam schema non-final in SolrCore. I'm not quite sure why it needs to be final to begin with so perhaps someone can shed some light on the situation. Also, I think it would be useful to be able to upload a schema through the admin GUI, have it persisted to disk and then call relaodSchema()but that seemed like a good bit of effort for a patch that might not even be a good idea. I'd also point out that this specific problem is one I've been trying to address recently and while I think it could be solved with various dynamic fields the combination of all the options for fields seemed to create too many variables to make useful dynamic name patterns. Thoughts? - will -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505846 ] Will Johnson commented on SOLR-239: --- after looking at all the dependencies for IndexSchema and with the addition of the new solrj stuff in trunk i no longer think this approach is the correct way to go about things. the LukeRequest/LukeResponse seems to give most of the same info with ~0 overhead and it's already checked in. > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch, IndexSchemaStream2.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: search components (plugins)
Sorry, I forgot to turn on my _wild_ideas_ flag before that last post. That being said, you could build the notion of dependencies into each stage and have the search logic omputed based on those dependencies, alternatively you could do pre/post methods for each processing stage that allow each stage hands on access to the searcher... crap looks like ryan beat me by 3 minutes. oh well, what he said. - will -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, June 11, 2007 2:27 PM To: solr-dev@lucene.apache.org Subject: Re: search components (plugins) : // choose one query method : docs = Query( req, debug ) :- standard :- dismax :- mlt (as input) there are two small hitches to an approach like this, the first is that you'd like to reuse more of hte query processing then to just say "go pick the list of docs basedo nthe reuqest" ...ideally we'd want things like "fq" parsing/processing to be refactored so it can be reused by both standard and dismax and mlt, but that requires chagning the API to be something like... Query, Filter = MakeQuery(req, debug) ..and you delegate to the outermost "controller" to deal with the actual conversion to generate the "docs" ... expcet you alo hve to worry about start, rows, sort etc at that level, which makes it a lot less clean. the second hitch is that "docs" only makes sense in ssuedo code ... in reality there are DocSets and DocLists, and the efficiencies of geting only one instead of both can be significant, but if the first phase of processing doesn't know what expectations the later phases have (facet or not? returns aDocList in teh response or not?) it may have to assume you need both. : // zero or more... : info[] = Info( req, docs, debug ) :+ facet :+ mlt (on each result) this for the record is what i was kind of amming for back when i made the SimpleFacets class ... give if the docset and some Solr Params, and then ask it for what you want (either some specific peice of functionality like getFacetFieldCounts, or all possibletypes of facets even if you don't know what they are with getFacetCounts()). -Hoss
[jira] Updated: (SOLR-259) More descriptive text on improperly set solr/home
[ https://issues.apache.org/jira/browse/SOLR-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-259: -- Attachment: betterSolrHomeError.patch +import java.util.logging.Level and a simple log.log(Level.SEVERE, "Could not start SOLR. Check solr/home property", t); > More descriptive text on improperly set solr/home > - > > Key: SOLR-259 > URL: https://issues.apache.org/jira/browse/SOLR-259 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: betterSolrHomeError.patch > > > when solr/home is set improperly, tomcat (and other containers) fail to log > any useful error messages because everything goes to SolrConfig.severeErrors > instead of some basic container level logs. the soon to be attached 1.5 line > patch adds a simple log message to the standard container logs to tell you to > check your settings and tell you what solr/home is currently set to. > Before the patch if solr/home is improperly set you get: > Jun 11, 2007 2:21:13 PM org.apache.solr.servlet.SolrDispatchFilter init > INFO: SolrDispatchFilter.init() > Jun 11, 2007 2:21:13 PM org.apache.solr.core.Config getInstanceDir > INFO: Using JNDI solr.home: > C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr > Jun 11, 2007 2:21:13 PM org.apache.solr.core.Config setInstanceDir > INFO: Solr home set to > 'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/' > Jun 11, 2007 2:21:13 PM org.apache.catalina.core.StandardContext start > SEVERE: Error filterStart > Jun 11, 2007 2:21:13 PM org.apache.catalina.core.StandardContext start > SEVERE: Context [/solr] startup failed due to previous errors > After the patch you get: > un 11, 2007 2:30:37 PM org.apache.solr.servlet.SolrDispatchFilter init > INFO: SolrDispatchFilter.init() > Jun 11, 2007 2:30:37 PM org.apache.solr.core.Config getInstanceDir > INFO: Using JNDI solr.home: > C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr > Jun 11, 2007 2:30:37 PM org.apache.solr.core.Config setInstanceDir > INFO: Solr home set to > 'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/' > Jun 11, 2007 2:30:37 PM org.apache.solr.servlet.SolrDispatchFilter init > SEVERE: Could not start SOLR. Check solr/home property > java.lang.ExceptionInInitializerError > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:66) > at > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) > at > org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) > at > org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) > at > org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3693) > at > org.apache.catalina.core.StandardContext.start(StandardContext.java:4340) > at > org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) > at > org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) > at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) > at > org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) > at > org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) > at > org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) > at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1206) > at > org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293) > at > org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) > at > org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1337) > at > org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601) > at > org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610) > at > org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1590) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.RuntimeException: Error in solrconfig.xml > at org.apache.solr.core.SolrConfig.(SolrConfig.java:90) > ... 20 more > Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' > in cl
[jira] Created: (SOLR-259) More descriptive text on improperly set solr/home
More descriptive text on improperly set solr/home - Key: SOLR-259 URL: https://issues.apache.org/jira/browse/SOLR-259 Project: Solr Issue Type: Improvement Affects Versions: 1.2 Reporter: Will Johnson Priority: Minor Fix For: 1.2 when solr/home is set improperly, tomcat (and other containers) fail to log any useful error messages because everything goes to SolrConfig.severeErrors instead of some basic container level logs. the soon to be attached 1.5 line patch adds a simple log message to the standard container logs to tell you to check your settings and tell you what solr/home is currently set to. Before the patch if solr/home is improperly set you get: Jun 11, 2007 2:21:13 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() Jun 11, 2007 2:21:13 PM org.apache.solr.core.Config getInstanceDir INFO: Using JNDI solr.home: C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr Jun 11, 2007 2:21:13 PM org.apache.solr.core.Config setInstanceDir INFO: Solr home set to 'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/' Jun 11, 2007 2:21:13 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart Jun 11, 2007 2:21:13 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors After the patch you get: un 11, 2007 2:30:37 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() Jun 11, 2007 2:30:37 PM org.apache.solr.core.Config getInstanceDir INFO: Using JNDI solr.home: C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr Jun 11, 2007 2:30:37 PM org.apache.solr.core.Config setInstanceDir INFO: Solr home set to 'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/' Jun 11, 2007 2:30:37 PM org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start SOLR. Check solr/home property java.lang.ExceptionInInitializerError at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:66) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3693) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4340) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1206) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1337) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1590) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: Error in solrconfig.xml at org.apache.solr.core.SolrConfig.(SolrConfig.java:90) ... 20 more Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'C:\data\workspace\gciTrunk\infrastructure\gciSolr\build\solr/conf/', cwd=C:\data\apps\tomcat6.0.13\bin at org.apache.solr.core.Config.openResource(Config.java:357) at org.apache.solr.core.SolrConfig.initConfig(SolrConfig.java:79) at org.apache.solr.core.SolrConfig.(SolrConfig.java:87) ... 20 more Jun 11, 2007 2:30:37 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart Jun 11, 2007 2:30:37 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr] startup failed due to previous errors -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (SOLR-236) Field collapsing
And one other point, one of the reasons why it's hard to find an example of post-faceting is that many of the major engines can't do it. - will -Original Message----- From: Will Johnson [mailto:[EMAIL PROTECTED] Sent: Monday, June 11, 2007 11:05 AM To: solr-dev@lucene.apache.org Subject: RE: [jira] Commented: (SOLR-236) Field collapsing >I assumed they would... I think our signals might be crossed w.r.t. >the meaning of pre or post collapsing. Faceting "post collapsing" I >took to mean that the base docset would be restricted to the top "n" >of each category. In my view, faceting should occur on the full collapsed result set. Ie break down 100 hits to 50 unique ones, then compute facets on those 50 even though you may only return 10 to the user. >circuitcity does it how I would expect... field collapsing does not >effect the facets on the left. >For example, if I search for memory, a facet tells me that there are >70 under "Digital Cameras". If I look down the collapsed results, >"Digital Cameras" only shows the top match, but has a link to "View >all 70 matches". I agree, circuit city is a use case where you want pre-faceting. If you think about site collapsing though I may se that there are 57 documents in my result set of type x, then clicking on type x should show me 57 docs. >15 documents displayed to the user, or 15 total documents that matched >the query? >If the latter, I don't see how you could get greater than 15 for any >facet count. If I see that there are 15 of type x and click on it then 'total result found' on the next page should say 15, not any higher. -Yonik
RE: [jira] Commented: (SOLR-236) Field collapsing
>I assumed they would... I think our signals might be crossed w.r.t. >the meaning of pre or post collapsing. Faceting "post collapsing" I >took to mean that the base docset would be restricted to the top "n" >of each category. In my view, faceting should occur on the full collapsed result set. Ie break down 100 hits to 50 unique ones, then compute facets on those 50 even though you may only return 10 to the user. >circuitcity does it how I would expect... field collapsing does not >effect the facets on the left. >For example, if I search for memory, a facet tells me that there are >70 under "Digital Cameras". If I look down the collapsed results, >"Digital Cameras" only shows the top match, but has a link to "View >all 70 matches". I agree, circuit city is a use case where you want pre-faceting. If you think about site collapsing though I may se that there are 57 documents in my result set of type x, then clicking on type x should show me 57 docs. >15 documents displayed to the user, or 15 total documents that matched >the query? >If the latter, I don't see how you could get greater than 15 for any >facet count. If I see that there are 15 of type x and click on it then 'total result found' on the next page should say 15, not any higher. -Yonik
RE: [jira] Commented: (SOLR-236) Field collapsing
Having worked on a number of customer implementations regarding this feature I can say that the number one requirement is for the facet counts to be accurate post collapsing. It all comes down to the user experience. For example, if I run a query that get collapsed and has a facet count for the non-collapsed value then when I click on that facet for refinement the number of hits in my subsequent query will not match the number of hits displayed by that facet count. Ie if it says there are 10 docs in my result set of type x then when I click on type x I expect to get back 10 hits. Further, I could easily end up with a result set with 15 total hits but a facet count hat says there are 200 results of type x which is very disconcerting from a user perspective. I agree that there are times when pre-faceting is also good, but post-faceting has always been a rather hard requirement for most ecommerce/data discovery sites. - will -Original Message- From: Emmanuel Keller (JIRA) [mailto:[EMAIL PROTECTED] Sent: Sunday, June 10, 2007 7:33 AM To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-236) Field collapsing [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.p lugin.system.issuetabpanels:comment-tabpanel#action_12503162 ] Emmanuel Keller commented on SOLR-236: -- Do we have to make a choice ? Both behaviors are interesting. What about a new parameter like collapse.facet=[pre|post] ? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.2 >Reporter: Emmanuel Keller > Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version (1.2) > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: search components (plugins)
Some thoughts: One of the most powerful and useful concepts that many of the other engines (well the good ones) use is the notion of processing pipelines. For queries this means a series of stages that do things such as: * faceting * collapsing * applying default values * spell checking * adding in promotions/boosted content * applying relevancy logic * more like this But it is also heavily used at indexing time. The more complex engines use these pipelines for all kinds of crazy stuff like converting msoffice docs, ocr, speech to text, etc which I think is what nutch does to some extent. However solr could still use the same notion to do more lower level operations like: * applying synonyms * removing/renaming fields * translating xml formats (it would be nice to have any update handler be able to apply an xslt on incoming data) * validate incoming data against some business logic I think much of this is wrapped up in the field definitions at the moment but it could be extended to be more document aware. Anything that makes chaining of pre-built processing easier would be nice. In addition, if these stages are specified in solrconfig then decisions like 'do I want faceting before or after collpasing' become simple cut/paste choices not code changes. Further, if the last processing step is 'index this doc' or 'search the index' those should be easy to replace with 'send this doc to segment x' or 'search all the sub indexes' with simple xml config file changes assuming those stages exist. (which again is how many of the other engines do things) - will -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Sunday, June 10, 2007 12:51 PM To: solr-dev@lucene.apache.org Subject: search components (plugins) Some people have needed some custom query logic, and they had to implement their own request handlers. They still wanted all of the other functionality (or almost all), so they are forced to copy the standard request handler or dismax, or both. That's not the easiest to maintain, and could be more elegant. Another layer of plugins sounded like overkill at first, but I'm starting to rethink it, esp in the face of the expanding number of different variations: - standard - dismax - more-like-this - field collapsing Seems like we should be able to more easily mix and match, or add new pieces, w/o having whole new request handlers. Looking toward the future, and distributed search, this might be a natural place to add hooks to implement that distributed logic. This would allow other people to efficiently support their custom functionality in a distributed environment. Thoughts? -Yonik
RE: [jira] Commented: (SOLR-20) A simple Java client for updating and searching
Has anyone thought of adding the docsum time to the qtime or possibly adding separate timing information for the real 'solr query time'. While my bosses are very pleased that most searches seem to take ~5ms it does seem a bit misleading. I'll take a crack at a patch unless there is a reason not to. - will -Original Message- From: Ryan McKinley (JIRA) [mailto:[EMAIL PROTECTED] Sent: Friday, June 08, 2007 1:09 PM To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-20) A simple Java client for updating and searching [ https://issues.apache.org/jira/browse/SOLR-20?page=com.atlassian.jira.pl ugin.system.issuetabpanels:comment-tabpanel#action_12502885 ] Ryan McKinley commented on SOLR-20: --- I don't know if you are on solr-dev, Yonik noted that the QTime does not include the time to write the response, only the query time. To get an accurate number for how long the whole query takes, check your app server logs http://www.nabble.com/Re%3A-A-simple-Java-client-for-updating-and-search ing-tf3890950.html To get a quick response from solr, try rows=0 or a 404 path. (Of course, the speed will depend on you network connection speed between client-server) > A simple Java client for updating and searching > --- > > Key: SOLR-20 > URL: https://issues.apache.org/jira/browse/SOLR-20 > Project: Solr > Issue Type: New Feature > Components: clients - java > Environment: all >Reporter: Darren Erik Vengroff >Priority: Minor > Attachments: DocumentManagerClient.java, DocumentManagerClient.java, solr-client-java-2.zip.zip, solr-client-java.zip, solr-client-sources.jar, solr-client.zip, solr-client.zip, solr-client.zip, solrclient_addqueryfacet.zip, SolrClientException.java, SolrServerException.java > > > I wrote a simple little client class that can connect to a Solr server and issue add, delete, commit and optimize commands using Java methods. I'm posting here for review and comments as suggested by Yonik. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-239: -- Attachment: IndexSchemaStream2.patch new patch that includes a GetFile servlet to possibly replace get-file.jsp due to the fact that it writes out invalid xml. > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch, IndexSchemaStream2.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (SOLR-236) Field collapsing
I haven't looked at any of the patches but I can comment some other uses for the feature that are in production today with major vendors. While it's used for site collapsing ala google it's also heavily used in ecommerce settings. Check out BestBuy.com/circuitcity/etc and do a search for some really generic word like 'cable' and notice all the groups of items; BB shows 3 per group, CC shows 1 per group. In each case it's not clear that the number of docs is really limited at all, ie it's more important to get back all the categories with n docs per category and the counts per category than it is to get back a fixed number of results or even categories for that matter. Also notice that neither of these sites allow you to page through the categorized results. I'd also point out that many vendors require the collapsing field to be an int instead of a string and then force the front end to do the mapping. just one more thing to consider - will -Original Message- From: Yonik Seeley (JIRA) [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 05, 2007 9:01 AM To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-236) Field collapsing [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.p lugin.system.issuetabpanels:comment-tabpanel#action_12501550 ] Yonik Seeley commented on SOLR-236: --- I guess adjacent collapsing can make sense when one is sorting by the field that is being collapsed. For the normal collapsing though, this patch appears to implement it by changing the sort order to the collapsing field (normally not desired). For example, if sorting by relevance and collapsing on a field, one would normally want the groups sorted by relevance (with the group relevance defined as the max score of it's members). As far as how to do paging, it makes sense to rigidly define it in terms of number of documents, regardless of how many documents are in each group. Going back to google, it always displays the first 10 documents, but a variable number of groups. That does mean that a group could be split across pages. It would actually be much simpler (IMO) to always return a fixed number of groups rather than a fixed number of documents, but I don't think this would be less useful to people. Thoughts? > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.2 >Reporter: Emmanuel Keller > Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version (1.2) > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-239: -- Attachment: IndexSchemaStream2.patch New patch that address all 6 suggestions. The one thing that is interesting is that using http://localhost:8983/solr/admin/get-file.jsp?file=schema.xml does not work as it prints out a number of newlines before the XML declaration which causes it to be invalid. I'm not quite sure how to fix this without rewriting get-file.jsp as a servlet and making sure it only prints out the xml. In any case it does work against url's that only contain valid xml however I wasn't sure how we go about testing things that require the example to be running. (the test is therefore commented out) as for motivations, yes it does require a good bit of overhead and i think it would be good to have a 'lighter' IndexSchema implementation for client api's. i do think however that it's nice to know exactly what is running and to be able to inspect each fields capabilities so i'm not sure what the right thing to do is. - will > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch, IndexSchemaStream2.patch, IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500704 ] Will Johnson commented on SOLR-239: --- after seeing that i'd need to regenerate a patch for the new IndexSchema's SolrException handling i got to thinking about ways to preserve the getInputStream() functionality. tracing things down a bit it seems to all fall to Config.openResource(fileName). i was wondering if it might not be better to extend that code to handle URL's as well as file names by looking for http:// at the beginning of the resourceName. this might open up other avenues for centralized configuration of all of solr in the future but it does at least solve this problem and maintain more backwards compatibility with the existing api. thoughts? > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch, IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
[ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499843 ] Will Johnson commented on SOLR-240: --- i get the stacktrace below with the latest from head with useNativeLocks turned off (from my patch). this took about 2 minutes to reproduce on my windows laptop. one thing i thought of is that local antivirus scanning / backup software which we run here may be getting in the way. i know many other search engines / high performance databases out there have issues with antivirus software because it causes similar locking issues. i'm disabling as much of the IT 'malware' as possible and seeing better results even with default locking however i had everything running when i had good results with the native locking enabled so it still seems to be a good idea to use the patch (or something similar). - will SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: [EMAIL PROTECTED] b822c61c394dd5f449aaf5e5717356-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:70) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:391) at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:81) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:120) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:181) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:259) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:166) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:79) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:198) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:166) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:368) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > java.io.IOException: Lock obtain timed out: SimpleFSLock > > > Key: SOLR-240 > URL: https://issues.apache.org/jira/browse/SOLR-240 > Project: Solr > Issue Type: Bug > Components: update >Affects Versions: 1.2 > Environment: windows xp >Reporter: Will Johnson > Attachments: IndexWriter.patch, IndexWriter2.patch, stacktrace.txt, > ThrashIndex.java > > > when running the soon to be attached sample application against solr it will > eventually die. this same error has happened on both windows and rh4 linux. > the app is just submitting docs with an id in batches of 10, performing a > commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-239: -- Attachment: IndexSchemaStream2.patch updated with fixed and test raw-schema.jsp and added back the IndexSchema testDynamicCopy() test. > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch, IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file
i'll have another go at the patch tomorrow morning; testing the raw-schema.jsp (even if it's not used) and put back the test. - will From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Thu 5/24/2007 6:02 PM To: solr-dev@lucene.apache.org Subject: RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file : 2) why did you remove testDynamicCopy() from IndexSchemaTest ? : : becuase it had nothing to do with testing the index schema. as far as i : could tell it was a ctrl-c / ctrl-v error. that or i'm really blind and : happy to put it back. idon't see a test with that name defined anywhere. it's testing that you can declare dynamic fields and copy them using copyField ... that sounds like an IndexSchemaTest to me (lots of other schema related tests may be in BasicFunctionalityTest or ConvertedLegacyTest, but we should try to use the class specific test classes when the test is very narrow) : 3) raw-schema.jsp on the trunk appears to be completely broken (multiple : <%@ page contentType="..."%> declarations), and not linked to from the : my patch worked but i also saw that it wasn't linked anywhere. i thought your patch left the multiple contentType declarations, but i don't rememebr for certain now ... it's a trivial issue either way. -Hoss
RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file
1) there is a public API change here by removing the getIputStream() method from IndexSearcher. probably not a big deal but important that we consider it. true, that called wasn't used anywhere else in the solr trunk code. also after a lot of thought i realized that it's in general a poor idea to rely on getting an input stream in any reliable fashion other than when it's first opened. (many don't support reset) i can put it back easily if people are that worried about breaking compatibility but in general it seems like it's asking for trouble without knowing the implemntation. 2) why did you remove testDynamicCopy() from IndexSchemaTest ? becuase it had nothing to do with testing the index schema. as far as i could tell it was a ctrl-c / ctrl-v error. that or i'm really blind and happy to put it back. 3) raw-schema.jsp on the trunk appears to be completely broken (multiple <%@ page contentType="..."%> declarations), and not linked to from the admin screen anyway ... we might want to just remove it completely and make a note in the CHANGES in case people have the old URL bookmarked. my patch worked but i also saw that it wasn't linked anywhere. - will > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement > Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-239: -- Attachment: IndexSchemaStream2.patch the attached patch (IndexSchemaStream2.patch) includes a cleaned up test case as well as making the IndexSchema constructors throw a SolrException since they are reading InputStreams (which they were before). i think perhaps they should throw something a big 'stronger' but that seemed to have more wide-reaching implications. > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)
Good point, I was proposing it as an alternative to myfield_facet since that seems to overload the field name a bit too much. I agree that solrconfig + specialized request handlers are a much better location for that kind of stuff. Also, the reason other engines require you to mark the fields in the index definition is because they actually index the data differently if it is a facet vs a normal indexed field. It's cool that solr doesn't have to do this but there may be a case where it would be a good idea someday. - will -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 23, 2007 6:34 PM To: Solr Dev Subject: RE: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are) : What about adding an optional parameter to the field definition in the : IndexSchema for defaultFacet="true/false". This would make solr's information should go in schema.xml if the are inherient to the data and the physical index. Things should go in the solrconfig.xml if they relate to how the index is used -- a master might have a differnet solrconfig then a slave because it doesn't get used for queries, while two diffenret slaves might have differnet solrconfigs because they get used by different sets of clients and need differnet cache configs or request handler configs -- but all three would use the same schema.xml because the physical index is the same in all cases. a mechanism already exists to say "by default, i want clients to get facets on certian fields" in teh solrconfig.xml, it's just a default param for hte requestHandler ... category author type ... ...then the params are defaulted for everyone, and the only thingthe user needs in the URL is "facet=true" ... or that can be defaulted as well. -Hoss
RE: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)
What about adding an optional parameter to the field definition in the IndexSchema for defaultFacet="true/false". This would make solr's functionality/configuration similar with many of the major search engine vendors and keep people from having to follow naming conventions for fields. Then facet.field=* just turns on those fields with defaultFacet="true" but still lets you facet on others if you deem necessary. If there were a list of default facet fields it might also let the index warming process pre-cache the results of those filter queries which would be a nice side benefit. The *_facet thing scares me because I'm afraid I'll eventually be 'forced' to have field names like: myfield_facet_vector_stem_morelikethis_highlight. - will -Original Message- From: Ryan McKinley (JIRA) [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 23, 2007 3:38 PM To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are) [ https://issues.apache.org/jira/browse/SOLR-247?page=com.atlassian.jira.p lugin.system.issuetabpanels:comment-tabpanel#action_12498338 ] Ryan McKinley commented on SOLR-247: > > There are *lots* of reasons why a field might be indexed though, so faceting on every indexed field doesn't seem like it would ever make sense. > agreed, but *_facet would be useful > > if we do this, i would think it only makes sense to generalize the use of "*" in both fl and facet.field into a true glob style syntax One issue is that fl=XXX is typically a field list separated with "," or "|", facet.field expects each field as a separate parameter. > Allow facet.field=* to facet on all fields (without knowing what they are) > -- > > Key: SOLR-247 > URL: https://issues.apache.org/jira/browse/SOLR-247 > Project: Solr > Issue Type: Improvement >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-247-FacetAllFields.patch > > > I don't know if this is a good idea to include -- it is potentially a bad idea to use it, but that can be ok. > This came out of trying to use faceting for the LukeRequestHandler top term collecting. > http://www.nabble.com/Luke-request-handler-issue-tf3762155.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-176) Add detailed timing data to query response output
[ https://issues.apache.org/jira/browse/SOLR-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-176: -- Attachment: RequesthandlerBase.patch added some average stats to RequestHandlerBase. all of the same info can be obtained by parsing the log files but having it show up on the admin screens and jmx is simple and nice to have. stats added: avgTimePerRequest and avgRequestsPerSecond. > Add detailed timing data to query response output > - > > Key: SOLR-176 > URL: https://issues.apache.org/jira/browse/SOLR-176 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.2 >Reporter: Mike Klaas > Assigned To: Mike Klaas >Priority: Minor > Fix For: 1.2 > > Attachments: dtiming.patch, dtiming.patch, RequesthandlerBase.patch > > > see > http://www.nabble.com/%27accumulate%27-copyField-for-faceting-tf3329986.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
[ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-240: -- Attachment: IndexWriter2.patch the attached patch adds a param to SolrIndexConfig called useNativeLocks. the default is false which will keeps with the existing design using SimpleFSLockFactory. if people think we should allow fully pluggable locking mechanisms i'm game but i wasn't quite sure how to tackle that problem. as for testing, i wasn't quite sure how to run tests to ensure that the locks were working beyond some basic println's (which passed). if anyone has good ideas i'm all ears. - will > java.io.IOException: Lock obtain timed out: SimpleFSLock > > > Key: SOLR-240 > URL: https://issues.apache.org/jira/browse/SOLR-240 > Project: Solr > Issue Type: Bug > Components: update >Affects Versions: 1.2 > Environment: windows xp >Reporter: Will Johnson > Attachments: IndexWriter.patch, IndexWriter2.patch, stacktrace.txt, > ThrashIndex.java > > > when running the soon to be attached sample application against solr it will > eventually die. this same error has happened on both windows and rh4 linux. > the app is just submitting docs with an id in batches of 10, performing a > commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: (solr 240) java.io.IOException: Lock obtain timed out: SimpleFSLock
On my XP laptop it takes a couple minutes, on the Linux server it took 2 days. - will -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, May 15, 2007 4:57 PM To: solr-dev@lucene.apache.org Subject: Re: (solr 240) java.io.IOException: Lock obtain timed out: SimpleFSLock I've been running this for for an hour so far... how long does it normally take you to get an exception?
RE: [jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
True, but the javadocs for the Standard Lock's implementation classes also say they don't work: http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html Further, NFS locking is also clearly stated to not work in the SimpleFSLockFactory: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or g/apache/lucene/store/SimpleFSLockFactory.html So it appears we're in between a lock and a hard place... (oh the 80's sitcom humor) Adding a config parameter sounds good too but the new patch is no worse than what exists in terms of javadoc warnings and has been shown to actually fix what I would imagine is a rather standard configuration (local disk xp/rh) - will -Original Message- From: Hoss Man (JIRA) [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 15, 2007 4:27 PM To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock [ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.p lugin.system.issuetabpanels:comment-tabpanel#action_12496115 ] Hoss Man commented on SOLR-240: --- the idea of using different lock implementations has come up in the past, http://www.nabble.com/switch-to-native-locks-by-default--tf2967095.html one reason not to hardcode native locks was because not all file systems support it -- so we left in the usage of SimpleFSLock because it's the most generally reusable. rather then change from one hardcoded lock type to another hardcoded lock type, we should support a config option for making the choice ... perhaps even adding a SolrLockFactory that defines an init(NamedList) method and creating simple SOlr sucbclasses of all the core Lucene LockFactor Imples so it's easy for people to write their own if they want (and we don't just have "if (lockType.equlas("simple"))..." type config parsing. > java.io.IOException: Lock obtain timed out: SimpleFSLock > > > Key: SOLR-240 > URL: https://issues.apache.org/jira/browse/SOLR-240 > Project: Solr > Issue Type: Bug > Components: update > Affects Versions: 1.2 > Environment: windows xp >Reporter: Will Johnson > Attachments: IndexWriter.patch, stacktrace.txt, ThrashIndex.java > > > when running the soon to be attached sample application against solr it will eventually die. this same error has happened on both windows and rh4 linux. the app is just submitting docs with an id in batches of 10, performing a commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
[ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-240: -- Attachment: IndexWriter.patch I found this: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or g/apache/lucene/store/NativeFSLockFactory.html And so I made the attached patch which seems to run at least 100x longer than without. - will > java.io.IOException: Lock obtain timed out: SimpleFSLock > > > Key: SOLR-240 > URL: https://issues.apache.org/jira/browse/SOLR-240 > Project: Solr > Issue Type: Bug > Components: update >Affects Versions: 1.2 > Environment: windows xp > Reporter: Will Johnson > Attachments: IndexWriter.patch, stacktrace.txt, ThrashIndex.java > > > when running the soon to be attached sample application against solr it will > eventually die. this same error has happened on both windows and rh4 linux. > the app is just submitting docs with an id in batches of 10, performing a > commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
[ https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-240: -- Attachment: stacktrace.txt ThrashIndex.java > java.io.IOException: Lock obtain timed out: SimpleFSLock > > > Key: SOLR-240 > URL: https://issues.apache.org/jira/browse/SOLR-240 > Project: Solr > Issue Type: Bug > Components: update >Affects Versions: 1.2 > Environment: windows xp > Reporter: Will Johnson > Attachments: stacktrace.txt, ThrashIndex.java > > > when running the soon to be attached sample application against solr it will > eventually die. this same error has happened on both windows and rh4 linux. > the app is just submitting docs with an id in batches of 10, performing a > commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock
java.io.IOException: Lock obtain timed out: SimpleFSLock Key: SOLR-240 URL: https://issues.apache.org/jira/browse/SOLR-240 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.2 Environment: windows xp Reporter: Will Johnson when running the soon to be attached sample application against solr it will eventually die. this same error has happened on both windows and rh4 linux. the app is just submitting docs with an id in batches of 10, performing a commit then repeating over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-239: -- Attachment: IndexSchemaStream2.patch patch updated. now with the added benefit of compiling. > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-239: -- Attachment: IndexSchemaStream.patch patch with test cases attached. i also had to change raw-schema.jsp to be a redirect to get-files.jsp however it wasn't clear that raw-schema.jsp was in use anymore. > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-239) Read IndexSchema from InputStream instead of Config file
Read IndexSchema from InputStream instead of Config file Key: SOLR-239 URL: https://issues.apache.org/jira/browse/SOLR-239 Project: Solr Issue Type: Improvement Affects Versions: 1.2 Environment: all Reporter: Will Johnson Priority: Minor Fix For: 1.2 Soon to follow patch adds a constructor to IndexSchema to allow them to be created directly from InputStreams. The overall logic for the Core's use of the IndexSchema creation/use does not change however this allows java clients like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is parsed, the client can inspect an index's capabilities which is useful for building generic search UI's. ie provide a drop down list of fields to search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Updated: (SOLR-217) schema option to ignore unused fields
Any update on this? I'm one little * away from having a clean build/test. - will -Original Message- From: Hoss Man (JIRA) [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 01, 2007 7:42 PM To: solr-dev@lucene.apache.org Subject: [jira] Updated: (SOLR-217) schema option to ignore unused fields [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p lugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-217: -- Attachment: ignoreUnnamedFields_v3.patch added a simple test to the existing patch. one thing to note is that this will result in the field being "ignored" if you try to query on it as well ... but this is a more general problem of qhat to do when people try to query on an unindexed field (see SOLR-223) will commit in a day or so barring objections > schema option to ignore unused fields > - > > Key: SOLR-217 > URL: https://issues.apache.org/jira/browse/SOLR-217 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch, ignoreUnnamedFields_v3.patch > > > One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint. this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-86) [PATCH] standalone updater cli based on httpClient
[ https://issues.apache.org/jira/browse/SOLR-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493784 ] Will Johnson commented on SOLR-86: -- has anyone brought up the idea of creating post.bat and post.sh scripts that use this java class instead of the curl example that currently ships in example/exampledocs? it would be one less thing for people to figure out and possibly screw up. > [PATCH] standalone updater cli based on httpClient > --- > > Key: SOLR-86 > URL: https://issues.apache.org/jira/browse/SOLR-86 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Thorsten Scherler > Assigned To: Erik Hatcher > Attachments: simple-post-tool-2007-02-15.patch, > simple-post-tool-2007-02-16.patch, > simple-post-using-urlconnection-approach.patch, solr-86.diff, solr-86.diff > > > We need a cross platform replacement for the post.sh. > The attached code is a direct replacement of the post.sh since it is actually > doing the same exact thing. > In the future one can extend the CLI with other feature like auto commit, > etc.. > Right now the code assumes that SOLR-85 is applied since we using the servlet > of this issue to actually do the update. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-20) A simple Java client for updating and searching
[ https://issues.apache.org/jira/browse/SOLR-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492902 ] Will Johnson commented on SOLR-20: -- the new api's work great, thanks! what's the plan for this going forward? id' like to start doing some work on this as it's rather critical to my current project and an are i've dealt with a lot in the past. assuming it's not getting dumped into org.apache.* land any time soon are you accepting patches to this code? if so i have some modifications to the api's that i think will make them easier to use (such as a method to set FacetParams on SolrQuery) and i'll even flush out the SolrServerTest for fun. also, i noticed that all the methods on SolrServer throw undeclared SolrExceptions which extends RuntimeException when things so south. should those throw some other sort of non-ignorable exception like a new SolrServerException? while it made coding/compiling easier to leave out all the usually required try's and catches it made running/debugging much less enjoyable. - will > A simple Java client for updating and searching > --- > > Key: SOLR-20 > URL: https://issues.apache.org/jira/browse/SOLR-20 > Project: Solr > Issue Type: New Feature > Components: clients - java > Environment: all >Reporter: Darren Erik Vengroff >Priority: Minor > Attachments: DocumentManagerClient.java, DocumentManagerClient.java, > solr-client-java-2.zip.zip, solr-client-java.zip, solr-client-sources.jar, > solr-client.zip, solr-client.zip, solr-client.zip, > solrclient_addqueryfacet.zip, SolrClientException.java, > SolrServerException.java > > > I wrote a simple little client class that can connect to a Solr server and > issue add, delete, commit and optimize commands using Java methods. I'm > posting here for review and comments as suggested by Yonik. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-20) A simple Java client for updating and searching
[ https://issues.apache.org/jira/browse/SOLR-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492700 ] Will Johnson commented on SOLR-20: -- the trunk version at http://solrstuff.org/svn/solrj/ seems to be missing a dependency and a copy of SolrParams. ant returns compile: [javac] Compiling 40 source files to C:\data\workspace\solrj\bin [javac] C:\data\workspace\solrj\src\org\apache\solr\client\solrj\impl\XMLResponseParser.java:10: package javax.xml.stream does not exist [javac] import javax.xml.stream.XMLInputFactory; [javac] C:\data\workspace\solrj\src\org\apache\solr\client\solrj\query\SolrQuery.java:10: cannot find symbol [javac] symbol : class SolrParams [javac] location: package org.apache.solr.util [javac] import org.apache.solr.util.SolrParams; > A simple Java client for updating and searching > --- > > Key: SOLR-20 > URL: https://issues.apache.org/jira/browse/SOLR-20 > Project: Solr > Issue Type: New Feature > Components: clients - java > Environment: all >Reporter: Darren Erik Vengroff >Priority: Minor > Attachments: DocumentManagerClient.java, DocumentManagerClient.java, > solr-client-java-2.zip.zip, solr-client-java.zip, solr-client-sources.jar, > solr-client.zip, solr-client.zip, solr-client.zip, > solrclient_addqueryfacet.zip, SolrClientException.java, > SolrServerException.java > > > I wrote a simple little client class that can connect to a Solr server and > issue add, delete, commit and optimize commands using Java methods. I'm > posting here for review and comments as suggested by Yonik. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-217) schema option to ignore unused fields
[ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-217: -- Attachment: ignoreUnnamedFields_v3.patch v3 patch included. this version of the patch also takes into account the suggested example/solr/conf/schema.xml changes. > schema option to ignore unused fields > - > > Key: SOLR-217 > URL: https://issues.apache.org/jira/browse/SOLR-217 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: ignoreNonIndexedNonStoredField.patch, > ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch > > > One thing that causes problems for me (and i assume others) is that Solr is > schema-strict in that unknown fields cause solr to throw exceptions and there > is no way to relax this constraint. this can cause all sorts of serious > problems if you have automated feeding applications that do things like > SELECT * FROM table1 or where you want to add other fields to the document > for processing purposes before sending them to solr but don't want to deal > with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-217) schema option to ignore unused fields
[ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492664 ] Will Johnson commented on SOLR-217: --- since we now have required fields (http://issues.apache.org/jira/browse/SOLR-181) any chance we can have ignored fields as well? let me know if something else needs to be done to the patch but as far as i can tell the code works and people seem to agree that it's the correct approach. - will > schema option to ignore unused fields > - > > Key: SOLR-217 > URL: https://issues.apache.org/jira/browse/SOLR-217 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: ignoreNonIndexedNonStoredField.patch, > ignoreUnnamedFields.patch > > > One thing that causes problems for me (and i assume others) is that Solr is > schema-strict in that unknown fields cause solr to throw exceptions and there > is no way to relax this constraint. this can cause all sorts of serious > problems if you have automated feeding applications that do things like > SELECT * FROM table1 or where you want to add other fields to the document > for processing purposes before sending them to solr but don't want to deal > with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (SOLR-217) schema option to ignore unused fields
I agree, the default schema should preserve the strictness of the existing core as it's already helped me figure out more than a few problems. Having the documented option to bypass that error is also nice. Fyi: the second patch does include a log.finest() message about ignoring the field. I wasn't sure what level would be appropriate but that was the same level used in the rest of the class. - will -Original Message- From: J.J. Larrea (JIRA) [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 2:54 PM To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-217) schema option to ignore unused fields [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p lugin.system.issuetabpanels:comment-tabpanel#action_12492369 ] J.J. Larrea commented on SOLR-217: -- +1 to Hoss' elaboration of Yonik's suggested approach, except for reverse-compatibility (where we DO want an error for unknown fields) schema.xml should probably read something like: ... > schema option to ignore unused fields > - > > Key: SOLR-217 > URL: https://issues.apache.org/jira/browse/SOLR-217 > Project: Solr > Issue Type: Improvement > Components: update > Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch > > > One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint. this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-217) schema option to ignore unused fields
[ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-217: -- Attachment: ignoreNonIndexedNonStoredField.patch I like that solution and I can definitely see the advantages of having dumb_*=ignored and so on. How does this patch sound instead of the previous: public Field createField(SchemaField field, String externalVal, float boost) { String val; try { val = toInternal(externalVal); } catch (NumberFormatException e) { throw new SolrException(500, "Error while creating field '" + field + "' from value '" + externalVal + "'", e, false); } if (val==null) return null; if (!field.indexed() && !field.stored()) { log.finest("Ignoring unindexed/unstored field: " + field); return null; } ... blah blah blah - will > schema option to ignore unused fields > - > > Key: SOLR-217 > URL: https://issues.apache.org/jira/browse/SOLR-217 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: ignoreNonIndexedNonStoredField.patch, > ignoreUnnamedFields.patch > > > One thing that causes problems for me (and i assume others) is that Solr is > schema-strict in that unknown fields cause solr to throw exceptions and there > is no way to relax this constraint. this can cause all sorts of serious > problems if you have automated feeding applications that do things like > SELECT * FROM table1 or where you want to add other fields to the document > for processing purposes before sending them to solr but don't want to deal > with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (SOLR-217) schema option to ignore unused fields
So are you proposing that the DocumentBuilder check those properties on the field before it adds the field or do we need to add checks everywhere else to make sure nothing happens? I'm happy to make either change and resubmit the patch. - will -Original Message- From: Erik Hatcher (JIRA) [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 12:11 PM To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-217) schema option to ignore unused fields [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p lugin.system.issuetabpanels:comment-tabpanel#action_12492332 ] Erik Hatcher commented on SOLR-217: --- I like Yonik's suggestion of allowing unstored+unindexed fields to be no-op. > schema option to ignore unused fields > - > > Key: SOLR-217 > URL: https://issues.apache.org/jira/browse/SOLR-217 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: ignoreUnnamedFields.patch > > > One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint. this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-217) schema option to ignore unused fields
[ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492326 ] Will Johnson commented on SOLR-217: --- i was actually taking this requirement from the other enterprise search engines that i've worked with that do this by default. ie, solr is different in this case. your *->nothing method sounds good as well but it doesn't seem as obvious to someone reading the schema or trying to feed data. you might also run into problems later on when there are other properties for 'things to do' for fields other than indexing and searching. - will > schema option to ignore unused fields > - > > Key: SOLR-217 > URL: https://issues.apache.org/jira/browse/SOLR-217 > Project: Solr > Issue Type: Improvement > Components: update > Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: ignoreUnnamedFields.patch > > > One thing that causes problems for me (and i assume others) is that Solr is > schema-strict in that unknown fields cause solr to throw exceptions and there > is no way to relax this constraint. this can cause all sorts of serious > problems if you have automated feeding applications that do things like > SELECT * FROM table1 or where you want to add other fields to the document > for processing purposes before sending them to solr but don't want to deal > with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-217) schema option to ignore unused fields
[ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-217: -- Attachment: ignoreUnnamedFields.patch the attached patch solve this problme by adding a new option to schema.xml to allow unnamed fields including those that don't match dynamic fields to be ignored. the default is false if the attribute is missing which is consistent with existing SOLR functionality. if you want to enable this feature the schema.xml would look like: blah blah blah ... blah blah blah ... > schema option to ignore unused fields > - > > Key: SOLR-217 > URL: https://issues.apache.org/jira/browse/SOLR-217 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: ignoreUnnamedFields.patch > > > One thing that causes problems for me (and i assume others) is that Solr is > schema-strict in that unknown fields cause solr to throw exceptions and there > is no way to relax this constraint. this can cause all sorts of serious > problems if you have automated feeding applications that do things like > SELECT * FROM table1 or where you want to add other fields to the document > for processing purposes before sending them to solr but don't want to deal > with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-217) schema option to ignore unused fields
schema option to ignore unused fields - Key: SOLR-217 URL: https://issues.apache.org/jira/browse/SOLR-217 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.2 Reporter: Will Johnson Priority: Minor Fix For: 1.2 Attachments: ignoreUnnamedFields.patch One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint. this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.