[jira] [Commented] (SOLR-3975) Document Summarization toolkit, using LSA techniques
[ https://issues.apache.org/jira/browse/SOLR-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482972#comment-13482972 ] Otis Gospodnetic commented on SOLR-3975: Nice, 170KB patch there Lance! :) I see lots of classes don't have ASL btw. > Document Summarization toolkit, using LSA techniques > > > Key: SOLR-3975 > URL: https://issues.apache.org/jira/browse/SOLR-3975 > Project: Solr > Issue Type: New Feature >Reporter: Lance Norskog >Priority: Minor > Attachments: 4.1.summary.patch, reuters.sh > > > This package analyzes sentences and words as used across sentences to rank > the most important sentences and words. The general topic is called "document > summarization" and is a popular research topic in textual analysis. > How to use: > 1) Check out the 4.x branch, apply the patch, build, and run the solr/example > instance. > 2) Download the first Reuters article corpus from: > http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz > 3) Unpack this into a directory. > 4) Run the attached 'reuters.sh' script: > sh reuters.sh directory http://localhost:8983/solr/collection1 > 5) Wait several minutes. > Now go to http://localhost:8983/solr/collection1/browse?summary=true and look > at the large gray box marked 'Document Summary'. This has a table of > statistics about the analysis, the three most important sentences, and > several of the most important words in the documents. The sentences have the > important words in italics. > The code is packaged as a search component and as an analysis handler. The > /browse demo uses the search component, and you can also post raw text to > http://localhost:8983/solr/collection1/analysis/summary. Here is a sample > command: > {code} > curl -s > "http://localhost:8983/solr/analysis/summary?indent=true&echoParams=all&file=$FILE&wt=xml"; > --data-binary @$FILE -H 'Content-type:application/xml' > {code} > This is an implementation of LSA-based document summarization. A short > explanation and a long evaluation are described in my blog, [Uncle Lance's > Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: > [http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482960#comment-13482960 ] Shawn Heisey commented on SOLR-1972: Solr handler statistics were already published in JMX, this just added some entries. I pulled up jconsole and connected to my patched solr server and the new stats were there. > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement > Components: web gui >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Fix For: 4.1 > > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972.patch, SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482950#comment-13482950 ] Otis Gospodnetic commented on SOLR-1395: Does anyone really need this? If so, I'm curious why? Or should we close this? > Integrate Katta > --- > > Key: SOLR-1395 > URL: https://issues.apache.org/jira/browse/SOLR-1395 > Project: Solr > Issue Type: New Feature >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 4.1 > > Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, > katta-core-0.6-dev.jar, katta.node.properties, katta-solrcores.jpg, > katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, > solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, > solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr1395.jpg, > solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, > solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, > solr-1395-katta-0.6.3-4.patch, solr-1395-katta-0.6.3-5.patch, > solr-1395-katta-0.6.3-6.patch, solr-1395-katta-0.6.3-7.patch, > SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, > test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar > > Original Estimate: 336h > Remaining Estimate: 336h > > We'll integrate Katta into Solr so that: > * Distributed search uses Hadoop RPC > * Shard/SolrCore distribution and management > * Zookeeper based failover > * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482948#comment-13482948 ] Otis Gospodnetic commented on SOLR-1972: [~romseygeek] are these metrics being published in JMX? They should, and Code's Metrics should make that easy. I had 2.5 second look at the patch and didn't find any mentions of "jmx". > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement > Components: web gui >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Fix For: 4.1 > > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972.patch, SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-1972: --- Component/s: web gui > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement > Components: web gui >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Fix For: 4.1 > > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972.patch, SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-1972: --- Fix Version/s: 4.1 > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement > Components: web gui >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Fix For: 4.1 > > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972.patch, SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_07) - Build # 1959 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1959/ Java: 32bit/jdk1.7.0_07 -server -XX:+UseSerialGC All tests passed Build Log: [...truncated 24574 lines...] -documentation-lint: [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html [exec] missing Constructors: KNearestNeighborClassifier(int) [exec] [exec] build/docs/classification/org/apache/lucene/classification/ClassificationResult.html [exec] missing Constructors: ClassificationResult(java.lang.String, double) [exec] missing Methods: getAssignedClass() [exec] missing Methods: getScore() [exec] [exec] Missing javadocs were found! BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:252: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1919: exec returned: 1 Total time: 32 minutes 15 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.7.0_07 -server -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3982) No way to get current dataimport status from admin GUI
Shawn Heisey created SOLR-3982: -- Summary: No way to get current dataimport status from admin GUI Key: SOLR-3982 URL: https://issues.apache.org/jira/browse/SOLR-3982 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0 Reporter: Shawn Heisey Fix For: 4.1 The dataimport section under each core on the admin gui does not provide a way to get the current import status. I actually would like to see it automatically pull the status as soon as you click on "Dataimport" ... I have never seen an import status with a qtime above 1 millisecond. A refresh icon/link would be good to have as well. Additional note: the resulting URL in the address bar is a little odd: http://server:port/solr/#/corename/dataimport//dataimport -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482921#comment-13482921 ] Shawn Heisey commented on SOLR-1972: I never know when things are threadsafe, although it does seem to work. Does a static class member variable 'automatically' become threadsafe, or would volatile be required? Was protected the right way to do that? > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972.patch, SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3336 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3336/ All tests passed Build Log: [...truncated 24623 lines...] -documentation-lint: [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html [exec] missing Constructors: KNearestNeighborClassifier(int) [exec] [exec] build/docs/classification/org/apache/lucene/classification/ClassificationResult.html [exec] missing Constructors: ClassificationResult(java.lang.String, double) [exec] missing Methods: getAssignedClass() [exec] missing Methods: getScore() [exec] [exec] Missing javadocs were found! BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/build.xml:60: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build.xml:252: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/common-build.xml:1919: exec returned: 1 Total time: 43 minutes 24 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-1972: --- Attachment: SOLR-1972_metrics.patch I think I fixed it. It looks like if you pass the same combination of arguments to the newCounter/newTimer methods, you actually get back the same object as the last time it was called with those parameters, not a new one. There is an alternate form of the constructor that takes a "scope" argument. I could have appended the new value to the name argument, but since they were kind enough to provide something separate... > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972.patch, SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482915#comment-13482915 ] Robert Muir commented on SOLR-3981: --- that adoc() you are using doesnt work with boosts. (I found this from another test) > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > Attachments: SOLR-3981.patch, SOLR-3981.patch > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2052) Allow for a list of filter queries and a single docset filter in QueryComponent
[ https://issues.apache.org/jira/browse/SOLR-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Daubman updated SOLR-2052: Attachment: SOLR-2052-4_0_0.patch SOLR-2052-trunk.patch Attaching patches against 4_0_0 and trunk > Allow for a list of filter queries and a single docset filter in > QueryComponent > --- > > Key: SOLR-2052 > URL: https://issues.apache.org/jira/browse/SOLR-2052 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0-ALPHA > Environment: Mac OS X, Java 1.6 >Reporter: Stephen Green >Priority: Minor > Fix For: 4.1 > > Attachments: SOLR-2052-2.patch, SOLR-2052-3-6-1.patch, > SOLR-2052-3.patch, SOLR-2052-4_0_0.patch, SOLR-2052-4.patch, SOLR-2052.patch, > SOLR-2052-trunk.patch > > > SolrIndexSearcher.QueryCommand allows you to specify a list of filter queries > or a single filter (as a DocSet), but not both. This restriction seems > arbitrary, and there are cases where we can have both a list of filter > queries and a DocSet generated by some other non-query process (e.g., > filtering documents according to IDs pulled from some other source like a > database.) > Fixing this requires a few small changes to SolrIndexSearcher to allow both > of these to be set for a QueryCommand and to take both into account when > evaluating the query. It also requires a modification to ResponseBuilder to > allow setting the single filter at query time. > I've run into this against 1.4, but the same holds true for the trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-3981: --- Attachment: SOLR-3981.patch bq. i want to work on a test that actually indexes a doc and inspects the encoded norms just to be certain i'm not missing something. Updated patch adds this to the test -- kludgy to reach this deep into the lucene code in the solr test, but do-able. Unfortunately the test fails because the decoded norms from the index wind up being way _lower_ then the expected values. At first i thought it was just because i forgot to factor in the term length in my expected norm, but even taking that into account the numbers are still way off. i'm guessing either i don't understand something about the new 4.0 APIs for getting the DocValues/Norms, or i've got some trivially silly bug that i'm blind too because i've been staring at it too long. I'd appreciate a second set of eyes. > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > Attachments: SOLR-3981.patch, SOLR-3981.patch > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-1604: --- Attachment: ComplexPhrase.zip Includes README.txt that contain instruction for Solr 4.0.0 > Wildcards, ORs etc inside Phrase Queries > > > Key: SOLR-1604 > URL: https://issues.apache.org/jira/browse/SOLR-1604 > Project: Solr > Issue Type: Improvement > Components: query parsers, search >Affects Versions: 1.4 >Reporter: Ahmet Arslan >Priority: Minor > Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, > ComplexPhraseQueryParser.java, ComplexPhrase.zip, ComplexPhrase.zip, > ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, > SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch > > > Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports > wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b58) - Build # 1957 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1957/ Java: 32bit/jdk1.8.0-ea-b58 -client -XX:+UseSerialGC All tests passed Build Log: [...truncated 22501 lines...] [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene... [javadoc] warning: [options] bootstrap class path not set in conjunction with -source 1.7 [javadoc] Loading source files for package org.apache.lucene.analysis... [javadoc] Loading source files for package org.apache.lucene.analysis.tokenattributes... [javadoc] Loading source files for package org.apache.lucene.codecs... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene40... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene40.values... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene41... [javadoc] Loading source files for package org.apache.lucene.codecs.perfield... [javadoc] Loading source files for package org.apache.lucene.document... [javadoc] Loading source files for package org.apache.lucene.index... [javadoc] Loading source files for package org.apache.lucene.search... [javadoc] Loading source files for package org.apache.lucene.search.payloads... [javadoc] Loading source files for package org.apache.lucene.search.similarities... [javadoc] Loading source files for package org.apache.lucene.search.spans... [javadoc] Loading source files for package org.apache.lucene.store... [javadoc] Loading source files for package org.apache.lucene.util... [javadoc] Loading source files for package org.apache.lucene.util.automaton... [javadoc] Loading source files for package org.apache.lucene.util.fst... [javadoc] Loading source files for package org.apache.lucene.util.mutable... [javadoc] Loading source files for package org.apache.lucene.util.packed... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.8.0-ea [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/docs/core/help-doc.html... [javadoc] 1 warning [...truncated 44 lines...] [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.analysis.ar... [javadoc] warning: [options] bootstrap class path not set in conjunction with -source 1.7 [javadoc] Loading source files for package org.apache.lucene.analysis.bg... [javadoc] Loading source files for package org.apache.lucene.analysis.br... [javadoc] Loading source files for package org.apache.lucene.analysis.ca... [javadoc] Loading source files for package org.apache.lucene.analysis.charfilter... [javadoc] Loading source files for package org.apache.lucene.analysis.cjk... [javadoc] Loading source files for package org.apache.lucene.analysis.commongrams... [javadoc] Loading source files for package org.apache.lucene.analysis.compound... [javadoc] Loading source files for package org.apache.lucene.analysis.compound.hyphenation... [javadoc] Loading source files for package org.apache.lucene.analysis.core... [javadoc] Loading source files for package org.apache.lucene.analysis.cz... [javadoc] Loading source files for package org.apache.lucene.analysis.da... [javadoc] Loading source files for package org.apache.lucene.analysis.de... [javadoc] Loading source files for package org.apache.lucene.analysis.el... [javadoc] Loading source files for package org.apache.lucene.analysis.en... [javadoc] Loading source files for package org.apache.lucene.analysis.es... [javadoc] Loading source files for package org.apache.lucene.analysis.eu... [javadoc] Loading source files for package org.apache.lucene.analysis.fa... [javadoc] Loading source files for package org.apache.lucene.analysis.fi... [javadoc] Loading source files for package org.apache.lucene.analysis.fr... [javadoc] Loading source files for package org.apache.lucene.analysis.ga... [javadoc] Loading source files for package org.apache.lucene.analysis.gl... [javadoc] Loading source files for package org.apache.lucene.analysis.hi... [javadoc] Loading source files for package org.apache.lucene.analysis.hu... [javadoc] Loading source files for package org.apache.lucene.analysis.hunspell... [javadoc] Loading source files for package org.apache.lucene.analysis.hy... [javadoc] Loading source files for package org.apache.lucene.analysis.id... [javadoc] Loading source files for package org.apache.lucene.analysis.in... [javadoc] Loading source files for package org.apache.lucene.analysis.it... [javadoc] Loading source files for package org.apache.lucene.analysis.lv... [javadoc] Loading source files for package org.apache.lucene.analysis.miscellaneous... [javadoc] Loadin
[jira] [Updated] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-3981: --- Attachment: SOLR-3981.patch patch wit hteh test i was working on, as well as a fix... the Document itself can serve as the "set" to keep track of which field names have already been added. because the final boost for the field name is the product of the individual boosts, we don't have to ensure that the (solr) docBoost and (solr) fieldBoost(s) are combined into the _first_ value of each copyField -- we just have to ensure that each is only used once. (multiple copyFields with the same dest will result in them being multiplied in the final dest field's norm but that's always been true) i'm still running the full test suite, and i want to work on a test that actually indexes a doc and inspects the encoded norms just to be certain i'm not missing something. > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > Attachments: SOLR-3981.patch > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core
[ https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482774#comment-13482774 ] Mark Miller commented on SOLR-3939: --- In two other issues I was working on, unrelated changes seemed to start causing test fails in one of the solrcloud tests - it's a fail I had seen sometimes in the past on Apache jenkins. A fail about waiting to notice a live node drop. It seems that was caused by this - it took some time to trace it back here. One of the nodes doesn't see a live node change because he is stuck in a leader election loop. Given that, I plan on committing what I have so far - so it stops blocking my other two issues. We can then iterate further on trunk. > Solr Cloud recovery and leader election when unloading leader core > -- > > Key: SOLR-3939 > URL: https://issues.apache.org/jira/browse/SOLR-3939 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-BETA, 4.0 >Reporter: Joel Bernstein >Assignee: Mark Miller >Priority: Critical > Labels: 4.0.1_Candidate > Fix For: 4.1, 5.0 > > Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch > > > When a leader core is unloaded using the core admin api, the followers in the > shard go into recovery but do not come out. Leader election doesn't take > place and the shard goes down. > This effects the ability to move a micro-shard from one Solr instance to > another Solr instance. > The problem does not occur 100% of the time but a large % of the time. > To setup a test, startup Solr Cloud with a single shard. Add cores to that > shard as replicas using core admin. Then unload the leader core using core > admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_07) - Build # 1955 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1955/ Java: 32bit/jdk1.7.0_07 -client -XX:+UseParallelGC All tests passed Build Log: [...truncated 24576 lines...] -documentation-lint: [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html [exec] missing Constructors: KNearestNeighborClassifier(int) [exec] [exec] build/docs/classification/org/apache/lucene/classification/ClassificationResult.html [exec] missing Constructors: ClassificationResult(java.lang.String, double) [exec] missing Methods: getAssignedClass() [exec] missing Methods: getScore() [exec] [exec] Missing javadocs were found! BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:252: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1919: exec returned: 1 Total time: 28 minutes 8 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.7.0_07 -client -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField
[ https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482765#comment-13482765 ] Hoss Man commented on SOLR-3981: Toke suggested in SOLR-3875... {quote} One solution would be to keep track of used fields (directly specified as well as copyFields) and only assign the full boost once per document. If the number of unique fields/document is low, a simple list would probably be the fastest and with low GC impact. For a higher number of unique fields, a Set might be better. An optimization would be to only create the tracking structure once a boost != 1.0f is encountered and only store the fields with boost != 1.0f, so that an update without boosts would not get a performance penalty. {quote} I _was_ thinking that a more straight forward solution would be to build up the entire "Document" w/o any regard to the docBoost, and then only at the end loop over the fields in that Document and multiple the docBoost if it's indexed & !omitNorms -- but then i realized that at that level there is no general way to "set" the boost. I'm working on a patch with a test demonstrating the problem ... that may help inform an appropriate solution. > docBoost is compounded on copyField > --- > > Key: SOLR-3981 > URL: https://issues.apache.org/jira/browse/SOLR-3981 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > > As noted by Toke in a comment on SOLR-3875... > https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 > {quote} > While boosting of multi-value fields is handled correctly in Solr 4.0.0, > boosting for copyFields are not. A sample document: > {code} > > Insane score Example. Score = 10E9 > Document boost broken for copyFields > video ThomasEgense and Toke Eskildsen > Test > bug > something else > bug > bug > > {code} > The fields name, manu, cat, features, keywords and content gets copied to > text and a search for thomasegense matches the text-field with query > explanation > {code} > 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result > of: > 70384.67 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 229376.0 = fieldNorm(doc=0) > {code} > If the two last fields keywords and content are removed from the sample > document, the score is reduced by a factor 100 (docBoost^2). > {quote} > (This is a continuation of some of the problems caused by the changes made > when the concept of docBoost was eliminated from the underly IndexWRiter > code, and overlooked due to the lack of testing of docBoosts at the solr > level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3875) Document boost does not work correctly when using multi-valued fields
[ https://issues.apache.org/jira/browse/SOLR-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482746#comment-13482746 ] Hoss Man commented on SOLR-3875: Toke: thanks for following up - too bad we didn't catch this other problem before 4.0. I've spun off SOLR-3981 to work on this since SOLR-3875 is already resolved and listed as fixed in 4.0 (we can't (sanely) re-open issues that were recorded in CHANGES.txt for official releases since it would leave users confused as to what parts of those issues were resolved in each version) > Document boost does not work correctly when using multi-valued fields > - > > Key: SOLR-3875 > URL: https://issues.apache.org/jira/browse/SOLR-3875 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis, update >Affects Versions: 4.0-BETA >Reporter: Toke Eskildsen >Assignee: Hoss Man >Priority: Critical > Fix For: 4.0, 4.1, 5.0 > > Attachments: SOLR-3875.patch > > > In Solr 4 BETA & trunk, document boosts skews the ranking for documents with > multi value fields tremendously. A document boost of 5 combined with 15 > values in a multi value field results in scores above 1,000,000,000, while a > boost of 0,5 results in scores below 0,001. The error is not present in Solr > 3.6. > Thomas Egense and I have tracked it down to a change in Solr DocumentBuilder > committed 20110827 (@1162347) by Mike McCandless, as part of work done on > LUCENE-2308. The problem is that Lucene multiplies the boosts of multiple > instances of the same field when updating the index. > The old DocumentBuilder, used in Lucene 3.6, handled this by calculating the > score for the field (docBoost*fieldBoost) and assigning it to the first > instance of the field, then setting the boost to 1.0f and assigning that to > subsequent instances of the field. This effectively assigned > docBoost*fieldBoost to the field, regardless of the number of instances. > The updated DocumentBuilder (see > https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java?revision=1388778&view=markup), > used in Lucene 4 BETA & trunk, also assigns docBoost*fieldBoost to the first > instance of the field. Then it sets fieldBoost = docBoost and continues to > assign docBoost*fieldBoost to subsequent instances. Using the example > mentioned above, the generated IndexableFields will get assigned boosts of 5, > 5*5, 5*5... 5*5. As Lucene multiplies all the values, 15 instances of the > same field will have a collective boost of 5*25^14. > This can be demonstrated with the Solr tutorial example by indexing the > sample documents and adding the document > {code:xml} > > > Insane score Example. Score = 10E9 > Document boost broken for multivalued fields > Thomas Egense and Toke Eskildsen > Test > bug > insane_boost > something else > something else > something else > something else > something else > something else > something else > something else > something else > something else > something else > something else > something else > > > {code} > The _manu_ & _features_-fields gets copied to _text_ and a search for > _thomas_ matches the _text_-field with query explanation > {code:xml} > > 2.44373361E10 = (MATCH) weight(text:thomas in 0) [DefaultSimilarity], result > of: > 2.44373361E10 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 3.2512918 = idf(docFreq=3, maxDocs=38) > 7.5161928E9 = fieldNorm(doc=0) > > {code} > Thomas and I are too pressed for time to attempt a proper patch at the > moment, but we guess that a reversion to the old algorithm of assigning the > combined boost to the first instance and 1.0f to all subsequent instances > would work? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4501) optimize 4.1 codec's encoding of frequencies
[ https://issues.apache.org/jira/browse/LUCENE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4501: Attachment: LUCENE-4501.patch This seems to shave about 1.5% off the .doc file in my tests. I'm worried about making this PF confusing with these optimizations though. > optimize 4.1 codec's encoding of frequencies > > > Key: LUCENE-4501 > URL: https://issues.apache.org/jira/browse/LUCENE-4501 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Robert Muir > Attachments: LUCENE-4501.patch > > > If we wanted, we could encode freq-1 into the FOR blocks (since it cannot be > 0) and save some space. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4501) optimize 4.1 codec's encoding of frequencies
Robert Muir created LUCENE-4501: --- Summary: optimize 4.1 codec's encoding of frequencies Key: LUCENE-4501 URL: https://issues.apache.org/jira/browse/LUCENE-4501 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir If we wanted, we could encode freq-1 into the FOR blocks (since it cannot be 0) and save some space. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3981) docBoost is compounded on copyField
Hoss Man created SOLR-3981: -- Summary: docBoost is compounded on copyField Key: SOLR-3981 URL: https://issues.apache.org/jira/browse/SOLR-3981 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.1 As noted by Toke in a comment on SOLR-3875... https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233 {quote} While boosting of multi-value fields is handled correctly in Solr 4.0.0, boosting for copyFields are not. A sample document: {code} Insane score Example. Score = 10E9 Document boost broken for copyFields video ThomasEgense and Toke Eskildsen Test bug something else bug bug {code} The fields name, manu, cat, features, keywords and content gets copied to text and a search for thomasegense matches the text-field with query explanation {code} 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result of: 70384.67 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 0.30685282 = idf(docFreq=1, maxDocs=1) 229376.0 = fieldNorm(doc=0) {code} If the two last fields keywords and content are removed from the sample document, the score is reduced by a factor 100 (docBoost^2). {quote} (This is a continuation of some of the problems caused by the changes made when the concept of docBoost was eliminated from the underly IndexWRiter code, and overlooked due to the lack of testing of docBoosts at the solr level - SOLR-3885)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482734#comment-13482734 ] Chris Russell commented on SOLR-2894: - In my experience using this patch, it seems that it does not over-request when enforcing a limit? This is problematic because, for example, in a situation where you have many slaves and you are pivoting on a fairly evenly distributed field and setting your facet limit to X, the Xth distinct value for that field by document count on each slave is likely to be different. The result is that some facet values close to your limit boundary will not get reported for aggregation, which will make your ultimate results somewhat inaccurate. It was my impression that other facet-based features of solr over-request when there is a limit to combat this situation? For example if you specify limit 10, the distributed query might have limit 100 or 1000, and then during aggregation it would be limited to the top 10. I am working on similar functionality for this patch. > Implement distributed pivot faceting > > > Key: SOLR-2894 > URL: https://issues.apache.org/jira/browse/SOLR-2894 > Project: Solr > Issue Type: Improvement >Reporter: Erik Hatcher > Fix For: 4.1 > > Attachments: distributed_pivot.patch, distributed_pivot.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch > > > Following up on SOLR-792, pivot faceting currently only supports > undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets
[ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482727#comment-13482727 ] Terrance A. Snyder commented on SOLR-3583: -- [~selah] Please do! Your contribution is amazing and pushes SOLR into a brave new world. > Percentiles for facets, pivot facets, and distributed pivot facets > -- > > Key: SOLR-3583 > URL: https://issues.apache.org/jira/browse/SOLR-3583 > Project: Solr > Issue Type: Improvement >Reporter: Chris Russell >Priority: Minor > Labels: newbie, patch > Fix For: 4.1 > > Attachments: SOLR-3583.patch > > > Built on top of SOLR-2894 (includes Apr 25th version) this patch adds > percentiles and averages to facets, pivot facets, and distributed pivot > facets by making use of range facet internals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455182#comment-13455182 ] Chris Russell edited comment on SOLR-2894 at 10/23/12 9:45 PM: --- Regarding facet.pivot.limit.method and facet.limit, it looks like these are not checked on a per-field basis? So, if a user sets different limits for different fields and wants 'combined' limiting, that is not possible? For example a user might set: f.field1.facet.limit=10 f.field1.facet.pivot.limit.method=combined f.field2.facet.limit=20 And the combined method will not be used... If the user sets facet.pivot.limit.method=combined it looks like the same limit will be used for all fields? Whatever the global facet.limit is set to? was (Author: selah): Regarding facet.pivot.limit.method and facet.limit, it looks like these are not checked on a per-field basis? So, if a user sets different limits for different fields and wants 'combined' limiting, that is not possible? For example a user might set: f.field1.facet.limit=10 f.field1.facet.pivot.limit.method=combined f.field2.facet.limit=20 And the combined method will not be used... If the user sets facet.pivot.limit.method=combined it looks like the same limit will be used for all fields? Whatever the global facet.limit is set to? Unfortunate. > Implement distributed pivot faceting > > > Key: SOLR-2894 > URL: https://issues.apache.org/jira/browse/SOLR-2894 > Project: Solr > Issue Type: Improvement >Reporter: Erik Hatcher > Fix For: 4.1 > > Attachments: distributed_pivot.patch, distributed_pivot.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch > > > Following up on SOLR-792, pivot faceting currently only supports > undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets
[ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482725#comment-13482725 ] Chris Russell commented on SOLR-3583: - I have gotten some time recently to work on this. I have disentangled my additions from the SOLR-2894 patch, and will be making a few enhancements before attempting to make it trunk-compatible. > Percentiles for facets, pivot facets, and distributed pivot facets > -- > > Key: SOLR-3583 > URL: https://issues.apache.org/jira/browse/SOLR-3583 > Project: Solr > Issue Type: Improvement >Reporter: Chris Russell >Priority: Minor > Labels: newbie, patch > Fix For: 4.1 > > Attachments: SOLR-3583.patch > > > Built on top of SOLR-2894 (includes Apr 25th version) this patch adds > percentiles and averages to facets, pivot facets, and distributed pivot > facets by making use of range facet internals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3964) Solr does not return error, even though create collection unsuccessfully
[ https://issues.apache.org/jira/browse/SOLR-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482689#comment-13482689 ] Mark Miller commented on SOLR-3964: --- bq. Solr does not return error, This is a limitation of the current collections API - you don't currently get a response - it just drops the create command on the queue where the overseer will pull it. Optionally waiting around for completion or adding some way to check the status of the cmd is something we need to add. > Solr does not return error, even though create collection unsuccessfully > - > > Key: SOLR-3964 > URL: https://issues.apache.org/jira/browse/SOLR-3964 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: milesli > Labels: lack, message, response > Original Estimate: 6h > Remaining Estimate: 6h > > Solr does not return error, > even though create/delete collection unsuccessfully; > even though the request URL is incorrect; > (example: > http://127.0.0.1:8983/solr/admin/collections?action=CREATE&name=tenancy_milesnumShards=3&numReplicas=2&collection.configName=myconf) > even though pass the collection name already exists; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_07) - Build # 1953 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1953/ Java: 32bit/jdk1.7.0_07 -server -XX:+UseSerialGC All tests passed Build Log: [...truncated 24575 lines...] -documentation-lint: [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html [exec] missing Constructors: KNearestNeighborClassifier(int) [exec] [exec] build/docs/classification/org/apache/lucene/classification/ClassificationResult.html [exec] missing Constructors: ClassificationResult(java.lang.String, double) [exec] missing Methods: getAssignedClass() [exec] missing Methods: getScore() [exec] [exec] Missing javadocs were found! BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:252: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1919: exec returned: 1 Total time: 28 minutes 46 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.7.0_07 -server -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1401343 - /lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java
Just peeking at the code, Tommaso -- Map classCounts = new HashMap(); this will cause unpredictable results in case of class ties (hash map order varies from vm to vm). I think it'd be better to make it an ordered set (on the class name for example), then the ties would always be broken in the same way. Alternatively, you can add another condition in the if loop searching for the winning class (and resolving the ties). Just a thought. D. On Tue, Oct 23, 2012 at 6:32 PM, wrote: > Author: tommaso > Date: Tue Oct 23 16:32:05 2012 > New Revision: 1401343 > > URL: http://svn.apache.org/viewvc?rev=1401343&view=rev > Log: > [LUCENE-4345] - adding @lucene.experimental annotation to kNN > > Modified: > > lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java > > Modified: > lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java > URL: > http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java?rev=1401343&r1=1401342&r2=1401343&view=diff > == > --- > lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java > (original) > +++ > lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java > Tue Oct 23 16:32:05 2012 > @@ -33,6 +33,7 @@ import java.util.Map; > /** > * A k-Nearest Neighbor classifier (see > http://en.wikipedia.org/wiki/K-nearest_neighbors) based > * on {@link MoreLikeThis} > + * @lucene.experimental > */ > public class KNearestNeighborClassifier implements Classifier { > > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482647#comment-13482647 ] Alan Woodward commented on SOLR-1972: - Hm, OK. I'm creating the various Metrics objects in the base class constructor, and registering them by class using this.getClass(). Only problem here is that in a super constructor, getClass() returns the superclass. Oops. If I move the object creation to init() I get other errors, because RequestHandlers are registered with JMX before init() is called, and JMX calls getStatistics() to get all the various measurement names and register them. Maybe put a guard in getStatistics to check if the counters are null, and if they are, instantiate them? Seems a bit hacky though. Let me have a think about this. In re the precision of the measurements, the jscript in the front end could presumably round them to 2 sig figs - that way they look prettier in the UI, but are still precise for any client that wants to use it. > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4498. - Resolution: Fixed Fix Version/s: 5.0 4.1 > pulse docfreq=1 DOCS_ONLY for 4.1 codec > --- > > Key: LUCENE-4498 > URL: https://issues.apache.org/jira/browse/LUCENE-4498 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Robert Muir > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, > LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch > > > We have pulsing codec, but currently this has some downsides: > * its very general, wrapping an arbitrary postingsformat and pulsing > everything in the postings for an arbitrary docfreq/totalTermFreq cutoff > * reuse is hairy: because it specializes its enums based on these cutoffs, > when walking thru terms e.g. merging there is a lot of sophisticated stuff to > avoid the worst cases where we clone indexinputs for tons of terms. > On the other hand the way the 4.1 codec encodes "primary key" fields is > pretty silly, we write the docStartFP vlong in the term dictionary metadata, > which tells us where to seek in the .doc to read our one lonely vint. > I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just > write the lone doc delta where we would write docStartFP. > We can avoid the hairy reuse problem too, by just supporting this in > refillDocs() in BlockDocsEnum instead of specializing. > This would remove the additional seek for "primary key" fields without really > any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3964) Solr does not return error, even though create collection unsuccessfully
[ https://issues.apache.org/jira/browse/SOLR-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482575#comment-13482575 ] Hoss Man commented on SOLR-3964: can you please elaborate on what problem you are seeing. specifically: 1) how are you running solr and what do your configs look like (ie: it appears you are running in cloud mode, but that's not certain) 2) what commands/requests do you execute that don't behave the way you expect 3) whta response do you expect from those commands/requests 4) what response do you _actually_ get from those commands/requests off the cuff i suspect that unless you made a cut/paste mistake when creating this issue, the problem you are having is that you are missing a "&" in your URL, and what solr is doing is creating a collection with the name "tenancy_milesnumShards=3" when what you really want is a collection named "tenancy_miles" (which you imply already exists, but haven't provided any concrete details for us to be certain) > Solr does not return error, even though create collection unsuccessfully > - > > Key: SOLR-3964 > URL: https://issues.apache.org/jira/browse/SOLR-3964 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: milesli > Labels: lack, message, response > Original Estimate: 6h > Remaining Estimate: 6h > > Solr does not return error, > even though create/delete collection unsuccessfully; > even though the request URL is incorrect; > (example: > http://127.0.0.1:8983/solr/admin/collections?action=CREATE&name=tenancy_milesnumShards=3&numReplicas=2&collection.configName=myconf) > even though pass the collection name already exists; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: branches/lucene_solr_4_0 and 4.0.1?
On Tue, Oct 23, 2012 at 2:10 PM, Mark Miller wrote: > I'd say two things: > > there are def some bad bugs already that would warrant a 4.01. > > I'd push for 4.1 well before jan. +1 I'd add that just because there hasn't been a lot of time to find additional bugs in 4.0 doesn't mean that we should artificially delay a 4.0.1. If/when more bugs are found after that, we can always do a 4.0.2 (if 4.1 still isn't imminent). -Yonik http://lucidworks.com > - Mark > > Sent from my iPhone > > On Oct 19, 2012, at 6:57 AM, Erick Erickson wrote: > >> Personally, I suspect that enough people are going to hop on the 4.0 >> code that _something_ will come bubbling up out of the cracks that >> needs to be addressed. I mean there's a lot that's in that release, plus >> things that people are geeked to try. Not necessarily killer bugs, more >> like enhancements. >> >> So I'm rather expecting a relatively quick turn-around for 4.1 and wouldn't >> push for a 4.0.1 unless and until there's a killer bug. Which, as Robert >> says, there aren't any examples of in the CHANGES file yet, so no reason >> for a 4.0.1. >> >> I'll throw out a straw-man proposal of targeting January for 4.1. Not a hard >> date, more a proposal for taking stock after the Holidays and seeing what >> we think. >> >> Besides, even though I don't hava a hand in it, is such a pain, especially >> for people who'd rather be coding >> >> Erick >> >> On Thu, Oct 18, 2012 at 7:58 PM, Robert Muir wrote: >>> On Thu, Oct 18, 2012 at 4:53 PM, Mark Miller wrote: I don't think a 4.0.1 would be strange at all. >>> >>> I just think it would be strange since there aren't really any serious >>> bugs yet in the lucene CHANGES.txt? I also don't think there has been >>> enough time for anyone to actually find any bugs, its only been like 6 >>> days since we released. >>> 4.X is essentially trunk to me now. I would put in changes that I want to bake for future 4.1, 4.2, 4.3, etc changes. >>> >>> Sure, well there aren't many architectural changes yet since 4.0, and >>> currently we have the ability to make and bake large changes to lucene >>> in many cases (block postings format, compressed stored fields, etc) >>> without introducing risk, since they are just experimental until we >>> decide to fold them into the default. >>> >>> But personally as soon I hit some limit in the codec API (which I >>> expect will happen), or want to work on something biggish like >>> positions iterators, I'll be looking at doing that kind of breaking >>> change only in trunk. >>> >>> I just think we shouldn't hold back from that: we should develop in a >>> correct and safe way and not backport scary stuff or majorly break >>> APIs to get them out faster, instead 4.x should stay stable and we >>> should plan on 5.x being in our own lifetimes. >>> >>> i dont want there to be the assumption that 5.0 is 3 years out. >>> When you have bad bugs, you don't want to worry about what's baking - you just want to put out a bug fix release. >>> >>> I totally agree with this! But I have serious concerns about the >>> ability for this community to say "hey we fixed some nasty shit, lets >>> get a bugfix out ASAP". Nobody is really testing until release >>> candidates are issued, the 72-hour voting period designed to be fair >>> to devs in different timezones is bastardized as some iterative QA >>> cycle, etc etc. >>> >>> So if we are going to go thru all the trouble, I'd rather it be a 4.1 >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_07) - Build # 1951 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1951/ Java: 32bit/jdk1.7.0_07 -client -XX:+UseSerialGC All tests passed Build Log: [...truncated 24575 lines...] -documentation-lint: [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html [exec] missing Constructors: KNearestNeighborClassifier(int) [exec] [exec] build/docs/classification/org/apache/lucene/classification/ClassificationResult.html [exec] missing Constructors: ClassificationResult(java.lang.String, double) [exec] missing Methods: getAssignedClass() [exec] missing Methods: getScore() [exec] [exec] Missing javadocs were found! BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:252: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1919: exec returned: 1 Total time: 30 minutes 39 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.7.0_07 -client -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: branches/lucene_solr_4_0 and 4.0.1?
I'd say two things: there are def some bad bugs already that would warrant a 4.01. I'd push for 4.1 well before jan. - Mark Sent from my iPhone On Oct 19, 2012, at 6:57 AM, Erick Erickson wrote: > Personally, I suspect that enough people are going to hop on the 4.0 > code that _something_ will come bubbling up out of the cracks that > needs to be addressed. I mean there's a lot that's in that release, plus > things that people are geeked to try. Not necessarily killer bugs, more > like enhancements. > > So I'm rather expecting a relatively quick turn-around for 4.1 and wouldn't > push for a 4.0.1 unless and until there's a killer bug. Which, as Robert > says, there aren't any examples of in the CHANGES file yet, so no reason > for a 4.0.1. > > I'll throw out a straw-man proposal of targeting January for 4.1. Not a hard > date, more a proposal for taking stock after the Holidays and seeing what > we think. > > Besides, even though I don't hava a hand in it, is such a pain, especially > for people who'd rather be coding > > Erick > > On Thu, Oct 18, 2012 at 7:58 PM, Robert Muir wrote: >> On Thu, Oct 18, 2012 at 4:53 PM, Mark Miller wrote: >>> I don't think a 4.0.1 would be strange at all. >> >> I just think it would be strange since there aren't really any serious >> bugs yet in the lucene CHANGES.txt? I also don't think there has been >> enough time for anyone to actually find any bugs, its only been like 6 >> days since we released. >> >>> >>> 4.X is essentially trunk to me now. I would put in changes that I want >>> to bake for future 4.1, 4.2, 4.3, etc changes. >> >> Sure, well there aren't many architectural changes yet since 4.0, and >> currently we have the ability to make and bake large changes to lucene >> in many cases (block postings format, compressed stored fields, etc) >> without introducing risk, since they are just experimental until we >> decide to fold them into the default. >> >> But personally as soon I hit some limit in the codec API (which I >> expect will happen), or want to work on something biggish like >> positions iterators, I'll be looking at doing that kind of breaking >> change only in trunk. >> >> I just think we shouldn't hold back from that: we should develop in a >> correct and safe way and not backport scary stuff or majorly break >> APIs to get them out faster, instead 4.x should stay stable and we >> should plan on 5.x being in our own lifetimes. >> >> i dont want there to be the assumption that 5.0 is 3 years out. >> >>> >>> When you have bad bugs, you don't want to worry about what's baking - >>> you just want to put out a bug fix release. >> >> I totally agree with this! But I have serious concerns about the >> ability for this community to say "hey we fixed some nasty shit, lets >> get a bugfix out ASAP". Nobody is really testing until release >> candidates are issued, the 72-hour voting period designed to be fair >> to devs in different timezones is bastardized as some iterative QA >> cycle, etc etc. >> >> So if we are going to go thru all the trouble, I'd rather it be a 4.1 >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3335 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3335/ All tests passed Build Log: [...truncated 24628 lines...] -documentation-lint: [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/org/apache/lucene/classification/ClassificationResult.html [exec] missing Constructors: ClassificationResult(java.lang.String, double) [exec] missing Methods: getAssignedClass() [exec] missing Methods: getScore() [exec] [exec] build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html [exec] missing Constructors: KNearestNeighborClassifier(int) [exec] [exec] Missing javadocs were found! BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/build.xml:60: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build.xml:252: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/common-build.xml:1919: exec returned: 1 Total time: 46 minutes 30 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482501#comment-13482501 ] Shawn Heisey edited comment on SOLR-1972 at 10/23/12 5:48 PM: -- I have now discovered a real problem. All of my search handlers have the exact same statistics. I have defined /lbcheck, /ncdismax, /search, and /select. That brings to mind a potential test that could be added -- make sure that when you issue queries against one handler, stats on other handlers do not see their numbers go up. No idea how to write it, though. was (Author: elyograg): I have now discovered a real problem. All of my search handlers have the exact same statistics. I have defined /lbcheck, /ncdismax, /search, and /select. > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482501#comment-13482501 ] Shawn Heisey edited comment on SOLR-1972 at 10/23/12 5:40 PM: -- I have now discovered a real problem. All of my search handlers have the exact same statistics. I have defined /lbcheck, /ncdismax, /search, and /select. was (Author: elyograg): I have now discovered what a real problem. All of my search handlers have the exact same statistics. I have defined /lbcheck, /ncdismax, /search, and /select. > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482501#comment-13482501 ] Shawn Heisey commented on SOLR-1972: I have now discovered what a real problem. All of my search handlers have the exact same statistics. I have defined /lbcheck, /ncdismax, /search, and /select. > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4494) Add phoenetic algorithm Match Rating approach to lucene
[ https://issues.apache.org/jira/browse/LUCENE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482476#comment-13482476 ] Ryan McKinley commented on LUCENE-4494: --- This looks great -- though it seems like the more appropriate home is in commons codec: http://commons.apache.org/codec/api-release/org/apache/commons/codec/language/package-summary.html > Add phoenetic algorithm Match Rating approach to lucene > --- > > Key: LUCENE-4494 > URL: https://issues.apache.org/jira/browse/LUCENE-4494 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.0-ALPHA >Reporter: Colm Rice >Priority: Minor > Fix For: 4.1 > > Attachments: LUCENE-4494.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I want to add MatchRatingApproach algorithm to the Lucene project. > What I have at the moment is a class called > org.apache.lucene.analysis.phoenetic.MatchRatingApproach implementing > StringEncoder > I have a pretty comprehensive test file located at: > org.apache.lucene.analysis.phonetic.MatchRatingApproachTests > It's not exactly existing pattern so I'm going to need a bit of advice here. > Thanks! Feel free to email. > FYI: It my first contribitution so be gentle :-) C# is my native. > Reference: http://en.wikipedia.org/wiki/Match_rating_approach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482475#comment-13482475 ] Shawn Heisey commented on SOLR-1972: I would also move avgTimePerRequest to right before medianRequestTime so similar numbers are all together. I was going to suggest giving it a new name to jive with the additions, but there are likely a lot of existing customer scripts that rely on that name -- including some of mine. > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_35) - Build # 1950 - Failure!
I committed a fix: classification module needs to link to queries javadocs. But there are more problems, 'ant documentation-lint' still fails: [echo] Checking for missing docs... [exec] [exec] build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html [exec] missing Constructors: KNearestNeighborClassifier(int) [exec] [exec] build/docs/classification/org/apache/lucene/classification/ClassificationResult.html [exec] missing Constructors: ClassificationResult(java.lang.String, double) [exec] missing Methods: getAssignedClass() [exec] missing Methods: getScore() [exec] [exec] Missing javadocs were found! BUILD FAILED On Tue, Oct 23, 2012 at 1:07 PM, Policeman Jenkins Server wrote: > Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1950/ > Java: 32bit/jdk1.6.0_35 -client -XX:+UseSerialGC > > All tests passed > > Build Log: > [...truncated 23898 lines...] > -documentation-lint: > [echo] Checking for broken links... > [exec] > [exec] Crawl/parse... > [exec] > [exec] Verify... > [exec] > [exec] > file:///build/docs/classification/org/apache/lucene/classification/class-use/Classifier.html > [exec] BROKEN LINK: > file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html > [exec] > [exec] > file:///build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html > [exec] BROKEN LINK: > file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html > [exec] > [exec] > file:///build/docs/classification/org/apache/lucene/classification/package-summary.html > [exec] BROKEN LINK: > file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html > [exec] > [exec] Broken javadocs links were found! > > BUILD FAILED > /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The > following error occurred while executing this line: > /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:235: The > following error occurred while executing this line: > /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1908: > exec returned: 1 > > Total time: 35 minutes 35 seconds > Build step 'Invoke Ant' marked build as failure > Archiving artifacts > Recording test results > Description set: Java: 32bit/jdk1.6.0_35 -client -XX:+UseSerialGC > Email was triggered for: Failure > Sending email for trigger: Failure > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482470#comment-13482470 ] Shawn Heisey edited comment on SOLR-1972 at 10/23/12 5:13 PM: -- This is lightyears beyond what I could have hoped for when I first opened this issue. I do have one more picky note: the extreme floating point precision of the output. Here's what I am getting: handlerStart: 1351011529717 requests: 14 errors: 0 timeouts: 0 totalTime:53483.059463 avgTimePerRequest: 3820.2185330714287 avgRequestsPerSecond: 0.1592740558481214 5minRateReqsPerSecond: 0.6393213424767414 15minRateReqsPerSecond: 0.7422605686168207 medianRequestTime: 2537.401157 75thPcRequestTime: 7728.151086 95thPcRequestTime: 8963.867643 99thPcRequestTime: 8963.867643 999thPcRequestTime: 8963.867643 Here's what I would like to see instead ... lower precision, and rounded up when the first eliminated digit is 5 or higher. handlerStart: 1351011529717 requests: 14 errors: 0 timeouts: 0 totalTime:53483 avgTimePerRequest: 3820.22 avgRequestsPerSecond: 0.16 5minRateReqsPerSecond: 0.64 15minRateReqsPerSecond: 0.74 medianRequestTime: 2537.40 75thPcRequestTime: 7728.15 95thPcRequestTime: 8963.87 99thPcRequestTime: 8963.87 999thPcRequestTime: 8963.87 was (Author: elyograg): This is lightyears beyond what I could have hoped for when I first opened this issue. I do have one more picky note: the extreme floating point precision of the output. Here's what I am getting: handlerStart: 1351011529717 requests: 14 errors: 0 timeouts: 0 totalTime:53483.059463 avgTimePerRequest: 3820.2185330714287 avgRequestsPerSecond: 0.1592740558481214 5minRateReqsPerSecond: 0.6393213424767414 15minRateReqsPerSecond: 0.7422605686168207 medianRequestTime: 2537.401157 75thPcRequestTime: 7728.151086 95thPcRequestTime: 8963.867643 99thPcRequestTime: 8963.867643 999thPcRequestTime: 8963.867643 Here's what I would like to see instead ... lower precision, and rounded up when the first eliminated digit is 5 or higher. handlerStart: 1351011529717 requests: 14 errors: 0 timeouts: 0 totalTime:53483 avgTimePerRequest: 3820 avgRequestsPerSecond: 0.16 5minRateReqsPerSecond: 0.64 15minRateReqsPerSecond: 0.74 medianRequestTime: 2537.40 75thPcRequestTime: 7728.15 95thPcRequestTime: 8963.87 99thPcRequestTime: 8963.87 999thPcRequestTime: 8963.87 > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: h
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482470#comment-13482470 ] Shawn Heisey commented on SOLR-1972: This is lightyears beyond what I could have hoped for when I first opened this issue. I do have one more picky note: the extreme floating point precision of the output. Here's what I am getting: handlerStart: 1351011529717 requests: 14 errors: 0 timeouts: 0 totalTime:53483.059463 avgTimePerRequest: 3820.2185330714287 avgRequestsPerSecond: 0.1592740558481214 5minRateReqsPerSecond: 0.6393213424767414 15minRateReqsPerSecond: 0.7422605686168207 medianRequestTime: 2537.401157 75thPcRequestTime: 7728.151086 95thPcRequestTime: 8963.867643 99thPcRequestTime: 8963.867643 999thPcRequestTime: 8963.867643 Here's what I would like to see instead ... lower precision, and rounded up when the first eliminated digit is 5 or higher. handlerStart: 1351011529717 requests: 14 errors: 0 timeouts: 0 totalTime:53483 avgTimePerRequest: 3820 avgRequestsPerSecond: 0.16 5minRateReqsPerSecond: 0.64 15minRateReqsPerSecond: 0.74 medianRequestTime: 2537.40 75thPcRequestTime: 7728.15 95thPcRequestTime: 8963.87 99thPcRequestTime: 8963.87 999thPcRequestTime: 8963.87 > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3979) slf4j bindings other than jdk -- cannot change log levels
[ https://issues.apache.org/jira/browse/SOLR-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-3979: Attachment: log4j-solr-stuff.zip here are some implementations of the LogWatcher that work for log4j these were included in the main distribution, but since it makes for a weird compile/test classpath, they were removed. I think there is a vague plan to switch to log4j as the default provider but no activity there... > slf4j bindings other than jdk -- cannot change log levels > - > > Key: SOLR-3979 > URL: https://issues.apache.org/jira/browse/SOLR-3979 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Shawn Heisey > Fix For: 4.1 > > Attachments: log4j-solr-stuff.zip > > > Once I finally got log4j logging working, I was slightly surprised by the > message related to SOLR-3426. I did not really consider that to be a big > deal, because if I want to look at my log, I'll be on the commandline anyway. > I was even more surprised to find that I cannot change any of the log levels > from the admin gui. My default log level is WARN for performance reasons, > but every once in a while I like to bump the log level to INFO to > troubleshoot a specific problem, then turn it back down. This is very easy > with jdk logging in either 3.x or 4.0. I changed to log4j because it easily > allows me to put the date of a log message on the same line as the first line > of the actual log message, so when I grep for things, I have the timestamp in > the grep output. > Currently the only way for me to change my log level is by updating > log4j.properties and restarting Solr. If the capability to figure this out > on a class-by-class basis isn't there with log4j, I would at least like to be > able to set the root logging level. Is that possible? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_35) - Build # 1950 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1950/ Java: 32bit/jdk1.6.0_35 -client -XX:+UseSerialGC All tests passed Build Log: [...truncated 23898 lines...] -documentation-lint: [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [exec] [exec] file:///build/docs/classification/org/apache/lucene/classification/class-use/Classifier.html [exec] BROKEN LINK: file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html [exec] [exec] file:///build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html [exec] BROKEN LINK: file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html [exec] [exec] file:///build/docs/classification/org/apache/lucene/classification/package-summary.html [exec] BROKEN LINK: file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html [exec] [exec] Broken javadocs links were found! BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:235: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1908: exec returned: 1 Total time: 35 minutes 35 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.6.0_35 -client -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482459#comment-13482459 ] Alan Woodward commented on SOLR-1972: - Hey Stefan, yeah I've been thinking about graphs, pictures are always good. For this sort of information, though, I think you generally want time-series representations, and for that you want proper monitoring software (something like graphite). So I think the best thing we can do here is just expose the data, and let people plug in their own monitors. Unless you have a better idea for how to represent it? > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene build & ivy problems
Yes. That's what I thought. And that is how it should work. But that is not what happened. Ivy did not need to resolve anything, but it called out to a resolver which it could not see. After that, it called out over and over. I had to completely re-download everything. Since a full download worked, I think it boogered up its cache and then confused itself. Lance - Original Message - | From: "Uwe Schindler" | To: dev@lucene.apache.org | Sent: Monday, October 22, 2012 12:03:35 AM | Subject: RE: Lucene build & ivy problems | It only downloads on the first try, later builds never download | anything unless dependencies have changed. And if you would be able | to * not * download them, your build would not succeed.
[jira] [Resolved] (SOLR-3966) LangID not to log WARN
[ https://issues.apache.org/jira/browse/SOLR-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-3966. Resolution: Fixed Assignee: Hoss Man Thanks Markus Committed revision 1401340. - trunk Committed revision 1401341. - 4x > LangID not to log WARN > -- > > Key: SOLR-3966 > URL: https://issues.apache.org/jira/browse/SOLR-3966 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.0 >Reporter: Markus Jelsma >Assignee: Hoss Man > Fix For: 4.1, 5.0 > > Attachments: SOLR-3966-trunk-1.patch > > > The LangID UpdateProcessor emits the warning below for documents that do not > contain an input field. The level should go to DEBUG or be removed. It is not > uncommon to see a log full of these messages just because not all documents > contain all the fields we're mapping. > {code}Oct 19, 2012 11:23:43 AM > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor process > WARNING: Document does not contain input field . Skipping > this{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4345) Create a Classification module
[ https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482448#comment-13482448 ] Tommaso Teofili commented on LUCENE-4345: - I've just committed some slight improvements to testing and a basic MLT based kNearestNeighbor classifier (with a bunch of TODOs), comments are welcome :) > Create a Classification module > -- > > Key: LUCENE-4345 > URL: https://issues.apache.org/jira/browse/LUCENE-4345 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili >Priority: Minor > Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, > SOLR-3700_2.patch, SOLR-3700.patch > > > Lucene/Solr can host huge sets of documents containing lots of information in > fields so that these can be used as training examples (w/ features) in order > to very quickly create classifiers algorithms to use on new documents and / > or to provide an additional service. > So the idea is to create a contrib module (called 'classification') to host a > ClassificationComponent that will use already seen data (the indexed > documents / fields) to classify new documents / text fragments. > The first version will contain a (simplistic) Lucene based Naive Bayes > classifier but more implementations should be added in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482431#comment-13482431 ] Stefan Matheis (steffkes) commented on SOLR-1972: - Hey, i applied to Patch to see where the Output goes and how it looks in the admin GUI .. right now, it's listed in the stats-section, as a table with all the other given attributes. how helpful would it be to have some kind of graph here? perhaps one like we have already on the dashboard to see the memory-usage? Let me know what you think about it :) > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482423#comment-13482423 ] Erick Erickson commented on SOLR-1293: -- Otis: I'm not sure I understand this. As I'm looking at this particular implementation, all the potential cores (configuration, data files, etc) are already on the particular node, it's just a matter of loading/unloading them. If you're thinking about SolrCloud/ZK, oh my aching head! I guess I'd propose that how this all works with ZK be split off to different tickets all together, too much for me to deal with I'm explicitly thinking of this as having no cluster-awareness, it's all local to a single Solr node. Any meta-level coordination on which node a particular query _should_ be routed to is assumed to be out of scope, at least for this version. That said, I can certainly see the value in what you're talking about, that's just not the use-case I'm trying to address > Support for large no:of cores and faster loading/unloading of cores > --- > > Key: SOLR-1293 > URL: https://issues.apache.org/jira/browse/SOLR-1293 > Project: Solr > Issue Type: New Feature > Components: multicore >Reporter: Noble Paul > Fix For: 4.1 > > Attachments: SOLR-1293.patch > > > Solr , currently ,is not very suitable for a large no:of homogeneous cores > where you require fast/frequent loading/unloading of cores . usually a core > is required to be loaded just to fire a search query or to just index one > document > The requirements of such a system are. > * Very efficient loading of cores . Solr cannot afford to read and parse and > create Schema, SolrConfig Objects for each core each time the core has to be > loaded ( SOLR-919 , SOLR-920) > * START STOP core . Currently it is only possible to unload a core (SOLR-880) > * Automatic loading of cores . If a core is present and it is not loaded and > a request comes for that load it automatically before serving up a request > * As there are a large no:of cores , all the cores cannot be kept loaded > always. There has to be an upper limit beyond which we need to unload a few > cores (probably the least recently used ones) > * Automatic allotment of dataDir for cores. If the no:of cores is too high al > the cores' dataDirs cannot live in the same dir. There is an upper limit on > the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482415#comment-13482415 ] Alan Woodward commented on SOLR-1972: - I think we'd probably want to keep the percentile list limited to start with. People can always ask for improvements later if they need them. I think this is ready to go in? > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482413#comment-13482413 ] Otis Gospodnetic commented on SOLR-1293: General comment: We may want the index/core re-opener to remain aware of previous locations (nodes) on which cores were opened for the purposes of reusing any possible OS-level caches that may still exist on those nodes for that core. For example, if the cluster has nodes 1-100 and core Foo was on nodes 1, 2, and 3 before it was closed, then maybe next time it needs to be opened it would ideally be opened on those 1, 2, and 3 nodes. Of course, nodes 1, 2, or 3 may no longer be around or may be currently overloaded, or in which case alternative nodes need to be picked. > Support for large no:of cores and faster loading/unloading of cores > --- > > Key: SOLR-1293 > URL: https://issues.apache.org/jira/browse/SOLR-1293 > Project: Solr > Issue Type: New Feature > Components: multicore >Reporter: Noble Paul > Fix For: 4.1 > > Attachments: SOLR-1293.patch > > > Solr , currently ,is not very suitable for a large no:of homogeneous cores > where you require fast/frequent loading/unloading of cores . usually a core > is required to be loaded just to fire a search query or to just index one > document > The requirements of such a system are. > * Very efficient loading of cores . Solr cannot afford to read and parse and > create Schema, SolrConfig Objects for each core each time the core has to be > loaded ( SOLR-919 , SOLR-920) > * START STOP core . Currently it is only possible to unload a core (SOLR-880) > * Automatic loading of cores . If a core is present and it is not loaded and > a request comes for that load it automatically before serving up a request > * As there are a large no:of cores , all the cores cannot be kept loaded > always. There has to be an upper limit beyond which we need to unload a few > cores (probably the least recently used ones) > * Automatic allotment of dataDir for cores. If the no:of cores is too high al > the cores' dataDirs cannot live in the same dir. There is an upper limit on > the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-1972: Attachment: SOLR-1972_metrics.patch New patch, adding 75th and 999th percentile, making the stats names less insanely long, and adding the metrics- threads to the test excluder thingy. All solr-core tests pass. > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, > SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-1531) Provide an option to remove the data directory on core unload
[ https://issues.apache.org/jira/browse/SOLR-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-1531. -- Resolution: Duplicate Fix Version/s: (was: 4.1) 3.3 4.0 This was fixed by https://issues.apache.org/jira/browse/SOLR-2610. > Provide an option to remove the data directory on core unload > - > > Key: SOLR-1531 > URL: https://issues.apache.org/jira/browse/SOLR-1531 > Project: Solr > Issue Type: Improvement >Reporter: Shalin Shekhar Mangar > Fix For: 4.0, 3.3 > > Attachments: SOLR-1531.patch > > > Currently the unload command keeps the core's data on disk even though the > details of the core is deleted from configuration. Solr should have an option > of cleaning the data directory on unload of a core. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-880) SolrCore should have a a lazy startup option
[ https://issues.apache.org/jira/browse/SOLR-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-880: Description: * a core should have an option of loadOnStartup=true|false. default should be true If there are too many cores (tens of thousands) where each of them may be used occassionally, we should not load all of them at once. In the runtime I should be able to STOP and START a core on demand. A listing command would let me know which one is present and what is up and what is down. A stopped core must not use any resource was: * We must have an option to STOP and START a core. * a core should have an option of loadOnStartup=true|false. default should be true * A list command which can give the names of all cores and some meta information like status If there are too many cores (tens of thousands) where each of them may be used occassionally, we should not load all of them at once. In the runtime I should be able to STOP and START a core on demand. A listing command would let me know which one is present and what is up and what is down. A stopped core must not use any resource Summary: SolrCore should have a a lazy startup option (was: SolrCore should have a STOP option and a lazy startup option) Removed STOP from description, functionality is handled by UNLOAD Broke out the "add a list command" to it's own JIRA, see: https://issues.apache.org/jira/browse/SOLR-3980 > SolrCore should have a a lazy startup option > > > Key: SOLR-880 > URL: https://issues.apache.org/jira/browse/SOLR-880 > Project: Solr > Issue Type: Improvement > Components: multicore >Reporter: Noble Paul >Assignee: Erick Erickson > Attachments: SOLR-880.patch > > > * a core should have an option of loadOnStartup=true|false. default should be > true > If there are too many cores (tens of thousands) where each of them may be > used occassionally, we should not load all of them at once. In the runtime I > should be able to STOP and START a core on demand. A listing command would > let me know which one is present and what is up and what is down. A stopped > core must not use any resource -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3980) Incorporate lazily-loaded cores into core listings for clients
Erick Erickson created SOLR-3980: Summary: Incorporate lazily-loaded cores into core listings for clients Key: SOLR-3980 URL: https://issues.apache.org/jira/browse/SOLR-3980 Project: Solr Issue Type: Improvement Components: multicore, web gui Affects Versions: 4.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 4.1 Part of SOLR-1293 (supporting lots of cores) will require we do something to allow clients (particularly the admin GUI) to get a full list of all possible cores, whether they've been loaded or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-880) SolrCore should have a STOP option and a lazy startup option
[ https://issues.apache.org/jira/browse/SOLR-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-880: Attachment: SOLR-880.patch First cut at the basics, putting up a preliminary version for comments. The general approach here is that, for any lazy cores, keep a separate list of SolrCoreDescriptors. When we get a core, if it's not already loaded, look in this separate list and create it at that point. Note a bunch of things: 1> many of the changes in CoreContainer are that I factored out creating cores from local files and Zookeeper into two methods, I was having a hard time keeping the zk and non-zk bits separate. 2> There are some TODOs and EOEs that I have to take out. 3> I'm not all that happy with the tests, especially making new config directories just for this case with tests. But I was going a bit crazy yesterday trying to use the "usual" methods for writing tests, but as far as I can tell, there are built-in assumptions in things like TestHarness that don't work well with different cores. Any suggestions? 4> All test pass. I fired up an example in our standard multicore system, and it's actually kinda cool. The admin console doesn't show the lazy core, but I can index to it with post.jar, then the admin screen shows it and I can query it. I can shut down and restart and the first query on the lazy core then returns results, even though it again isn't in the admin screen. 5> I haven't tested this all that thoroughly, this is preliminary for comments. This is part of SOLR-1293. 6> Next up is SOLR-1028, limiting the number of cores that can be loaded simultaneously. 7> I'm quite sure I'll screw up the reference counting and/or there are nooks and crannies that I don't even know exist. Please let me know of any off the tops of your heads! 8> All tests pass. Can I ship it now? > SolrCore should have a STOP option and a lazy startup option > > > Key: SOLR-880 > URL: https://issues.apache.org/jira/browse/SOLR-880 > Project: Solr > Issue Type: Improvement > Components: multicore >Reporter: Noble Paul >Assignee: Erick Erickson > Attachments: SOLR-880.patch > > > * We must have an option to STOP and START a core. > * a core should have an option of loadOnStartup=true|false. default should be > true > * A list command which can give the names of all cores and some meta > information like status > If there are too many cores (tens of thousands) where each of them may be > used occassionally, we should not load all of them at once. In the runtime I > should be able to STOP and START a core on demand. A listing command would > let me know which one is present and what is up and what is down. A stopped > core must not use any resource -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3979) slf4j bindings other than jdk -- cannot change log levels
Shawn Heisey created SOLR-3979: -- Summary: slf4j bindings other than jdk -- cannot change log levels Key: SOLR-3979 URL: https://issues.apache.org/jira/browse/SOLR-3979 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Shawn Heisey Fix For: 4.1 Once I finally got log4j logging working, I was slightly surprised by the message related to SOLR-3426. I did not really consider that to be a big deal, because if I want to look at my log, I'll be on the commandline anyway. I was even more surprised to find that I cannot change any of the log levels from the admin gui. My default log level is WARN for performance reasons, but every once in a while I like to bump the log level to INFO to troubleshoot a specific problem, then turn it back down. This is very easy with jdk logging in either 3.x or 4.0. I changed to log4j because it easily allows me to put the date of a log message on the same line as the first line of the actual log message, so when I grep for things, I have the timestamp in the grep output. Currently the only way for me to change my log level is by updating log4j.properties and restarting Solr. If the capability to figure this out on a class-by-class basis isn't there with log4j, I would at least like to be able to set the root logging level. Is that possible? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482307#comment-13482307 ] Robert Muir commented on LUCENE-4498: - Committed to trunk. will give that flonkings builder some time... > pulse docfreq=1 DOCS_ONLY for 4.1 codec > --- > > Key: LUCENE-4498 > URL: https://issues.apache.org/jira/browse/LUCENE-4498 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Robert Muir > Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, > LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch > > > We have pulsing codec, but currently this has some downsides: > * its very general, wrapping an arbitrary postingsformat and pulsing > everything in the postings for an arbitrary docfreq/totalTermFreq cutoff > * reuse is hairy: because it specializes its enums based on these cutoffs, > when walking thru terms e.g. merging there is a lot of sophisticated stuff to > avoid the worst cases where we clone indexinputs for tons of terms. > On the other hand the way the 4.1 codec encodes "primary key" fields is > pretty silly, we write the docStartFP vlong in the term dictionary metadata, > which tells us where to seek in the .doc to read our one lonely vint. > I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just > write the lone doc delta where we would write docStartFP. > We can avoid the hairy reuse problem too, by just supporting this in > refillDocs() in BlockDocsEnum instead of specializing. > This would remove the additional seek for "primary key" fields without really > any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3966) LangID not to log WARN
[ https://issues.apache.org/jira/browse/SOLR-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-3966: Attachment: SOLR-3966-trunk-1.patch > LangID not to log WARN > -- > > Key: SOLR-3966 > URL: https://issues.apache.org/jira/browse/SOLR-3966 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.0 >Reporter: Markus Jelsma > Fix For: 4.1, 5.0 > > Attachments: SOLR-3966-trunk-1.patch > > > The LangID UpdateProcessor emits the warning below for documents that do not > contain an input field. The level should go to DEBUG or be removed. It is not > uncommon to see a log full of these messages just because not all documents > contain all the fields we're mapping. > {code}Oct 19, 2012 11:23:43 AM > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor process > WARNING: Document does not contain input field . Skipping > this{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3966) LangID not to log WARN
[ https://issues.apache.org/jira/browse/SOLR-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-3966: Attachment: (was: SOLR-3966-trunk-1.patch) > LangID not to log WARN > -- > > Key: SOLR-3966 > URL: https://issues.apache.org/jira/browse/SOLR-3966 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.0 >Reporter: Markus Jelsma > Fix For: 4.1, 5.0 > > Attachments: SOLR-3966-trunk-1.patch > > > The LangID UpdateProcessor emits the warning below for documents that do not > contain an input field. The level should go to DEBUG or be removed. It is not > uncommon to see a log full of these messages just because not all documents > contain all the fields we're mapping. > {code}Oct 19, 2012 11:23:43 AM > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor process > WARNING: Document does not contain input field . Skipping > this{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4494) Add phoenetic algorithm Match Rating approach to lucene
[ https://issues.apache.org/jira/browse/LUCENE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colm Rice updated LUCENE-4494: -- Attachment: LUCENE-4494.patch Match Rating Approach (MRA) phonetic algorithm & associated tests. I hope... :-) > Add phoenetic algorithm Match Rating approach to lucene > --- > > Key: LUCENE-4494 > URL: https://issues.apache.org/jira/browse/LUCENE-4494 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 4.0-ALPHA >Reporter: Colm Rice >Priority: Minor > Fix For: 4.1 > > Attachments: LUCENE-4494.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I want to add MatchRatingApproach algorithm to the Lucene project. > What I have at the moment is a class called > org.apache.lucene.analysis.phoenetic.MatchRatingApproach implementing > StringEncoder > I have a pretty comprehensive test file located at: > org.apache.lucene.analysis.phonetic.MatchRatingApproachTests > It's not exactly existing pattern so I'm going to need a bit of advice here. > Thanks! Feel free to email. > FYI: It my first contribitution so be gentle :-) C# is my native. > Reference: http://en.wikipedia.org/wiki/Match_rating_approach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3966) LangID not to log WARN
[ https://issues.apache.org/jira/browse/SOLR-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-3966: Attachment: SOLR-3966-trunk-1.patch Patch removing the warning. > LangID not to log WARN > -- > > Key: SOLR-3966 > URL: https://issues.apache.org/jira/browse/SOLR-3966 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.0 >Reporter: Markus Jelsma > Fix For: 4.1, 5.0 > > Attachments: SOLR-3966-trunk-1.patch > > > The LangID UpdateProcessor emits the warning below for documents that do not > contain an input field. The level should go to DEBUG or be removed. It is not > uncommon to see a log full of these messages just because not all documents > contain all the fields we're mapping. > {code}Oct 19, 2012 11:23:43 AM > org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor process > WARNING: Document does not contain input field . Skipping > this{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 71 - Failure
On Tue, Oct 23, 2012 at 2:41 AM, Uwe Schindler wrote: > Ah, we got a hprof heap dump... :-) YAY! Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3885) audit solr DocumentBuilder logic & tests
[ https://issues.apache.org/jira/browse/SOLR-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482240#comment-13482240 ] Toke Eskildsen commented on SOLR-3885: -- It seems that there are indeed problems with copyFields and that it is a blocker for using docBoosts with the standard practice catch-all copyField. I have updated SOLR-3875 with a description of the problem. > audit solr DocumentBuilder logic & tests > > > Key: SOLR-3885 > URL: https://issues.apache.org/jira/browse/SOLR-3885 > Project: Solr > Issue Type: Improvement >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 4.1 > > > Spun off of SOLR-3875: it would be good to audit DocumentBuilder carefully > and ensure that there are adequate tests for the various edge cases (ie: > docboosts, copyfield, multivalued fields, various combinations of, etc..) and > special types of fields (ie: polyfields). > There also seems to be some dead code here that can likely be cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3875) Document boost does not work correctly when using multi-valued fields
[ https://issues.apache.org/jira/browse/SOLR-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482233#comment-13482233 ] Toke Eskildsen commented on SOLR-3875: -- Unfortunately, the bug is only partly solved. Thomas and I encountered strange scores again. While boosting of multi-value fields is handled correctly in Solr 4.0.0, boosting for copyFields are not. A sample document: {code} Insane score Example. Score = 10E9 Document boost broken for copyFields video ThomasEgense and Toke Eskildsen Test bug something else bug bug {code} The fields _name_, _manu_, _cat_, _features_, keywords and _content_ gets copied to text and a search for thomasegense matches the text-field with query explanation {code} 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result of: 70384.67 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 0.30685282 = idf(docFreq=1, maxDocs=1) 229376.0 = fieldNorm(doc=0) {code} If the two last fields _keywords_ and _content_ are removed from the sample document, the score is reduced by a factor 100 (docBoost^2). The current DocumentBuilder https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java?revision=1389648&view=markup works roughly like this: {code} foreach (field) { boost = docBoost*fieldBoost foreach (value) { assignField(field, value, boost) foreach (copyField) { assignField(copyField, value, boost) } boost = 1f } } {code} When all fields share the same copyField (_text_ in this example), the copyField will have the full boost assigned for each directly specified field which uses that copyField. That's 5 times with the provided sample, so the total boost for the field _text_ will be 10^5. One solution would be to keep track of used fields (directly specified as well as copyFields) and only assign the full boost once per document. If the number of unique fields/document is low, a simple list would probably be the fastest and with low GC impact. For a higher number of unique fields, a Set might be better. An optimization would be to only create the tracking structure once a boost != 1.0f is encountered and only store the fields with boost != 1.0f, so that an update without boosts would not get a performance penalty. > Document boost does not work correctly when using multi-valued fields > - > > Key: SOLR-3875 > URL: https://issues.apache.org/jira/browse/SOLR-3875 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis, update >Affects Versions: 4.0-BETA >Reporter: Toke Eskildsen >Assignee: Hoss Man >Priority: Critical > Fix For: 4.0, 4.1, 5.0 > > Attachments: SOLR-3875.patch > > > In Solr 4 BETA & trunk, document boosts skews the ranking for documents with > multi value fields tremendously. A document boost of 5 combined with 15 > values in a multi value field results in scores above 1,000,000,000, while a > boost of 0,5 results in scores below 0,001. The error is not present in Solr > 3.6. > Thomas Egense and I have tracked it down to a change in Solr DocumentBuilder > committed 20110827 (@1162347) by Mike McCandless, as part of work done on > LUCENE-2308. The problem is that Lucene multiplies the boosts of multiple > instances of the same field when updating the index. > The old DocumentBuilder, used in Lucene 3.6, handled this by calculating the > score for the field (docBoost*fieldBoost) and assigning it to the first > instance of the field, then setting the boost to 1.0f and assigning that to > subsequent instances of the field. This effectively assigned > docBoost*fieldBoost to the field, regardless of the number of instances. > The updated DocumentBuilder (see > https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java?revision=1388778&view=markup), > used in Lucene 4 BETA & trunk, also assigns docBoost*fieldBoost to the first > instance of the field. Then it sets fieldBoost = docBoost and continues to > assign docBoost*fieldBoost to subsequent instances. Using the example > mentioned above, the generated IndexableFields will get assigned boosts of 5, > 5*5, 5*5... 5*5. As Lucene multiplies all the values, 15 instances of the > same field will have a collective boost of 5*25^14. > This can be demonstrated with the Solr tutorial example by indexing the > sample documents and adding the document > {code:xml} > > > Insane score Example. Score = 10E9 > Document boost broken for multivalued fields > Thomas Egense and Toke Eskildsen > Test > bug > insane_boos
[jira] [Commented] (SOLR-3978) CoreAdmin - configName definition
[ https://issues.apache.org/jira/browse/SOLR-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482176#comment-13482176 ] Gianluca Varisco commented on SOLR-3978: JAVA_OPTIONS="-Dsolr.solr.home=/opt/solr-3.6.1/staging/ -XX:+DisableExplicitGC -Xms8192M -Xmx8192M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled" > CoreAdmin - configName definition > - > > Key: SOLR-3978 > URL: https://issues.apache.org/jira/browse/SOLR-3978 > Project: Solr > Issue Type: Bug > Components: multicore > Environment: * Solr 3.6.1 > * Jetty 8.1.5.v20120716 >Reporter: Gianluca Varisco >Priority: Minor > > Hello, > I'm trying to define a bunch of cores as follows: > dataDir="/opt/solr-3.6.1/staging/venus/data/" > configName="/shop/www/htdocs/venus/shop.staging/solr/app/conf/solrconfig.xml" > schemaName="/shop/www/htdocs/venus/shop.staging/solr/app/conf/schema.xml" /> > Is it possible to point configName and schemaName to a different path? It > works if conf/solrconfig.xml is added in /opt/solr-3.6.1/staging/venus/ > Am I missing something? Trace output is attached. > SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in > classpath or '/opt/solr-3.6.1/staging/venus/conf/', cwd=/opt/jetty/staging > at > org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:273) > at > org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:239) > at org.apache.solr.core.Config.(Config.java:141) > at org.apache.solr.core.SolrConfig.(SolrConfig.java:138) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:452) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) > at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) > at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) > at > org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719) > at > org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258) > at > org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1233) > at > org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:701) > at > org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:475) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) > at > org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36) > at > org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183) > at > org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491) > at > org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138) > at > org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142) > at > org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53) > at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604) > at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535) > at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398) > at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) > at > org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) > at > org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552) > at > org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) > at > org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:75) > at > org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53) > at > org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91) > at org.eclipse.jetty.server.Server.doStart(Server.java:272) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) > at > org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1260) >
[jira] [Created] (SOLR-3978) CoreAdmin - configName definition
Gianluca Varisco created SOLR-3978: -- Summary: CoreAdmin - configName definition Key: SOLR-3978 URL: https://issues.apache.org/jira/browse/SOLR-3978 Project: Solr Issue Type: Bug Components: multicore Environment: * Solr 3.6.1 * Jetty 8.1.5.v20120716 Reporter: Gianluca Varisco Priority: Minor Hello, I'm trying to define a bunch of cores as follows: Is it possible to point configName and schemaName to a different path? It works if conf/solrconfig.xml is added in /opt/solr-3.6.1/staging/venus/ Am I missing something? Trace output is attached. SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or '/opt/solr-3.6.1/staging/venus/conf/', cwd=/opt/jetty/staging at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:273) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:239) at org.apache.solr.core.Config.(Config.java:141) at org.apache.solr.core.SolrConfig.(SolrConfig.java:138) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:452) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719) at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258) at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1233) at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:701) at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:475) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36) at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183) at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491) at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138) at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142) at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53) at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604) at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535) at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398) at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552) at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) at org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:75) at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53) at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91) at org.eclipse.jetty.server.Server.doStart(Server.java:272) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59) at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1260) at java.security.AccessController.doPrivileged(Native Method) at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1183) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.eclipse.jetty.start.Main.invokeMain(Main.java:462) at org.eclipse.jetty.start.Main.start(Main.java:610) at org.eclipse.jetty.start.Main.main(Main.java:86) -- This message is automatically gen
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482171#comment-13482171 ] Shawn Heisey commented on SOLR-1972: After poking around a lot looking for a way to bump the reservoir size, I finally came across the paper on reservoir sampling by Vitter. After even more poking around, I think I get it now. Their small reservoir apparently really does give statistically relevant results over millions or billions of total samples. If it didn't give them numbers they could use, they would have already made it larger. Do you think it's worthwhile to give people the ability to customize the percentile list -- turn some of the standard percentiles off, and/or add custom ones? As soon as we conclude that including the full predefined set won't present a performance problem because it only gets calculated when the admin GUI is accessed, there'll be someone who has created hundreds of request handlers and polls the statistics for all of them once a minute. I can also see someone wanting to see the 12th and 87th percentiles for some reason neither of us can fathom, but makes perfect sense to them. > Need additional query stats in admin interface - median, 95th and 99th > percentile > - > > Key: SOLR-1972 > URL: https://issues.apache.org/jira/browse/SOLR-1972 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, > elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, > SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, > SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, > SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch > > > I would like to see more detailed query statistics from the admin GUI. This > is what you can get now: > requests : 809 > errors : 0 > timeouts : 0 > totalTime : 70053 > avgTimePerRequest : 86.59209 > avgRequestsPerSecond : 0.8148785 > I'd like to see more data on the time per request - median, 95th percentile, > 99th percentile, and any other statistical function that makes sense to > include. In my environment, the first bunch of queries after startup tend to > take several seconds each. I find that the average value tends to be useless > until it has several thousand queries under its belt and the caches are > thoroughly warmed. The statistical functions I have mentioned would quickly > eliminate the influence of those initial slow queries. > The system will have to store individual data about each query. I don't know > if this is something Solr does already. It would be nice to have a > configurable count of how many of the most recent data points are kept, to > control the amount of memory the feature uses. The default value could be > something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org