[jira] [Commented] (SOLR-3975) Document Summarization toolkit, using LSA techniques

2012-10-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482972#comment-13482972
 ] 

Otis Gospodnetic commented on SOLR-3975:


Nice, 170KB patch there Lance! :)
I see lots of classes don't have ASL btw.

> Document Summarization toolkit, using LSA techniques
> 
>
> Key: SOLR-3975
> URL: https://issues.apache.org/jira/browse/SOLR-3975
> Project: Solr
>  Issue Type: New Feature
>Reporter: Lance Norskog
>Priority: Minor
> Attachments: 4.1.summary.patch, reuters.sh
>
>
> This package analyzes sentences and words as used across sentences to rank 
> the most important sentences and words. The general topic is called "document 
> summarization" and is a popular research topic in textual analysis. 
> How to use:
> 1) Check out the 4.x branch, apply the patch, build, and run the solr/example 
> instance.
> 2) Download the first Reuters article corpus from:
> http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
> 3) Unpack this into a directory.
> 4) Run the attached 'reuters.sh' script:
> sh reuters.sh directory http://localhost:8983/solr/collection1
> 5) Wait several minutes.
> Now go to http://localhost:8983/solr/collection1/browse?summary=true and look 
> at the large gray box marked 'Document Summary'. This has a table of 
> statistics about the analysis, the three most important sentences, and 
> several of the most important words in the documents. The sentences have the 
> important words in italics.
> The code is packaged as a search component and as an analysis handler. The 
> /browse demo uses the search component, and you can also post raw text to  
> http://localhost:8983/solr/collection1/analysis/summary. Here is a sample 
> command:
> {code}
> curl -s 
> "http://localhost:8983/solr/analysis/summary?indent=true&echoParams=all&file=$FILE&wt=xml";
>  --data-binary @$FILE -H 'Content-type:application/xml'
> {code}
> This is an implementation of LSA-based document summarization. A short 
> explanation and a long evaluation are described in my blog, [Uncle Lance's 
> Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: 
> [http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482960#comment-13482960
 ] 

Shawn Heisey commented on SOLR-1972:


Solr handler statistics were already published in JMX, this just added some 
entries.  I pulled up jconsole and connected to my patched solr server and the 
new stats were there.

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Fix For: 4.1
>
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972.patch, SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1395) Integrate Katta

2012-10-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482950#comment-13482950
 ] 

Otis Gospodnetic commented on SOLR-1395:


Does anyone really need this?  If so, I'm curious why?
Or should we close this?


> Integrate Katta
> ---
>
> Key: SOLR-1395
> URL: https://issues.apache.org/jira/browse/SOLR-1395
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.1
>
> Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
> katta-core-0.6-dev.jar, katta.node.properties, katta-solrcores.jpg, 
> katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, 
> solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, 
> solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr1395.jpg, 
> solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, 
> solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, 
> solr-1395-katta-0.6.3-4.patch, solr-1395-katta-0.6.3-5.patch, 
> solr-1395-katta-0.6.3-6.patch, solr-1395-katta-0.6.3-7.patch, 
> SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
> test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> We'll integrate Katta into Solr so that:
> * Distributed search uses Hadoop RPC
> * Shard/SolrCore distribution and management
> * Zookeeper based failover
> * Indexes may be built using Hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482948#comment-13482948
 ] 

Otis Gospodnetic commented on SOLR-1972:


[~romseygeek] are these metrics being published in JMX?  They should, and 
Code's Metrics should make that easy.  I had 2.5 second look at the patch and 
didn't find any mentions of "jmx".

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Fix For: 4.1
>
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972.patch, SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-1972:
---

Component/s: web gui

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Fix For: 4.1
>
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972.patch, SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-1972:
---

Fix Version/s: 4.1

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Fix For: 4.1
>
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972.patch, SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_07) - Build # 1959 - Failure!

2012-10-23 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1959/
Java: 32bit/jdk1.7.0_07 -server -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 24574 lines...]
-documentation-lint:
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
 [exec]   missing Constructors: KNearestNeighborClassifier(int)
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/ClassificationResult.html
 [exec]   missing Constructors: ClassificationResult(java.lang.String, 
double)
 [exec]   missing Methods: getAssignedClass()
 [exec]   missing Methods: getScore()
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:252: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1919:
 exec returned: 1

Total time: 32 minutes 15 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.7.0_07 -server -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3982) No way to get current dataimport status from admin GUI

2012-10-23 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-3982:
--

 Summary: No way to get current dataimport status from admin GUI
 Key: SOLR-3982
 URL: https://issues.apache.org/jira/browse/SOLR-3982
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


The dataimport section under each core on the admin gui does not provide a way 
to get the current import status.  I actually would like to see it 
automatically pull the status as soon as you click on "Dataimport" ... I have 
never seen an import status with a qtime above 1 millisecond.  A refresh 
icon/link would be good to have as well.

Additional note: the resulting URL in the address bar is a little odd:
http://server:port/solr/#/corename/dataimport//dataimport


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482921#comment-13482921
 ] 

Shawn Heisey commented on SOLR-1972:


I never know when things are threadsafe, although it does seem to work.  Does a 
static class member variable 'automatically' become threadsafe, or would 
volatile be required?  Was protected the right way to do that?


> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972.patch, SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3336 - Still Failing

2012-10-23 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3336/

All tests passed

Build Log:
[...truncated 24623 lines...]
-documentation-lint:
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
 [exec]   missing Constructors: KNearestNeighborClassifier(int)
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/ClassificationResult.html
 [exec]   missing Constructors: ClassificationResult(java.lang.String, 
double)
 [exec]   missing Methods: getAssignedClass()
 [exec]   missing Methods: getScore()
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/build.xml:60:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build.xml:252:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/common-build.xml:1919:
 exec returned: 1

Total time: 43 minutes 24 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-1972:
---

Attachment: SOLR-1972_metrics.patch

I think I fixed it.

It looks like if you pass the same combination of arguments to the 
newCounter/newTimer methods, you actually get back the same object as the last 
time it was called with those parameters, not a new one.  There is an alternate 
form of the constructor that takes a "scope" argument.  I could have appended 
the new value to the name argument, but since they were kind enough to provide 
something separate...

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972.patch, SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482915#comment-13482915
 ] 

Robert Muir commented on SOLR-3981:
---

that adoc() you are using doesnt work with boosts. (I found this from another 
test)


> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
> Attachments: SOLR-3981.patch, SOLR-3981.patch
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2052) Allow for a list of filter queries and a single docset filter in QueryComponent

2012-10-23 Thread Aaron Daubman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Daubman updated SOLR-2052:


Attachment: SOLR-2052-4_0_0.patch
SOLR-2052-trunk.patch

Attaching patches against 4_0_0 and trunk

> Allow for a list of filter queries and a single docset filter in 
> QueryComponent
> ---
>
> Key: SOLR-2052
> URL: https://issues.apache.org/jira/browse/SOLR-2052
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0-ALPHA
> Environment: Mac OS X, Java 1.6
>Reporter: Stephen Green
>Priority: Minor
> Fix For: 4.1
>
> Attachments: SOLR-2052-2.patch, SOLR-2052-3-6-1.patch, 
> SOLR-2052-3.patch, SOLR-2052-4_0_0.patch, SOLR-2052-4.patch, SOLR-2052.patch, 
> SOLR-2052-trunk.patch
>
>
> SolrIndexSearcher.QueryCommand allows you to specify a list of filter queries 
> or a single filter (as a DocSet), but not both.  This restriction seems 
> arbitrary, and there are cases where we can have both a list of filter 
> queries and a DocSet generated by some other non-query process (e.g., 
> filtering documents according to IDs pulled from some other source like a 
> database.)
> Fixing this requires a few small changes to SolrIndexSearcher to allow both 
> of these to be set for a QueryCommand and to take both into account when 
> evaluating the query.  It also requires a modification to ResponseBuilder to 
> allow setting the single filter at query time.
> I've run into this against 1.4, but the same holds true for the trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3981:
---

Attachment: SOLR-3981.patch

bq. i want to work on a test that actually indexes a doc and inspects the 
encoded norms just to be certain i'm not missing something.

Updated patch adds this to the test -- kludgy to reach this deep into the 
lucene code in the solr test, but do-able.  

Unfortunately the test fails because the decoded norms from the index wind up 
being way _lower_ then the expected values.  

At first i thought it was just because i forgot to factor in the term length in 
my expected norm, but even taking that into account the numbers are still way 
off.  i'm guessing either i don't understand something about the new 4.0 APIs 
for getting the DocValues/Norms, or i've got some trivially silly bug that i'm 
blind too because i've been staring at it too long.

I'd appreciate a second set of eyes.

> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
> Attachments: SOLR-3981.patch, SOLR-3981.patch
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-10-23 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-1604:
---

Attachment: ComplexPhrase.zip

Includes README.txt that contain instruction for Solr 4.0.0

> Wildcards, ORs etc inside Phrase Queries
> 
>
> Key: SOLR-1604
> URL: https://issues.apache.org/jira/browse/SOLR-1604
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers, search
>Affects Versions: 1.4
>Reporter: Ahmet Arslan
>Priority: Minor
> Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
> ComplexPhraseQueryParser.java, ComplexPhrase.zip, ComplexPhrase.zip, 
> ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
> SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch
>
>
> Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
> wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b58) - Build # 1957 - Failure!

2012-10-23 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1957/
Java: 32bit/jdk1.8.0-ea-b58 -client -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 22501 lines...]
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene...
  [javadoc] warning: [options] bootstrap class path not set in conjunction with 
-source 1.7
  [javadoc] Loading source files for package org.apache.lucene.analysis...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.tokenattributes...
  [javadoc] Loading source files for package org.apache.lucene.codecs...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene40...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene40.values...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene41...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.perfield...
  [javadoc] Loading source files for package org.apache.lucene.document...
  [javadoc] Loading source files for package org.apache.lucene.index...
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package 
org.apache.lucene.search.payloads...
  [javadoc] Loading source files for package 
org.apache.lucene.search.similarities...
  [javadoc] Loading source files for package org.apache.lucene.search.spans...
  [javadoc] Loading source files for package org.apache.lucene.store...
  [javadoc] Loading source files for package org.apache.lucene.util...
  [javadoc] Loading source files for package org.apache.lucene.util.automaton...
  [javadoc] Loading source files for package org.apache.lucene.util.fst...
  [javadoc] Loading source files for package org.apache.lucene.util.mutable...
  [javadoc] Loading source files for package org.apache.lucene.util.packed...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.8.0-ea
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/docs/core/help-doc.html...
  [javadoc] 1 warning

[...truncated 44 lines...]
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.analysis.ar...
  [javadoc] warning: [options] bootstrap class path not set in conjunction with 
-source 1.7
  [javadoc] Loading source files for package org.apache.lucene.analysis.bg...
  [javadoc] Loading source files for package org.apache.lucene.analysis.br...
  [javadoc] Loading source files for package org.apache.lucene.analysis.ca...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.charfilter...
  [javadoc] Loading source files for package org.apache.lucene.analysis.cjk...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.commongrams...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.compound...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.compound.hyphenation...
  [javadoc] Loading source files for package org.apache.lucene.analysis.core...
  [javadoc] Loading source files for package org.apache.lucene.analysis.cz...
  [javadoc] Loading source files for package org.apache.lucene.analysis.da...
  [javadoc] Loading source files for package org.apache.lucene.analysis.de...
  [javadoc] Loading source files for package org.apache.lucene.analysis.el...
  [javadoc] Loading source files for package org.apache.lucene.analysis.en...
  [javadoc] Loading source files for package org.apache.lucene.analysis.es...
  [javadoc] Loading source files for package org.apache.lucene.analysis.eu...
  [javadoc] Loading source files for package org.apache.lucene.analysis.fa...
  [javadoc] Loading source files for package org.apache.lucene.analysis.fi...
  [javadoc] Loading source files for package org.apache.lucene.analysis.fr...
  [javadoc] Loading source files for package org.apache.lucene.analysis.ga...
  [javadoc] Loading source files for package org.apache.lucene.analysis.gl...
  [javadoc] Loading source files for package org.apache.lucene.analysis.hi...
  [javadoc] Loading source files for package org.apache.lucene.analysis.hu...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.hunspell...
  [javadoc] Loading source files for package org.apache.lucene.analysis.hy...
  [javadoc] Loading source files for package org.apache.lucene.analysis.id...
  [javadoc] Loading source files for package org.apache.lucene.analysis.in...
  [javadoc] Loading source files for package org.apache.lucene.analysis.it...
  [javadoc] Loading source files for package org.apache.lucene.analysis.lv...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.miscellaneous...
  [javadoc] Loadin

[jira] [Updated] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3981:
---

Attachment: SOLR-3981.patch

patch wit hteh test i was working on, as well as a fix...

the Document itself can serve as the "set" to keep track of which field names 
have already been added.  because the final boost for the field name is the 
product of the individual boosts, we don't have to ensure that the (solr) 
docBoost and (solr) fieldBoost(s) are combined into the _first_ value of each 
copyField -- we just have to ensure that each is only used once.  (multiple 
copyFields with the same dest will result in them being multiplied in the final 
dest field's norm but that's always been true)

i'm still running the full test suite, and i want to work on a test that 
actually indexes a doc and inspects the encoded norms just to be certain i'm 
not missing something.

> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
> Attachments: SOLR-3981.patch
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core

2012-10-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482774#comment-13482774
 ] 

Mark Miller commented on SOLR-3939:
---

In two other issues I was working on, unrelated changes seemed to start causing 
test fails in one of the solrcloud tests - it's a fail I had seen sometimes in 
the past on Apache jenkins. A fail about waiting to notice a live node drop. It 
seems that was caused by this - it took some time to trace it back here. One of 
the nodes doesn't see a live node change because he is stuck in a leader 
election loop.

Given that, I plan on committing what I have so far - so it stops blocking my 
other two issues. We can then iterate further on trunk.

> Solr Cloud recovery and leader election when unloading leader core
> --
>
> Key: SOLR-3939
> URL: https://issues.apache.org/jira/browse/SOLR-3939
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-BETA, 4.0
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Critical
>  Labels: 4.0.1_Candidate
> Fix For: 4.1, 5.0
>
> Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch
>
>
> When a leader core is unloaded using the core admin api, the followers in the 
> shard go into recovery but do not come out. Leader election doesn't take 
> place and the shard goes down.
> This effects the ability to move a micro-shard from one Solr instance to 
> another Solr instance.
> The problem does not occur 100% of the time but a large % of the time. 
> To setup a test, startup Solr Cloud with a single shard. Add cores to that 
> shard as replicas using core admin. Then unload the leader core using core 
> admin. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_07) - Build # 1955 - Failure!

2012-10-23 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1955/
Java: 32bit/jdk1.7.0_07 -client -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 24576 lines...]
-documentation-lint:
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
 [exec]   missing Constructors: KNearestNeighborClassifier(int)
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/ClassificationResult.html
 [exec]   missing Constructors: ClassificationResult(java.lang.String, 
double)
 [exec]   missing Methods: getAssignedClass()
 [exec]   missing Methods: getScore()
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:252: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1919:
 exec returned: 1

Total time: 28 minutes 8 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.7.0_07 -client -XX:+UseParallelGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482765#comment-13482765
 ] 

Hoss Man commented on SOLR-3981:


Toke suggested in SOLR-3875...

{quote}
One solution would be to keep track of used fields (directly specified as well 
as copyFields) and only assign the full boost once per document. If the number 
of unique fields/document is low, a simple list would probably be the fastest 
and with low GC impact. For a higher number of unique fields, a Set might be 
better. An optimization would be to only create the tracking structure once a 
boost != 1.0f is encountered and only store the fields with boost != 1.0f, so 
that an update without boosts would not get a performance penalty.
{quote}

I _was_ thinking that a more straight forward solution would be to build up the 
entire "Document" w/o any regard to the docBoost, and then only at the end loop 
over the fields in that Document and multiple the docBoost if it's indexed & 
!omitNorms -- but then i realized that at that level there is no general way to 
"set" the boost.

I'm working on a patch with a test demonstrating the problem ... that may help 
inform an appropriate solution.

> docBoost is compounded on copyField
> ---
>
> Key: SOLR-3981
> URL: https://issues.apache.org/jira/browse/SOLR-3981
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
>
> As noted by Toke in a comment on SOLR-3875...
> https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233
> {quote}
> While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
> boosting for copyFields are not. A sample document:
> {code}
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for copyFields
>   video ThomasEgense and Toke Eskildsen
>   Test
>   bug
>   something else
>   bug
>   bug
>   
> {code}
> The fields name, manu, cat, features, keywords and content gets copied to 
> text and a search for thomasegense matches the text-field with query 
> explanation
> {code}
> 70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
> of:
>   70384.67 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 229376.0 = fieldNorm(doc=0)
> {code}
> If the two last fields keywords and content are removed from the sample 
> document, the score is reduced by a factor 100 (docBoost^2).
> {quote}
> (This is a continuation of some of the problems caused by the changes made 
> when the concept of docBoost was eliminated from the underly IndexWRiter 
> code, and overlooked due to the lack of testing of docBoosts at the solr 
> level - SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3875) Document boost does not work correctly when using multi-valued fields

2012-10-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482746#comment-13482746
 ] 

Hoss Man commented on SOLR-3875:


Toke: thanks for following up - too bad we didn't catch this other problem 
before 4.0.

I've spun off SOLR-3981 to work on this since SOLR-3875 is already resolved and 
listed as fixed in 4.0 (we can't (sanely) re-open issues that were recorded in 
CHANGES.txt for official releases since it would leave users confused as to 
what parts of those issues were resolved in each version)

> Document boost does not work correctly when using multi-valued fields
> -
>
> Key: SOLR-3875
> URL: https://issues.apache.org/jira/browse/SOLR-3875
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis, update
>Affects Versions: 4.0-BETA
>Reporter: Toke Eskildsen
>Assignee: Hoss Man
>Priority: Critical
> Fix For: 4.0, 4.1, 5.0
>
> Attachments: SOLR-3875.patch
>
>
> In Solr 4 BETA & trunk, document boosts skews the ranking for documents with 
> multi value fields tremendously. A document boost of 5 combined with 15 
> values in a multi value field results in scores above 1,000,000,000, while a 
> boost of 0,5 results in scores below 0,001. The error is not present in Solr 
> 3.6.
> Thomas Egense and I have tracked it down to a change in Solr DocumentBuilder 
> committed 20110827 (@1162347) by Mike McCandless, as part of work done on 
> LUCENE-2308. The problem is that Lucene multiplies the boosts of multiple 
> instances of the same field when updating the index.
> The old DocumentBuilder, used in Lucene 3.6, handled this by calculating the 
> score for the field (docBoost*fieldBoost) and assigning it to the first 
> instance of the field, then setting the boost to 1.0f and assigning that to 
> subsequent instances of the field. This effectively assigned 
> docBoost*fieldBoost to the field, regardless of the number of instances.
> The updated DocumentBuilder (see 
> https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java?revision=1388778&view=markup),
>  used in Lucene 4 BETA & trunk, also assigns docBoost*fieldBoost to the first 
> instance of the field. Then it sets fieldBoost = docBoost and continues to 
> assign docBoost*fieldBoost to subsequent instances. Using the example 
> mentioned above, the generated IndexableFields will get assigned boosts of 5, 
> 5*5, 5*5... 5*5. As Lucene multiplies all the values, 15 instances of the 
> same field will have a collective boost of 5*25^14.
> This can be demonstrated with the Solr tutorial example by indexing the 
> sample documents and adding the document 
> {code:xml}
> 
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for multivalued fields
>   Thomas Egense and Toke Eskildsen
>   Test
>   bug
>   insane_boost
>   something else
>   something else
>   something else
>   something else
>   something else
>   something else
>   something else
>   something else
>   something else
>   something else
>   something else
>   something else
>   something else  
> 
> 
> {code}
> The _manu_ & _features_-fields gets copied to _text_ and a search for 
> _thomas_ matches the _text_-field with query explanation
> {code:xml}
> 
> 2.44373361E10 = (MATCH) weight(text:thomas in 0) [DefaultSimilarity], result 
> of:
>   2.44373361E10 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 3.2512918 = idf(docFreq=3, maxDocs=38)
> 7.5161928E9 = fieldNorm(doc=0)
> 
> {code}
> Thomas and I are too pressed for time to attempt a proper patch at the 
> moment, but we guess that a reversion to the old algorithm of assigning the 
> combined boost to the first instance and 1.0f to all subsequent instances 
> would work?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4501) optimize 4.1 codec's encoding of frequencies

2012-10-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4501:


Attachment: LUCENE-4501.patch

This seems to shave about 1.5% off the .doc file in my tests.

I'm worried about making this PF confusing with these optimizations though.

> optimize 4.1 codec's encoding of frequencies
> 
>
> Key: LUCENE-4501
> URL: https://issues.apache.org/jira/browse/LUCENE-4501
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Robert Muir
> Attachments: LUCENE-4501.patch
>
>
> If we wanted, we could encode freq-1 into the FOR blocks (since it cannot be 
> 0) and save some space.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4501) optimize 4.1 codec's encoding of frequencies

2012-10-23 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4501:
---

 Summary: optimize 4.1 codec's encoding of frequencies
 Key: LUCENE-4501
 URL: https://issues.apache.org/jira/browse/LUCENE-4501
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir


If we wanted, we could encode freq-1 into the FOR blocks (since it cannot be 0) 
and save some space.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3981) docBoost is compounded on copyField

2012-10-23 Thread Hoss Man (JIRA)
Hoss Man created SOLR-3981:
--

 Summary: docBoost is compounded on copyField
 Key: SOLR-3981
 URL: https://issues.apache.org/jira/browse/SOLR-3981
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.1


As noted by Toke in a comment on SOLR-3875...

https://issues.apache.org/jira/browse/SOLR-3875?focusedCommentId=13482233&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13482233

{quote}
While boosting of multi-value fields is handled correctly in Solr 4.0.0, 
boosting for copyFields are not. A sample document:
{code}

  Insane score Example. Score = 10E9 
  Document boost broken for copyFields
  video ThomasEgense and Toke Eskildsen
  Test
  bug
  something else
  bug
  bug
  
{code}
The fields name, manu, cat, features, keywords and content gets copied to text 
and a search for thomasegense matches the text-field with query explanation
{code}
70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
of:
  70384.67 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
0.30685282 = idf(docFreq=1, maxDocs=1)
229376.0 = fieldNorm(doc=0)
{code}
If the two last fields keywords and content are removed from the sample 
document, the score is reduced by a factor 100 (docBoost^2).
{quote}

(This is a continuation of some of the problems caused by the changes made when 
the concept of docBoost was eliminated from the underly IndexWRiter code, and 
overlooked due to the lack of testing of docBoosts at the solr level - 
SOLR-3885))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2012-10-23 Thread Chris Russell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482734#comment-13482734
 ] 

Chris Russell commented on SOLR-2894:
-

In my experience using this patch, it seems that it does not over-request when 
enforcing a limit?
This is problematic because, for example, in a situation where you have many 
slaves and you are pivoting on a fairly evenly distributed field and setting 
your facet limit to X, the Xth distinct value for that field by document count 
on each slave is likely to be different.  The result is that some facet values 
close to your limit boundary will not get reported for aggregation, which will 
make your ultimate results somewhat inaccurate.

It was my impression that other facet-based features of solr over-request when 
there is a limit to combat this situation?  For example if you specify limit 
10, the distributed query might have limit 100 or 1000, and then during 
aggregation it would be limited to the top 10.

I am working on similar functionality for this patch.

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
> Fix For: 4.1
>
> Attachments: distributed_pivot.patch, distributed_pivot.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

2012-10-23 Thread Terrance A. Snyder (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482727#comment-13482727
 ] 

Terrance A. Snyder commented on SOLR-3583:
--

[~selah] Please do! Your contribution is amazing and pushes SOLR into a brave 
new world.

> Percentiles for facets, pivot facets, and distributed pivot facets
> --
>
> Key: SOLR-3583
> URL: https://issues.apache.org/jira/browse/SOLR-3583
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris Russell
>Priority: Minor
>  Labels: newbie, patch
> Fix For: 4.1
>
> Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds 
> percentiles and averages to facets, pivot facets, and distributed pivot 
> facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

2012-10-23 Thread Chris Russell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455182#comment-13455182
 ] 

Chris Russell edited comment on SOLR-2894 at 10/23/12 9:45 PM:
---

Regarding facet.pivot.limit.method and facet.limit, it looks like these are not 
checked on a per-field basis?
So, if a user sets different limits for different fields and wants 'combined' 
limiting, that is not possible?
For example a user might set:

f.field1.facet.limit=10
f.field1.facet.pivot.limit.method=combined
f.field2.facet.limit=20 

And the combined method will not be used...
If the user sets facet.pivot.limit.method=combined it looks like the same limit 
will be used for all fields?  Whatever the global facet.limit is set to?


  was (Author: selah):
Regarding facet.pivot.limit.method and facet.limit, it looks like these are 
not checked on a per-field basis?
So, if a user sets different limits for different fields and wants 'combined' 
limiting, that is not possible?
For example a user might set:

f.field1.facet.limit=10
f.field1.facet.pivot.limit.method=combined
f.field2.facet.limit=20 

And the combined method will not be used...
If the user sets facet.pivot.limit.method=combined it looks like the same limit 
will be used for all fields?  Whatever the global facet.limit is set to?
Unfortunate.
  
> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
> Fix For: 4.1
>
> Attachments: distributed_pivot.patch, distributed_pivot.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

2012-10-23 Thread Chris Russell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482725#comment-13482725
 ] 

Chris Russell commented on SOLR-3583:
-

I have gotten some time recently to work on this.
I have disentangled my additions from the SOLR-2894 patch, and will be making a 
few enhancements before attempting to make it trunk-compatible.

> Percentiles for facets, pivot facets, and distributed pivot facets
> --
>
> Key: SOLR-3583
> URL: https://issues.apache.org/jira/browse/SOLR-3583
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris Russell
>Priority: Minor
>  Labels: newbie, patch
> Fix For: 4.1
>
> Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds 
> percentiles and averages to facets, pivot facets, and distributed pivot 
> facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3964) Solr does not return error, even though create collection unsuccessfully

2012-10-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482689#comment-13482689
 ] 

Mark Miller commented on SOLR-3964:
---

bq. Solr does not return error,

This is a limitation of the current collections API - you don't currently get a 
response - it just drops the create command on the queue where the overseer 
will pull it.

Optionally waiting around for completion or adding some way to check the status 
of the cmd is something we need to add.

> Solr does not return error, even though create collection unsuccessfully 
> -
>
> Key: SOLR-3964
> URL: https://issues.apache.org/jira/browse/SOLR-3964
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
>Reporter: milesli
>  Labels: lack, message, response
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> Solr does not return error,
>  even though create/delete collection unsuccessfully; 
>  even though the request URL is incorrect;
> (example: 
> http://127.0.0.1:8983/solr/admin/collections?action=CREATE&name=tenancy_milesnumShards=3&numReplicas=2&collection.configName=myconf)
>  even though pass the collection name  already exists;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_07) - Build # 1953 - Failure!

2012-10-23 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1953/
Java: 32bit/jdk1.7.0_07 -server -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 24575 lines...]
-documentation-lint:
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
 [exec]   missing Constructors: KNearestNeighborClassifier(int)
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/ClassificationResult.html
 [exec]   missing Constructors: ClassificationResult(java.lang.String, 
double)
 [exec]   missing Methods: getAssignedClass()
 [exec]   missing Methods: getScore()
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:252: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1919:
 exec returned: 1

Total time: 28 minutes 46 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.7.0_07 -server -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1401343 - /lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java

2012-10-23 Thread Dawid Weiss
Just peeking at the code, Tommaso --

Map classCounts = new HashMap();

this will cause unpredictable results in case of class ties (hash map
order varies from vm to vm). I think it'd be better to make it an
ordered set (on the class name for example), then the ties would
always be broken in the same way. Alternatively, you can add another
condition in the if loop searching for the winning class (and
resolving the ties).

Just a thought.

D.

On Tue, Oct 23, 2012 at 6:32 PM,   wrote:
> Author: tommaso
> Date: Tue Oct 23 16:32:05 2012
> New Revision: 1401343
>
> URL: http://svn.apache.org/viewvc?rev=1401343&view=rev
> Log:
> [LUCENE-4345] - adding @lucene.experimental annotation to kNN
>
> Modified:
> 
> lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java
>
> Modified: 
> lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java?rev=1401343&r1=1401342&r2=1401343&view=diff
> ==
> --- 
> lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java
>  (original)
> +++ 
> lucene/dev/trunk/lucene/classification/src/java/org/apache/lucene/classification/KNearestNeighborClassifier.java
>  Tue Oct 23 16:32:05 2012
> @@ -33,6 +33,7 @@ import java.util.Map;
>  /**
>   * A k-Nearest Neighbor classifier (see 
> http://en.wikipedia.org/wiki/K-nearest_neighbors) based
>   * on {@link MoreLikeThis}
> + * @lucene.experimental
>   */
>  public class KNearestNeighborClassifier implements Classifier {
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482647#comment-13482647
 ] 

Alan Woodward commented on SOLR-1972:
-

Hm, OK.  I'm creating the various Metrics objects in the base class 
constructor, and registering them by class using this.getClass().  Only problem 
here is that in a super constructor, getClass() returns the superclass.  Oops.

If I move the object creation to init() I get other errors, because 
RequestHandlers are registered with JMX before init() is called, and JMX calls 
getStatistics() to get all the various measurement names and register them.

Maybe put a guard in getStatistics to check if the counters are null, and if 
they are, instantiate them?  Seems a bit hacky though.  Let me have a think 
about this.

In re the precision of the measurements, the jscript in the front end could 
presumably round them to 2 sig figs - that way they look prettier in the UI, 
but are still precise for any client that wants to use it.

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4498.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.1

> pulse docfreq=1 DOCS_ONLY for 4.1 codec
> ---
>
> Key: LUCENE-4498
> URL: https://issues.apache.org/jira/browse/LUCENE-4498
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Robert Muir
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, 
> LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch
>
>
> We have pulsing codec, but currently this has some downsides:
> * its very general, wrapping an arbitrary postingsformat and pulsing 
> everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
> * reuse is hairy: because it specializes its enums based on these cutoffs, 
> when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
> avoid the worst cases where we clone indexinputs for tons of terms.
> On the other hand the way the 4.1 codec encodes "primary key" fields is 
> pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
> which tells us where to seek in the .doc to read our one lonely vint.
> I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
> write the lone doc delta where we would write docStartFP. 
> We can avoid the hairy reuse problem too, by just supporting this in 
> refillDocs() in BlockDocsEnum instead of specializing.
> This would remove the additional seek for "primary key" fields without really 
> any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3964) Solr does not return error, even though create collection unsuccessfully

2012-10-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482575#comment-13482575
 ] 

Hoss Man commented on SOLR-3964:


can you please elaborate on what problem you are seeing.

specifically: 
1) how are you running solr and what do your configs look like (ie: it appears 
you are running in cloud mode, but that's not certain)
2) what commands/requests do you execute that don't behave the way you expect
3) whta response do you expect from those commands/requests
4) what response do you _actually_ get from those commands/requests


off the cuff i suspect that unless you made a cut/paste mistake when creating 
this issue, the problem you are having is that you are missing a "&" in your 
URL, and what solr is doing is creating a collection with the name 
"tenancy_milesnumShards=3" when what you really want is a collection named 
"tenancy_miles" (which you imply already exists, but haven't provided any 
concrete details for us to be certain)

> Solr does not return error, even though create collection unsuccessfully 
> -
>
> Key: SOLR-3964
> URL: https://issues.apache.org/jira/browse/SOLR-3964
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
>Reporter: milesli
>  Labels: lack, message, response
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> Solr does not return error,
>  even though create/delete collection unsuccessfully; 
>  even though the request URL is incorrect;
> (example: 
> http://127.0.0.1:8983/solr/admin/collections?action=CREATE&name=tenancy_milesnumShards=3&numReplicas=2&collection.configName=myconf)
>  even though pass the collection name  already exists;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: branches/lucene_solr_4_0 and 4.0.1?

2012-10-23 Thread Yonik Seeley
On Tue, Oct 23, 2012 at 2:10 PM, Mark Miller  wrote:
> I'd say two things:
>
> there are def some bad bugs already that would warrant a 4.01.
>
> I'd push for 4.1 well before jan.

+1

I'd add that just because there hasn't been a lot of time to find
additional bugs in 4.0 doesn't mean that we should artificially delay
a 4.0.1.  If/when more bugs are found after that, we can always do a
4.0.2 (if 4.1 still isn't imminent).

-Yonik
http://lucidworks.com



> - Mark
>
> Sent from my iPhone
>
> On Oct 19, 2012, at 6:57 AM, Erick Erickson  wrote:
>
>> Personally, I suspect that enough people are going to hop on the 4.0
>> code that _something_ will come bubbling up out of the cracks that
>> needs to be addressed. I mean there's a lot that's in that release, plus
>> things that people are geeked to try. Not necessarily killer bugs, more
>> like enhancements.
>>
>> So I'm rather expecting a relatively quick turn-around for 4.1 and wouldn't
>> push for a 4.0.1 unless and until there's a killer bug. Which, as Robert
>> says, there aren't any examples of in the CHANGES file yet, so no reason
>> for a 4.0.1.
>>
>> I'll throw out a straw-man proposal of targeting January for 4.1. Not a hard
>> date, more a proposal for taking stock after the Holidays and seeing what
>> we think.
>>
>> Besides, even though I don't hava a hand in it, is such a pain, especially
>> for people who'd rather be coding
>>
>> Erick
>>
>> On Thu, Oct 18, 2012 at 7:58 PM, Robert Muir  wrote:
>>> On Thu, Oct 18, 2012 at 4:53 PM, Mark Miller  wrote:
 I don't think a 4.0.1 would be strange at all.
>>>
>>> I just think it would be strange since there aren't really any serious
>>> bugs yet in the lucene CHANGES.txt? I also don't think there has been
>>> enough time for anyone to actually find any bugs, its only been like 6
>>> days since we released.
>>>

 4.X is essentially trunk to me now. I would put in changes that I want
 to bake for future 4.1, 4.2, 4.3, etc changes.
>>>
>>> Sure, well there aren't many architectural changes yet since 4.0, and
>>> currently we have the ability to make and bake large changes to lucene
>>> in many cases (block postings format, compressed stored fields, etc)
>>> without introducing risk, since they are just experimental until we
>>> decide to fold them into the default.
>>>
>>> But personally as soon I hit some limit in the codec API (which I
>>> expect will happen), or want to work on something biggish like
>>> positions iterators, I'll be looking at doing that kind of breaking
>>> change only in trunk.
>>>
>>> I just think we shouldn't hold back from that: we should develop in a
>>> correct and safe way and not backport scary stuff or majorly break
>>> APIs to get them out faster, instead 4.x should stay stable and we
>>> should plan on 5.x being in our own lifetimes.
>>>
>>> i dont want there to be the assumption that 5.0 is 3 years out.
>>>

 When you have bad bugs, you don't want to worry about what's baking -
 you just want to put out a bug fix release.
>>>
>>> I totally agree with this! But I have serious concerns about the
>>> ability for this community to say "hey we fixed some nasty shit, lets
>>> get a bugfix out ASAP". Nobody is really testing until release
>>> candidates are issued, the 72-hour voting period designed to be fair
>>> to devs in different timezones is bastardized as some iterative QA
>>> cycle, etc etc.
>>>
>>> So if we are going to go thru all the trouble, I'd rather it be a 4.1
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_07) - Build # 1951 - Still Failing!

2012-10-23 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1951/
Java: 32bit/jdk1.7.0_07 -client -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 24575 lines...]
-documentation-lint:
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
 [exec]   missing Constructors: KNearestNeighborClassifier(int)
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/ClassificationResult.html
 [exec]   missing Constructors: ClassificationResult(java.lang.String, 
double)
 [exec]   missing Methods: getAssignedClass()
 [exec]   missing Methods: getScore()
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:252: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1919:
 exec returned: 1

Total time: 30 minutes 39 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.7.0_07 -client -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: branches/lucene_solr_4_0 and 4.0.1?

2012-10-23 Thread Mark Miller
I'd say two things: 

there are def some bad bugs already that would warrant a 4.01. 

I'd push for 4.1 well before jan. 

- Mark

Sent from my iPhone

On Oct 19, 2012, at 6:57 AM, Erick Erickson  wrote:

> Personally, I suspect that enough people are going to hop on the 4.0
> code that _something_ will come bubbling up out of the cracks that
> needs to be addressed. I mean there's a lot that's in that release, plus
> things that people are geeked to try. Not necessarily killer bugs, more
> like enhancements.
> 
> So I'm rather expecting a relatively quick turn-around for 4.1 and wouldn't
> push for a 4.0.1 unless and until there's a killer bug. Which, as Robert
> says, there aren't any examples of in the CHANGES file yet, so no reason
> for a 4.0.1.
> 
> I'll throw out a straw-man proposal of targeting January for 4.1. Not a hard
> date, more a proposal for taking stock after the Holidays and seeing what
> we think.
> 
> Besides, even though I don't hava a hand in it, is such a pain, especially
> for people who'd rather be coding
> 
> Erick
> 
> On Thu, Oct 18, 2012 at 7:58 PM, Robert Muir  wrote:
>> On Thu, Oct 18, 2012 at 4:53 PM, Mark Miller  wrote:
>>> I don't think a 4.0.1 would be strange at all.
>> 
>> I just think it would be strange since there aren't really any serious
>> bugs yet in the lucene CHANGES.txt? I also don't think there has been
>> enough time for anyone to actually find any bugs, its only been like 6
>> days since we released.
>> 
>>> 
>>> 4.X is essentially trunk to me now. I would put in changes that I want
>>> to bake for future 4.1, 4.2, 4.3, etc changes.
>> 
>> Sure, well there aren't many architectural changes yet since 4.0, and
>> currently we have the ability to make and bake large changes to lucene
>> in many cases (block postings format, compressed stored fields, etc)
>> without introducing risk, since they are just experimental until we
>> decide to fold them into the default.
>> 
>> But personally as soon I hit some limit in the codec API (which I
>> expect will happen), or want to work on something biggish like
>> positions iterators, I'll be looking at doing that kind of breaking
>> change only in trunk.
>> 
>> I just think we shouldn't hold back from that: we should develop in a
>> correct and safe way and not backport scary stuff or majorly break
>> APIs to get them out faster, instead 4.x should stay stable and we
>> should plan on 5.x being in our own lifetimes.
>> 
>> i dont want there to be the assumption that 5.0 is 3 years out.
>> 
>>> 
>>> When you have bad bugs, you don't want to worry about what's baking -
>>> you just want to put out a bug fix release.
>> 
>> I totally agree with this! But I have serious concerns about the
>> ability for this community to say "hey we fixed some nasty shit, lets
>> get a bugfix out ASAP". Nobody is really testing until release
>> candidates are issued, the 72-hour voting period designed to be fair
>> to devs in different timezones is bastardized as some iterative QA
>> cycle, etc etc.
>> 
>> So if we are going to go thru all the trouble, I'd rather it be a 4.1
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3335 - Failure

2012-10-23 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3335/

All tests passed

Build Log:
[...truncated 24628 lines...]
-documentation-lint:
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/ClassificationResult.html
 [exec]   missing Constructors: ClassificationResult(java.lang.String, 
double)
 [exec]   missing Methods: getAssignedClass()
 [exec]   missing Methods: getScore()
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
 [exec]   missing Constructors: KNearestNeighborClassifier(int)
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/build.xml:60:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build.xml:252:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/common-build.xml:1919:
 exec returned: 1

Total time: 46 minutes 30 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482501#comment-13482501
 ] 

Shawn Heisey edited comment on SOLR-1972 at 10/23/12 5:48 PM:
--

I have now discovered a real problem.  All of my search handlers have the exact 
same statistics.  I have defined /lbcheck, /ncdismax, /search, and /select.

That brings to mind a potential test that could be added -- make sure that when 
you issue queries against one handler, stats on other handlers do not see their 
numbers go up.  No idea how to write it, though.


  was (Author: elyograg):
I have now discovered a real problem.  All of my search handlers have the 
exact same statistics.  I have defined /lbcheck, /ncdismax, /search, and 
/select.
  
> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482501#comment-13482501
 ] 

Shawn Heisey edited comment on SOLR-1972 at 10/23/12 5:40 PM:
--

I have now discovered a real problem.  All of my search handlers have the exact 
same statistics.  I have defined /lbcheck, /ncdismax, /search, and /select.

  was (Author: elyograg):
I have now discovered what a real problem.  All of my search handlers have 
the exact same statistics.  I have defined /lbcheck, /ncdismax, /search, and 
/select.
  
> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482501#comment-13482501
 ] 

Shawn Heisey commented on SOLR-1972:


I have now discovered what a real problem.  All of my search handlers have the 
exact same statistics.  I have defined /lbcheck, /ncdismax, /search, and 
/select.

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4494) Add phoenetic algorithm Match Rating approach to lucene

2012-10-23 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482476#comment-13482476
 ] 

Ryan McKinley commented on LUCENE-4494:
---

This looks great -- though it seems like the more appropriate home is in 
commons codec:
http://commons.apache.org/codec/api-release/org/apache/commons/codec/language/package-summary.html


> Add phoenetic algorithm Match Rating approach to lucene
> ---
>
> Key: LUCENE-4494
> URL: https://issues.apache.org/jira/browse/LUCENE-4494
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Colm Rice
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4494.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I want to add MatchRatingApproach algorithm to the Lucene project. 
> What I have at the moment is a class called 
> org.apache.lucene.analysis.phoenetic.MatchRatingApproach implementing 
> StringEncoder
> I have a pretty comprehensive test file located at: 
> org.apache.lucene.analysis.phonetic.MatchRatingApproachTests
> It's not exactly existing pattern so I'm going to need a bit of advice here. 
> Thanks! Feel free to email.
> FYI: It my first contribitution so be gentle :-) C# is my native.
> Reference: http://en.wikipedia.org/wiki/Match_rating_approach

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482475#comment-13482475
 ] 

Shawn Heisey commented on SOLR-1972:


I would also move avgTimePerRequest to right before medianRequestTime so 
similar numbers are all together.  I was going to suggest giving it a new name 
to jive with the additions, but there are likely a lot of existing customer 
scripts that rely on that name -- including some of mine.


> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_35) - Build # 1950 - Failure!

2012-10-23 Thread Robert Muir
I committed a fix: classification module needs to link to queries javadocs.

But there are more problems, 'ant documentation-lint' still fails:

 [echo] Checking for missing docs...
 [exec]
 [exec] 
build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
 [exec]   missing Constructors: KNearestNeighborClassifier(int)
 [exec]
 [exec] 
build/docs/classification/org/apache/lucene/classification/ClassificationResult.html
 [exec]   missing Constructors:
ClassificationResult(java.lang.String, double)
 [exec]   missing Methods: getAssignedClass()
 [exec]   missing Methods: getScore()
 [exec]
 [exec] Missing javadocs were found!

BUILD FAILED

On Tue, Oct 23, 2012 at 1:07 PM, Policeman Jenkins Server
 wrote:
> Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1950/
> Java: 32bit/jdk1.6.0_35 -client -XX:+UseSerialGC
>
> All tests passed
>
> Build Log:
> [...truncated 23898 lines...]
> -documentation-lint:
>  [echo] Checking for broken links...
>  [exec]
>  [exec] Crawl/parse...
>  [exec]
>  [exec] Verify...
>  [exec]
>  [exec] 
> file:///build/docs/classification/org/apache/lucene/classification/class-use/Classifier.html
>  [exec]   BROKEN LINK: 
> file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html
>  [exec]
>  [exec] 
> file:///build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
>  [exec]   BROKEN LINK: 
> file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html
>  [exec]
>  [exec] 
> file:///build/docs/classification/org/apache/lucene/classification/package-summary.html
>  [exec]   BROKEN LINK: 
> file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html
>  [exec]
>  [exec] Broken javadocs links were found!
>
> BUILD FAILED
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The 
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:235: The 
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1908:
>  exec returned: 1
>
> Total time: 35 minutes 35 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Recording test results
> Description set: Java: 32bit/jdk1.6.0_35 -client -XX:+UseSerialGC
> Email was triggered for: Failure
> Sending email for trigger: Failure
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482470#comment-13482470
 ] 

Shawn Heisey edited comment on SOLR-1972 at 10/23/12 5:13 PM:
--

This is lightyears beyond what I could have hoped for when I first opened this 
issue.  I do have one more picky note: the extreme floating point precision of 
the output.  Here's what I am getting:

handlerStart: 1351011529717
requests: 14
errors:   0
timeouts: 0
totalTime:53483.059463
avgTimePerRequest:  3820.2185330714287
avgRequestsPerSecond:   0.1592740558481214
5minRateReqsPerSecond:  0.6393213424767414
15minRateReqsPerSecond: 0.7422605686168207
medianRequestTime:  2537.401157
75thPcRequestTime:  7728.151086
95thPcRequestTime:  8963.867643
99thPcRequestTime:  8963.867643
999thPcRequestTime: 8963.867643

Here's what I would like to see instead ... lower precision, and rounded up 
when the first eliminated digit is 5 or higher.

handlerStart: 1351011529717
requests: 14
errors:   0
timeouts: 0
totalTime:53483
avgTimePerRequest:  3820.22
avgRequestsPerSecond:   0.16
5minRateReqsPerSecond:  0.64
15minRateReqsPerSecond: 0.74
medianRequestTime:  2537.40
75thPcRequestTime:  7728.15
95thPcRequestTime:  8963.87
99thPcRequestTime:  8963.87
999thPcRequestTime: 8963.87


  was (Author: elyograg):
This is lightyears beyond what I could have hoped for when I first opened 
this issue.  I do have one more picky note: the extreme floating point 
precision of the output.  Here's what I am getting:

handlerStart: 1351011529717
requests: 14
errors:   0
timeouts: 0
totalTime:53483.059463
avgTimePerRequest:  3820.2185330714287
avgRequestsPerSecond:   0.1592740558481214
5minRateReqsPerSecond:  0.6393213424767414
15minRateReqsPerSecond: 0.7422605686168207
medianRequestTime:  2537.401157
75thPcRequestTime:  7728.151086
95thPcRequestTime:  8963.867643
99thPcRequestTime:  8963.867643
999thPcRequestTime: 8963.867643

Here's what I would like to see instead ... lower precision, and rounded up 
when the first eliminated digit is 5 or higher.

handlerStart: 1351011529717
requests: 14
errors:   0
timeouts: 0
totalTime:53483
avgTimePerRequest:  3820
avgRequestsPerSecond:   0.16
5minRateReqsPerSecond:  0.64
15minRateReqsPerSecond: 0.74
medianRequestTime:  2537.40
75thPcRequestTime:  7728.15
95thPcRequestTime:  8963.87
99thPcRequestTime:  8963.87
999thPcRequestTime: 8963.87

  
> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: h

[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482470#comment-13482470
 ] 

Shawn Heisey commented on SOLR-1972:


This is lightyears beyond what I could have hoped for when I first opened this 
issue.  I do have one more picky note: the extreme floating point precision of 
the output.  Here's what I am getting:

handlerStart: 1351011529717
requests: 14
errors:   0
timeouts: 0
totalTime:53483.059463
avgTimePerRequest:  3820.2185330714287
avgRequestsPerSecond:   0.1592740558481214
5minRateReqsPerSecond:  0.6393213424767414
15minRateReqsPerSecond: 0.7422605686168207
medianRequestTime:  2537.401157
75thPcRequestTime:  7728.151086
95thPcRequestTime:  8963.867643
99thPcRequestTime:  8963.867643
999thPcRequestTime: 8963.867643

Here's what I would like to see instead ... lower precision, and rounded up 
when the first eliminated digit is 5 or higher.

handlerStart: 1351011529717
requests: 14
errors:   0
timeouts: 0
totalTime:53483
avgTimePerRequest:  3820
avgRequestsPerSecond:   0.16
5minRateReqsPerSecond:  0.64
15minRateReqsPerSecond: 0.74
medianRequestTime:  2537.40
75thPcRequestTime:  7728.15
95thPcRequestTime:  8963.87
99thPcRequestTime:  8963.87
999thPcRequestTime: 8963.87


> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3979) slf4j bindings other than jdk -- cannot change log levels

2012-10-23 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-3979:


Attachment: log4j-solr-stuff.zip

here are some implementations of the LogWatcher that work for log4j

these were included in the main distribution, but since it makes for a weird 
compile/test classpath, they were removed.  I think there is a vague plan to 
switch to log4j as the default provider but no activity there...


> slf4j bindings other than jdk -- cannot change log levels
> -
>
> Key: SOLR-3979
> URL: https://issues.apache.org/jira/browse/SOLR-3979
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Shawn Heisey
> Fix For: 4.1
>
> Attachments: log4j-solr-stuff.zip
>
>
> Once I finally got log4j logging working, I was slightly surprised by the 
> message related to SOLR-3426.  I did not really consider that to be a big 
> deal, because if I want to look at my log, I'll be on the commandline anyway.
> I was even more surprised to find that I cannot change any of the log levels 
> from the admin gui.  My default log level is WARN for performance reasons, 
> but every once in a while I like to bump the log level to INFO to 
> troubleshoot a specific problem, then turn it back down.  This is very easy 
> with jdk logging in either 3.x or 4.0.  I changed to log4j because it easily 
> allows me to put the date of a log message on the same line as the first line 
> of the actual log message, so when I grep for things, I have the timestamp in 
> the grep output.
> Currently the only way for me to change my log level is by updating 
> log4j.properties and restarting Solr.  If the capability to figure this out 
> on a class-by-class basis isn't there with log4j, I would at least like to be 
> able to set the root logging level.  Is that possible?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_35) - Build # 1950 - Failure!

2012-10-23 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1950/
Java: 32bit/jdk1.6.0_35 -client -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 23898 lines...]
-documentation-lint:
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [exec] 
 [exec] 
file:///build/docs/classification/org/apache/lucene/classification/class-use/Classifier.html
 [exec]   BROKEN LINK: 
file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html
 [exec] 
 [exec] 
file:///build/docs/classification/org/apache/lucene/classification/KNearestNeighborClassifier.html
 [exec]   BROKEN LINK: 
file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html
 [exec] 
 [exec] 
file:///build/docs/classification/org/apache/lucene/classification/package-summary.html
 [exec]   BROKEN LINK: 
file:///build/docs/core/org/apache/lucene/queries.mlt.MoreLikeThis.html
 [exec] 
 [exec] Broken javadocs links were found!

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:235: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1908:
 exec returned: 1

Total time: 35 minutes 35 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.6.0_35 -client -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482459#comment-13482459
 ] 

Alan Woodward commented on SOLR-1972:
-

Hey Stefan, yeah I've been thinking about graphs, pictures are always good.  
For this sort of information, though, I think you generally want time-series 
representations, and for that you want proper monitoring software (something 
like graphite).  So I think the best thing we can do here is just expose the 
data, and let people plug in their own monitors.  Unless you have a better idea 
for how to represent it?

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene build & ivy problems

2012-10-23 Thread Lance Norskog
Yes. That's what I thought. And that is how it should work. But that is not 
what happened. Ivy did not need to resolve anything, but it called out to a 
resolver which it could not see. After that, it called out over and over. I had 
to completely re-download everything. Since a full download worked, I think it 
boogered up its cache and then confused itself. 

Lance 

- Original Message -

| From: "Uwe Schindler" 
| To: dev@lucene.apache.org
| Sent: Monday, October 22, 2012 12:03:35 AM
| Subject: RE: Lucene build & ivy problems

| It only downloads on the first try, later builds never download
| anything unless dependencies have changed. And if you would be able
| to * not * download them, your build would not succeed.


[jira] [Resolved] (SOLR-3966) LangID not to log WARN

2012-10-23 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3966.


Resolution: Fixed
  Assignee: Hoss Man

Thanks Markus

Committed revision 1401340. - trunk
Committed revision 1401341. - 4x


> LangID not to log WARN
> --
>
> Key: SOLR-3966
> URL: https://issues.apache.org/jira/browse/SOLR-3966
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.0
>Reporter: Markus Jelsma
>Assignee: Hoss Man
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-3966-trunk-1.patch
>
>
> The LangID UpdateProcessor emits the warning below for documents that do not 
> contain an input field. The level should go to DEBUG or be removed. It is not 
> uncommon to see a log full of these messages just because not all documents 
> contain all the fields we're mapping. 
> {code}Oct 19, 2012 11:23:43 AM 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor process
> WARNING: Document  does not contain input field . Skipping 
> this{code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4345) Create a Classification module

2012-10-23 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482448#comment-13482448
 ] 

Tommaso Teofili commented on LUCENE-4345:
-

I've just committed some slight improvements to testing and a basic MLT based 
kNearestNeighbor classifier (with a bunch of TODOs), comments are welcome :)

> Create a Classification module
> --
>
> Key: LUCENE-4345
> URL: https://issues.apache.org/jira/browse/LUCENE-4345
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Minor
> Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, 
> SOLR-3700_2.patch, SOLR-3700.patch
>
>
> Lucene/Solr can host huge sets of documents containing lots of information in 
> fields so that these can be used as training examples (w/ features) in order 
> to very quickly create classifiers algorithms to use on new documents and / 
> or to provide an additional service.
> So the idea is to create a contrib module (called 'classification') to host a 
> ClassificationComponent that will use already seen data (the indexed 
> documents / fields) to classify new documents / text fragments.
> The first version will contain a (simplistic) Lucene based Naive Bayes 
> classifier but more implementations should be added in the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482431#comment-13482431
 ] 

Stefan Matheis (steffkes) commented on SOLR-1972:
-

Hey, i applied to Patch to see where the Output goes and how it looks in the 
admin GUI .. right now, it's listed in the stats-section, as a table with all 
the other given attributes. how helpful would it be to have some kind of graph 
here? perhaps one like we have already on the dashboard to see the 
memory-usage?  Let me know what you think about it :)

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-23 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482423#comment-13482423
 ] 

Erick Erickson commented on SOLR-1293:
--

Otis:

I'm not sure I understand this. As I'm looking at this particular 
implementation, all the potential cores (configuration, data files, etc) are 
already on the particular node, it's just a matter of loading/unloading them. 
If you're thinking about SolrCloud/ZK, oh my aching head! I guess I'd propose 
that how this all works with ZK be split off to different tickets all together, 
too much for me to deal with

I'm explicitly thinking of this as having no cluster-awareness, it's all local 
to a single Solr node. Any meta-level coordination on which node a particular 
query _should_ be routed to is assumed to be out of scope, at least for this 
version.

That said, I can certainly see the value in what you're talking about, that's 
just not the use-case I'm trying to address

> Support for large no:of cores and faster loading/unloading of cores
> ---
>
> Key: SOLR-1293
> URL: https://issues.apache.org/jira/browse/SOLR-1293
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Reporter: Noble Paul
> Fix For: 4.1
>
> Attachments: SOLR-1293.patch
>
>
> Solr , currently ,is not very suitable for a large no:of homogeneous cores 
> where you require fast/frequent loading/unloading of cores . usually a core 
> is required to be loaded just to fire a search query or to just index one 
> document
> The requirements of such a system are.
> * Very efficient loading of cores . Solr cannot afford to read and parse and 
> create Schema, SolrConfig Objects for each core each time the core has to be 
> loaded ( SOLR-919 , SOLR-920)
> * START STOP core . Currently it is only possible to unload a core (SOLR-880)
> * Automatic loading of cores . If a core is present and it is not loaded and 
> a request comes for that load it automatically before serving up a request
> * As there are a large no:of cores , all the cores cannot be kept loaded 
> always. There has to be an upper limit beyond which we need to unload a few 
> cores (probably the least recently used ones)
> * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
> the cores' dataDirs cannot live in the same dir. There is an upper limit on 
> the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482415#comment-13482415
 ] 

Alan Woodward commented on SOLR-1972:
-

I think we'd probably want to keep the percentile list limited to start with.  
People can always ask for improvements later if they need them.

I think this is ready to go in?

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482413#comment-13482413
 ] 

Otis Gospodnetic commented on SOLR-1293:


General comment:
We may want the index/core re-opener to remain aware of previous locations 
(nodes) on which cores were opened for the purposes of reusing any possible 
OS-level caches that may still exist on those nodes for that core.  For 
example, if the cluster has nodes 1-100 and core Foo was on nodes 1, 2, and 3 
before it was closed, then maybe next time it needs to be opened it would 
ideally be opened on those 1, 2, and 3 nodes.  Of course, nodes 1, 2, or 3 may 
no longer be around or may be currently overloaded, or in which case 
alternative nodes need to be picked.


> Support for large no:of cores and faster loading/unloading of cores
> ---
>
> Key: SOLR-1293
> URL: https://issues.apache.org/jira/browse/SOLR-1293
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Reporter: Noble Paul
> Fix For: 4.1
>
> Attachments: SOLR-1293.patch
>
>
> Solr , currently ,is not very suitable for a large no:of homogeneous cores 
> where you require fast/frequent loading/unloading of cores . usually a core 
> is required to be loaded just to fire a search query or to just index one 
> document
> The requirements of such a system are.
> * Very efficient loading of cores . Solr cannot afford to read and parse and 
> create Schema, SolrConfig Objects for each core each time the core has to be 
> loaded ( SOLR-919 , SOLR-920)
> * START STOP core . Currently it is only possible to unload a core (SOLR-880)
> * Automatic loading of cores . If a core is present and it is not loaded and 
> a request comes for that load it automatically before serving up a request
> * As there are a large no:of cores , all the cores cannot be kept loaded 
> always. There has to be an upper limit beyond which we need to unload a few 
> cores (probably the least recently used ones)
> * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
> the cores' dataDirs cannot live in the same dir. There is an upper limit on 
> the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-1972:


Attachment: SOLR-1972_metrics.patch

New patch, adding 75th and 999th percentile, making the stats names less 
insanely long, and adding the metrics- threads to the test excluder thingy.  
All solr-core tests pass.

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972_metrics.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, 
> SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-1531) Provide an option to remove the data directory on core unload

2012-10-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-1531.
--

   Resolution: Duplicate
Fix Version/s: (was: 4.1)
   3.3
   4.0

This was fixed by https://issues.apache.org/jira/browse/SOLR-2610. 

> Provide an option to remove the data directory on core unload
> -
>
> Key: SOLR-1531
> URL: https://issues.apache.org/jira/browse/SOLR-1531
> Project: Solr
>  Issue Type: Improvement
>Reporter: Shalin Shekhar Mangar
> Fix For: 4.0, 3.3
>
> Attachments: SOLR-1531.patch
>
>
> Currently the unload command keeps the core's data on disk even though the 
> details of the core is deleted from configuration. Solr should have an option 
> of cleaning the data directory on unload of a core.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-880) SolrCore should have a a lazy startup option

2012-10-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-880:


Description: 
* a core should have an option of loadOnStartup=true|false. default should be 
true

If there are too many cores (tens of thousands) where each of them may be used 
occassionally, we should not load all of them at once. In the runtime I should 
be able to STOP and START a core on demand. A listing command would let me know 
which one is present and what is up and what is down. A stopped core must not 
use any resource


  was:
* We must have an option to STOP and START a core. 
* a core should have an option of loadOnStartup=true|false. default should be 
true
* A list command which can give the names of all cores and some meta 
information like status

If there are too many cores (tens of thousands) where each of them may be used 
occassionally, we should not load all of them at once. In the runtime I should 
be able to STOP and START a core on demand. A listing command would let me know 
which one is present and what is up and what is down. A stopped core must not 
use any resource


Summary: SolrCore should have a a lazy startup option  (was: SolrCore 
should have a STOP option and a lazy startup option)

Removed STOP from description, functionality is handled by UNLOAD

Broke out the "add a list command" to it's own JIRA, see: 
https://issues.apache.org/jira/browse/SOLR-3980

> SolrCore should have a a lazy startup option
> 
>
> Key: SOLR-880
> URL: https://issues.apache.org/jira/browse/SOLR-880
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Reporter: Noble Paul
>Assignee: Erick Erickson
> Attachments: SOLR-880.patch
>
>
> * a core should have an option of loadOnStartup=true|false. default should be 
> true
> If there are too many cores (tens of thousands) where each of them may be 
> used occassionally, we should not load all of them at once. In the runtime I 
> should be able to STOP and START a core on demand. A listing command would 
> let me know which one is present and what is up and what is down. A stopped 
> core must not use any resource

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3980) Incorporate lazily-loaded cores into core listings for clients

2012-10-23 Thread Erick Erickson (JIRA)
Erick Erickson created SOLR-3980:


 Summary: Incorporate lazily-loaded cores into core listings for 
clients
 Key: SOLR-3980
 URL: https://issues.apache.org/jira/browse/SOLR-3980
 Project: Solr
  Issue Type: Improvement
  Components: multicore, web gui
Affects Versions: 4.1
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.1


Part of SOLR-1293 (supporting lots of cores) will require we do something to 
allow clients (particularly the admin GUI) to get a full list of all possible 
cores, whether they've been loaded or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-880) SolrCore should have a STOP option and a lazy startup option

2012-10-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-880:


Attachment: SOLR-880.patch

First cut at the basics, putting up a preliminary version for comments. 

The general approach here is that, for any lazy cores, keep a separate list of 
SolrCoreDescriptors. When we get a core, if it's  not already loaded, look in 
this separate list and create it at that point.

Note a bunch of things:

1> many of the changes in CoreContainer are that I factored out creating cores 
from local files and Zookeeper into two methods, I was having a hard time 
keeping the zk and non-zk bits separate.

2> There are some TODOs and EOEs that I have to take out.

3> I'm not all that happy with the tests, especially making new config 
directories just for this case with tests. But I was going a bit crazy 
yesterday trying to use the "usual" methods for writing tests, but as far as I 
can tell, there are built-in assumptions in things like TestHarness that don't 
work well with different cores. Any suggestions?

4> All test pass. I fired up an example in our standard multicore system, and 
it's actually kinda cool. The admin console doesn't show the lazy core, but I 
can index to it with post.jar, then the admin screen shows it and I can query 
it. I can shut down and restart and the first query on the lazy core then 
returns results, even though it again isn't in the admin screen.

5> I haven't tested this all that thoroughly, this is preliminary for comments. 
This is part of SOLR-1293.

6> Next up is SOLR-1028, limiting the number of cores that can be loaded 
simultaneously. 

7> I'm quite sure I'll screw up the reference counting and/or there are nooks 
and crannies that I don't even know exist. Please let me know of any off the 
tops of your heads!

8> All tests pass. Can I ship it now? 



> SolrCore should have a STOP option and a lazy startup option
> 
>
> Key: SOLR-880
> URL: https://issues.apache.org/jira/browse/SOLR-880
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Reporter: Noble Paul
>Assignee: Erick Erickson
> Attachments: SOLR-880.patch
>
>
> * We must have an option to STOP and START a core. 
> * a core should have an option of loadOnStartup=true|false. default should be 
> true
> * A list command which can give the names of all cores and some meta 
> information like status
> If there are too many cores (tens of thousands) where each of them may be 
> used occassionally, we should not load all of them at once. In the runtime I 
> should be able to STOP and START a core on demand. A listing command would 
> let me know which one is present and what is up and what is down. A stopped 
> core must not use any resource

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3979) slf4j bindings other than jdk -- cannot change log levels

2012-10-23 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-3979:
--

 Summary: slf4j bindings other than jdk -- cannot change log levels
 Key: SOLR-3979
 URL: https://issues.apache.org/jira/browse/SOLR-3979
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1


Once I finally got log4j logging working, I was slightly surprised by the 
message related to SOLR-3426.  I did not really consider that to be a big deal, 
because if I want to look at my log, I'll be on the commandline anyway.

I was even more surprised to find that I cannot change any of the log levels 
from the admin gui.  My default log level is WARN for performance reasons, but 
every once in a while I like to bump the log level to INFO to troubleshoot a 
specific problem, then turn it back down.  This is very easy with jdk logging 
in either 3.x or 4.0.  I changed to log4j because it easily allows me to put 
the date of a log message on the same line as the first line of the actual log 
message, so when I grep for things, I have the timestamp in the grep output.

Currently the only way for me to change my log level is by updating 
log4j.properties and restarting Solr.  If the capability to figure this out on 
a class-by-class basis isn't there with log4j, I would at least like to be able 
to set the root logging level.  Is that possible?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482307#comment-13482307
 ] 

Robert Muir commented on LUCENE-4498:
-

Committed to trunk. will give that flonkings builder some time...

> pulse docfreq=1 DOCS_ONLY for 4.1 codec
> ---
>
> Key: LUCENE-4498
> URL: https://issues.apache.org/jira/browse/LUCENE-4498
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Robert Muir
> Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, 
> LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch
>
>
> We have pulsing codec, but currently this has some downsides:
> * its very general, wrapping an arbitrary postingsformat and pulsing 
> everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
> * reuse is hairy: because it specializes its enums based on these cutoffs, 
> when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
> avoid the worst cases where we clone indexinputs for tons of terms.
> On the other hand the way the 4.1 codec encodes "primary key" fields is 
> pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
> which tells us where to seek in the .doc to read our one lonely vint.
> I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
> write the lone doc delta where we would write docStartFP. 
> We can avoid the hairy reuse problem too, by just supporting this in 
> refillDocs() in BlockDocsEnum instead of specializing.
> This would remove the additional seek for "primary key" fields without really 
> any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3966) LangID not to log WARN

2012-10-23 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-3966:


Attachment: SOLR-3966-trunk-1.patch

> LangID not to log WARN
> --
>
> Key: SOLR-3966
> URL: https://issues.apache.org/jira/browse/SOLR-3966
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.0
>Reporter: Markus Jelsma
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-3966-trunk-1.patch
>
>
> The LangID UpdateProcessor emits the warning below for documents that do not 
> contain an input field. The level should go to DEBUG or be removed. It is not 
> uncommon to see a log full of these messages just because not all documents 
> contain all the fields we're mapping. 
> {code}Oct 19, 2012 11:23:43 AM 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor process
> WARNING: Document  does not contain input field . Skipping 
> this{code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3966) LangID not to log WARN

2012-10-23 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-3966:


Attachment: (was: SOLR-3966-trunk-1.patch)

> LangID not to log WARN
> --
>
> Key: SOLR-3966
> URL: https://issues.apache.org/jira/browse/SOLR-3966
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.0
>Reporter: Markus Jelsma
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-3966-trunk-1.patch
>
>
> The LangID UpdateProcessor emits the warning below for documents that do not 
> contain an input field. The level should go to DEBUG or be removed. It is not 
> uncommon to see a log full of these messages just because not all documents 
> contain all the fields we're mapping. 
> {code}Oct 19, 2012 11:23:43 AM 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor process
> WARNING: Document  does not contain input field . Skipping 
> this{code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4494) Add phoenetic algorithm Match Rating approach to lucene

2012-10-23 Thread Colm Rice (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colm Rice updated LUCENE-4494:
--

Attachment: LUCENE-4494.patch

Match Rating Approach (MRA) phonetic algorithm & associated tests. 
I hope... 

:-)

> Add phoenetic algorithm Match Rating approach to lucene
> ---
>
> Key: LUCENE-4494
> URL: https://issues.apache.org/jira/browse/LUCENE-4494
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Colm Rice
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4494.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I want to add MatchRatingApproach algorithm to the Lucene project. 
> What I have at the moment is a class called 
> org.apache.lucene.analysis.phoenetic.MatchRatingApproach implementing 
> StringEncoder
> I have a pretty comprehensive test file located at: 
> org.apache.lucene.analysis.phonetic.MatchRatingApproachTests
> It's not exactly existing pattern so I'm going to need a bit of advice here. 
> Thanks! Feel free to email.
> FYI: It my first contribitution so be gentle :-) C# is my native.
> Reference: http://en.wikipedia.org/wiki/Match_rating_approach

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3966) LangID not to log WARN

2012-10-23 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-3966:


Attachment: SOLR-3966-trunk-1.patch

Patch removing the warning.

> LangID not to log WARN
> --
>
> Key: SOLR-3966
> URL: https://issues.apache.org/jira/browse/SOLR-3966
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.0
>Reporter: Markus Jelsma
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-3966-trunk-1.patch
>
>
> The LangID UpdateProcessor emits the warning below for documents that do not 
> contain an input field. The level should go to DEBUG or be removed. It is not 
> uncommon to see a log full of these messages just because not all documents 
> contain all the fields we're mapping. 
> {code}Oct 19, 2012 11:23:43 AM 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor process
> WARNING: Document  does not contain input field . Skipping 
> this{code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 71 - Failure

2012-10-23 Thread Michael McCandless
On Tue, Oct 23, 2012 at 2:41 AM, Uwe Schindler  wrote:

> Ah, we got a hprof heap dump... :-)

YAY!

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3885) audit solr DocumentBuilder logic & tests

2012-10-23 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482240#comment-13482240
 ] 

Toke Eskildsen commented on SOLR-3885:
--

It seems that there are indeed problems with copyFields and that it is a 
blocker for using docBoosts with the standard practice catch-all copyField. I 
have updated SOLR-3875 with a description of the problem.

> audit solr DocumentBuilder logic & tests
> 
>
> Key: SOLR-3885
> URL: https://issues.apache.org/jira/browse/SOLR-3885
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.1
>
>
> Spun off of SOLR-3875: it would be good to audit DocumentBuilder carefully 
> and ensure that there are adequate tests for the various edge cases (ie: 
> docboosts, copyfield, multivalued fields, various combinations of, etc..) and 
> special types of fields (ie: polyfields).
> There also seems to be some dead code here that can likely be cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3875) Document boost does not work correctly when using multi-valued fields

2012-10-23 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482233#comment-13482233
 ] 

Toke Eskildsen commented on SOLR-3875:
--

Unfortunately, the bug is only partly solved. Thomas and I encountered strange 
scores again. While boosting of multi-value fields is handled correctly in Solr 
4.0.0, boosting for copyFields are not. A sample document:

{code}
   
  Insane score Example. Score = 10E9 
  Document boost broken for copyFields
  video ThomasEgense and Toke Eskildsen
  Test
  bug
  something else
  bug
  bug
  
{code}

The fields _name_, _manu_, _cat_, _features_, keywords and _content_ gets 
copied to text and a search for thomasegense matches the text-field with query 
explanation

{code}
70384.67 = (MATCH) weight(text:thomasegense in 0) [DefaultSimilarity], result 
of:
  70384.67 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
0.30685282 = idf(docFreq=1, maxDocs=1)
229376.0 = fieldNorm(doc=0)
{code}

If the two last fields _keywords_ and _content_ are removed from the sample 
document, the score is reduced by a factor 100 (docBoost^2).

The current DocumentBuilder 
https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java?revision=1389648&view=markup
 works roughly like this:

{code}
foreach (field) {
  boost = docBoost*fieldBoost
  foreach (value) {
assignField(field, value, boost)
foreach (copyField) {
  assignField(copyField, value, boost)
}
boost = 1f
  }
}
{code}

When all fields share the same copyField (_text_ in this example), the 
copyField will have the full boost assigned for each directly specified field 
which uses that copyField. That's 5 times with the provided sample, so the 
total boost for the field _text_ will be 10^5.

One solution would be to keep track of used fields (directly specified as well 
as copyFields) and only assign the full boost once per document. If the number 
of unique fields/document is low, a simple list would probably be the fastest 
and with low GC impact. For a higher number of unique fields, a Set might be 
better. An optimization would be to only create the tracking structure once a 
boost != 1.0f is encountered and only store the fields with boost != 1.0f, so 
that an update without boosts would not get a performance penalty.

> Document boost does not work correctly when using multi-valued fields
> -
>
> Key: SOLR-3875
> URL: https://issues.apache.org/jira/browse/SOLR-3875
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis, update
>Affects Versions: 4.0-BETA
>Reporter: Toke Eskildsen
>Assignee: Hoss Man
>Priority: Critical
> Fix For: 4.0, 4.1, 5.0
>
> Attachments: SOLR-3875.patch
>
>
> In Solr 4 BETA & trunk, document boosts skews the ranking for documents with 
> multi value fields tremendously. A document boost of 5 combined with 15 
> values in a multi value field results in scores above 1,000,000,000, while a 
> boost of 0,5 results in scores below 0,001. The error is not present in Solr 
> 3.6.
> Thomas Egense and I have tracked it down to a change in Solr DocumentBuilder 
> committed 20110827 (@1162347) by Mike McCandless, as part of work done on 
> LUCENE-2308. The problem is that Lucene multiplies the boosts of multiple 
> instances of the same field when updating the index.
> The old DocumentBuilder, used in Lucene 3.6, handled this by calculating the 
> score for the field (docBoost*fieldBoost) and assigning it to the first 
> instance of the field, then setting the boost to 1.0f and assigning that to 
> subsequent instances of the field. This effectively assigned 
> docBoost*fieldBoost to the field, regardless of the number of instances.
> The updated DocumentBuilder (see 
> https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_0/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java?revision=1388778&view=markup),
>  used in Lucene 4 BETA & trunk, also assigns docBoost*fieldBoost to the first 
> instance of the field. Then it sets fieldBoost = docBoost and continues to 
> assign docBoost*fieldBoost to subsequent instances. Using the example 
> mentioned above, the generated IndexableFields will get assigned boosts of 5, 
> 5*5, 5*5... 5*5. As Lucene multiplies all the values, 15 instances of the 
> same field will have a collective boost of 5*25^14.
> This can be demonstrated with the Solr tutorial example by indexing the 
> sample documents and adding the document 
> {code:xml}
> 
> 
>   Insane score Example. Score = 10E9 
>   Document boost broken for multivalued fields
>   Thomas Egense and Toke Eskildsen
>   Test
>   bug
>   insane_boos

[jira] [Commented] (SOLR-3978) CoreAdmin - configName definition

2012-10-23 Thread Gianluca Varisco (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482176#comment-13482176
 ] 

Gianluca Varisco commented on SOLR-3978:


JAVA_OPTIONS="-Dsolr.solr.home=/opt/solr-3.6.1/staging/ -XX:+DisableExplicitGC 
-Xms8192M -Xmx8192M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled"

> CoreAdmin - configName definition
> -
>
> Key: SOLR-3978
> URL: https://issues.apache.org/jira/browse/SOLR-3978
> Project: Solr
>  Issue Type: Bug
>  Components: multicore
> Environment: * Solr 3.6.1
> * Jetty 8.1.5.v20120716
>Reporter: Gianluca Varisco
>Priority: Minor
>
> Hello,
> I'm trying to define a bunch of cores as follows:
>  dataDir="/opt/solr-3.6.1/staging/venus/data/" 
> configName="/shop/www/htdocs/venus/shop.staging/solr/app/conf/solrconfig.xml" 
> schemaName="/shop/www/htdocs/venus/shop.staging/solr/app/conf/schema.xml" />
> Is it possible to point configName and schemaName to a different path? It 
> works if conf/solrconfig.xml is added in /opt/solr-3.6.1/staging/venus/
> Am I missing something? Trace output is attached.
> SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in 
> classpath or '/opt/solr-3.6.1/staging/venus/conf/', cwd=/opt/jetty/staging
>   at 
> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:273)
>   at 
> org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:239)
>   at org.apache.solr.core.Config.(Config.java:141)
>   at org.apache.solr.core.SolrConfig.(SolrConfig.java:138)
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:452)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)
>   at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
>   at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719)
>   at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
>   at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1233)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:701)
>   at 
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:475)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>   at 
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
>   at 
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
>   at 
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
>   at 
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
>   at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
>   at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
>   at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
>   at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
>   at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
>   at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>   at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>   at 
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
>   at 
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>   at 
> org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:75)
>   at 
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
>   at org.eclipse.jetty.server.Server.doStart(Server.java:272)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
>   at 
> org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1260)
>   

[jira] [Created] (SOLR-3978) CoreAdmin - configName definition

2012-10-23 Thread Gianluca Varisco (JIRA)
Gianluca Varisco created SOLR-3978:
--

 Summary: CoreAdmin - configName definition
 Key: SOLR-3978
 URL: https://issues.apache.org/jira/browse/SOLR-3978
 Project: Solr
  Issue Type: Bug
  Components: multicore
 Environment: * Solr 3.6.1
* Jetty 8.1.5.v20120716
Reporter: Gianluca Varisco
Priority: Minor


Hello,

I'm trying to define a bunch of cores as follows:



Is it possible to point configName and schemaName to a different path? It works 
if conf/solrconfig.xml is added in /opt/solr-3.6.1/staging/venus/

Am I missing something? Trace output is attached.

SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in 
classpath or '/opt/solr-3.6.1/staging/venus/conf/', cwd=/opt/jetty/staging
at 
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:273)
at 
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:239)
at org.apache.solr.core.Config.(Config.java:141)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:138)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:452)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at 
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719)
at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
at 
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:701)
at 
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:475)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at 
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
at 
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
at 
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
at 
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at 
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
at 
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at 
org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:75)
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
at org.eclipse.jetty.server.Server.doStart(Server.java:272)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at 
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1260)
at java.security.AccessController.doPrivileged(Native Method)
at 
org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1183)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:462)
at org.eclipse.jetty.start.Main.start(Main.java:610)
at org.eclipse.jetty.start.Main.main(Main.java:86)

--
This message is automatically gen

[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-23 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482171#comment-13482171
 ] 

Shawn Heisey commented on SOLR-1972:


After poking around a lot looking for a way to bump the reservoir size, I 
finally came across the paper on reservoir sampling by Vitter.  After even more 
poking around, I think I get it now.  Their small reservoir apparently really 
does give statistically relevant results over millions or billions of total 
samples.  If it didn't give them numbers they could use, they would have 
already made it larger.

Do you think it's worthwhile to give people the ability to customize the 
percentile list -- turn some of the standard percentiles off, and/or add custom 
ones?  As soon as we conclude that including the full predefined set won't 
present a performance problem because it only gets calculated when the admin 
GUI is accessed, there'll be someone who has created hundreds of request 
handlers and polls the statistics for all of them once a minute.  I can also 
see someone wanting to see the 12th and 87th percentiles for some reason 
neither of us can fathom, but makes perfect sense to them.


> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
> Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org