[jira] [Updated] (SOLR-5247) Custom per core properties not persisted on API CREATE with new-style solr.xml
[ https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris F updated SOLR-5247: -- Description: This part has been solved. See comments When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties file. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? was: When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties file. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? Custom per core properties not persisted on API CREATE with new-style solr.xml -- Key: SOLR-5247 URL: https://issues.apache.org/jira/browse/SOLR-5247 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.4 Reporter: Chris F Priority: Critical Labels: 4.4, core.properties, discovery, new-style, property, solr.xml This part has been solved. See comments When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties file. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? -- This message
[jira] [Updated] (SOLR-5247) Custom per core properties not persisted on API CREATE with new-style solr.xml
[ https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris F updated SOLR-5247: -- Summary: Custom per core properties not persisted on API CREATE with new-style solr.xml (was: Support for custom per core properties missing with new-style solr.xml) Custom per core properties not persisted on API CREATE with new-style solr.xml -- Key: SOLR-5247 URL: https://issues.apache.org/jira/browse/SOLR-5247 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.4 Reporter: Chris F Priority: Critical Labels: 4.4, core.properties, discovery, new-style, property, solr.xml When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties file. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5247) Support for custom per core properties missing with new-style solr.xml
[ https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris F updated SOLR-5247: -- Summary: Support for custom per core properties missing with new-style solr.xml (was: Custom per core properties not persisted on API CREATE with new-style solr.xml) Support for custom per core properties missing with new-style solr.xml -- Key: SOLR-5247 URL: https://issues.apache.org/jira/browse/SOLR-5247 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.4 Reporter: Chris F Priority: Critical Labels: 4.4, core.properties, discovery, new-style, property, solr.xml This part has been solved. See comments When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties file. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5247) Support for custom per core properties missing with new-style solr.xml
[ https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris F updated SOLR-5247: -- Priority: Trivial (was: Critical) Description: This part has been solved. See comments When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties. However, I can use it in schema.xml. But only until restarting the servlet container. After that, the property is lost. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? was: This part has been solved. See comments When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties file. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? Support for custom per core properties missing with new-style solr.xml -- Key: SOLR-5247 URL: https://issues.apache.org/jira/browse/SOLR-5247 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.4 Reporter: Chris F Priority: Trivial Labels: 4.4, core.properties, discovery, new-style, property, solr.xml This part has been solved. See comments When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code}
Re: [VOTE] Release Lucene/Solr 4.5.0 RC1
Hi Chris, On Fri, Sep 20, 2013 at 2:33 AM, Chris Hostetter hossman_luc...@fucit.org wrote: I *think* this means that we just need to backport r1522884 to the 4_5 branch, but i don't think we need a re-spin. Thanks for reporting this error. I agree this doesn't need a respin, especially given that the fix is to ignore the javadoc bug on the checker side. I'll backport the commit to lucene_solr_4_5. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 4.5.0 RC1
On Fri, Sep 20, 2013 at 9:20 AM, Adrien Grand jpou...@gmail.com wrote: I'll backport the commit to lucene_solr_4_5. Oh, I see you have already done that, thanks! -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5109) EliasFano value index
[ https://issues.apache.org/jira/browse/LUCENE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772858#comment-13772858 ] Adrien Grand commented on LUCENE-5109: -- Thanks for the update, this looks interesting! bq. use this to implement EliasFanoValueIndexedDocIdSet, test, maybe benchmark This can be useful to test the overhead of the index compared to EliasFanoDocIdSet but given that we are probably going to want an index almost everytime, maybe we could just make EliasFanoDocIdSet use an index by default, potentially giving the ability to disable indexing by passing indexInterval=Integer.MAX_VALUE (like the other sets). bq. add broadword bit selection I'm looking forward to it! EliasFano value index - Key: LUCENE-5109 URL: https://issues.apache.org/jira/browse/LUCENE-5109 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Paul Elschot Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5109.patch, LUCENE-5109.patch Index upper bits of Elias-Fano sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5123) invert the codec postings API
[ https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772862#comment-13772862 ] Han Jiang commented on LUCENE-5123: --- Nice change! Although PushFieldsConsumer is still using the old API, I like the migrating of flush() logic from FreqProxTermsWriterPerField to PushFieldsConsumer, the calling chain is more clear in codec level now. :) Also, I'm quite curious whether StoredFields and TermVectors will get rid of 'merge()' later. invert the codec postings API - Key: LUCENE-5123 URL: https://issues.apache.org/jira/browse/LUCENE-5123 Project: Lucene - Core Issue Type: Wish Reporter: Robert Muir Assignee: Michael McCandless Fix For: 5.0 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. FreqProxTermsWriter streams the postings at flush, and the default merge() takes the incoming codec api and filters out deleted docs and pushes via same api (but that can be overridden). It could be cleaner if we allowed for a pull model instead (like DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of itself and just passed this to the codec consumer. This would give the codec more flexibility to e.g. do multiple passes if it wanted to do things like encode high-frequency terms more efficiently with a bitset-like encoding or other things... A codec can try to do things like this to some extent today, but its very difficult (look at buffering in Pulsing). We made this change with DV and it made a lot of interesting optimizations easy to implement... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Can we use TREC data set in open source?
I read here http://lemurproject.org/clueweb09/ that there is a hosted version of ClueWeb09 (the latest is ClueWeb12, for which I don't find a hosted version), and to get access to it, someone from the ASF will need to sign an Organizational Agreement with them as well as each individual in the project will need to sign an Individual Agreement (retained by the ASF). Perhaps this can be available only to committers. This is nice! I'll try to ask ASF about this. To this day, I think the only way it will happen is for the community to build a completely open system, perhaps based off of Common Crawl or our own crawl and host it ourselves and develop judgments, etc. Yeah, this is what we need in ORP. Most people like the idea, but are not sure how to distribute it in an open way (ClueWeb comes as 4 1TB disks right now) and I am also not sure how they would handle any copyright/redaction claims against it. There is, of course, little incentive for those involved to solve these, either, as most people who are interested sign the form and pay the $600 for the disks. Sigh, yes, it is hard to make a data set totally public. Actually, one of my purpose in this question is to see whether it is acceptable in our community (i.e. lucene/solr only) to obtain a data set not open to all people. When expand to a larger scope, the license issue is somewhat hairy... And since Shai has found a possible 'free' data set, I think it is possible for ASF to obtain an Organizational Agreement for this. I'll try to contact ASF CMU about how they define person with the authority in OSS. On Tue, Sep 17, 2013 at 6:11 AM, Grant Ingersoll gsing...@apache.orgwrote: Inline below On Sep 9, 2013, at 10:53 PM, Han Jiang jiangha...@gmail.com wrote: Back in 2007 Grant contacted with NIST about making TREC collection available to our community: http://mail-archives.apache.org/mod_mbox/lucene-dev/200708.mbox/browser I think a try for this is really important to our project and people who use Lucene. All these years the speed performance is mainly tuned on Wikipedia, however it's not very 'standard': * it doesn't represent how real-world search works; * it cannot be used to evaluate the relevance of our scoring models; * researchers tend to do experiments on other data sets, and usually it is hard to know whether Lucene performs its best performance; And personally I agree with this line: I think it would encourage Lucene users/developers to think about relevance as much as we think about speed. There's been much work to make Lucene's scoring models pluggable in 4.0, and it'll be great if we can explore more about it. It is very appealing to see a high-performance library work along with state-of-the-art ranking methods. And about TREC data set, the problems we met are: 1. NIST/TREC does not own the original collections, therefore it might be necessary to have direct contact with those organizations who really did, such as: http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html http://lemurproject.org/clueweb12/ 2. Currently, there is no open-source license for any of the data sets, so it won't be as 'open' as Wikipedia is. As is proposed by Grant, a possibility is to make the data set accessible only to committers instead of all users. It is not very open-source then, but TREC data sets is public and usually available to researchers, so people can still reproduce performance test. I'm quite curious, has anyone explored getting an open-source license for one of those data sets? And is our community still interested about this issue after all these years? It continues to be of interest to me. I've had various conversations throughout the years on it. Most people like the idea, but are not sure how to distribute it in an open way (ClueWeb comes as 4 1TB disks right now) and I am also not sure how they would handle any copyright/redaction claims against it. There is, of course, little incentive for those involved to solve these, either, as most people who are interested sign the form and pay the $600 for the disks. I've had a number of conversations about how I view this to be a significant barrier to open research, esp. in under-served countries and to open source. People sympathize with me, but then move on. To this day, I think the only way it will happen is for the community to build a completely open system, perhaps based off of Common Crawl or our own crawl and host it ourselves and develop judgments, etc. We tried to get this off the ground w/ the Open Relevance Project, but there was never a sustainable effort, and thus I have little hope at this point for it (but I would love to be proven wrong) For it to succeed, I think we would need the backing of a University with students interested in curating such a collection, the judgments, etc. I think we could figure out how to distribute the data either
[jira] [Updated] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5215: --- Attachment: LUCENE-5215.patch Fixed BasePostingsFormatTestCase to initialize Lucene46Codec (not 45). It was the last piece of code which still used the now deprecated Lucene45. All Lucene and Solr tests pass, so I think this is ready. BTW, I noticed that TestBackCompat suppresses Lucene41 and Lucene42. I ran it with -Dtests.codec=Lucene45 and it passed, so I'm not sure if I should add the now deprecated Lucene45Codec to the suppress list? Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5123) invert the codec postings API
[ https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772939#comment-13772939 ] Michael McCandless commented on LUCENE-5123: Thanks Han, I do like the new API better ... I don't think we need go get rid of merge() for stored fields / term vectors, at least not yet ... invert the codec postings API - Key: LUCENE-5123 URL: https://issues.apache.org/jira/browse/LUCENE-5123 Project: Lucene - Core Issue Type: Wish Reporter: Robert Muir Assignee: Michael McCandless Fix For: 5.0 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. FreqProxTermsWriter streams the postings at flush, and the default merge() takes the incoming codec api and filters out deleted docs and pushes via same api (but that can be overridden). It could be cleaner if we allowed for a pull model instead (like DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of itself and just passed this to the codec consumer. This would give the codec more flexibility to e.g. do multiple passes if it wanted to do things like encode high-frequency terms more efficiently with a bitset-like encoding or other things... A codec can try to do things like this to some extent today, but its very difficult (look at buffering in Pulsing). We made this change with DV and it made a lot of interesting optimizations easy to implement... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Difference between CustomScoreProvider, FunctionQuery and Expression
Hi In an attempt to understand how to do document-level boosting (following this thread http://mail-archives.apache.org/mod_mbox/lucene-java-user/201302.mbox/%3c51221bbf.8040...@fastmail.fm%3E), I experimented with the 3 easiest ways that currently exist in Lucene (that I'm aware of, maybe there are more): two of them use CustomScoreQuery and the third uses the new Expression module. I created a simple index with two documents with the field f and value test doc (for both). I also added the field boost with values 1L (doc-0) and 2L (doc-1). I then searched using each method and got different results w.r.t. computed scores: *CustomScoreProvider * As far as I understand, you should override CustomScoreQuery.getCustomScoreProvider if you want to apply a different function than score*boost (e.g score^boost) to the documents. Nevertheless, nothing prevents you from giving a CustomScoreProvider which reads from the 'boost' field and does the multiplication (since it receives the AtomicReaderContext). I wrote one and the result scores are: search CustomScoreProvider doc=1, score=0.74316853 doc=0, score=0.37158427 *FunctionQuery * I wasn't able to find a ValueSource which reads from an NDV field, so I wrote a NumericDocValuesFieldSource which returns a LongValues that reads from the NumericDocValues (if there isn't indeed one, I can open an issue to add it). The result scores are: search NumericDocValuesFieldSource doc=1, score=0.32644913 doc=0, score=0.16322456 *Expression * I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN), so I'm ok with it not factoring in queryWeight (maybe we could implement such expression?). What I like about it is that I didn't have to implement anything (e.g. NumericDocValuesFieldSource or CSProvider) - it just worked. And if all you care about is the order of results, it gets the job done. So between FunctionQuery and CustomScoreProvider, which is the correct way to boost a document by an NDV field? I think FunctionQuery? Separately, I think we can improve CSQ.getCSProvider jdocs. They say: The default implementation returns a default implementation as specified in the docs of CustomScoreProvider but the jdocs of CSP don't mention it multiplies. Shai
Re: [VOTE] Release Lucene/Solr 4.5.0 RC1
+1, smoke tester is happy for me (Windows 7, 64-bit). Shai On Fri, Sep 20, 2013 at 10:26 AM, Adrien Grand jpou...@gmail.com wrote: On Fri, Sep 20, 2013 at 9:20 AM, Adrien Grand jpou...@gmail.com wrote: I'll backport the commit to lucene_solr_4_5. Oh, I see you have already done that, thanks! -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
I think that I actually abused CSProvider, and it's not supposed to be used that way. It really is supposed to be used when you want to apply different combination on the two scores. While nothing prevents you from reading the scores from a different source, it's better to implement that capability through a custom ValueSource. So maybe we should put such a note on CSProvider jdocs... Shai On Fri, Sep 20, 2013 at 3:01 PM, Shai Erera ser...@gmail.com wrote: Hi In an attempt to understand how to do document-level boosting (following this thread http://mail-archives.apache.org/mod_mbox/lucene-java-user/201302.mbox/%3c51221bbf.8040...@fastmail.fm%3E), I experimented with the 3 easiest ways that currently exist in Lucene (that I'm aware of, maybe there are more): two of them use CustomScoreQuery and the third uses the new Expression module. I created a simple index with two documents with the field f and value test doc (for both). I also added the field boost with values 1L (doc-0) and 2L (doc-1). I then searched using each method and got different results w.r.t. computed scores: *CustomScoreProvider * As far as I understand, you should override CustomScoreQuery.getCustomScoreProvider if you want to apply a different function than score*boost (e.g score^boost) to the documents. Nevertheless, nothing prevents you from giving a CustomScoreProvider which reads from the 'boost' field and does the multiplication (since it receives the AtomicReaderContext). I wrote one and the result scores are: search CustomScoreProvider doc=1, score=0.74316853 doc=0, score=0.37158427 *FunctionQuery * I wasn't able to find a ValueSource which reads from an NDV field, so I wrote a NumericDocValuesFieldSource which returns a LongValues that reads from the NumericDocValues (if there isn't indeed one, I can open an issue to add it). The result scores are: search NumericDocValuesFieldSource doc=1, score=0.32644913 doc=0, score=0.16322456 *Expression * I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN), so I'm ok with it not factoring in queryWeight (maybe we could implement such expression?). What I like about it is that I didn't have to implement anything (e.g. NumericDocValuesFieldSource or CSProvider) - it just worked. And if all you care about is the order of results, it gets the job done. So between FunctionQuery and CustomScoreProvider, which is the correct way to boost a document by an NDV field? I think FunctionQuery? Separately, I think we can improve CSQ.getCSProvider jdocs. They say: The default implementation returns a default implementation as specified in the docs of CustomScoreProvider but the jdocs of CSP don't mention it multiplies. Shai
Re: need doc assist for 4.5: clarify SOLR-4221 changes regarding routeField vs router.field ?
I notice that Noble updated the Collections API page with the information that was needed - thank you. Based on that, I updated this page: https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Yonik or Noble, if you one of you would look the section on Document Routing over, I would appreciate it. I adapted the content that was there to fit these new options, but am not entirely sure I have it right. Thanks, Cassandra On Thu, Sep 19, 2013 at 12:41 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Yonik / Noble / Shalin in particular: we need clarification here on these changes for 4.5... https://issues.apache.org/jira/browse/SOLR-4221?focusedCommentId=13769675page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769675 Cassandra and i were talking on IRC this morning about the satate of the ref guide -- our opinion is that in terms of changes for 4.5, things look pretty good and we could probably go ahead and do an RC in parallel ith the code RC1 that Adrien is currently re-spinning (which might even allow us to release/announce the ref guide in the same email as the code release itself) But the one blocker is this change discussed at the end of SOLR-4221 regarding teh routeField param. Noble previously updated the ref guide documentation to include routerField... https://cwiki.apache.org/confluence/display/solr/Collections+API ...but it's not currently clear to cassandra or myself if that documentation is still accurate -- should the refrences to routeField be replaced by router.field ? does hte documentation need to generally be improved to refer to supporting a generic set of router.* params that are user defined? throw us a bone here guys. Docs on new features are probably the most important part of the user guide updates, and inaccurate docs on new features is worse then no doc at all. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 386 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/386/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestNumericDocValuesUpdates.testStressMultiThreading Error Message: Captured an uncaught exception in thread: Thread[id=3130, name=UpdateThread-1, state=RUNNABLE, group=TGRP-TestNumericDocValuesUpdates] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=3130, name=UpdateThread-1, state=RUNNABLE, group=TGRP-TestNumericDocValuesUpdates] Caused by: java.lang.OutOfMemoryError: Java heap space at __randomizedtesting.SeedInfo.seed([EA6A02F6820CB4B7]:0) at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuilder.append(StringBuilder.java:132) at java.lang.StringBuilder.append(StringBuilder.java:128) at java.util.AbstractCollection.toString(AbstractCollection.java:450) at java.lang.String.valueOf(String.java:2854) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4239) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2834) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2922) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2897) at org.apache.lucene.index.TestNumericDocValuesUpdates$2.run(TestNumericDocValuesUpdates.java:957) Build Log: [...truncated 1672 lines...] [junit4] Suite: org.apache.lucene.index.TestNumericDocValuesUpdates [junit4] 2 9 20, 0025 3:45:48 ?? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2 WARNING: Uncaught exception in thread: Thread[UpdateThread-1,5,TGRP-TestNumericDocValuesUpdates] [junit4] 2 java.lang.OutOfMemoryError: Java heap space [junit4] 2at __randomizedtesting.SeedInfo.seed([EA6A02F6820CB4B7]:0) [junit4] 2at java.util.Arrays.copyOf(Arrays.java:2367) [junit4] 2at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) [junit4] 2at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) [junit4] 2at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) [junit4] 2at java.lang.StringBuilder.append(StringBuilder.java:132) [junit4] 2at java.lang.StringBuilder.append(StringBuilder.java:128) [junit4] 2at java.util.AbstractCollection.toString(AbstractCollection.java:450) [junit4] 2at java.lang.String.valueOf(String.java:2854) [junit4] 2at java.lang.StringBuilder.append(StringBuilder.java:128) [junit4] 2at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4239) [junit4] 2at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2834) [junit4] 2at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2922) [junit4] 2at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2897) [junit4] 2at org.apache.lucene.index.TestNumericDocValuesUpdates$2.run(TestNumericDocValuesUpdates.java:957) [junit4] 2 [junit4] 2 9 20, 0025 3:47:09 ?? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2 WARNING: Uncaught exception in thread: Thread[UpdateThread-3,5,TGRP-TestNumericDocValuesUpdates] [junit4] 2 java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit [junit4] 2at __randomizedtesting.SeedInfo.seed([EA6A02F6820CB4B7]:0) [junit4] 2at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2750) [junit4] 2at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2922) [junit4] 2at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2897) [junit4] 2at org.apache.lucene.index.TestNumericDocValuesUpdates$2.run(TestNumericDocValuesUpdates.java:957) [junit4] 2 [junit4] 2 9 20, 0025 3:47:32 ?? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2 WARNING: Uncaught exception in thread: Thread[UpdateThread-8,5,TGRP-TestNumericDocValuesUpdates] [junit4] 2 java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit [junit4] 2at __randomizedtesting.SeedInfo.seed([EA6A02F6820CB4B7]:0) [junit4] 2at
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote: Expression I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN) Why does that come as a surprise to you? Pass true to indexsearcher to get the documents score back here. === Release 2.9.0 2009-09-23 === Changes in backwards compatibility policy LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. ... (Shai Erera via Mike McCandless) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: need doc assist for 4.5: clarify SOLR-4221 changes regarding routeField vs router.field ?
OK, I was just reviewing some of the router code changes (better late than never...) ImplicitDocIdRouter has this: if(shard == null) shard =params.get(_shard_); //deperecated for back compat Also, it looks like route.field can be specified for the compositeId rotuer as well. I'll update that page. -Yonik http://lucidworks.com On Fri, Sep 20, 2013 at 8:41 AM, Cassandra Targett casstarg...@gmail.com wrote: I notice that Noble updated the Collections API page with the information that was needed - thank you. Based on that, I updated this page: https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Yonik or Noble, if you one of you would look the section on Document Routing over, I would appreciate it. I adapted the content that was there to fit these new options, but am not entirely sure I have it right. Thanks, Cassandra On Thu, Sep 19, 2013 at 12:41 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Yonik / Noble / Shalin in particular: we need clarification here on these changes for 4.5... https://issues.apache.org/jira/browse/SOLR-4221?focusedCommentId=13769675page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769675 Cassandra and i were talking on IRC this morning about the satate of the ref guide -- our opinion is that in terms of changes for 4.5, things look pretty good and we could probably go ahead and do an RC in parallel ith the code RC1 that Adrien is currently re-spinning (which might even allow us to release/announce the ref guide in the same email as the code release itself) But the one blocker is this change discussed at the end of SOLR-4221 regarding teh routeField param. Noble previously updated the ref guide documentation to include routerField... https://cwiki.apache.org/confluence/display/solr/Collections+API ...but it's not currently clear to cassandra or myself if that documentation is still accurate -- should the refrences to routeField be replaced by router.field ? does hte documentation need to generally be improved to refer to supporting a generic set of router.* params that are user defined? throw us a bone here guys. Docs on new features are probably the most important part of the user guide updates, and inaccurate docs on new features is worse then no doc at all. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5228) IndexWriter.addIndexes copies raw files but acquires no locks
Robert Muir created LUCENE-5228: --- Summary: IndexWriter.addIndexes copies raw files but acquires no locks Key: LUCENE-5228 URL: https://issues.apache.org/jira/browse/LUCENE-5228 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir I see stuff like: merge problem with lucene 3 and 4 indices (from solr users list), and cannot even think how to respond to these users because so many things can go wrong with IndexWriter.addIndexes(Directory) it currently has in its javadocs: NOTE: the index in each Directory must not be changed (opened by a writer) while this method is running. This method does not acquire a write lock in each input Directory, so it is up to the caller to enforce this. This method should be acquiring locks: its copying *RAW FILES*. Otherwise we should remove it. If someone doesnt like that, or is mad because its 10ns slower, they can use NoLockFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: need doc assist for 4.5: clarify SOLR-4221 changes regarding routeField vs router.field ?
On Fri, Sep 20, 2013 at 10:15 AM, Yonik Seeley yo...@lucidworks.com wrote: Also, it looks like route.field can be specified for the compositeId rotuer as well. Actually, I decided to leave that out of the docs since on further review the implementation looks incorrect (or perhaps I don't understand the intended API). We can doc it in a future release once it's nailed down. -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
Yes, you're right, but that's unrelated to this thread. I passed doScore=true and the scores come out the same, meaning Expression didn't affect the actual score, only the sort-by value (which is ok). search Expression doc=1, score=0.37158427, field=0.7431685328483582 doc=0, score=0.37158427, field=0.3715842664241791 Shai On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote: Expression I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN) Why does that come as a surprise to you? Pass true to indexsearcher to get the documents score back here. === Release 2.9.0 2009-09-23 === Changes in backwards compatibility policy LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. ... (Shai Erera via Mike McCandless) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5123) invert the codec postings API
[ https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773028#comment-13773028 ] Robert Muir commented on LUCENE-5123: - The only reason merge() exists there is so they can implement some bulk merging optimizations? Can we remove these optimizations? Has there ever been a benchmark showing they help at all? We shouldnt have such scary code in lucene because it looks faster. Every time I look at infostreams from merge, its completely dominated by postings and other things. invert the codec postings API - Key: LUCENE-5123 URL: https://issues.apache.org/jira/browse/LUCENE-5123 Project: Lucene - Core Issue Type: Wish Reporter: Robert Muir Assignee: Michael McCandless Fix For: 5.0 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. FreqProxTermsWriter streams the postings at flush, and the default merge() takes the incoming codec api and filters out deleted docs and pushes via same api (but that can be overridden). It could be cleaner if we allowed for a pull model instead (like DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of itself and just passed this to the codec consumer. This would give the codec more flexibility to e.g. do multiple passes if it wanted to do things like encode high-frequency terms more efficiently with a bitset-like encoding or other things... A codec can try to do things like this to some extent today, but its very difficult (look at buffering in Pulsing). We made this change with DV and it made a lot of interesting optimizations easy to implement... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5247) Support for custom per core properties missing with new-style solr.xml
[ https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-5247: Assignee: Erick Erickson Support for custom per core properties missing with new-style solr.xml -- Key: SOLR-5247 URL: https://issues.apache.org/jira/browse/SOLR-5247 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.4 Reporter: Chris F Assignee: Erick Erickson Priority: Trivial Labels: 4.4, core.properties, discovery, new-style, property, solr.xml This part has been solved. See comments When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties. However, I can use it in schema.xml. But only until restarting the servlet container. After that, the property is lost. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5247) Support for custom per core properties missing with new-style solr.xml
[ https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773093#comment-13773093 ] Erick Erickson commented on SOLR-5247: -- [~romseygeek] Do you have any insights re: whether this is still an issue in 4.5? We've both been in this code recently. I'll assign it to myself to track it, but I don't have many cycles right now, feel free to grab it if you do. Support for custom per core properties missing with new-style solr.xml -- Key: SOLR-5247 URL: https://issues.apache.org/jira/browse/SOLR-5247 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.4 Reporter: Chris F Priority: Trivial Labels: 4.4, core.properties, discovery, new-style, property, solr.xml This part has been solved. See comments When using old-style solr.xml I can define custom properties per core like so: {code:xml} cores adminPath=/admin/cores defaultCoreName=core1 core name=core1 instanceDir=core1 config=solrconfig.xml schema=schema.xml property name=foo value=bar / /core /cores {code} I can then use the property foo in schema.xml or solrconfig.xml like this: {code:xml} str name=foo${foo}/str {code} After switching to the new-style solr.xml with separate core.properties files per core this does not work anymore. I guess the corresponding core.properties file should look like this: {code} config=solrconfig.xml name=core1 schema=schema.xml foo=bar {code} (I also tried property.foo=bar) With that, I get the following error when reloading the core: {code} org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No system property or default value specified for foo value:${foo} {code} I can successfully reload the core if I use $\{foo:undefined\} but the value of foo will always be undefined then. When trying to create a new core with an url like this: {code} http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true {code} the property foo will not appear in core.properties. However, I can use it in schema.xml. But only until restarting the servlet container. After that, the property is lost. Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think it's important that we have one. Shai On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote: thats what it does. its more like a computed field. and you can sort by more than one of them. please see the JIRA issue for a description of the differences between function queries. On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote: Yes, you're right, but that's unrelated to this thread. I passed doScore=true and the scores come out the same, meaning Expression didn't affect the actual score, only the sort-by value (which is ok). search Expression doc=1, score=0.37158427, field=0.7431685328483582 doc=0, score=0.37158427, field=0.3715842664241791 Shai On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote: Expression I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN) Why does that come as a surprise to you? Pass true to indexsearcher to get the documents score back here. === Release 2.9.0 2009-09-23 === Changes in backwards compatibility policy LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. ... (Shai Erera via Mike McCandless) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5258) router.field support for compositeId router
Yonik Seeley created SOLR-5258: -- Summary: router.field support for compositeId router Key: SOLR-5258 URL: https://issues.apache.org/jira/browse/SOLR-5258 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Priority: Minor Although there is code to support router.field for CompositeId, it only calculates a simple (non-compound) hash, which isn't that useful unless you don't use compound ids (this is why I changed the docs to say router.field is only supported for the implicit router). The field value should either - be used to calculate the full compound hash - be used to calculate the prefix bits, and the uniqueKey will still be used for the lower bits. For consistency, I'd suggest the former. If we want to be able to specify a separate field that is only used for the prefix bits, then perhaps that should be router.prefixField -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
thats what it does. its more like a computed field. and you can sort by more than one of them. please see the JIRA issue for a description of the differences between function queries. On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote: Yes, you're right, but that's unrelated to this thread. I passed doScore=true and the scores come out the same, meaning Expression didn't affect the actual score, only the sort-by value (which is ok). search Expression doc=1, score=0.37158427, field=0.7431685328483582 doc=0, score=0.37158427, field=0.3715842664241791 Shai On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote: Expression I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN) Why does that come as a surprise to you? Pass true to indexsearcher to get the documents score back here. === Release 2.9.0 2009-09-23 === Changes in backwards compatibility policy LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. ... (Shai Erera via Mike McCandless) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5229) remove Collector specializations
Robert Muir created LUCENE-5229: --- Summary: remove Collector specializations Key: LUCENE-5229 URL: https://issues.apache.org/jira/browse/LUCENE-5229 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir There are too many collector specializations (i think 16 or 18?) and too many crazy defaults like returning NaN scores to the user by default in indexsearcher. this confuses hotspot (I will ignore any benchmarks posted here where only one type of sort is running thru the JVM, thats unrealistic), and confuses users with stuff like NaN scores coming back by default. I have two concerete suggestions: * nuke doMaxScores. its implicit from doScores. This is just over the top. This should also halve the collectors. * change doScores to true by default in indexsearcher. since shai was confused by the NaNs by default, and he added this stuff to lucene, that says *everything* about how wrong this default is. Someone who *does* understand what it does can simply pass false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: need doc assist for 4.5: clarify SOLR-4221 changes regarding routeField vs router.field ?
On Fri, Sep 20, 2013 at 10:39 AM, Yonik Seeley yo...@lucidworks.com wrote: On Fri, Sep 20, 2013 at 10:15 AM, Yonik Seeley yo...@lucidworks.com wrote: Also, it looks like route.field can be specified for the compositeId rotuer as well. Actually, I decided to leave that out of the docs since on further review the implementation looks incorrect (or perhaps I don't understand the intended API). We can doc it in a future release once it's nailed down. I opened this issue to deal with router.field in compositeId router https://issues.apache.org/jira/browse/SOLR-5258 -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
why dont you look and see how expressions is doing it? On Fri, Sep 20, 2013 at 11:39 AM, Shai Erera ser...@gmail.com wrote: Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think it's important that we have one. Shai On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote: thats what it does. its more like a computed field. and you can sort by more than one of them. please see the JIRA issue for a description of the differences between function queries. On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote: Yes, you're right, but that's unrelated to this thread. I passed doScore=true and the scores come out the same, meaning Expression didn't affect the actual score, only the sort-by value (which is ok). search Expression doc=1, score=0.37158427, field=0.7431685328483582 doc=0, score=0.37158427, field=0.3715842664241791 Shai On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote: Expression I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN) Why does that come as a surprise to you? Pass true to indexsearcher to get the documents score back here. === Release 2.9.0 2009-09-23 === Changes in backwards compatibility policy LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. ... (Shai Erera via Mike McCandless) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5229) remove Collector specializations
[ https://issues.apache.org/jira/browse/LUCENE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773097#comment-13773097 ] Shai Erera commented on LUCENE-5229: bq. nuke doMaxScores. its implicit from doScores +1, if you ask to compute scores, you might as well get maxScore. I doubt that specialization is so important. bq. change doScores to true by default in indexsearcher I'm not sure about it. I wasn't confused by the fact that I received NaN, only pointed out that when you use Expression, the result is not in the 'score' field, but the 'field' field. I think that in most cases, if you sort, you're interested in the sort-by value, not the score. Not sure if it buys performance or not, but I think it's just redundant work. remove Collector specializations Key: LUCENE-5229 URL: https://issues.apache.org/jira/browse/LUCENE-5229 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir There are too many collector specializations (i think 16 or 18?) and too many crazy defaults like returning NaN scores to the user by default in indexsearcher. this confuses hotspot (I will ignore any benchmarks posted here where only one type of sort is running thru the JVM, thats unrealistic), and confuses users with stuff like NaN scores coming back by default. I have two concerete suggestions: * nuke doMaxScores. its implicit from doScores. This is just over the top. This should also halve the collectors. * change doScores to true by default in indexsearcher. since shai was confused by the NaNs by default, and he added this stuff to lucene, that says *everything* about how wrong this default is. Someone who *does* understand what it does can simply pass false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
What do Expressions have to do here? Do they replace CustomScoreQuery? Maybe they should, I don't know. But today, if you want to use CSQ, to boost a document by an NDV field, you need to write a ValueSource which reads from the field. And that's the object that I don't see. Maybe you want to say that Expressions will eventually replace CSQ, and so it's moot to add a NumericDVFieldSource to Lucene? Or we want to document on CSQ that you should really consider using Expressions? Shai On Fri, Sep 20, 2013 at 6:41 PM, Robert Muir rcm...@gmail.com wrote: why dont you look and see how expressions is doing it? On Fri, Sep 20, 2013 at 11:39 AM, Shai Erera ser...@gmail.com wrote: Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think it's important that we have one. Shai On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote: thats what it does. its more like a computed field. and you can sort by more than one of them. please see the JIRA issue for a description of the differences between function queries. On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote: Yes, you're right, but that's unrelated to this thread. I passed doScore=true and the scores come out the same, meaning Expression didn't affect the actual score, only the sort-by value (which is ok). search Expression doc=1, score=0.37158427, field=0.7431685328483582 doc=0, score=0.37158427, field=0.3715842664241791 Shai On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote: Expression I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN) Why does that come as a surprise to you? Pass true to indexsearcher to get the documents score back here. === Release 2.9.0 2009-09-23 === Changes in backwards compatibility policy LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. ... (Shai Erera via Mike McCandless) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5229) remove Collector specializations
[ https://issues.apache.org/jira/browse/LUCENE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773108#comment-13773108 ] Robert Muir commented on LUCENE-5229: - {quote} I wasn't confused by the fact that I received NaN, only pointed out that when you use Expression, the result is not in the 'score' field, but the 'field' field. {quote} You invoked IndexSearcher.search(query, filter, n, *Sort*) and you were surprised that the result of the sort goes there? I think this kinda stuff only furthers to reinforce my argument that this stuff is way too specialized and complicated. remove Collector specializations Key: LUCENE-5229 URL: https://issues.apache.org/jira/browse/LUCENE-5229 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir There are too many collector specializations (i think 16 or 18?) and too many crazy defaults like returning NaN scores to the user by default in indexsearcher. this confuses hotspot (I will ignore any benchmarks posted here where only one type of sort is running thru the JVM, thats unrealistic), and confuses users with stuff like NaN scores coming back by default. I have two concerete suggestions: * nuke doMaxScores. its implicit from doScores. This is just over the top. This should also halve the collectors. * change doScores to true by default in indexsearcher. since shai was confused by the NaNs by default, and he added this stuff to lucene, that says *everything* about how wrong this default is. Someone who *does* understand what it does can simply pass false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
You asked about how to access a NumericDocValues field from a valuesource in lucene. Yet you showed an example where you did just this with expressions, so I'm just recommending you look at expressions/ source code (they use valuesource under the hood) to see how its done! On Fri, Sep 20, 2013 at 11:49 AM, Shai Erera ser...@gmail.com wrote: What do Expressions have to do here? Do they replace CustomScoreQuery? Maybe they should, I don't know. But today, if you want to use CSQ, to boost a document by an NDV field, you need to write a ValueSource which reads from the field. And that's the object that I don't see. Maybe you want to say that Expressions will eventually replace CSQ, and so it's moot to add a NumericDVFieldSource to Lucene? Or we want to document on CSQ that you should really consider using Expressions? Shai On Fri, Sep 20, 2013 at 6:41 PM, Robert Muir rcm...@gmail.com wrote: why dont you look and see how expressions is doing it? On Fri, Sep 20, 2013 at 11:39 AM, Shai Erera ser...@gmail.com wrote: Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think it's important that we have one. Shai On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote: thats what it does. its more like a computed field. and you can sort by more than one of them. please see the JIRA issue for a description of the differences between function queries. On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote: Yes, you're right, but that's unrelated to this thread. I passed doScore=true and the scores come out the same, meaning Expression didn't affect the actual score, only the sort-by value (which is ok). search Expression doc=1, score=0.37158427, field=0.7431685328483582 doc=0, score=0.37158427, field=0.3715842664241791 Shai On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote: Expression I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN) Why does that come as a surprise to you? Pass true to indexsearcher to get the documents score back here. === Release 2.9.0 2009-09-23 === Changes in backwards compatibility policy LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. ... (Shai Erera via Mike McCandless) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5230) CJKAnalyzer can't split ;
[ https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773147#comment-13773147 ] Littlestar commented on LUCENE-5230: sorry, I miss reset. I want to split with ;. CJKAnalyzer can't split ; --- Key: LUCENE-5230 URL: https://issues.apache.org/jira/browse/LUCENE-5230 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.4 Reporter: Littlestar Priority: Minor @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); token.reset(); //here while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5230) CJKAnalyzer can't split ;
[ https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated LUCENE-5230: --- Description: @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } was: @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(ä¸å›½)); //TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } java.lang.NullPointerException at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923) at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133) at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:171) at org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63) at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54) at org.apache.lucene.analysis.cjk.CJKBigramFilter.doNext(CJKBigramFilter.java:240) at org.apache.lucene.analysis.cjk.CJKBigramFilter.incrementToken(CJKBigramFilter.java:169) at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:81) Summary: CJKAnalyzer can't split ; (was: CJKAnalyzer java.lang.NullPointerException) CJKAnalyzer can't split ; --- Key: LUCENE-5230 URL: https://issues.apache.org/jira/browse/LUCENE-5230 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.4 Reporter: Littlestar Priority: Minor @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5230) CJKAnalyzer java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated LUCENE-5230: --- Summary: CJKAnalyzer java.lang.NullPointerException (was: CJKAnalyzer can't split ;) fixed. thanks. I want to split CJK string with ; and CJK bigram, but failed. CJKAnalyzer java.lang.NullPointerException -- Key: LUCENE-5230 URL: https://issues.apache.org/jira/browse/LUCENE-5230 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.4 Reporter: Littlestar Priority: Minor @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); token.reset(); //here while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5231) better interoperability of expressions/ with valuesource
Robert Muir created LUCENE-5231: --- Summary: better interoperability of expressions/ with valuesource Key: LUCENE-5231 URL: https://issues.apache.org/jira/browse/LUCENE-5231 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-5231.patch A few things i noticed, while trying to work on e.g. integration of this with solr and just playing around: * No way for a custom Bindings to currently bind the score, as the necessary stuff is package private. This adds a simple protected method to Bindings to enable this. * Expression.getValueSource() cannot in general be used easily by other things (e.g. interoperate with function queries and so on), because it expects you pass it this custom cache. This is an impl detail, its easy to remove this restriction and still compute subs only once. * if you try to bind the score and don't have the scorer setup, you should get a clear exception: not NPE. * Each binding is looked up per-segment, which is bad. we should minimize the lookups to only in the CTOR. * This makes validation considerably simpler and less error-prone, so easy that I don't think we need it in the base class either, I moved this to just a simple helper method on SimpleBindings. It also found a bug in the equals() test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5249) ClassNotFoundException due to white-spaces in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773140#comment-13773140 ] Simon Endele commented on SOLR-5249: Wow, thanks for your quick and detailed response! I'm using Eclipse with default settings, so I thought this might bother some more people like me. Eclipse inserts line-breaks and white-spaces at other places in the solrconfig.xml, which are ignored, for example in the defaults-section of a request handler: {code}str name=hl.flcontent title field1 field2 field3 field4 /str{code} Ok, this is maybe a bad example as the field list ist parsed. As far I know class names are Java identifiers, which cannot contain any white-spaces. This certain code fragment only handles class names and no files, doesn't it? ClassNotFoundException due to white-spaces in solrconfig.xml Key: SOLR-5249 URL: https://issues.apache.org/jira/browse/SOLR-5249 Project: Solr Issue Type: Bug Reporter: Simon Endele Priority: Minor Attachments: SolrResourceLoader.java.patch Original Estimate: 1h Remaining Estimate: 1h Due to auto-formatting by an text editor/IDE there may be line-breaks after class names in the solrconfig.xml, for example: {code:xml}searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory /str [...] /lst /searchComponent{code} This will raise an exception in SolrResourceLoader as the white-spaces are not stripped from the class name: {code}Caused by: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.spelling.suggest.fst.WFSTLookupFactory ' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:449) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:471) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:467) at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:102) at org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:623) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:601) at org.apache.solr.core.SolrCore.init(SolrCore.java:830) ... 13 more Caused by: java.lang.ClassNotFoundException: org.apache.solr.spelling.suggest.fst.WFSTLookupFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433) ... 19 more{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5230) CJKAnalyzer java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5230. - Resolution: Not A Problem You must call reset() (and also your loop should have end(), etc). See the javadocs of TokenStream. CJKAnalyzer java.lang.NullPointerException -- Key: LUCENE-5230 URL: https://issues.apache.org/jira/browse/LUCENE-5230 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.4 Reporter: Littlestar Priority: Minor @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(ä¸å›½)); //TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } java.lang.NullPointerException at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923) at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133) at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:171) at org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63) at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54) at org.apache.lucene.analysis.cjk.CJKBigramFilter.doNext(CJKBigramFilter.java:240) at org.apache.lucene.analysis.cjk.CJKBigramFilter.incrementToken(CJKBigramFilter.java:169) at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:81) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5249) ClassNotFoundException due to white-spaces in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773140#comment-13773140 ] Simon Endele edited comment on SOLR-5249 at 9/20/13 4:18 PM: - Wow, thanks for your quick and detailed response! I'm using Eclipse with default settings, so I thought this might bother some more people like me. Eclipse inserts line-breaks and white-spaces at other places in the solrconfig.xml, which are ignored, for example in the defaults-section of a request handler: {code}str name=hl.flcontent title field1 field2 field3 field4 /str{code} Ok, this is maybe a bad example as the field list is parsed. As far I know class names are Java identifiers, which cannot contain any white-spaces. This certain code fragment only handles class names and no files, doesn't it? was (Author: simon.endele): Wow, thanks for your quick and detailed response! I'm using Eclipse with default settings, so I thought this might bother some more people like me. Eclipse inserts line-breaks and white-spaces at other places in the solrconfig.xml, which are ignored, for example in the defaults-section of a request handler: {code}str name=hl.flcontent title field1 field2 field3 field4 /str{code} Ok, this is maybe a bad example as the field list ist parsed. As far I know class names are Java identifiers, which cannot contain any white-spaces. This certain code fragment only handles class names and no files, doesn't it? ClassNotFoundException due to white-spaces in solrconfig.xml Key: SOLR-5249 URL: https://issues.apache.org/jira/browse/SOLR-5249 Project: Solr Issue Type: Bug Reporter: Simon Endele Priority: Minor Attachments: SolrResourceLoader.java.patch Original Estimate: 1h Remaining Estimate: 1h Due to auto-formatting by an text editor/IDE there may be line-breaks after class names in the solrconfig.xml, for example: {code:xml}searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory /str [...] /lst /searchComponent{code} This will raise an exception in SolrResourceLoader as the white-spaces are not stripped from the class name: {code}Caused by: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.spelling.suggest.fst.WFSTLookupFactory ' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:449) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:471) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:467) at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:102) at org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:623) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:601) at org.apache.solr.core.SolrCore.init(SolrCore.java:830) ... 13 more Caused by: java.lang.ClassNotFoundException: org.apache.solr.spelling.suggest.fst.WFSTLookupFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433) ... 19 more{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5230) CJKAnalyzer java.lang.NullPointerException
Littlestar created LUCENE-5230: -- Summary: CJKAnalyzer java.lang.NullPointerException Key: LUCENE-5230 URL: https://issues.apache.org/jira/browse/LUCENE-5230 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.4 Reporter: Littlestar Priority: Minor @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(ä¸å›½)); //TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } java.lang.NullPointerException at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923) at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133) at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:171) at org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63) at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54) at org.apache.lucene.analysis.cjk.CJKBigramFilter.doNext(CJKBigramFilter.java:240) at org.apache.lucene.analysis.cjk.CJKBigramFilter.incrementToken(CJKBigramFilter.java:169) at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:81) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5231) better interoperability of expressions/ with valuesource
[ https://issues.apache.org/jira/browse/LUCENE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5231: Attachment: LUCENE-5231.patch better interoperability of expressions/ with valuesource Key: LUCENE-5231 URL: https://issues.apache.org/jira/browse/LUCENE-5231 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-5231.patch A few things i noticed, while trying to work on e.g. integration of this with solr and just playing around: * No way for a custom Bindings to currently bind the score, as the necessary stuff is package private. This adds a simple protected method to Bindings to enable this. * Expression.getValueSource() cannot in general be used easily by other things (e.g. interoperate with function queries and so on), because it expects you pass it this custom cache. This is an impl detail, its easy to remove this restriction and still compute subs only once. * if you try to bind the score and don't have the scorer setup, you should get a clear exception: not NPE. * Each binding is looked up per-segment, which is bad. we should minimize the lookups to only in the CTOR. * This makes validation considerably simpler and less error-prone, so easy that I don't think we need it in the base class either, I moved this to just a simple helper method on SimpleBindings. It also found a bug in the equals() test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5229) remove Collector specializations
[ https://issues.apache.org/jira/browse/LUCENE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773171#comment-13773171 ] Robert Muir commented on LUCENE-5229: - {quote} nuke doMaxScores. its implicit from doScores +1, if you ask to compute scores, you might as well get maxScore. I doubt that specialization is so important. {quote} I will split off a subtask for this since I dont think its controversial. I at least want to make some progress on this. Removing confusing booleans from the API of indexsearcher is also huge to me: and this will take care of one. remove Collector specializations Key: LUCENE-5229 URL: https://issues.apache.org/jira/browse/LUCENE-5229 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir There are too many collector specializations (i think 16 or 18?) and too many crazy defaults like returning NaN scores to the user by default in indexsearcher. this confuses hotspot (I will ignore any benchmarks posted here where only one type of sort is running thru the JVM, thats unrealistic), and confuses users with stuff like NaN scores coming back by default. I have two concerete suggestions: * nuke doMaxScores. its implicit from doScores. This is just over the top. This should also halve the collectors. * change doScores to true by default in indexsearcher. since shai was confused by the NaNs by default, and he added this stuff to lucene, that says *everything* about how wrong this default is. Someone who *does* understand what it does can simply pass false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5230) CJKAnalyzer can't split ;
[ https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated LUCENE-5230: --- Description: @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); token.reset(); //here while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } was: @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } CJKAnalyzer can't split ; --- Key: LUCENE-5230 URL: https://issues.apache.org/jira/browse/LUCENE-5230 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.4 Reporter: Littlestar Priority: Minor @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); token.reset(); //here while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
Robert Muir created LUCENE-5232: --- Summary: Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5230) CJKAnalyzer java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773202#comment-13773202 ] Erick Erickson commented on LUCENE-5230: Please bring issues like this up on the user's list before raising a JIRA. CJKAnalyzer java.lang.NullPointerException -- Key: LUCENE-5230 URL: https://issues.apache.org/jira/browse/LUCENE-5230 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.4 Reporter: Littlestar Priority: Minor @Test public void test_AlphaNumAnalyzer() throws IOException { Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44); TokenStream token = analyzer.tokenStream(test, new StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441)); token.reset(); //here while (token.incrementToken()) { final CharTermAttribute termAtt = token.addAttribute(CharTermAttribute.class); System.out.println(termAtt.toString()); } analyzer.close(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773461#comment-13773461 ] Michael McCandless commented on LUCENE-5232: I'm not sure we should do this. Today, when doMaxScore is false and doScores is true, we only score those hits that make it into the PQ, which is typically a very small subset of all hits. When an app needs scores, I think it often does not need the maxScore. Can we somehow remove specialization without losing this functionality? Decouple the two ... Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 4.5.0 RC1
: On Fri, Sep 20, 2013 at 9:20 AM, Adrien Grand jpou...@gmail.com wrote: : I'll backport the commit to lucene_solr_4_5. : : Oh, I see you have already done that, thanks! yeah, sorry -- i ment to followup and forgot to hit send. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773489#comment-13773489 ] Robert Muir commented on LUCENE-5232: - Such users can pass their own collector. Seriously, who is using search(Query query, Filter filter, int n, Sort sort, boolean doDocScores, boolean doMaxScore), so using a sort, and asking for scores, but not asking for the maximum score. This sounds to me like someones very special use case baked into lucene: I think we should remove it. Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773508#comment-13773508 ] Robert Muir commented on LUCENE-5232: - You know, if we really want to have this crazy specialization, why not move it out to a contrib module, and just have a HuperDuperTopFieldCollector.create() method that generates bytecode for the exact number of sort fields, and a million boolean parameters passed in? I just dont think it needs to be in IndexSearcher/core lucene. Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773566#comment-13773566 ] Michael McCandless commented on LUCENE-5215: Could you make the patch with --show-copies-as-adds? Thanks! Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773567#comment-13773567 ] Michael McCandless commented on LUCENE-5232: How about never computing maxScore when sorting by field (and removing that boolean)? An app can make a custom collector if they really need that, but I suspect it's uncommon. Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[SolrCloud] is there a reason Overseer.STATE_UPDATE_DELAY is set so high?
Hi, Overseer.STATE_UPDATE_DELAY seems to be the amount of time the state updater thread goes to sleep if there's no state update queue items to process, so that it doesn't hammer zookeeper. Is it necessary to set it that high (1500ms)? We're using SolrCloud such that collections are created on the fly, and 1500ms becomes a bottleneck for creation for the entire cluster because the updater is single-threaded and it goes to sleep for 1500ms every time the outer while loop runs. Since there's only one thread trying to monitor the queue, I don't think zookeeper will mind being hit a little more frequently while the queue remains empty. If people are in general worried about lowering it, can we at least make it a property? Thanks, Jessica
Re: [VOTE] Release Lucene/Solr 4.5.0 RC1
: : http://people.apache.org/~jpountz/staging_area/lucene-solr-4.5.0-RC1-rev1524755/ Once i fixed the javadoc linter workarround to hte 4_5 branch, I found no other problems with RC1 other then LUCENE-5233 -- and i certainly don't think LUCENE-5233 is significant enough to warrant a re-spin. So i vote +1 based on the following SHA1 files... 407d517272961cc09b5b2a6dc7f414c033c2a842 *lucene-4.5.0-src.tgz cb55b9fb36296e233d10b4dd0061af32947f1056 *lucene-4.5.0.tgz 82ed448175508792be960d31de05ea7e2815791e *lucene-4.5.0.zip 6db41833bf6763ec3b704cb343f59b779c16a841 *solr-4.5.0-src.tgz e9150dd7c1f6046f5879196ea266505613f26506 *solr-4.5.0.tgz 0c7d4bcb5c29f67f2722b1255a5da803772c03a5 *solr-4.5.0.zip -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 843 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/843/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 9752 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/jre/bin/java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=62BF223C722178E6 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=US-ASCII -classpath
Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 843 - Still Failing!
jvm crash On Fri, Sep 20, 2013 at 7:33 PM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/843/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 9752 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/jre/bin/java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=62BF223C722178E6 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=US-ASCII -classpath
[jira] [Commented] (LUCENE-5231) better interoperability of expressions/ with valuesource
[ https://issues.apache.org/jira/browse/LUCENE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773707#comment-13773707 ] ASF subversion and git services commented on LUCENE-5231: - Commit 1525192 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1525192 ] LUCENE-5231: better interoperability of expressions with valuesource better interoperability of expressions/ with valuesource Key: LUCENE-5231 URL: https://issues.apache.org/jira/browse/LUCENE-5231 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-5231.patch A few things i noticed, while trying to work on e.g. integration of this with solr and just playing around: * No way for a custom Bindings to currently bind the score, as the necessary stuff is package private. This adds a simple protected method to Bindings to enable this. * Expression.getValueSource() cannot in general be used easily by other things (e.g. interoperate with function queries and so on), because it expects you pass it this custom cache. This is an impl detail, its easy to remove this restriction and still compute subs only once. * if you try to bind the score and don't have the scorer setup, you should get a clear exception: not NPE. * Each binding is looked up per-segment, which is bad. we should minimize the lookups to only in the CTOR. * This makes validation considerably simpler and less error-prone, so easy that I don't think we need it in the base class either, I moved this to just a simple helper method on SimpleBindings. It also found a bug in the equals() test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773708#comment-13773708 ] Shai Erera commented on LUCENE-5232: bq. just have a HuperDuperTopFieldCollector.create() We have that, it's called TopFieldCollector.create(). bq. How about never computing maxScore when sorting by field +1. We can even offer such Collector. Maybe what we need is to remove that .search() method from IndexSearcher API, document that the sort methods never compute scores and that you should use TopFieldCollector.create() if you wish to do that? As for the specialization, I agree with Mike that we should decouple the two. I don't know how costly it is, in a real live system, to have a few extra 'ifs' (I don't think luceneutil lets you check that?), but I'm sure that computing a score is in most cases redundant work when sorting by a field and therefore should be avoided. Perhaps we should remove the specializations in favor of the added 'ifs' and let someone write his own Collector if he's worried about perf? Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5231) better interoperability of expressions/ with valuesource
[ https://issues.apache.org/jira/browse/LUCENE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5231. - Resolution: Fixed Fix Version/s: 4.6 5.0 better interoperability of expressions/ with valuesource Key: LUCENE-5231 URL: https://issues.apache.org/jira/browse/LUCENE-5231 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Fix For: 5.0, 4.6 Attachments: LUCENE-5231.patch A few things i noticed, while trying to work on e.g. integration of this with solr and just playing around: * No way for a custom Bindings to currently bind the score, as the necessary stuff is package private. This adds a simple protected method to Bindings to enable this. * Expression.getValueSource() cannot in general be used easily by other things (e.g. interoperate with function queries and so on), because it expects you pass it this custom cache. This is an impl detail, its easy to remove this restriction and still compute subs only once. * if you try to bind the score and don't have the scorer setup, you should get a clear exception: not NPE. * Each binding is looked up per-segment, which is bad. we should minimize the lookups to only in the CTOR. * This makes validation considerably simpler and less error-prone, so easy that I don't think we need it in the base class either, I moved this to just a simple helper method on SimpleBindings. It also found a bug in the equals() test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5231) better interoperability of expressions/ with valuesource
[ https://issues.apache.org/jira/browse/LUCENE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773709#comment-13773709 ] ASF subversion and git services commented on LUCENE-5231: - Commit 1525193 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1525193 ] LUCENE-5231: better interoperability of expressions with valuesource better interoperability of expressions/ with valuesource Key: LUCENE-5231 URL: https://issues.apache.org/jira/browse/LUCENE-5231 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-5231.patch A few things i noticed, while trying to work on e.g. integration of this with solr and just playing around: * No way for a custom Bindings to currently bind the score, as the necessary stuff is package private. This adds a simple protected method to Bindings to enable this. * Expression.getValueSource() cannot in general be used easily by other things (e.g. interoperate with function queries and so on), because it expects you pass it this custom cache. This is an impl detail, its easy to remove this restriction and still compute subs only once. * if you try to bind the score and don't have the scorer setup, you should get a clear exception: not NPE. * Each binding is looked up per-segment, which is bad. we should minimize the lookups to only in the CTOR. * This makes validation considerably simpler and less error-prone, so easy that I don't think we need it in the base class either, I moved this to just a simple helper method on SimpleBindings. It also found a bug in the equals() test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773712#comment-13773712 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1525195 from [~thetaphi] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1525195 ] Merged revision(s) 1525194 from lucene/dev/trunk: LUCENE-5207: Add a test that checks if the stack trace of an exception thrown from a Javascript function contains the original expression source code as the filename. lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Fix For: 5.0, 4.6 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5215: --- Attachment: LUCENE-5215.patch Patch with --show-copies-as-adds Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773710#comment-13773710 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1525194 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1525194 ] LUCENE-5207: Add a test that checks if the stack trace of an exception thrown from a Javascript function contains the original expression source code as the filename. lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Fix For: 5.0, 4.6 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773713#comment-13773713 ] Robert Muir commented on LUCENE-5232: - Sorry, I guess I'm against never computing this shit... because you guys think returning NaN is ok. I don't. Its not. if you want to make these optimizations, fix the APIs so its intuitive, otherwise, no way. Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4882) Restrict SolrResourceLoader to only classloader accessible files and instance dir
[ https://issues.apache.org/jira/browse/SOLR-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773716#comment-13773716 ] Uwe Schindler commented on SOLR-4882: - Hi, nobody commented on this issue, so I think the current patch is fine. I would like to commit this for 4.6. After that is resolved, we can also do SOLR-5234. Restrict SolrResourceLoader to only classloader accessible files and instance dir - Key: SOLR-4882 URL: https://issues.apache.org/jira/browse/SOLR-4882 Project: Solr Issue Type: Improvement Affects Versions: 4.3 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.5, 5.0 Attachments: SOLR-4882.patch, SOLR-4882.patch SolrResourceLoader currently allows to load files from any absolute/CWD-relative path, which is used as a fallback if the resource cannot be looked up via the class loader. We should limit this fallback to sub-dirs below the instanceDir passed into the ctor. The CWD special case should be removed, too (the virtual CWD is instance's config or root dir). The reason for this is security related. Some Solr components allow to pass in resource paths via REST parameters (e.g. XSL stylesheets, velocity templates,...) and load them via resource loader. By this it is possible to limit the whole thing to not allow loading e.g. /etc/passwd as a stylesheet. In 4.4 we should add a solrconfig.xml setting to enable the old behaviour, but disable it by default, if your existing installation requires the files from outside the instance dir which are not available via the URLClassLoader used internally. In Lucene 5.0 we should not support this anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Difference between CustomScoreProvider, FunctionQuery and Expression
Ok, so after I followed Rob's hints and clues, I found LongFieldSource, which uses FieldCache API under the hood, which access the NDV field. I feel sorry for the poor user who will try to figure it out himself though, because there's no evident anywhere that this is what you should do! LongFieldSource jdocs are completely erroneous, seems a copy-paste bug from FloatFieldSource. Then, if you get passed that and read FieldCache.getLongs jdocs, they say: * Checks the internal cache for an appropriate entry, and if none is * found, *reads the terms in codefield/code as longs* and returns an array * of size codereader.maxDoc()/code of the value each document * has in the given field. Nothing about NumericDocValues. I actually looked at FieldCache before sending the first email, but all I could conclude from the jdocs is that it parses terms. I didn't bother looking at FieldCacheImpl implementation, and users shouldn't be expected to do that. I'll open an issue to clean/clarify javadocs. Shai On Fri, Sep 20, 2013 at 6:53 PM, Robert Muir rcm...@gmail.com wrote: You asked about how to access a NumericDocValues field from a valuesource in lucene. Yet you showed an example where you did just this with expressions, so I'm just recommending you look at expressions/ source code (they use valuesource under the hood) to see how its done! On Fri, Sep 20, 2013 at 11:49 AM, Shai Erera ser...@gmail.com wrote: What do Expressions have to do here? Do they replace CustomScoreQuery? Maybe they should, I don't know. But today, if you want to use CSQ, to boost a document by an NDV field, you need to write a ValueSource which reads from the field. And that's the object that I don't see. Maybe you want to say that Expressions will eventually replace CSQ, and so it's moot to add a NumericDVFieldSource to Lucene? Or we want to document on CSQ that you should really consider using Expressions? Shai On Fri, Sep 20, 2013 at 6:41 PM, Robert Muir rcm...@gmail.com wrote: why dont you look and see how expressions is doing it? On Fri, Sep 20, 2013 at 11:39 AM, Shai Erera ser...@gmail.com wrote: Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think it's important that we have one. Shai On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote: thats what it does. its more like a computed field. and you can sort by more than one of them. please see the JIRA issue for a description of the differences between function queries. On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote: Yes, you're right, but that's unrelated to this thread. I passed doScore=true and the scores come out the same, meaning Expression didn't affect the actual score, only the sort-by value (which is ok). search Expression doc=1, score=0.37158427, field=0.7431685328483582 doc=0, score=0.37158427, field=0.3715842664241791 Shai On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote: Expression I tried the new module, following TestDemoExpression and compiled the expression using this code: Expression expr = JavascriptCompiler.compile(_score * boost); SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField(_score, SortField.Type.SCORE)); bindings.add(new SortField(boost, SortField.Type.LONG)); The result scores are: search Expression doc=1, score=NaN, field=0.7431685328483582 doc=0, score=NaN, field=0.3715842664241791 As you can see, both CustomScoreProvider and Expression methods return same scores for the docs, while the FunctionQuery method returns different scores. The reason is that when using FunctionQuery, the scores of the ValueSources are multiplied by queryWeight, which seems correct to me. Expression is more about sorting than scoring as far as I understand (for instance, the result FieldDocs.score is NaN) Why does that come as a surprise to you? Pass true to indexsearcher to get the documents score back here. === Release 2.9.0 2009-09-23 === Changes in backwards compatibility policy LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. ... (Shai Erera via Mike McCandless) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail:
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773721#comment-13773721 ] Shai Erera commented on LUCENE-5232: Maybe we can fix the API by making maxScore private on TopDocs, and throw IllegalStateException if you call it, yet it's NaN? I think it's an overkill though and it's enough to document that that's the behavior if you don't ask to compute maxScore. Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5234) Clarify FieldCache API around the use of NumericDocValues fields
Shai Erera created LUCENE-5234: -- Summary: Clarify FieldCache API around the use of NumericDocValues fields Key: LUCENE-5234 URL: https://issues.apache.org/jira/browse/LUCENE-5234 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Shai Erera Assignee: Shai Erera Spinoff from this thread: http://lucene.markmail.org/thread/wxs6bzf2ul6go4pg. FieldCache (and friends) API javadocs need some improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773726#comment-13773726 ] Robert Muir commented on LUCENE-5232: - who the fuck is asking for scores, but not the max score, and why does their insanely specialized use case justify all these booleans on a central lucene class. Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5234) Clarify FieldCache API around the use of NumericDocValues fields
[ https://issues.apache.org/jira/browse/LUCENE-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5234: --- Attachment: LUCENE-5234.patch Initial patch improving longs javadocs. I think same improvements can be done to the other types as well (float, int etc.), but I'd like to get feedback on the wording first. Clarify FieldCache API around the use of NumericDocValues fields Key: LUCENE-5234 URL: https://issues.apache.org/jira/browse/LUCENE-5234 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5234.patch Spinoff from this thread: http://lucene.markmail.org/thread/wxs6bzf2ul6go4pg. FieldCache (and friends) API javadocs need some improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-2844) benchmark geospatial performance based on geonames.org
[ https://issues.apache.org/jira/browse/LUCENE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned LUCENE-2844: Assignee: David Smiley benchmark geospatial performance based on geonames.org -- Key: LUCENE-2844 URL: https://issues.apache.org/jira/browse/LUCENE-2844 Project: Lucene - Core Issue Type: New Feature Components: modules/benchmark Reporter: David Smiley Assignee: David Smiley Priority: Minor Fix For: 5.0, 4.5 Attachments: benchmark-geo.patch, benchmark-geo.patch Until now (with this patch), the benchmark contrib module did not include a means to test geospatial data. This patch includes some new files and changes to existing ones. Here is a summary of what is being added in this patch per file (all files below are within the benchmark contrib module) along with my notes: Changes: * build.xml -- Add dependency on Lucene's spatial module and Solr. ** It was a real pain to figure out the convoluted ant build system to make this work, and I doubt I did it the proper way. ** Rob Muir thought it would be a good idea to make the benchmark contrib module be top level module (i.e. be alongside analysis) so that it can depend on everything. http://lucene.472066.n3.nabble.com/Re-Geospatial-search-in-Lucene-Solr-tp2157146p2157824.html I agree * ReadTask.java -- Added a search.useHitTotal boolean option that will use the total hits number for reporting purposes, instead of the existing behavior. ** The existing behavior (i.e. when search.useHitTotal=false) doesn't look very useful since the response integer is the sum of several things instead of just one thing. I don't see how anyone makes use of it. Note that on my local system, I also changed ReportTask RepSelectByPrefTask to not include the '-' every other line, and also changed Format.java to not use commas in the numbers. These changes are to make copy-pasting into excel more streamlined. New Files: * geoname-spatial.alg -- my algorithm file. ** Note the :0 trailing the Populate sequence. This is a trick I use to skip building the index, since it takes a while to build and I'm not interested in benchmarking index construction. You'll want to set this to :1 and then subsequently put it back for further runs as long as you keep the doc.geo.schemaField or any other configuration elements affecting index the same. ** In the patch, doc.geo.schemaField=geohash but unless you're tinkering with SOLR-2155, you'll probably want to set this to latlon * GeoNamesContentSource.java -- a ContentSource for a geonames.org data file (either a single country like US.txt or allCountries.txt). ** Uses a subclass of DocData to store all the fields. The existing DocData wasn't very applicable to data that is not composed of a title and body. ** Doesn't reuse the docdata parameter to getNextDocData(); a new one is created every time. ** Only supports content.source.forever=false * GeoNamesDocMaker.java -- a subclass of DocMaker that works very differently than the existing DocMaker. ** Instead of assuming that each line from geonames.org will correspond to one Lucene document, this implementation supports, via configuration, creating a variable number of documents, each with a variable number of points taken randomly from a GeoNamesContentSource. ** doc.geo.docsToGenerate: The number of documents to generate. If blank it defaults to the number of rows in GeoNamesContentSource. ** doc.geo.avgPlacesPerDoc: The average number of places to be added to a document. A random number between 0 and one less than twice this amount is chosen on a per document basis. If this is set to 1, then exactly one is always used. In order to support a value greater than 1, use the geohash field type and incorporate SOLR-2155 (geohash prefix technique). ** doc.geo.oneDocPerPlace: Whether at most one document should use the same place. In other words, Can more than one document have the same place? If so, set this to false. ** doc.geo.schemaField: references a field name in schema.xml. The field should implement SpatialQueryable. * GeoPerfData.java: This class is a singleton storing data in memory that is shared by GeoNamesDocMaker.java and GeoQueryMaker.java. ** content.geo.zeroPopSubst: if a population is encountered that is = 0, then use this population value instead. Default is 100. ** content.geo.maxPlaces: A limit on the number of rows read in from GeoNamesContentSource.java can be set here. Defaults to Integer.MAX_VALUE. ** GeoPerfData is primarily responsible for reading in data from GeoNamesContentSource into memory to store the lat, lon, and population. When a random place is
[jira] [Updated] (LUCENE-2844) benchmark geospatial performance based on geonames.org
[ https://issues.apache.org/jira/browse/LUCENE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-2844: - Fix Version/s: (was: 4.5) 4.6 benchmark geospatial performance based on geonames.org -- Key: LUCENE-2844 URL: https://issues.apache.org/jira/browse/LUCENE-2844 Project: Lucene - Core Issue Type: New Feature Components: modules/benchmark Reporter: David Smiley Assignee: David Smiley Priority: Minor Fix For: 5.0, 4.6 Attachments: benchmark-geo.patch, benchmark-geo.patch Until now (with this patch), the benchmark contrib module did not include a means to test geospatial data. This patch includes some new files and changes to existing ones. Here is a summary of what is being added in this patch per file (all files below are within the benchmark contrib module) along with my notes: Changes: * build.xml -- Add dependency on Lucene's spatial module and Solr. ** It was a real pain to figure out the convoluted ant build system to make this work, and I doubt I did it the proper way. ** Rob Muir thought it would be a good idea to make the benchmark contrib module be top level module (i.e. be alongside analysis) so that it can depend on everything. http://lucene.472066.n3.nabble.com/Re-Geospatial-search-in-Lucene-Solr-tp2157146p2157824.html I agree * ReadTask.java -- Added a search.useHitTotal boolean option that will use the total hits number for reporting purposes, instead of the existing behavior. ** The existing behavior (i.e. when search.useHitTotal=false) doesn't look very useful since the response integer is the sum of several things instead of just one thing. I don't see how anyone makes use of it. Note that on my local system, I also changed ReportTask RepSelectByPrefTask to not include the '-' every other line, and also changed Format.java to not use commas in the numbers. These changes are to make copy-pasting into excel more streamlined. New Files: * geoname-spatial.alg -- my algorithm file. ** Note the :0 trailing the Populate sequence. This is a trick I use to skip building the index, since it takes a while to build and I'm not interested in benchmarking index construction. You'll want to set this to :1 and then subsequently put it back for further runs as long as you keep the doc.geo.schemaField or any other configuration elements affecting index the same. ** In the patch, doc.geo.schemaField=geohash but unless you're tinkering with SOLR-2155, you'll probably want to set this to latlon * GeoNamesContentSource.java -- a ContentSource for a geonames.org data file (either a single country like US.txt or allCountries.txt). ** Uses a subclass of DocData to store all the fields. The existing DocData wasn't very applicable to data that is not composed of a title and body. ** Doesn't reuse the docdata parameter to getNextDocData(); a new one is created every time. ** Only supports content.source.forever=false * GeoNamesDocMaker.java -- a subclass of DocMaker that works very differently than the existing DocMaker. ** Instead of assuming that each line from geonames.org will correspond to one Lucene document, this implementation supports, via configuration, creating a variable number of documents, each with a variable number of points taken randomly from a GeoNamesContentSource. ** doc.geo.docsToGenerate: The number of documents to generate. If blank it defaults to the number of rows in GeoNamesContentSource. ** doc.geo.avgPlacesPerDoc: The average number of places to be added to a document. A random number between 0 and one less than twice this amount is chosen on a per document basis. If this is set to 1, then exactly one is always used. In order to support a value greater than 1, use the geohash field type and incorporate SOLR-2155 (geohash prefix technique). ** doc.geo.oneDocPerPlace: Whether at most one document should use the same place. In other words, Can more than one document have the same place? If so, set this to false. ** doc.geo.schemaField: references a field name in schema.xml. The field should implement SpatialQueryable. * GeoPerfData.java: This class is a singleton storing data in memory that is shared by GeoNamesDocMaker.java and GeoQueryMaker.java. ** content.geo.zeroPopSubst: if a population is encountered that is = 0, then use this population value instead. Default is 100. ** content.geo.maxPlaces: A limit on the number of rows read in from GeoNamesContentSource.java can be set here. Defaults to Integer.MAX_VALUE. ** GeoPerfData is primarily responsible for reading in data from GeoNamesContentSource into memory to store the lat, lon, and population.
[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc
[ https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773730#comment-13773730 ] Shai Erera commented on LUCENE-5232: Well, maybe start with why computing maxScore at all, even for a TopScoreDocCollector? We use that to normalize document scores when doing some form of distributed search. When you use TSDC, it's easy to fill TopDocs.maxScore, because it's already known. When you sort by a field, you have to score *every* document in order to fill maxScore, as Mike pointed out, and not just those that make it into the heap based on their sort value. I think the problematic API here might be TopFieldDocs extending TopDocs. I believe that when you ask to sort, you don't need scores. That's the common case. So if we e.g. returned a TopFieldDocs which does not extend from TopDocs, and FieldDoc only gave you the sort-by values + 'doc', then we can remove doScore + doMaxScore entirely from TopFieldCollector. Let the users that need to know the score in addition to the sort-by values write a custom Collector. Or, they can put a SortField.SCORE as the last sort-by field, and they get the scores already in FieldDoc.fields. Remove doMaxScore from indexsearcher, collector specializations, etc Key: LUCENE-5232 URL: https://issues.apache.org/jira/browse/LUCENE-5232 Project: Lucene - Core Issue Type: Sub-task Reporter: Robert Muir Fix For: 5.0 I think we should just compute doMaxScore whenever doDocScores = true. This would remove 4 collector specializations and remove a boolean parameter from 4 indexsearcher methods. We can just do this in 5.0 I think. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org