[jira] [Updated] (SOLR-5247) Custom per core properties not persisted on API CREATE with new-style solr.xml

2013-09-20 Thread Chris F (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris F updated SOLR-5247:
--

Description: 
This part has been solved. See comments

When using old-style solr.xml I can define custom properties per core like so:
{code:xml}
cores adminPath=/admin/cores defaultCoreName=core1
  core name=core1 instanceDir=core1 config=solrconfig.xml 
schema=schema.xml
property name=foo value=bar /
  /core
/cores
{code}
I can then use the property foo in schema.xml or solrconfig.xml like this:
{code:xml}
str name=foo${foo}/str
{code}

After switching to the new-style solr.xml with separate core.properties files 
per core this does not work anymore.

I guess the corresponding core.properties file should look like this:
{code}
config=solrconfig.xml
name=core1
schema=schema.xml
foo=bar
{code}
(I also tried property.foo=bar)

With that, I get the following error when reloading the core:
{code}
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
system property or default value specified for foo value:${foo}
{code}
I can successfully reload the core if I use $\{foo:undefined\} but the value of 
foo will always be undefined then.

When trying to create a new core with an url like this:
{code}
http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
{code}
the property foo will not appear in core.properties file.

Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?

  was:
When using old-style solr.xml I can define custom properties per core like so:
{code:xml}
cores adminPath=/admin/cores defaultCoreName=core1
  core name=core1 instanceDir=core1 config=solrconfig.xml 
schema=schema.xml
property name=foo value=bar /
  /core
/cores
{code}
I can then use the property foo in schema.xml or solrconfig.xml like this:
{code:xml}
str name=foo${foo}/str
{code}

After switching to the new-style solr.xml with separate core.properties files 
per core this does not work anymore.

I guess the corresponding core.properties file should look like this:
{code}
config=solrconfig.xml
name=core1
schema=schema.xml
foo=bar
{code}
(I also tried property.foo=bar)

With that, I get the following error when reloading the core:
{code}
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
system property or default value specified for foo value:${foo}
{code}
I can successfully reload the core if I use $\{foo:undefined\} but the value of 
foo will always be undefined then.

When trying to create a new core with an url like this:
{code}
http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
{code}
the property foo will not appear in core.properties file.

Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?


 Custom per core properties not persisted on API CREATE with new-style solr.xml
 --

 Key: SOLR-5247
 URL: https://issues.apache.org/jira/browse/SOLR-5247
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.4
Reporter: Chris F
Priority: Critical
  Labels: 4.4, core.properties, discovery, new-style, property, 
 solr.xml

 This part has been solved. See comments
 When using old-style solr.xml I can define custom properties per core like so:
 {code:xml}
 cores adminPath=/admin/cores defaultCoreName=core1
   core name=core1 instanceDir=core1 config=solrconfig.xml 
 schema=schema.xml
 property name=foo value=bar /
   /core
 /cores
 {code}
 I can then use the property foo in schema.xml or solrconfig.xml like this:
 {code:xml}
 str name=foo${foo}/str
 {code}
 After switching to the new-style solr.xml with separate core.properties files 
 per core this does not work anymore.
 I guess the corresponding core.properties file should look like this:
 {code}
 config=solrconfig.xml
 name=core1
 schema=schema.xml
 foo=bar
 {code}
 (I also tried property.foo=bar)
 With that, I get the following error when reloading the core:
 {code}
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
 system property or default value specified for foo value:${foo}
 {code}
 I can successfully reload the core if I use $\{foo:undefined\} but the value 
 of foo will always be undefined then.
 When trying to create a new core with an url like this:
 {code}
 http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
 {code}
 the property foo will not appear in core.properties file.
 Possibly related to 
 [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?

--
This message 

[jira] [Updated] (SOLR-5247) Custom per core properties not persisted on API CREATE with new-style solr.xml

2013-09-20 Thread Chris F (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris F updated SOLR-5247:
--

Summary: Custom per core properties not persisted on API CREATE with 
new-style solr.xml  (was: Support for custom per core properties missing with 
new-style solr.xml)

 Custom per core properties not persisted on API CREATE with new-style solr.xml
 --

 Key: SOLR-5247
 URL: https://issues.apache.org/jira/browse/SOLR-5247
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.4
Reporter: Chris F
Priority: Critical
  Labels: 4.4, core.properties, discovery, new-style, property, 
 solr.xml

 When using old-style solr.xml I can define custom properties per core like so:
 {code:xml}
 cores adminPath=/admin/cores defaultCoreName=core1
   core name=core1 instanceDir=core1 config=solrconfig.xml 
 schema=schema.xml
 property name=foo value=bar /
   /core
 /cores
 {code}
 I can then use the property foo in schema.xml or solrconfig.xml like this:
 {code:xml}
 str name=foo${foo}/str
 {code}
 After switching to the new-style solr.xml with separate core.properties files 
 per core this does not work anymore.
 I guess the corresponding core.properties file should look like this:
 {code}
 config=solrconfig.xml
 name=core1
 schema=schema.xml
 foo=bar
 {code}
 (I also tried property.foo=bar)
 With that, I get the following error when reloading the core:
 {code}
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
 system property or default value specified for foo value:${foo}
 {code}
 I can successfully reload the core if I use $\{foo:undefined\} but the value 
 of foo will always be undefined then.
 When trying to create a new core with an url like this:
 {code}
 http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
 {code}
 the property foo will not appear in core.properties file.
 Possibly related to 
 [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5247) Support for custom per core properties missing with new-style solr.xml

2013-09-20 Thread Chris F (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris F updated SOLR-5247:
--

Summary: Support for custom per core properties missing with new-style 
solr.xml  (was: Custom per core properties not persisted on API CREATE with 
new-style solr.xml)

 Support for custom per core properties missing with new-style solr.xml
 --

 Key: SOLR-5247
 URL: https://issues.apache.org/jira/browse/SOLR-5247
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.4
Reporter: Chris F
Priority: Critical
  Labels: 4.4, core.properties, discovery, new-style, property, 
 solr.xml

 This part has been solved. See comments
 When using old-style solr.xml I can define custom properties per core like so:
 {code:xml}
 cores adminPath=/admin/cores defaultCoreName=core1
   core name=core1 instanceDir=core1 config=solrconfig.xml 
 schema=schema.xml
 property name=foo value=bar /
   /core
 /cores
 {code}
 I can then use the property foo in schema.xml or solrconfig.xml like this:
 {code:xml}
 str name=foo${foo}/str
 {code}
 After switching to the new-style solr.xml with separate core.properties files 
 per core this does not work anymore.
 I guess the corresponding core.properties file should look like this:
 {code}
 config=solrconfig.xml
 name=core1
 schema=schema.xml
 foo=bar
 {code}
 (I also tried property.foo=bar)
 With that, I get the following error when reloading the core:
 {code}
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
 system property or default value specified for foo value:${foo}
 {code}
 I can successfully reload the core if I use $\{foo:undefined\} but the value 
 of foo will always be undefined then.
 When trying to create a new core with an url like this:
 {code}
 http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
 {code}
 the property foo will not appear in core.properties file.
 Possibly related to 
 [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5247) Support for custom per core properties missing with new-style solr.xml

2013-09-20 Thread Chris F (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris F updated SOLR-5247:
--

   Priority: Trivial  (was: Critical)
Description: 
This part has been solved. See comments

When using old-style solr.xml I can define custom properties per core like so:
{code:xml}
cores adminPath=/admin/cores defaultCoreName=core1
  core name=core1 instanceDir=core1 config=solrconfig.xml 
schema=schema.xml
property name=foo value=bar /
  /core
/cores
{code}
I can then use the property foo in schema.xml or solrconfig.xml like this:
{code:xml}
str name=foo${foo}/str
{code}

After switching to the new-style solr.xml with separate core.properties files 
per core this does not work anymore.

I guess the corresponding core.properties file should look like this:
{code}
config=solrconfig.xml
name=core1
schema=schema.xml
foo=bar
{code}
(I also tried property.foo=bar)

With that, I get the following error when reloading the core:
{code}
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
system property or default value specified for foo value:${foo}
{code}
I can successfully reload the core if I use $\{foo:undefined\} but the value of 
foo will always be undefined then.

When trying to create a new core with an url like this:
{code}
http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
{code}
the property foo will not appear in core.properties. However, I can use it in 
schema.xml. But only until restarting the servlet container. After that, the 
property is lost.

Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?

  was:
This part has been solved. See comments

When using old-style solr.xml I can define custom properties per core like so:
{code:xml}
cores adminPath=/admin/cores defaultCoreName=core1
  core name=core1 instanceDir=core1 config=solrconfig.xml 
schema=schema.xml
property name=foo value=bar /
  /core
/cores
{code}
I can then use the property foo in schema.xml or solrconfig.xml like this:
{code:xml}
str name=foo${foo}/str
{code}

After switching to the new-style solr.xml with separate core.properties files 
per core this does not work anymore.

I guess the corresponding core.properties file should look like this:
{code}
config=solrconfig.xml
name=core1
schema=schema.xml
foo=bar
{code}
(I also tried property.foo=bar)

With that, I get the following error when reloading the core:
{code}
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
system property or default value specified for foo value:${foo}
{code}
I can successfully reload the core if I use $\{foo:undefined\} but the value of 
foo will always be undefined then.

When trying to create a new core with an url like this:
{code}
http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
{code}
the property foo will not appear in core.properties file.

Possibly related to [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?


 Support for custom per core properties missing with new-style solr.xml
 --

 Key: SOLR-5247
 URL: https://issues.apache.org/jira/browse/SOLR-5247
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.4
Reporter: Chris F
Priority: Trivial
  Labels: 4.4, core.properties, discovery, new-style, property, 
 solr.xml

 This part has been solved. See comments
 When using old-style solr.xml I can define custom properties per core like so:
 {code:xml}
 cores adminPath=/admin/cores defaultCoreName=core1
   core name=core1 instanceDir=core1 config=solrconfig.xml 
 schema=schema.xml
 property name=foo value=bar /
   /core
 /cores
 {code}
 I can then use the property foo in schema.xml or solrconfig.xml like this:
 {code:xml}
 str name=foo${foo}/str
 {code}
 After switching to the new-style solr.xml with separate core.properties files 
 per core this does not work anymore.
 I guess the corresponding core.properties file should look like this:
 {code}
 config=solrconfig.xml
 name=core1
 schema=schema.xml
 foo=bar
 {code}
 (I also tried property.foo=bar)
 With that, I get the following error when reloading the core:
 {code}
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
 system property or default value specified for foo value:${foo}
 {code}
 I can successfully reload the core if I use $\{foo:undefined\} but the value 
 of foo will always be undefined then.
 When trying to create a new core with an url like this:
 {code}
 

Re: [VOTE] Release Lucene/Solr 4.5.0 RC1

2013-09-20 Thread Adrien Grand
Hi Chris,

On Fri, Sep 20, 2013 at 2:33 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 I *think* this means that we just need to backport r1522884 to the 4_5
 branch, but i don't think we need a re-spin.

Thanks for reporting this error. I agree this doesn't need a respin,
especially given that the fix is to ignore the javadoc bug on the
checker side. I'll backport the commit to lucene_solr_4_5.

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 4.5.0 RC1

2013-09-20 Thread Adrien Grand
On Fri, Sep 20, 2013 at 9:20 AM, Adrien Grand jpou...@gmail.com wrote:
 I'll backport the commit to lucene_solr_4_5.

Oh, I see you have already done that, thanks!

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5109) EliasFano value index

2013-09-20 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772858#comment-13772858
 ] 

Adrien Grand commented on LUCENE-5109:
--

Thanks for the update, this looks interesting!

bq. use this to implement EliasFanoValueIndexedDocIdSet, test, maybe benchmark

This can be useful to test the overhead of the index compared to 
EliasFanoDocIdSet but given that we are probably going to want an index almost 
everytime, maybe we could just make EliasFanoDocIdSet use an index by default, 
potentially giving the ability to disable indexing by passing 
indexInterval=Integer.MAX_VALUE (like the other sets).

bq. add broadword bit selection

I'm looking forward to it!

 EliasFano value index
 -

 Key: LUCENE-5109
 URL: https://issues.apache.org/jira/browse/LUCENE-5109
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Paul Elschot
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5109.patch, LUCENE-5109.patch


 Index upper bits of Elias-Fano sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-20 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772862#comment-13772862
 ] 

Han Jiang commented on LUCENE-5123:
---

Nice change! Although PushFieldsConsumer is still using the old API, I like the 
migrating
of flush() logic from FreqProxTermsWriterPerField to PushFieldsConsumer, the 
calling chain is 
more clear in codec level now. :)

Also, I'm quite curious whether StoredFields and TermVectors will get rid of 
'merge()' later.


 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Can we use TREC data set in open source?

2013-09-20 Thread Han Jiang
 I read here http://lemurproject.org/clueweb09/ that there is a hosted
 version of ClueWeb09 (the latest is ClueWeb12, for which I don't find a
 hosted version), and to get access to it, someone from the ASF will need
 to sign an Organizational Agreement with them as well as each individual
 in the project will need to sign an Individual Agreement (retained by the
 ASF). Perhaps this can be available only to committers.

This is nice! I'll try to ask ASF about this.

 To this day, I think the only way it will happen is for the community
 to build a completely open system, perhaps based off of Common Crawl or
 our own crawl and host it ourselves and develop judgments, etc.

Yeah, this is what we need in ORP.

 Most people like the idea, but are not sure how to distribute it in an
 open way (ClueWeb comes as 4 1TB disks right now) and I am also not sure
 how they would handle any copyright/redaction claims against it.  There
 is, of course, little incentive for those involved to solve these, either,
 as most people who are interested sign the form and pay the $600 for the
 disks.

Sigh, yes, it is hard to make a data set totally public. Actually, one of
my purpose in this question is to see whether it is acceptable in our
community (i.e. lucene/solr only) to obtain a data set not open to all
people. When expand to a larger scope, the license issue is somewhat
hairy...


And since Shai has found a possible 'free' data set, I think it is possible
for ASF to obtain an Organizational Agreement for this. I'll try to contact
ASF  CMU about how they define person with the authority in OSS.


On Tue, Sep 17, 2013 at 6:11 AM, Grant Ingersoll gsing...@apache.orgwrote:

 Inline below

 On Sep 9, 2013, at 10:53 PM, Han Jiang jiangha...@gmail.com wrote:

 Back in 2007 Grant contacted with NIST about making TREC collection
 available to our community:

 http://mail-archives.apache.org/mod_mbox/lucene-dev/200708.mbox/browser

 I think a try for this is really important to our project and people who
 use Lucene. All these years the speed performance is mainly tuned on
 Wikipedia, however it's not very 'standard':

 * it doesn't represent how real-world search works;
 * it cannot be used to evaluate the relevance of our scoring models;
 * researchers tend to do experiments on other data sets, and usually it is
   hard to know whether Lucene performs its best performance;

 And personally I agree with this line:

  I think it would encourage Lucene users/developers to think about
  relevance as much as we think about speed.

 There's been much work to make Lucene's scoring models pluggable in 4.0,
 and it'll be great if we can explore more about it. It is very appealing
 to
 see a high-performance library work along with state-of-the-art ranking
 methods.


 And about TREC data set, the problems we met are:

 1. NIST/TREC does not own the original collections, therefore it might be
necessary to have direct contact with those organizations who really
 did,
such as:

http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html
http://lemurproject.org/clueweb12/

 2. Currently, there is no open-source license for any of the data sets, so
it won't be as 'open' as Wikipedia is.

As is proposed by Grant, a possibility is to make the data set
 accessible
only to committers instead of all users. It is not very open-source
 then,
but TREC data sets is public and usually available to researchers, so
people can still reproduce performance test.

 I'm quite curious, has anyone explored getting an open-source license for
 one of those data sets? And is our community still interested about this
 issue after all these years?


 It continues to be of interest to me.  I've had various conversations
 throughout the years on it.  Most people like the idea, but are not sure
 how to distribute it in an open way (ClueWeb comes as 4 1TB disks right
 now) and I am also not sure how they would handle any copyright/redaction
 claims against it.  There is, of course, little incentive for those
 involved to solve these, either, as most people who are interested sign the
 form and pay the $600 for the disks.  I've had a number of conversations
 about how I view this to be a significant barrier to open research, esp. in
 under-served countries and to open source.  People sympathize with me, but
 then move on.

 To this day, I think the only way it will happen is for the community to
 build a completely open system, perhaps based off of Common Crawl or our
 own crawl and host it ourselves and develop judgments, etc.  We tried to
 get this off the ground w/ the Open Relevance Project, but there was never
 a sustainable effort, and thus I have little hope at this point for it (but
 I would love to be proven wrong)  For it to succeed, I think we would need
 the backing of a University with students interested in curating such a
 collection, the judgments, etc.  I think we could figure out how to
 distribute the data either 

[jira] [Updated] (LUCENE-5215) Add support for FieldInfos generation

2013-09-20 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5215:
---

Attachment: LUCENE-5215.patch

Fixed BasePostingsFormatTestCase to initialize Lucene46Codec (not 45). It was 
the last piece of code which still used the now deprecated Lucene45. All Lucene 
and Solr tests pass, so I think this is ready.

BTW, I noticed that TestBackCompat suppresses Lucene41 and Lucene42. I ran it 
with -Dtests.codec=Lucene45 and it passed, so I'm not sure if I should add the 
now deprecated Lucene45Codec to the suppress list?

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch


 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772939#comment-13772939
 ] 

Michael McCandless commented on LUCENE-5123:


Thanks Han, I do like the new API better ...

I don't think we need go get rid of merge() for stored fields / term vectors, 
at least not yet ...

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Shai Erera
Hi

In an attempt to understand how to do document-level boosting (following
this thread
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201302.mbox/%3c51221bbf.8040...@fastmail.fm%3E),
I experimented with the 3 easiest ways that currently exist in Lucene (that
I'm aware of, maybe there are more): two of them use CustomScoreQuery and
the third uses the new Expression module.

I created a simple index with two documents with the field f and value
test doc (for both). I also added the field boost with values 1L
(doc-0) and 2L (doc-1). I then searched using each method and got different
results w.r.t. computed scores:

*CustomScoreProvider
*
As far as I understand, you should override
CustomScoreQuery.getCustomScoreProvider if you want to apply a different
function than score*boost (e.g score^boost) to the documents. Nevertheless,
nothing prevents you from giving a CustomScoreProvider which reads from the
'boost' field and does the multiplication (since it receives the
AtomicReaderContext). I wrote one and the result scores are:

search CustomScoreProvider
doc=1, score=0.74316853
doc=0, score=0.37158427

*FunctionQuery
*
I wasn't able to find a ValueSource which reads from an NDV field, so I
wrote a NumericDocValuesFieldSource which returns a LongValues that reads
from the NumericDocValues (if there isn't indeed one, I can open an issue
to add it). The result scores are:

search NumericDocValuesFieldSource
doc=1, score=0.32644913
doc=0, score=0.16322456

*Expression
*
I tried the new module, following TestDemoExpression and compiled the
expression using this code:

Expression expr = JavascriptCompiler.compile(_score * boost);
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField(_score, SortField.Type.SCORE));
bindings.add(new SortField(boost, SortField.Type.LONG));

The result scores are:

search Expression
doc=1, score=NaN, field=0.7431685328483582
doc=0, score=NaN, field=0.3715842664241791

As you can see, both CustomScoreProvider and Expression methods return same
scores for the docs, while the FunctionQuery method returns different
scores. The reason is that when using FunctionQuery, the scores of the
ValueSources are multiplied by queryWeight, which seems correct to me.

Expression is more about sorting than scoring as far as I understand (for
instance, the result FieldDocs.score is NaN), so I'm ok with it not
factoring in queryWeight (maybe we could implement such expression?). What
I like about it is that I didn't have to implement anything (e.g.
NumericDocValuesFieldSource or CSProvider) - it just worked. And if all you
care about is the order of results, it gets the job done.

So between FunctionQuery and CustomScoreProvider, which is the correct way
to boost a document by an NDV field? I think FunctionQuery?

Separately, I think we can improve CSQ.getCSProvider jdocs. They say: The
default implementation returns a default implementation as specified in the
docs of CustomScoreProvider but the jdocs of CSP don't mention it
multiplies.

Shai


Re: [VOTE] Release Lucene/Solr 4.5.0 RC1

2013-09-20 Thread Shai Erera
+1, smoke tester is happy for me (Windows 7, 64-bit).

Shai


On Fri, Sep 20, 2013 at 10:26 AM, Adrien Grand jpou...@gmail.com wrote:

 On Fri, Sep 20, 2013 at 9:20 AM, Adrien Grand jpou...@gmail.com wrote:
  I'll backport the commit to lucene_solr_4_5.

 Oh, I see you have already done that, thanks!

 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Shai Erera
I think that I actually abused CSProvider, and it's not supposed to be used
that way. It really is supposed to be used when you want to apply different
combination on the two scores. While nothing prevents you from reading the
scores from a different source, it's better to implement that capability
through a custom ValueSource. So maybe we should put such a note on
CSProvider jdocs...

Shai


On Fri, Sep 20, 2013 at 3:01 PM, Shai Erera ser...@gmail.com wrote:

 Hi

 In an attempt to understand how to do document-level boosting (following
 this thread
 http://mail-archives.apache.org/mod_mbox/lucene-java-user/201302.mbox/%3c51221bbf.8040...@fastmail.fm%3E),
 I experimented with the 3 easiest ways that currently exist in Lucene
 (that I'm aware of, maybe there are more): two of them use CustomScoreQuery
 and the third uses the new Expression module.

 I created a simple index with two documents with the field f and value
 test doc (for both). I also added the field boost with values 1L
 (doc-0) and 2L (doc-1). I then searched using each method and got different
 results w.r.t. computed scores:

 *CustomScoreProvider
 *
 As far as I understand, you should override
 CustomScoreQuery.getCustomScoreProvider if you want to apply a different
 function than score*boost (e.g score^boost) to the documents.
 Nevertheless, nothing prevents you from giving a CustomScoreProvider which
 reads from the 'boost' field and does the multiplication (since it receives
 the AtomicReaderContext). I wrote one and the result scores are:

 search CustomScoreProvider
 doc=1, score=0.74316853
 doc=0, score=0.37158427

 *FunctionQuery
 *
 I wasn't able to find a ValueSource which reads from an NDV field, so I
 wrote a NumericDocValuesFieldSource which returns a LongValues that reads
 from the NumericDocValues (if there isn't indeed one, I can open an issue
 to add it). The result scores are:

 search NumericDocValuesFieldSource
 doc=1, score=0.32644913
 doc=0, score=0.16322456

 *Expression
 *
 I tried the new module, following TestDemoExpression and compiled the
 expression using this code:

 Expression expr = JavascriptCompiler.compile(_score * boost);
 SimpleBindings bindings = new SimpleBindings();
 bindings.add(new SortField(_score, SortField.Type.SCORE));
 bindings.add(new SortField(boost, SortField.Type.LONG));

 The result scores are:

 search Expression
 doc=1, score=NaN, field=0.7431685328483582
 doc=0, score=NaN, field=0.3715842664241791

 As you can see, both CustomScoreProvider and Expression methods return
 same scores for the docs, while the FunctionQuery method returns different
 scores. The reason is that when using FunctionQuery, the scores of the
 ValueSources are multiplied by queryWeight, which seems correct to me.

 Expression is more about sorting than scoring as far as I understand (for
 instance, the result FieldDocs.score is NaN), so I'm ok with it not
 factoring in queryWeight (maybe we could implement such expression?). What
 I like about it is that I didn't have to implement anything (e.g.
 NumericDocValuesFieldSource or CSProvider) - it just worked. And if all you
 care about is the order of results, it gets the job done.

 So between FunctionQuery and CustomScoreProvider, which is the correct way
 to boost a document by an NDV field? I think FunctionQuery?

 Separately, I think we can improve CSQ.getCSProvider jdocs. They say: The
 default implementation returns a default implementation as specified in
 the docs of CustomScoreProvider but the jdocs of CSP don't mention it
 multiplies.

 Shai



Re: need doc assist for 4.5: clarify SOLR-4221 changes regarding routeField vs router.field ?

2013-09-20 Thread Cassandra Targett
I notice that Noble updated the Collections API page with the
information that was needed - thank you.

Based on that, I updated this page:
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

Yonik or Noble, if you one of you would look the section on Document
Routing over, I would appreciate it. I adapted the content that was
there to fit these new options, but am not entirely sure I have it
right.

Thanks,
Cassandra

On Thu, Sep 19, 2013 at 12:41 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 Yonik / Noble / Shalin in particular:

 we need clarification here on these changes for 4.5...

 https://issues.apache.org/jira/browse/SOLR-4221?focusedCommentId=13769675page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769675


 Cassandra and i were talking on IRC this morning about the satate of the ref
 guide -- our opinion is that in terms of changes for 4.5, things look pretty
 good and we could probably go ahead and do an RC in parallel ith the code
 RC1 that Adrien is currently re-spinning (which might even allow us to
 release/announce the ref guide in the same email as the code release itself)

 But the one blocker is this change discussed at the end of SOLR-4221
 regarding teh routeField param.

 Noble previously updated the ref guide documentation to include
 routerField...

 https://cwiki.apache.org/confluence/display/solr/Collections+API

 ...but it's not currently clear to cassandra or myself if that documentation
 is still accurate -- should the refrences to routeField be replaced by
 router.field ?  does hte documentation need to generally be improved to
 refer to supporting a generic set of router.* params that are user
 defined?

 throw us a bone here guys.  Docs on new features are probably the most
 important part of the user guide updates, and inaccurate docs on new
 features is worse then no doc at all.



 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 386 - Failure

2013-09-20 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/386/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestNumericDocValuesUpdates.testStressMultiThreading

Error Message:
Captured an uncaught exception in thread: Thread[id=3130, name=UpdateThread-1, 
state=RUNNABLE, group=TGRP-TestNumericDocValuesUpdates]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=3130, name=UpdateThread-1, state=RUNNABLE, 
group=TGRP-TestNumericDocValuesUpdates]
Caused by: java.lang.OutOfMemoryError: Java heap space
at __randomizedtesting.SeedInfo.seed([EA6A02F6820CB4B7]:0)
at java.util.Arrays.copyOf(Arrays.java:2367)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at java.util.AbstractCollection.toString(AbstractCollection.java:450)
at java.lang.String.valueOf(String.java:2854)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4239)
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2834)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2922)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2897)
at 
org.apache.lucene.index.TestNumericDocValuesUpdates$2.run(TestNumericDocValuesUpdates.java:957)




Build Log:
[...truncated 1672 lines...]
   [junit4] Suite: org.apache.lucene.index.TestNumericDocValuesUpdates
   [junit4]   2 9 20, 0025 3:45:48 ?? 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
   [junit4]   2 WARNING: Uncaught exception in thread: 
Thread[UpdateThread-1,5,TGRP-TestNumericDocValuesUpdates]
   [junit4]   2 java.lang.OutOfMemoryError: Java heap space
   [junit4]   2at 
__randomizedtesting.SeedInfo.seed([EA6A02F6820CB4B7]:0)
   [junit4]   2at java.util.Arrays.copyOf(Arrays.java:2367)
   [junit4]   2at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
   [junit4]   2at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
   [junit4]   2at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
   [junit4]   2at 
java.lang.StringBuilder.append(StringBuilder.java:132)
   [junit4]   2at 
java.lang.StringBuilder.append(StringBuilder.java:128)
   [junit4]   2at 
java.util.AbstractCollection.toString(AbstractCollection.java:450)
   [junit4]   2at java.lang.String.valueOf(String.java:2854)
   [junit4]   2at 
java.lang.StringBuilder.append(StringBuilder.java:128)
   [junit4]   2at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4239)
   [junit4]   2at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2834)
   [junit4]   2at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2922)
   [junit4]   2at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2897)
   [junit4]   2at 
org.apache.lucene.index.TestNumericDocValuesUpdates$2.run(TestNumericDocValuesUpdates.java:957)
   [junit4]   2 
   [junit4]   2 9 20, 0025 3:47:09 ?? 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
   [junit4]   2 WARNING: Uncaught exception in thread: 
Thread[UpdateThread-3,5,TGRP-TestNumericDocValuesUpdates]
   [junit4]   2 java.lang.IllegalStateException: this writer hit an 
OutOfMemoryError; cannot commit
   [junit4]   2at 
__randomizedtesting.SeedInfo.seed([EA6A02F6820CB4B7]:0)
   [junit4]   2at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2750)
   [junit4]   2at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2922)
   [junit4]   2at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2897)
   [junit4]   2at 
org.apache.lucene.index.TestNumericDocValuesUpdates$2.run(TestNumericDocValuesUpdates.java:957)
   [junit4]   2 
   [junit4]   2 9 20, 0025 3:47:32 ?? 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
   [junit4]   2 WARNING: Uncaught exception in thread: 
Thread[UpdateThread-8,5,TGRP-TestNumericDocValuesUpdates]
   [junit4]   2 java.lang.IllegalStateException: this writer hit an 
OutOfMemoryError; cannot commit
   [junit4]   2at 
__randomizedtesting.SeedInfo.seed([EA6A02F6820CB4B7]:0)
   [junit4]   2at 

Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Robert Muir
On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote:

 Expression
 I tried the new module, following TestDemoExpression and compiled the
 expression using this code:

 Expression expr = JavascriptCompiler.compile(_score * boost);
 SimpleBindings bindings = new SimpleBindings();
 bindings.add(new SortField(_score, SortField.Type.SCORE));
 bindings.add(new SortField(boost, SortField.Type.LONG));

 The result scores are:

 search Expression
 doc=1, score=NaN, field=0.7431685328483582
 doc=0, score=NaN, field=0.3715842664241791

 As you can see, both CustomScoreProvider and Expression methods return same
 scores for the docs, while the FunctionQuery method returns different
 scores. The reason is that when using FunctionQuery, the scores of the
 ValueSources are multiplied by queryWeight, which seems correct to me.

 Expression is more about sorting than scoring as far as I understand (for
 instance, the result FieldDocs.score is NaN)

Why does that come as a surprise to you?  Pass true to indexsearcher
to get the documents score back here.

=== Release 2.9.0 2009-09-23 ===

Changes in backwards compatibility policy

LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer
computes a document score for each hit by default.
... (Shai Erera via Mike McCandless)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: need doc assist for 4.5: clarify SOLR-4221 changes regarding routeField vs router.field ?

2013-09-20 Thread Yonik Seeley
OK, I was just reviewing some of the router code changes (better late
than never...)
ImplicitDocIdRouter has this:
  if(shard == null) shard =params.get(_shard_); //deperecated
for back compat
Also, it looks like route.field can be specified for the compositeId
rotuer as well.
I'll update that page.

-Yonik
http://lucidworks.com


On Fri, Sep 20, 2013 at 8:41 AM, Cassandra Targett
casstarg...@gmail.com wrote:
 I notice that Noble updated the Collections API page with the
 information that was needed - thank you.

 Based on that, I updated this page:
 https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

 Yonik or Noble, if you one of you would look the section on Document
 Routing over, I would appreciate it. I adapted the content that was
 there to fit these new options, but am not entirely sure I have it
 right.

 Thanks,
 Cassandra

 On Thu, Sep 19, 2013 at 12:41 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:

 Yonik / Noble / Shalin in particular:

 we need clarification here on these changes for 4.5...

 https://issues.apache.org/jira/browse/SOLR-4221?focusedCommentId=13769675page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769675


 Cassandra and i were talking on IRC this morning about the satate of the ref
 guide -- our opinion is that in terms of changes for 4.5, things look pretty
 good and we could probably go ahead and do an RC in parallel ith the code
 RC1 that Adrien is currently re-spinning (which might even allow us to
 release/announce the ref guide in the same email as the code release itself)

 But the one blocker is this change discussed at the end of SOLR-4221
 regarding teh routeField param.

 Noble previously updated the ref guide documentation to include
 routerField...

 https://cwiki.apache.org/confluence/display/solr/Collections+API

 ...but it's not currently clear to cassandra or myself if that documentation
 is still accurate -- should the refrences to routeField be replaced by
 router.field ?  does hte documentation need to generally be improved to
 refer to supporting a generic set of router.* params that are user
 defined?

 throw us a bone here guys.  Docs on new features are probably the most
 important part of the user guide updates, and inaccurate docs on new
 features is worse then no doc at all.



 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5228) IndexWriter.addIndexes copies raw files but acquires no locks

2013-09-20 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5228:
---

 Summary: IndexWriter.addIndexes copies raw files but acquires no 
locks
 Key: LUCENE-5228
 URL: https://issues.apache.org/jira/browse/LUCENE-5228
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


I see stuff like: merge problem with lucene 3 and 4 indices (from solr users 
list), and cannot even think how to respond to these users because so many 
things can go wrong with IndexWriter.addIndexes(Directory)

it currently has in its javadocs:

NOTE: the index in each Directory must not be changed (opened by a writer) 
while this method is running. This method does not acquire a write lock in each 
input Directory, so it is up to the caller to enforce this. 

This method should be acquiring locks: its copying *RAW FILES*. Otherwise we 
should remove it. If someone doesnt like that, or is mad because its 10ns 
slower, they can use NoLockFactory. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: need doc assist for 4.5: clarify SOLR-4221 changes regarding routeField vs router.field ?

2013-09-20 Thread Yonik Seeley
On Fri, Sep 20, 2013 at 10:15 AM, Yonik Seeley yo...@lucidworks.com wrote:
 Also, it looks like route.field can be specified for the compositeId
 rotuer as well.

Actually, I decided to leave that out of the docs since on further
review the implementation looks incorrect (or perhaps I don't
understand the intended API).
We can doc it in a future release once it's nailed down.

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Shai Erera
Yes, you're right, but that's unrelated to this thread. I passed
doScore=true and the scores come out the same, meaning Expression didn't
affect the actual score, only the sort-by value (which is ok).

search Expression
doc=1, score=0.37158427, field=0.7431685328483582
doc=0, score=0.37158427, field=0.3715842664241791

Shai


On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote:

 On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote:
 
  Expression
  I tried the new module, following TestDemoExpression and compiled the
  expression using this code:
 
  Expression expr = JavascriptCompiler.compile(_score * boost);
  SimpleBindings bindings = new SimpleBindings();
  bindings.add(new SortField(_score, SortField.Type.SCORE));
  bindings.add(new SortField(boost, SortField.Type.LONG));
 
  The result scores are:
 
  search Expression
  doc=1, score=NaN, field=0.7431685328483582
  doc=0, score=NaN, field=0.3715842664241791
 
  As you can see, both CustomScoreProvider and Expression methods return
 same
  scores for the docs, while the FunctionQuery method returns different
  scores. The reason is that when using FunctionQuery, the scores of the
  ValueSources are multiplied by queryWeight, which seems correct to me.
 
  Expression is more about sorting than scoring as far as I understand (for
  instance, the result FieldDocs.score is NaN)

 Why does that come as a surprise to you?  Pass true to indexsearcher
 to get the documents score back here.

 === Release 2.9.0 2009-09-23 ===

 Changes in backwards compatibility policy

 LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer
 computes a document score for each hit by default.
 ... (Shai Erera via Mike McCandless)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773028#comment-13773028
 ] 

Robert Muir commented on LUCENE-5123:
-

The only reason merge() exists there is so they can implement some bulk 
merging optimizations?

Can we remove these optimizations? Has there ever been a benchmark showing they 
help at all?

We shouldnt have such scary code in lucene because it looks faster. Every 
time I look at infostreams from merge, its completely dominated by postings and 
other things.

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5247) Support for custom per core properties missing with new-style solr.xml

2013-09-20 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-5247:


Assignee: Erick Erickson

 Support for custom per core properties missing with new-style solr.xml
 --

 Key: SOLR-5247
 URL: https://issues.apache.org/jira/browse/SOLR-5247
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.4
Reporter: Chris F
Assignee: Erick Erickson
Priority: Trivial
  Labels: 4.4, core.properties, discovery, new-style, property, 
 solr.xml

 This part has been solved. See comments
 When using old-style solr.xml I can define custom properties per core like so:
 {code:xml}
 cores adminPath=/admin/cores defaultCoreName=core1
   core name=core1 instanceDir=core1 config=solrconfig.xml 
 schema=schema.xml
 property name=foo value=bar /
   /core
 /cores
 {code}
 I can then use the property foo in schema.xml or solrconfig.xml like this:
 {code:xml}
 str name=foo${foo}/str
 {code}
 After switching to the new-style solr.xml with separate core.properties files 
 per core this does not work anymore.
 I guess the corresponding core.properties file should look like this:
 {code}
 config=solrconfig.xml
 name=core1
 schema=schema.xml
 foo=bar
 {code}
 (I also tried property.foo=bar)
 With that, I get the following error when reloading the core:
 {code}
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
 system property or default value specified for foo value:${foo}
 {code}
 I can successfully reload the core if I use $\{foo:undefined\} but the value 
 of foo will always be undefined then.
 When trying to create a new core with an url like this:
 {code}
 http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
 {code}
 the property foo will not appear in core.properties. However, I can use it 
 in schema.xml. But only until restarting the servlet container. After that, 
 the property is lost.
 Possibly related to 
 [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5247) Support for custom per core properties missing with new-style solr.xml

2013-09-20 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773093#comment-13773093
 ] 

Erick Erickson commented on SOLR-5247:
--

[~romseygeek] Do you have any insights re: whether this is still an issue in 
4.5? We've both been in this code recently.

I'll assign it to myself to track it, but I don't have many cycles right now, 
feel free to grab it if you do.



 Support for custom per core properties missing with new-style solr.xml
 --

 Key: SOLR-5247
 URL: https://issues.apache.org/jira/browse/SOLR-5247
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.4
Reporter: Chris F
Priority: Trivial
  Labels: 4.4, core.properties, discovery, new-style, property, 
 solr.xml

 This part has been solved. See comments
 When using old-style solr.xml I can define custom properties per core like so:
 {code:xml}
 cores adminPath=/admin/cores defaultCoreName=core1
   core name=core1 instanceDir=core1 config=solrconfig.xml 
 schema=schema.xml
 property name=foo value=bar /
   /core
 /cores
 {code}
 I can then use the property foo in schema.xml or solrconfig.xml like this:
 {code:xml}
 str name=foo${foo}/str
 {code}
 After switching to the new-style solr.xml with separate core.properties files 
 per core this does not work anymore.
 I guess the corresponding core.properties file should look like this:
 {code}
 config=solrconfig.xml
 name=core1
 schema=schema.xml
 foo=bar
 {code}
 (I also tried property.foo=bar)
 With that, I get the following error when reloading the core:
 {code}
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: No 
 system property or default value specified for foo value:${foo}
 {code}
 I can successfully reload the core if I use $\{foo:undefined\} but the value 
 of foo will always be undefined then.
 When trying to create a new core with an url like this:
 {code}
 http://localhost:8080/solr/admin/cores?action=CREATEname=core2instanceDir=core2config=solrconfig.xmlschema=schema.xmlproperty.foo=barpersist=true
 {code}
 the property foo will not appear in core.properties. However, I can use it 
 in schema.xml. But only until restarting the servlet container. After that, 
 the property is lost.
 Possibly related to 
 [SOLR-5208|https://issues.apache.org/jira/browse/SOLR-5208]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Shai Erera
Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think it's
important that we have one.

Shai


On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote:

 thats what it does. its more like a computed field. and you can sort
 by more than one of them.

 please see the JIRA issue for a description of the differences between
 function queries.

 On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote:
  Yes, you're right, but that's unrelated to this thread. I passed
  doScore=true and the scores come out the same, meaning Expression didn't
  affect the actual score, only the sort-by value (which is ok).
 
  search Expression
  doc=1, score=0.37158427, field=0.7431685328483582
  doc=0, score=0.37158427, field=0.3715842664241791
 
  Shai
 
 
  On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote:
 
  On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote:
  
   Expression
   I tried the new module, following TestDemoExpression and compiled the
   expression using this code:
  
   Expression expr = JavascriptCompiler.compile(_score * boost);
   SimpleBindings bindings = new SimpleBindings();
   bindings.add(new SortField(_score, SortField.Type.SCORE));
   bindings.add(new SortField(boost, SortField.Type.LONG));
  
   The result scores are:
  
   search Expression
   doc=1, score=NaN, field=0.7431685328483582
   doc=0, score=NaN, field=0.3715842664241791
  
   As you can see, both CustomScoreProvider and Expression methods return
   same
   scores for the docs, while the FunctionQuery method returns different
   scores. The reason is that when using FunctionQuery, the scores of the
   ValueSources are multiplied by queryWeight, which seems correct to me.
  
   Expression is more about sorting than scoring as far as I understand
   (for
   instance, the result FieldDocs.score is NaN)
 
  Why does that come as a surprise to you?  Pass true to indexsearcher
  to get the documents score back here.
 
  === Release 2.9.0 2009-09-23 ===
 
  Changes in backwards compatibility policy
 
  LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer
  computes a document score for each hit by default.
  ... (Shai Erera via Mike McCandless)
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Created] (SOLR-5258) router.field support for compositeId router

2013-09-20 Thread Yonik Seeley (JIRA)
Yonik Seeley created SOLR-5258:
--

 Summary: router.field support for compositeId router
 Key: SOLR-5258
 URL: https://issues.apache.org/jira/browse/SOLR-5258
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Priority: Minor


Although there is code to support router.field for CompositeId, it only 
calculates a simple (non-compound) hash, which isn't that useful unless you 
don't use compound ids (this is why I changed the docs to say router.field is 
only supported for the implicit router).  The field value should either
- be used to calculate the full compound hash
- be used to calculate the prefix bits, and the uniqueKey will still be used 
for the lower bits.

For consistency, I'd suggest the former.
If we want to be able to specify a separate field that is only used for the 
prefix bits, then perhaps that should be router.prefixField

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Robert Muir
thats what it does. its more like a computed field. and you can sort
by more than one of them.

please see the JIRA issue for a description of the differences between
function queries.

On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote:
 Yes, you're right, but that's unrelated to this thread. I passed
 doScore=true and the scores come out the same, meaning Expression didn't
 affect the actual score, only the sort-by value (which is ok).

 search Expression
 doc=1, score=0.37158427, field=0.7431685328483582
 doc=0, score=0.37158427, field=0.3715842664241791

 Shai


 On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote:

 On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote:
 
  Expression
  I tried the new module, following TestDemoExpression and compiled the
  expression using this code:
 
  Expression expr = JavascriptCompiler.compile(_score * boost);
  SimpleBindings bindings = new SimpleBindings();
  bindings.add(new SortField(_score, SortField.Type.SCORE));
  bindings.add(new SortField(boost, SortField.Type.LONG));
 
  The result scores are:
 
  search Expression
  doc=1, score=NaN, field=0.7431685328483582
  doc=0, score=NaN, field=0.3715842664241791
 
  As you can see, both CustomScoreProvider and Expression methods return
  same
  scores for the docs, while the FunctionQuery method returns different
  scores. The reason is that when using FunctionQuery, the scores of the
  ValueSources are multiplied by queryWeight, which seems correct to me.
 
  Expression is more about sorting than scoring as far as I understand
  (for
  instance, the result FieldDocs.score is NaN)

 Why does that come as a surprise to you?  Pass true to indexsearcher
 to get the documents score back here.

 === Release 2.9.0 2009-09-23 ===

 Changes in backwards compatibility policy

 LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer
 computes a document score for each hit by default.
 ... (Shai Erera via Mike McCandless)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5229) remove Collector specializations

2013-09-20 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5229:
---

 Summary: remove Collector specializations
 Key: LUCENE-5229
 URL: https://issues.apache.org/jira/browse/LUCENE-5229
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir


There are too many collector specializations (i think 16 or 18?) and too many 
crazy defaults like returning NaN scores to the user by default in 
indexsearcher.

this confuses hotspot (I will ignore any benchmarks posted here where only one 
type of sort is running thru the JVM, thats unrealistic), and confuses users 
with stuff like NaN scores coming back by default.

I have two concerete suggestions:

* nuke doMaxScores. its implicit from doScores. This is just over the top. This 
should also halve the collectors.
* change doScores to true by default in indexsearcher. since shai was confused 
by the NaNs by default, and he added this stuff to lucene, that says 
*everything* about how wrong this default is. Someone who *does* understand 
what it does can simply pass false.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: need doc assist for 4.5: clarify SOLR-4221 changes regarding routeField vs router.field ?

2013-09-20 Thread Yonik Seeley
On Fri, Sep 20, 2013 at 10:39 AM, Yonik Seeley yo...@lucidworks.com wrote:
 On Fri, Sep 20, 2013 at 10:15 AM, Yonik Seeley yo...@lucidworks.com wrote:
 Also, it looks like route.field can be specified for the compositeId
 rotuer as well.

 Actually, I decided to leave that out of the docs since on further
 review the implementation looks incorrect (or perhaps I don't
 understand the intended API).
 We can doc it in a future release once it's nailed down.

I opened this issue to deal with router.field in compositeId router
https://issues.apache.org/jira/browse/SOLR-5258

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Robert Muir
why dont you look and see how expressions is doing it?

On Fri, Sep 20, 2013 at 11:39 AM, Shai Erera ser...@gmail.com wrote:
 Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think it's
 important that we have one.

 Shai


 On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote:

 thats what it does. its more like a computed field. and you can sort
 by more than one of them.

 please see the JIRA issue for a description of the differences between
 function queries.

 On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote:
  Yes, you're right, but that's unrelated to this thread. I passed
  doScore=true and the scores come out the same, meaning Expression didn't
  affect the actual score, only the sort-by value (which is ok).
 
  search Expression
  doc=1, score=0.37158427, field=0.7431685328483582
  doc=0, score=0.37158427, field=0.3715842664241791
 
  Shai
 
 
  On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com wrote:
 
  On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com wrote:
  
   Expression
   I tried the new module, following TestDemoExpression and compiled the
   expression using this code:
  
   Expression expr = JavascriptCompiler.compile(_score * boost);
   SimpleBindings bindings = new SimpleBindings();
   bindings.add(new SortField(_score, SortField.Type.SCORE));
   bindings.add(new SortField(boost, SortField.Type.LONG));
  
   The result scores are:
  
   search Expression
   doc=1, score=NaN, field=0.7431685328483582
   doc=0, score=NaN, field=0.3715842664241791
  
   As you can see, both CustomScoreProvider and Expression methods
   return
   same
   scores for the docs, while the FunctionQuery method returns different
   scores. The reason is that when using FunctionQuery, the scores of
   the
   ValueSources are multiplied by queryWeight, which seems correct to
   me.
  
   Expression is more about sorting than scoring as far as I understand
   (for
   instance, the result FieldDocs.score is NaN)
 
  Why does that come as a surprise to you?  Pass true to indexsearcher
  to get the documents score back here.
 
  === Release 2.9.0 2009-09-23
  ===
 
  Changes in backwards compatibility policy
 
  LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer
  computes a document score for each hit by default.
  ... (Shai Erera via Mike McCandless)
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5229) remove Collector specializations

2013-09-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773097#comment-13773097
 ] 

Shai Erera commented on LUCENE-5229:


bq. nuke doMaxScores. its implicit from doScores

+1, if you ask to compute scores, you might as well get maxScore. I doubt that 
specialization is so important.

bq. change doScores to true by default in indexsearcher

I'm not sure about it. I wasn't confused by the fact that I received NaN, only 
pointed out that when you use Expression, the result is not in the 'score' 
field, but the 'field' field. I think that in most cases, if you sort, you're 
interested in the sort-by value, not the score. Not sure if it buys performance 
or not, but I think it's just redundant work.

 remove Collector specializations
 

 Key: LUCENE-5229
 URL: https://issues.apache.org/jira/browse/LUCENE-5229
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 There are too many collector specializations (i think 16 or 18?) and too many 
 crazy defaults like returning NaN scores to the user by default in 
 indexsearcher.
 this confuses hotspot (I will ignore any benchmarks posted here where only 
 one type of sort is running thru the JVM, thats unrealistic), and confuses 
 users with stuff like NaN scores coming back by default.
 I have two concerete suggestions:
 * nuke doMaxScores. its implicit from doScores. This is just over the top. 
 This should also halve the collectors.
 * change doScores to true by default in indexsearcher. since shai was 
 confused by the NaNs by default, and he added this stuff to lucene, that says 
 *everything* about how wrong this default is. Someone who *does* understand 
 what it does can simply pass false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Shai Erera
What do Expressions have to do here? Do they replace CustomScoreQuery?
Maybe they should, I don't know. But today, if you want to use CSQ, to
boost a document by an NDV field, you need to write a ValueSource which
reads from the field. And that's the object that I don't see.

Maybe you want to say that Expressions will eventually replace CSQ, and so
it's moot to add a NumericDVFieldSource to Lucene? Or we want to document
on CSQ that you should really consider using Expressions?

Shai


On Fri, Sep 20, 2013 at 6:41 PM, Robert Muir rcm...@gmail.com wrote:

 why dont you look and see how expressions is doing it?

 On Fri, Sep 20, 2013 at 11:39 AM, Shai Erera ser...@gmail.com wrote:
  Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think
 it's
  important that we have one.
 
  Shai
 
 
  On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote:
 
  thats what it does. its more like a computed field. and you can sort
  by more than one of them.
 
  please see the JIRA issue for a description of the differences between
  function queries.
 
  On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote:
   Yes, you're right, but that's unrelated to this thread. I passed
   doScore=true and the scores come out the same, meaning Expression
 didn't
   affect the actual score, only the sort-by value (which is ok).
  
   search Expression
   doc=1, score=0.37158427, field=0.7431685328483582
   doc=0, score=0.37158427, field=0.3715842664241791
  
   Shai
  
  
   On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com
 wrote:
  
   On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com
 wrote:
   
Expression
I tried the new module, following TestDemoExpression and compiled
 the
expression using this code:
   
Expression expr = JavascriptCompiler.compile(_score * boost);
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField(_score, SortField.Type.SCORE));
bindings.add(new SortField(boost, SortField.Type.LONG));
   
The result scores are:
   
search Expression
doc=1, score=NaN, field=0.7431685328483582
doc=0, score=NaN, field=0.3715842664241791
   
As you can see, both CustomScoreProvider and Expression methods
return
same
scores for the docs, while the FunctionQuery method returns
 different
scores. The reason is that when using FunctionQuery, the scores of
the
ValueSources are multiplied by queryWeight, which seems correct to
me.
   
Expression is more about sorting than scoring as far as I
 understand
(for
instance, the result FieldDocs.score is NaN)
  
   Why does that come as a surprise to you?  Pass true to indexsearcher
   to get the documents score back here.
  
   === Release 2.9.0 2009-09-23
   ===
  
   Changes in backwards compatibility policy
  
   LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer
   computes a document score for each hit by default.
   ... (Shai Erera via Mike McCandless)
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
  
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-5229) remove Collector specializations

2013-09-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773108#comment-13773108
 ] 

Robert Muir commented on LUCENE-5229:
-

{quote}
I wasn't confused by the fact that I received NaN, only pointed out that when 
you use Expression, the result is not in the 'score' field, but the 'field' 
field.
{quote}

You invoked IndexSearcher.search(query, filter, n, *Sort*) and you were 
surprised that the result of the sort goes there?

I think this kinda stuff only furthers to reinforce my argument that this stuff 
is way too specialized and complicated.


 remove Collector specializations
 

 Key: LUCENE-5229
 URL: https://issues.apache.org/jira/browse/LUCENE-5229
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 There are too many collector specializations (i think 16 or 18?) and too many 
 crazy defaults like returning NaN scores to the user by default in 
 indexsearcher.
 this confuses hotspot (I will ignore any benchmarks posted here where only 
 one type of sort is running thru the JVM, thats unrealistic), and confuses 
 users with stuff like NaN scores coming back by default.
 I have two concerete suggestions:
 * nuke doMaxScores. its implicit from doScores. This is just over the top. 
 This should also halve the collectors.
 * change doScores to true by default in indexsearcher. since shai was 
 confused by the NaNs by default, and he added this stuff to lucene, that says 
 *everything* about how wrong this default is. Someone who *does* understand 
 what it does can simply pass false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Robert Muir
You asked about how to access a NumericDocValues field from a
valuesource in lucene.

Yet you showed an example where you did just this with expressions, so
I'm just recommending you look at expressions/ source code (they use
valuesource under the hood) to see how its done!

On Fri, Sep 20, 2013 at 11:49 AM, Shai Erera ser...@gmail.com wrote:
 What do Expressions have to do here? Do they replace CustomScoreQuery? Maybe
 they should, I don't know. But today, if you want to use CSQ, to boost a
 document by an NDV field, you need to write a ValueSource which reads from
 the field. And that's the object that I don't see.

 Maybe you want to say that Expressions will eventually replace CSQ, and so
 it's moot to add a NumericDVFieldSource to Lucene? Or we want to document on
 CSQ that you should really consider using Expressions?

 Shai


 On Fri, Sep 20, 2013 at 6:41 PM, Robert Muir rcm...@gmail.com wrote:

 why dont you look and see how expressions is doing it?

 On Fri, Sep 20, 2013 at 11:39 AM, Shai Erera ser...@gmail.com wrote:
  Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think
  it's
  important that we have one.
 
  Shai
 
 
  On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com wrote:
 
  thats what it does. its more like a computed field. and you can sort
  by more than one of them.
 
  please see the JIRA issue for a description of the differences between
  function queries.
 
  On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com wrote:
   Yes, you're right, but that's unrelated to this thread. I passed
   doScore=true and the scores come out the same, meaning Expression
   didn't
   affect the actual score, only the sort-by value (which is ok).
  
   search Expression
   doc=1, score=0.37158427, field=0.7431685328483582
   doc=0, score=0.37158427, field=0.3715842664241791
  
   Shai
  
  
   On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com
   wrote:
  
   On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com
   wrote:
   
Expression
I tried the new module, following TestDemoExpression and compiled
the
expression using this code:
   
Expression expr = JavascriptCompiler.compile(_score *
boost);
SimpleBindings bindings = new SimpleBindings();
bindings.add(new SortField(_score, SortField.Type.SCORE));
bindings.add(new SortField(boost, SortField.Type.LONG));
   
The result scores are:
   
search Expression
doc=1, score=NaN, field=0.7431685328483582
doc=0, score=NaN, field=0.3715842664241791
   
As you can see, both CustomScoreProvider and Expression methods
return
same
scores for the docs, while the FunctionQuery method returns
different
scores. The reason is that when using FunctionQuery, the scores of
the
ValueSources are multiplied by queryWeight, which seems correct to
me.
   
Expression is more about sorting than scoring as far as I
understand
(for
instance, the result FieldDocs.score is NaN)
  
   Why does that come as a surprise to you?  Pass true to indexsearcher
   to get the documents score back here.
  
   === Release 2.9.0 2009-09-23
   ===
  
   Changes in backwards compatibility policy
  
   LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no longer
   computes a document score for each hit by default.
   ... (Shai Erera via Mike McCandless)
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
  
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5230) CJKAnalyzer can't split ;

2013-09-20 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773147#comment-13773147
 ] 

Littlestar commented on LUCENE-5230:


sorry, I miss reset.
I want to split with ;.

 CJKAnalyzer can't split ;
 ---

 Key: LUCENE-5230
 URL: https://issues.apache.org/jira/browse/LUCENE-5230
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.4
Reporter: Littlestar
Priority: Minor

 @Test
 public void test_AlphaNumAnalyzer() throws IOException {
 Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
 TokenStream token = analyzer.tokenStream(test, new 
 StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
token.reset(); //here
while (token.incrementToken()) {
 final CharTermAttribute termAtt = 
 token.addAttribute(CharTermAttribute.class);
 System.out.println(termAtt.toString());
 }
 analyzer.close();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5230) CJKAnalyzer can't split ;

2013-09-20 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated LUCENE-5230:
---

Description: 
@Test
public void test_AlphaNumAnalyzer() throws IOException {
Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
TokenStream token = analyzer.tokenStream(test, new 
StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
while (token.incrementToken()) {
final CharTermAttribute termAtt = 
token.addAttribute(CharTermAttribute.class);

System.out.println(termAtt.toString());
}
analyzer.close();
}

  was:
@Test
public void test_AlphaNumAnalyzer() throws IOException {
Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
TokenStream token = analyzer.tokenStream(test, new 
StringReader(中国));
//TokenStream token = analyzer.tokenStream(test, new 
StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
while (token.incrementToken()) {
final CharTermAttribute termAtt = 
token.addAttribute(CharTermAttribute.class);

System.out.println(termAtt.toString());
}
analyzer.close();
}



java.lang.NullPointerException
at 
org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923)
at 
org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
at 
org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:171)
at 
org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63)
at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
at 
org.apache.lucene.analysis.cjk.CJKBigramFilter.doNext(CJKBigramFilter.java:240)
at 
org.apache.lucene.analysis.cjk.CJKBigramFilter.incrementToken(CJKBigramFilter.java:169)
at 
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:81)



Summary: CJKAnalyzer can't split ;  (was: CJKAnalyzer 
java.lang.NullPointerException)

 CJKAnalyzer can't split ;
 ---

 Key: LUCENE-5230
 URL: https://issues.apache.org/jira/browse/LUCENE-5230
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.4
Reporter: Littlestar
Priority: Minor

 @Test
 public void test_AlphaNumAnalyzer() throws IOException {
 Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
 TokenStream token = analyzer.tokenStream(test, new 
 StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
 while (token.incrementToken()) {
 final CharTermAttribute termAtt = 
 token.addAttribute(CharTermAttribute.class);
 System.out.println(termAtt.toString());
 }
 analyzer.close();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5230) CJKAnalyzer java.lang.NullPointerException

2013-09-20 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated LUCENE-5230:
---

Summary: CJKAnalyzer java.lang.NullPointerException  (was: CJKAnalyzer 
can't split ;)

fixed. thanks.

I want to split CJK string with ; and CJK bigram, but failed.

 CJKAnalyzer java.lang.NullPointerException
 --

 Key: LUCENE-5230
 URL: https://issues.apache.org/jira/browse/LUCENE-5230
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.4
Reporter: Littlestar
Priority: Minor

 @Test
 public void test_AlphaNumAnalyzer() throws IOException {
 Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
 TokenStream token = analyzer.tokenStream(test, new 
 StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
token.reset(); //here
while (token.incrementToken()) {
 final CharTermAttribute termAtt = 
 token.addAttribute(CharTermAttribute.class);
 System.out.println(termAtt.toString());
 }
 analyzer.close();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5231) better interoperability of expressions/ with valuesource

2013-09-20 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5231:
---

 Summary: better interoperability of expressions/ with valuesource
 Key: LUCENE-5231
 URL: https://issues.apache.org/jira/browse/LUCENE-5231
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-5231.patch

A few things i noticed, while trying to work on e.g. integration of this with 
solr and just playing around:

* No way for a custom Bindings to currently bind the score, as the necessary 
stuff is package private. This adds a simple protected method to Bindings to 
enable this.
* Expression.getValueSource() cannot in general be used easily by other things 
(e.g. interoperate with function queries and so on), because it expects you 
pass it this custom cache. This is an impl detail, its easy to remove this 
restriction and still compute subs only once.
* if you try to bind the score and don't have the scorer setup, you should get 
a clear exception: not NPE.
* Each binding is looked up per-segment, which is bad. we should minimize the 
lookups to only in the CTOR.
* This makes validation considerably simpler and less error-prone, so easy that 
I don't think we need it in the base class either, I moved this to just a 
simple helper method on SimpleBindings. It also found a bug in the equals() 
test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5249) ClassNotFoundException due to white-spaces in solrconfig.xml

2013-09-20 Thread Simon Endele (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773140#comment-13773140
 ] 

Simon Endele commented on SOLR-5249:


Wow, thanks for your quick and detailed response!

I'm using Eclipse with default settings, so I thought this might bother some 
more people like me.

Eclipse inserts line-breaks and white-spaces at other places in the 
solrconfig.xml, which are ignored, for example in the defaults-section of a 
request handler:
{code}str name=hl.flcontent title field1 field2 field3
field4
/str{code}
Ok, this is maybe a bad example as the field list ist parsed.

As far I know class names are Java identifiers, which cannot contain any 
white-spaces. This certain code fragment only handles class names and no files, 
doesn't it?

 ClassNotFoundException due to white-spaces in solrconfig.xml
 

 Key: SOLR-5249
 URL: https://issues.apache.org/jira/browse/SOLR-5249
 Project: Solr
  Issue Type: Bug
Reporter: Simon Endele
Priority: Minor
 Attachments: SolrResourceLoader.java.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Due to auto-formatting by an text editor/IDE there may be line-breaks after 
 class names in the solrconfig.xml, for example:
 {code:xml}searchComponent class=solr.SpellCheckComponent name=suggest
   lst name=spellchecker
   str name=namesuggest/str
   str 
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str 
 name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory
   /str
   [...]
   /lst
 /searchComponent{code}
 This will raise an exception in SolrResourceLoader as the white-spaces are 
 not stripped from the class name:
 {code}Caused by: org.apache.solr.common.SolrException: Error loading class 
 'org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
   '
   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:449)
   at 
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:471)
   at 
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:467)
   at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:102)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:623)
   at 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:601)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:830)
   ... 13 more
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
   
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
   at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433)
   ... 19 more{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5230) CJKAnalyzer java.lang.NullPointerException

2013-09-20 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5230.
-

Resolution: Not A Problem

You must call reset() (and also your loop should have end(), etc).
See the javadocs of TokenStream.



 CJKAnalyzer java.lang.NullPointerException
 --

 Key: LUCENE-5230
 URL: https://issues.apache.org/jira/browse/LUCENE-5230
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.4
Reporter: Littlestar
Priority: Minor

 @Test
 public void test_AlphaNumAnalyzer() throws IOException {
 Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
 TokenStream token = analyzer.tokenStream(test, new 
 StringReader(中国));
 //TokenStream token = analyzer.tokenStream(test, new 
 StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
 while (token.incrementToken()) {
 final CharTermAttribute termAtt = 
 token.addAttribute(CharTermAttribute.class);
 System.out.println(termAtt.toString());
 }
 analyzer.close();
 }
 
 java.lang.NullPointerException
   at 
 org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923)
   at 
 org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
   at 
 org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:171)
   at 
 org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63)
   at 
 org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
   at 
 org.apache.lucene.analysis.cjk.CJKBigramFilter.doNext(CJKBigramFilter.java:240)
   at 
 org.apache.lucene.analysis.cjk.CJKBigramFilter.incrementToken(CJKBigramFilter.java:169)
   at 
 org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5249) ClassNotFoundException due to white-spaces in solrconfig.xml

2013-09-20 Thread Simon Endele (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773140#comment-13773140
 ] 

Simon Endele edited comment on SOLR-5249 at 9/20/13 4:18 PM:
-

Wow, thanks for your quick and detailed response!

I'm using Eclipse with default settings, so I thought this might bother some 
more people like me.

Eclipse inserts line-breaks and white-spaces at other places in the 
solrconfig.xml, which are ignored, for example in the defaults-section of a 
request handler:
{code}str name=hl.flcontent title field1 field2 field3
field4
/str{code}
Ok, this is maybe a bad example as the field list is parsed.

As far I know class names are Java identifiers, which cannot contain any 
white-spaces. This certain code fragment only handles class names and no files, 
doesn't it?

  was (Author: simon.endele):
Wow, thanks for your quick and detailed response!

I'm using Eclipse with default settings, so I thought this might bother some 
more people like me.

Eclipse inserts line-breaks and white-spaces at other places in the 
solrconfig.xml, which are ignored, for example in the defaults-section of a 
request handler:
{code}str name=hl.flcontent title field1 field2 field3
field4
/str{code}
Ok, this is maybe a bad example as the field list ist parsed.

As far I know class names are Java identifiers, which cannot contain any 
white-spaces. This certain code fragment only handles class names and no files, 
doesn't it?
  
 ClassNotFoundException due to white-spaces in solrconfig.xml
 

 Key: SOLR-5249
 URL: https://issues.apache.org/jira/browse/SOLR-5249
 Project: Solr
  Issue Type: Bug
Reporter: Simon Endele
Priority: Minor
 Attachments: SolrResourceLoader.java.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Due to auto-formatting by an text editor/IDE there may be line-breaks after 
 class names in the solrconfig.xml, for example:
 {code:xml}searchComponent class=solr.SpellCheckComponent name=suggest
   lst name=spellchecker
   str name=namesuggest/str
   str 
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str 
 name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory
   /str
   [...]
   /lst
 /searchComponent{code}
 This will raise an exception in SolrResourceLoader as the white-spaces are 
 not stripped from the class name:
 {code}Caused by: org.apache.solr.common.SolrException: Error loading class 
 'org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
   '
   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:449)
   at 
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:471)
   at 
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:467)
   at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:102)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:623)
   at 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:601)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:830)
   ... 13 more
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
   
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
   at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433)
   ... 19 more{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5230) CJKAnalyzer java.lang.NullPointerException

2013-09-20 Thread Littlestar (JIRA)
Littlestar created LUCENE-5230:
--

 Summary: CJKAnalyzer java.lang.NullPointerException
 Key: LUCENE-5230
 URL: https://issues.apache.org/jira/browse/LUCENE-5230
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.4
Reporter: Littlestar
Priority: Minor


@Test
public void test_AlphaNumAnalyzer() throws IOException {
Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
TokenStream token = analyzer.tokenStream(test, new 
StringReader(中国));
//TokenStream token = analyzer.tokenStream(test, new 
StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
while (token.incrementToken()) {
final CharTermAttribute termAtt = 
token.addAttribute(CharTermAttribute.class);

System.out.println(termAtt.toString());
}
analyzer.close();
}



java.lang.NullPointerException
at 
org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923)
at 
org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
at 
org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:171)
at 
org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63)
at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
at 
org.apache.lucene.analysis.cjk.CJKBigramFilter.doNext(CJKBigramFilter.java:240)
at 
org.apache.lucene.analysis.cjk.CJKBigramFilter.incrementToken(CJKBigramFilter.java:169)
at 
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:81)



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5231) better interoperability of expressions/ with valuesource

2013-09-20 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5231:


Attachment: LUCENE-5231.patch

 better interoperability of expressions/ with valuesource
 

 Key: LUCENE-5231
 URL: https://issues.apache.org/jira/browse/LUCENE-5231
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-5231.patch


 A few things i noticed, while trying to work on e.g. integration of this with 
 solr and just playing around:
 * No way for a custom Bindings to currently bind the score, as the necessary 
 stuff is package private. This adds a simple protected method to Bindings to 
 enable this.
 * Expression.getValueSource() cannot in general be used easily by other 
 things (e.g. interoperate with function queries and so on), because it 
 expects you pass it this custom cache. This is an impl detail, its easy to 
 remove this restriction and still compute subs only once.
 * if you try to bind the score and don't have the scorer setup, you should 
 get a clear exception: not NPE.
 * Each binding is looked up per-segment, which is bad. we should minimize the 
 lookups to only in the CTOR.
 * This makes validation considerably simpler and less error-prone, so easy 
 that I don't think we need it in the base class either, I moved this to just 
 a simple helper method on SimpleBindings. It also found a bug in the equals() 
 test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5229) remove Collector specializations

2013-09-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773171#comment-13773171
 ] 

Robert Muir commented on LUCENE-5229:
-

{quote}
nuke doMaxScores. its implicit from doScores

+1, if you ask to compute scores, you might as well get maxScore. I doubt that 
specialization is so important.
{quote}

I will split off a subtask for this since I dont think its controversial. I at 
least want to make some progress on this. Removing confusing booleans from the 
API of indexsearcher is also huge to me: and this will take care of one.

 remove Collector specializations
 

 Key: LUCENE-5229
 URL: https://issues.apache.org/jira/browse/LUCENE-5229
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 There are too many collector specializations (i think 16 or 18?) and too many 
 crazy defaults like returning NaN scores to the user by default in 
 indexsearcher.
 this confuses hotspot (I will ignore any benchmarks posted here where only 
 one type of sort is running thru the JVM, thats unrealistic), and confuses 
 users with stuff like NaN scores coming back by default.
 I have two concerete suggestions:
 * nuke doMaxScores. its implicit from doScores. This is just over the top. 
 This should also halve the collectors.
 * change doScores to true by default in indexsearcher. since shai was 
 confused by the NaNs by default, and he added this stuff to lucene, that says 
 *everything* about how wrong this default is. Someone who *does* understand 
 what it does can simply pass false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5230) CJKAnalyzer can't split ;

2013-09-20 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated LUCENE-5230:
---

Description: 
@Test
public void test_AlphaNumAnalyzer() throws IOException {
Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
TokenStream token = analyzer.tokenStream(test, new 
StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
   token.reset(); //here
   while (token.incrementToken()) {
final CharTermAttribute termAtt = 
token.addAttribute(CharTermAttribute.class);

System.out.println(termAtt.toString());
}
analyzer.close();
}

  was:
@Test
public void test_AlphaNumAnalyzer() throws IOException {
Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
TokenStream token = analyzer.tokenStream(test, new 
StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
while (token.incrementToken()) {
final CharTermAttribute termAtt = 
token.addAttribute(CharTermAttribute.class);

System.out.println(termAtt.toString());
}
analyzer.close();
}


 CJKAnalyzer can't split ;
 ---

 Key: LUCENE-5230
 URL: https://issues.apache.org/jira/browse/LUCENE-5230
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.4
Reporter: Littlestar
Priority: Minor

 @Test
 public void test_AlphaNumAnalyzer() throws IOException {
 Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
 TokenStream token = analyzer.tokenStream(test, new 
 StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
token.reset(); //here
while (token.incrementToken()) {
 final CharTermAttribute termAtt = 
 token.addAttribute(CharTermAttribute.class);
 System.out.println(termAtt.toString());
 }
 analyzer.close();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5232:
---

 Summary: Remove doMaxScore from indexsearcher, collector 
specializations, etc
 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


I think we should just compute doMaxScore whenever doDocScores = true.

This would remove 4 collector specializations and remove a boolean parameter 
from 4 indexsearcher methods.

We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5230) CJKAnalyzer java.lang.NullPointerException

2013-09-20 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773202#comment-13773202
 ] 

Erick Erickson commented on LUCENE-5230:


Please bring issues like this up on the user's list before raising a JIRA.

 CJKAnalyzer java.lang.NullPointerException
 --

 Key: LUCENE-5230
 URL: https://issues.apache.org/jira/browse/LUCENE-5230
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.4
Reporter: Littlestar
Priority: Minor

 @Test
 public void test_AlphaNumAnalyzer() throws IOException {
 Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_44);
 TokenStream token = analyzer.tokenStream(test, new 
 StringReader(0009bf2d97e9f86a7188002a64a84b351379323870284;0009bf2e97e9f8707188002a64a84b351379323870273;000ae1f0b4390779eed1002a64a8a7950;0001e87997e9f0017188000a64a84b351378869697875;fff205ce319b68ff1a3c002964a820841377769850018;000ae1f0b439077beed1002a64a8a7950;000ae1f1b439077deed1002a64a8a7950;0009bf2d97e9f86c7188002a64a84b351379323870281;0015adfd0c69d870debb000a64a8477c1378809423441));
token.reset(); //here
while (token.incrementToken()) {
 final CharTermAttribute termAtt = 
 token.addAttribute(CharTermAttribute.class);
 System.out.println(termAtt.toString());
 }
 analyzer.close();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773461#comment-13773461
 ] 

Michael McCandless commented on LUCENE-5232:


I'm not sure we should do this.

Today, when doMaxScore is false and doScores is true, we only score those hits 
that make it into the PQ, which is typically a very small subset of all hits. 
When an app needs scores, I think it often does not need the maxScore.

Can we somehow remove specialization without losing this functionality?  
Decouple the two ...

 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 4.5.0 RC1

2013-09-20 Thread Chris Hostetter

: On Fri, Sep 20, 2013 at 9:20 AM, Adrien Grand jpou...@gmail.com wrote:
:  I'll backport the commit to lucene_solr_4_5.
: 
: Oh, I see you have already done that, thanks!

yeah, sorry -- i ment to followup and forgot to hit send.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773489#comment-13773489
 ] 

Robert Muir commented on LUCENE-5232:
-

Such users can pass their own collector.

Seriously, who is using search(Query query, Filter filter, int n, Sort sort, 
boolean doDocScores, boolean doMaxScore), so using a sort, and asking for 
scores, but not asking for the maximum score.

This sounds to me like someones very special use case baked into lucene: I 
think we should remove it.

 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773508#comment-13773508
 ] 

Robert Muir commented on LUCENE-5232:
-

You know, if we really want to have this crazy specialization, why not move it 
out to a contrib module, and just have a HuperDuperTopFieldCollector.create() 
method that generates bytecode for the exact number of sort fields, and a 
million boolean parameters passed in?

I just dont think it needs to be in IndexSearcher/core lucene.

 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773566#comment-13773566
 ] 

Michael McCandless commented on LUCENE-5215:


Could you make the patch with --show-copies-as-adds?  Thanks!

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch


 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773567#comment-13773567
 ] 

Michael McCandless commented on LUCENE-5232:


How about never computing maxScore when sorting by field (and removing that 
boolean)?  An app can make a custom collector if they really need that, but I 
suspect it's uncommon.

 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[SolrCloud] is there a reason Overseer.STATE_UPDATE_DELAY is set so high?

2013-09-20 Thread Jessica Cheng
Hi,

Overseer.STATE_UPDATE_DELAY seems to be the amount of time the state
updater thread goes to sleep if there's no state update queue items to
process, so that it doesn't hammer zookeeper. Is it necessary to set it
that high (1500ms)?

We're using SolrCloud such that collections are created on the fly, and
1500ms becomes a bottleneck for creation for the entire cluster because the
updater is single-threaded and it goes to sleep for 1500ms every time the
outer while loop runs.

Since there's only one thread trying to monitor the queue, I don't think
zookeeper will mind being hit a little more frequently while the queue
remains empty. If people are in general worried about lowering it, can we
at least make it a property?

Thanks,
Jessica


Re: [VOTE] Release Lucene/Solr 4.5.0 RC1

2013-09-20 Thread Chris Hostetter

: :  
http://people.apache.org/~jpountz/staging_area/lucene-solr-4.5.0-RC1-rev1524755/

Once i fixed the javadoc linter workarround to hte 4_5 branch, I 
found no other problems with RC1 other then LUCENE-5233 -- and i certainly 
don't think LUCENE-5233 is significant enough to warrant a re-spin.


So i vote +1 based on the following SHA1 files...

407d517272961cc09b5b2a6dc7f414c033c2a842 *lucene-4.5.0-src.tgz
cb55b9fb36296e233d10b4dd0061af32947f1056 *lucene-4.5.0.tgz
82ed448175508792be960d31de05ea7e2815791e *lucene-4.5.0.zip
6db41833bf6763ec3b704cb343f59b779c16a841 *solr-4.5.0-src.tgz
e9150dd7c1f6046f5879196ea266505613f26506 *solr-4.5.0.tgz
0c7d4bcb5c29f67f2722b1255a5da803772c03a5 *solr-4.5.0.zip



-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 843 - Still Failing!

2013-09-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/843/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9752 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=62BF223C722178E6 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=US-ASCII 
-classpath 

Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 843 - Still Failing!

2013-09-20 Thread Robert Muir
jvm crash

On Fri, Sep 20, 2013 at 7:33 PM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/843/
 Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

 All tests passed

 Build Log:
 [...truncated 9752 lines...]
[junit4] ERROR: JVM J0 ended with an exception, command line: 
 /Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/jre/bin/java 
 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 
 -XX:+HeapDumpOnOutOfMemoryError 
 -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
 -Dtests.prefix=tests -Dtests.seed=62BF223C722178E6 -Xmx512M -Dtests.iters= 
 -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
 -Dtests.postingsformat=random -Dtests.docvaluesformat=random 
 -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
 -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
 -Dtests.cleanthreads=perClass 
 -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
  -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
 -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
 -Djava.io.tmpdir=. 
 -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
  
 -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
  -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
 -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
  -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
 -Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=US-ASCII 
 -classpath 
 

[jira] [Commented] (LUCENE-5231) better interoperability of expressions/ with valuesource

2013-09-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773707#comment-13773707
 ] 

ASF subversion and git services commented on LUCENE-5231:
-

Commit 1525192 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1525192 ]

LUCENE-5231: better interoperability of expressions with valuesource

 better interoperability of expressions/ with valuesource
 

 Key: LUCENE-5231
 URL: https://issues.apache.org/jira/browse/LUCENE-5231
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-5231.patch


 A few things i noticed, while trying to work on e.g. integration of this with 
 solr and just playing around:
 * No way for a custom Bindings to currently bind the score, as the necessary 
 stuff is package private. This adds a simple protected method to Bindings to 
 enable this.
 * Expression.getValueSource() cannot in general be used easily by other 
 things (e.g. interoperate with function queries and so on), because it 
 expects you pass it this custom cache. This is an impl detail, its easy to 
 remove this restriction and still compute subs only once.
 * if you try to bind the score and don't have the scorer setup, you should 
 get a clear exception: not NPE.
 * Each binding is looked up per-segment, which is bad. we should minimize the 
 lookups to only in the CTOR.
 * This makes validation considerably simpler and less error-prone, so easy 
 that I don't think we need it in the base class either, I moved this to just 
 a simple helper method on SimpleBindings. It also found a bug in the equals() 
 test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773708#comment-13773708
 ] 

Shai Erera commented on LUCENE-5232:


bq. just have a HuperDuperTopFieldCollector.create() 

We have that, it's called TopFieldCollector.create().

bq. How about never computing maxScore when sorting by field

+1. We can even offer such Collector.

Maybe what we need is to remove that .search() method from IndexSearcher API, 
document that the sort methods never compute scores and that you should use 
TopFieldCollector.create() if you wish to do that?

As for the specialization, I agree with Mike that we should decouple the two. I 
don't know how costly it is, in a real live system, to have a few extra 'ifs' 
(I don't think luceneutil lets you check that?), but I'm sure that computing a 
score is in most cases redundant work when sorting by a field and therefore 
should be avoided. Perhaps we should remove the specializations in favor of the 
added 'ifs' and let someone write his own Collector if he's worried about perf?

 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5231) better interoperability of expressions/ with valuesource

2013-09-20 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5231.
-

   Resolution: Fixed
Fix Version/s: 4.6
   5.0

 better interoperability of expressions/ with valuesource
 

 Key: LUCENE-5231
 URL: https://issues.apache.org/jira/browse/LUCENE-5231
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 5.0, 4.6

 Attachments: LUCENE-5231.patch


 A few things i noticed, while trying to work on e.g. integration of this with 
 solr and just playing around:
 * No way for a custom Bindings to currently bind the score, as the necessary 
 stuff is package private. This adds a simple protected method to Bindings to 
 enable this.
 * Expression.getValueSource() cannot in general be used easily by other 
 things (e.g. interoperate with function queries and so on), because it 
 expects you pass it this custom cache. This is an impl detail, its easy to 
 remove this restriction and still compute subs only once.
 * if you try to bind the score and don't have the scorer setup, you should 
 get a clear exception: not NPE.
 * Each binding is looked up per-segment, which is bad. we should minimize the 
 lookups to only in the CTOR.
 * This makes validation considerably simpler and less error-prone, so easy 
 that I don't think we need it in the base class either, I moved this to just 
 a simple helper method on SimpleBindings. It also found a bug in the equals() 
 test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5231) better interoperability of expressions/ with valuesource

2013-09-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773709#comment-13773709
 ] 

ASF subversion and git services commented on LUCENE-5231:
-

Commit 1525193 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1525193 ]

LUCENE-5231: better interoperability of expressions with valuesource

 better interoperability of expressions/ with valuesource
 

 Key: LUCENE-5231
 URL: https://issues.apache.org/jira/browse/LUCENE-5231
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-5231.patch


 A few things i noticed, while trying to work on e.g. integration of this with 
 solr and just playing around:
 * No way for a custom Bindings to currently bind the score, as the necessary 
 stuff is package private. This adds a simple protected method to Bindings to 
 enable this.
 * Expression.getValueSource() cannot in general be used easily by other 
 things (e.g. interoperate with function queries and so on), because it 
 expects you pass it this custom cache. This is an impl detail, its easy to 
 remove this restriction and still compute subs only once.
 * if you try to bind the score and don't have the scorer setup, you should 
 get a clear exception: not NPE.
 * Each binding is looked up per-segment, which is bad. we should minimize the 
 lookups to only in the CTOR.
 * This makes validation considerably simpler and less error-prone, so easy 
 that I don't think we need it in the base class either, I moved this to just 
 a simple helper method on SimpleBindings. It also found a bug in the equals() 
 test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773712#comment-13773712
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1525195 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1525195 ]

Merged revision(s) 1525194 from lucene/dev/trunk:
LUCENE-5207: Add a test that checks if the stack trace of an exception thrown 
from a Javascript function contains the original expression source code as the 
filename.

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Fix For: 5.0, 4.6

 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5215) Add support for FieldInfos generation

2013-09-20 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5215:
---

Attachment: LUCENE-5215.patch

Patch with --show-copies-as-adds

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, 
 LUCENE-5215.patch


 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773710#comment-13773710
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1525194 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1525194 ]

LUCENE-5207: Add a test that checks if the stack trace of an exception thrown 
from a Javascript function contains the original expression source code as the 
filename.

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Fix For: 5.0, 4.6

 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773713#comment-13773713
 ] 

Robert Muir commented on LUCENE-5232:
-

Sorry, I guess I'm against never computing this shit... because you guys 
think returning NaN is ok.

I don't. Its not.

if you want to make these optimizations, fix the APIs so its intuitive, 
otherwise, no way.

 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4882) Restrict SolrResourceLoader to only classloader accessible files and instance dir

2013-09-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773716#comment-13773716
 ] 

Uwe Schindler commented on SOLR-4882:
-

Hi,

nobody commented on this issue, so I think the current patch is fine. I would 
like to commit this for 4.6.

After that is resolved, we can also do SOLR-5234.

 Restrict SolrResourceLoader to only classloader accessible files and instance 
 dir
 -

 Key: SOLR-4882
 URL: https://issues.apache.org/jira/browse/SOLR-4882
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.5, 5.0

 Attachments: SOLR-4882.patch, SOLR-4882.patch


 SolrResourceLoader currently allows to load files from any 
 absolute/CWD-relative path, which is used as a fallback if the resource 
 cannot be looked up via the class loader.
 We should limit this fallback to sub-dirs below the instanceDir passed into 
 the ctor. The CWD special case should be removed, too (the virtual CWD is 
 instance's config or root dir).
 The reason for this is security related. Some Solr components allow to pass 
 in resource paths via REST parameters (e.g. XSL stylesheets, velocity 
 templates,...) and load them via resource loader. By this it is possible to 
 limit the whole thing to
 not allow loading e.g. /etc/passwd as a stylesheet.
 In 4.4 we should add a solrconfig.xml setting to enable the old behaviour, 
 but disable it by default, if your existing installation requires the files 
 from outside the instance dir which are not available via the URLClassLoader 
 used internally. In Lucene 5.0 we should not support this anymore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Difference between CustomScoreProvider, FunctionQuery and Expression

2013-09-20 Thread Shai Erera
Ok, so after I followed Rob's hints and clues, I found LongFieldSource,
which uses FieldCache API under the hood, which access the NDV field. I
feel sorry for the poor user who will try to figure it out himself though,
because there's no evident anywhere that this is what you should do!

LongFieldSource jdocs are completely erroneous, seems a copy-paste bug from
FloatFieldSource.

Then, if you get passed that and read FieldCache.getLongs jdocs, they say:

   * Checks the internal cache for an appropriate entry, and if none is
   * found, *reads the terms in codefield/code as longs* and returns an
array
   * of size codereader.maxDoc()/code of the value each document
   * has in the given field.

Nothing about NumericDocValues. I actually looked at FieldCache before
sending the first email, but all I could conclude from the jdocs is that it
parses terms. I didn't bother looking at FieldCacheImpl implementation, and
users shouldn't be expected to do that. I'll open an issue to clean/clarify
javadocs.

Shai


On Fri, Sep 20, 2013 at 6:53 PM, Robert Muir rcm...@gmail.com wrote:

 You asked about how to access a NumericDocValues field from a
 valuesource in lucene.

 Yet you showed an example where you did just this with expressions, so
 I'm just recommending you look at expressions/ source code (they use
 valuesource under the hood) to see how its done!

 On Fri, Sep 20, 2013 at 11:49 AM, Shai Erera ser...@gmail.com wrote:
  What do Expressions have to do here? Do they replace CustomScoreQuery?
 Maybe
  they should, I don't know. But today, if you want to use CSQ, to boost a
  document by an NDV field, you need to write a ValueSource which reads
 from
  the field. And that's the object that I don't see.
 
  Maybe you want to say that Expressions will eventually replace CSQ, and
 so
  it's moot to add a NumericDVFieldSource to Lucene? Or we want to
 document on
  CSQ that you should really consider using Expressions?
 
  Shai
 
 
  On Fri, Sep 20, 2013 at 6:41 PM, Robert Muir rcm...@gmail.com wrote:
 
  why dont you look and see how expressions is doing it?
 
  On Fri, Sep 20, 2013 at 11:39 AM, Shai Erera ser...@gmail.com wrote:
   Thanks Rob. So is there a NumericDVFieldSource-like in Lucene? I think
   it's
   important that we have one.
  
   Shai
  
  
   On Fri, Sep 20, 2013 at 6:10 PM, Robert Muir rcm...@gmail.com
 wrote:
  
   thats what it does. its more like a computed field. and you can sort
   by more than one of them.
  
   please see the JIRA issue for a description of the differences
 between
   function queries.
  
   On Fri, Sep 20, 2013 at 10:49 AM, Shai Erera ser...@gmail.com
 wrote:
Yes, you're right, but that's unrelated to this thread. I passed
doScore=true and the scores come out the same, meaning Expression
didn't
affect the actual score, only the sort-by value (which is ok).
   
search Expression
doc=1, score=0.37158427, field=0.7431685328483582
doc=0, score=0.37158427, field=0.3715842664241791
   
Shai
   
   
On Fri, Sep 20, 2013 at 5:10 PM, Robert Muir rcm...@gmail.com
wrote:
   
On Fri, Sep 20, 2013 at 8:01 AM, Shai Erera ser...@gmail.com
wrote:

 Expression
 I tried the new module, following TestDemoExpression and
 compiled
 the
 expression using this code:

 Expression expr = JavascriptCompiler.compile(_score *
 boost);
 SimpleBindings bindings = new SimpleBindings();
 bindings.add(new SortField(_score, SortField.Type.SCORE));
 bindings.add(new SortField(boost, SortField.Type.LONG));

 The result scores are:

 search Expression
 doc=1, score=NaN, field=0.7431685328483582
 doc=0, score=NaN, field=0.3715842664241791

 As you can see, both CustomScoreProvider and Expression methods
 return
 same
 scores for the docs, while the FunctionQuery method returns
 different
 scores. The reason is that when using FunctionQuery, the scores
 of
 the
 ValueSources are multiplied by queryWeight, which seems correct
 to
 me.

 Expression is more about sorting than scoring as far as I
 understand
 (for
 instance, the result FieldDocs.score is NaN)
   
Why does that come as a surprise to you?  Pass true to
 indexsearcher
to get the documents score back here.
   
=== Release 2.9.0 2009-09-23
===
   
Changes in backwards compatibility policy
   
LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
 longer
computes a document score for each hit by default.
... (Shai Erera via Mike McCandless)
   
   
   
 -
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
   
   
  
   -
   To unsubscribe, e-mail: 

[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773721#comment-13773721
 ] 

Shai Erera commented on LUCENE-5232:


Maybe we can fix the API by making maxScore private on TopDocs, and throw 
IllegalStateException if you call it, yet it's NaN? I think it's an overkill 
though and it's enough to document that that's the behavior if you don't ask to 
compute maxScore.

 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5234) Clarify FieldCache API around the use of NumericDocValues fields

2013-09-20 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-5234:
--

 Summary: Clarify FieldCache API around the use of NumericDocValues 
fields
 Key: LUCENE-5234
 URL: https://issues.apache.org/jira/browse/LUCENE-5234
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera


Spinoff from this thread: http://lucene.markmail.org/thread/wxs6bzf2ul6go4pg. 
FieldCache (and friends) API javadocs need some improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773726#comment-13773726
 ] 

Robert Muir commented on LUCENE-5232:
-

who the fuck is asking for scores, but not the max score, and why does their 
insanely specialized use case justify all these booleans on a central lucene 
class.


 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5234) Clarify FieldCache API around the use of NumericDocValues fields

2013-09-20 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5234:
---

Attachment: LUCENE-5234.patch

Initial patch improving longs javadocs. I think same improvements can be done 
to the other types as well (float, int etc.), but I'd like to get feedback on 
the wording first.

 Clarify FieldCache API around the use of NumericDocValues fields
 

 Key: LUCENE-5234
 URL: https://issues.apache.org/jira/browse/LUCENE-5234
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5234.patch


 Spinoff from this thread: http://lucene.markmail.org/thread/wxs6bzf2ul6go4pg. 
 FieldCache (and friends) API javadocs need some improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-2844) benchmark geospatial performance based on geonames.org

2013-09-20 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-2844:


Assignee: David Smiley

 benchmark geospatial performance based on geonames.org
 --

 Key: LUCENE-2844
 URL: https://issues.apache.org/jira/browse/LUCENE-2844
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor
 Fix For: 5.0, 4.5

 Attachments: benchmark-geo.patch, benchmark-geo.patch


 Until now (with this patch), the benchmark contrib module did not include a 
 means to test geospatial data.  This patch includes some new files and 
 changes to existing ones.  Here is a summary of what is being added in this 
 patch per file (all files below are within the benchmark contrib module) 
 along with my notes:
 Changes:
 * build.xml -- Add dependency on Lucene's spatial module and Solr.
 ** It was a real pain to figure out the convoluted ant build system to make 
 this work, and I doubt I did it the proper way.  
 ** Rob Muir thought it would be a good idea to make the benchmark contrib 
 module be top level module (i.e. be alongside analysis) so that it can depend 
 on everything.  
 http://lucene.472066.n3.nabble.com/Re-Geospatial-search-in-Lucene-Solr-tp2157146p2157824.html
   I agree 
 * ReadTask.java -- Added a search.useHitTotal boolean option that will use 
 the total hits number for reporting purposes, instead of the existing 
 behavior.
 ** The existing behavior (i.e. when search.useHitTotal=false) doesn't look 
 very useful since the response integer is the sum of several things instead 
 of just one thing.  I don't see how anyone makes use of it.
 Note that on my local system, I also changed ReportTask  RepSelectByPrefTask 
 to not include the '-' every other line, and also changed Format.java to not 
 use commas in the numbers.  These changes are to make copy-pasting into excel 
 more streamlined.
 New Files:
 * geoname-spatial.alg -- my algorithm file.
 **  Note the :0 trailing the Populate sequence.  This is a trick I use to 
 skip building the index, since it takes a while to build and I'm not 
 interested in benchmarking index construction.  You'll want to set this to :1 
 and then subsequently put it back for further runs as long as you keep the 
 doc.geo.schemaField or any other configuration elements affecting index the 
 same.
 ** In the patch, doc.geo.schemaField=geohash but unless you're tinkering with 
 SOLR-2155, you'll probably want to set this to latlon
 * GeoNamesContentSource.java -- a ContentSource for a geonames.org data file 
 (either a single country like US.txt or allCountries.txt).
 ** Uses a subclass of DocData to store all the fields.  The existing DocData 
 wasn't very applicable to data that is not composed of a title and body.
 ** Doesn't reuse the docdata parameter to getNextDocData(); a new one is 
 created every time.
 ** Only supports content.source.forever=false
 * GeoNamesDocMaker.java -- a subclass of DocMaker that works very differently 
 than the existing DocMaker.
 ** Instead of assuming that each line from geonames.org will correspond to 
 one Lucene document, this implementation supports, via configuration, 
 creating a variable number of documents, each with a variable number of 
 points taken randomly from a GeoNamesContentSource.
 ** doc.geo.docsToGenerate:  The number of documents to generate.  If blank it 
 defaults to the number of rows in GeoNamesContentSource.
 ** doc.geo.avgPlacesPerDoc: The average number of places to be added to a 
 document.  A random number between 0 and one less than twice this amount is 
 chosen on a per document basis.  If this is set to 1, then exactly one is 
 always used.  In order to support a value greater than 1, use the geohash 
 field type and incorporate SOLR-2155 (geohash prefix technique).
 ** doc.geo.oneDocPerPlace: Whether at most one document should use the same 
 place.  In other words, Can more than one document have the same place?  If 
 so, set this to false.
 ** doc.geo.schemaField: references a field name in schema.xml.  The field 
 should implement SpatialQueryable.
 * GeoPerfData.java: This class is a singleton storing data in memory that is 
 shared by GeoNamesDocMaker.java and GeoQueryMaker.java.
 ** content.geo.zeroPopSubst: if a population is encountered that is = 0, 
 then use this population value instead.  Default is 100.
 ** content.geo.maxPlaces: A limit on the number of rows read in from 
 GeoNamesContentSource.java can be set here.  Defaults to Integer.MAX_VALUE.
 ** GeoPerfData is primarily responsible for reading in data from 
 GeoNamesContentSource into memory to store the lat, lon, and population.  
 When a random place is 

[jira] [Updated] (LUCENE-2844) benchmark geospatial performance based on geonames.org

2013-09-20 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-2844:
-

Fix Version/s: (was: 4.5)
   4.6

 benchmark geospatial performance based on geonames.org
 --

 Key: LUCENE-2844
 URL: https://issues.apache.org/jira/browse/LUCENE-2844
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor
 Fix For: 5.0, 4.6

 Attachments: benchmark-geo.patch, benchmark-geo.patch


 Until now (with this patch), the benchmark contrib module did not include a 
 means to test geospatial data.  This patch includes some new files and 
 changes to existing ones.  Here is a summary of what is being added in this 
 patch per file (all files below are within the benchmark contrib module) 
 along with my notes:
 Changes:
 * build.xml -- Add dependency on Lucene's spatial module and Solr.
 ** It was a real pain to figure out the convoluted ant build system to make 
 this work, and I doubt I did it the proper way.  
 ** Rob Muir thought it would be a good idea to make the benchmark contrib 
 module be top level module (i.e. be alongside analysis) so that it can depend 
 on everything.  
 http://lucene.472066.n3.nabble.com/Re-Geospatial-search-in-Lucene-Solr-tp2157146p2157824.html
   I agree 
 * ReadTask.java -- Added a search.useHitTotal boolean option that will use 
 the total hits number for reporting purposes, instead of the existing 
 behavior.
 ** The existing behavior (i.e. when search.useHitTotal=false) doesn't look 
 very useful since the response integer is the sum of several things instead 
 of just one thing.  I don't see how anyone makes use of it.
 Note that on my local system, I also changed ReportTask  RepSelectByPrefTask 
 to not include the '-' every other line, and also changed Format.java to not 
 use commas in the numbers.  These changes are to make copy-pasting into excel 
 more streamlined.
 New Files:
 * geoname-spatial.alg -- my algorithm file.
 **  Note the :0 trailing the Populate sequence.  This is a trick I use to 
 skip building the index, since it takes a while to build and I'm not 
 interested in benchmarking index construction.  You'll want to set this to :1 
 and then subsequently put it back for further runs as long as you keep the 
 doc.geo.schemaField or any other configuration elements affecting index the 
 same.
 ** In the patch, doc.geo.schemaField=geohash but unless you're tinkering with 
 SOLR-2155, you'll probably want to set this to latlon
 * GeoNamesContentSource.java -- a ContentSource for a geonames.org data file 
 (either a single country like US.txt or allCountries.txt).
 ** Uses a subclass of DocData to store all the fields.  The existing DocData 
 wasn't very applicable to data that is not composed of a title and body.
 ** Doesn't reuse the docdata parameter to getNextDocData(); a new one is 
 created every time.
 ** Only supports content.source.forever=false
 * GeoNamesDocMaker.java -- a subclass of DocMaker that works very differently 
 than the existing DocMaker.
 ** Instead of assuming that each line from geonames.org will correspond to 
 one Lucene document, this implementation supports, via configuration, 
 creating a variable number of documents, each with a variable number of 
 points taken randomly from a GeoNamesContentSource.
 ** doc.geo.docsToGenerate:  The number of documents to generate.  If blank it 
 defaults to the number of rows in GeoNamesContentSource.
 ** doc.geo.avgPlacesPerDoc: The average number of places to be added to a 
 document.  A random number between 0 and one less than twice this amount is 
 chosen on a per document basis.  If this is set to 1, then exactly one is 
 always used.  In order to support a value greater than 1, use the geohash 
 field type and incorporate SOLR-2155 (geohash prefix technique).
 ** doc.geo.oneDocPerPlace: Whether at most one document should use the same 
 place.  In other words, Can more than one document have the same place?  If 
 so, set this to false.
 ** doc.geo.schemaField: references a field name in schema.xml.  The field 
 should implement SpatialQueryable.
 * GeoPerfData.java: This class is a singleton storing data in memory that is 
 shared by GeoNamesDocMaker.java and GeoQueryMaker.java.
 ** content.geo.zeroPopSubst: if a population is encountered that is = 0, 
 then use this population value instead.  Default is 100.
 ** content.geo.maxPlaces: A limit on the number of rows read in from 
 GeoNamesContentSource.java can be set here.  Defaults to Integer.MAX_VALUE.
 ** GeoPerfData is primarily responsible for reading in data from 
 GeoNamesContentSource into memory to store the lat, lon, and population.  
 

[jira] [Commented] (LUCENE-5232) Remove doMaxScore from indexsearcher, collector specializations, etc

2013-09-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773730#comment-13773730
 ] 

Shai Erera commented on LUCENE-5232:


Well, maybe start with why computing maxScore at all, even for a 
TopScoreDocCollector? We use that to normalize document scores when doing some 
form of distributed search. When you use TSDC, it's easy to fill 
TopDocs.maxScore, because it's already known. When you sort by a field, you 
have to score *every* document in order to fill maxScore, as Mike pointed out, 
and not just those that make it into the heap based on their sort value.

I think the problematic API here might be TopFieldDocs extending TopDocs. I 
believe that when you ask to sort, you don't need scores. That's the common 
case. So if we e.g. returned a TopFieldDocs which does not extend from TopDocs, 
and FieldDoc only gave you the sort-by values + 'doc', then we can remove 
doScore + doMaxScore entirely from TopFieldCollector. Let the users that need 
to know the score in addition to the sort-by values write a custom Collector. 
Or, they can put a SortField.SCORE as the last sort-by field, and they get the 
scores already in FieldDoc.fields.

 Remove doMaxScore from indexsearcher, collector specializations, etc
 

 Key: LUCENE-5232
 URL: https://issues.apache.org/jira/browse/LUCENE-5232
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Robert Muir
 Fix For: 5.0


 I think we should just compute doMaxScore whenever doDocScores = true.
 This would remove 4 collector specializations and remove a boolean parameter 
 from 4 indexsearcher methods.
 We can just do this in 5.0 I think.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org