[jira] [Closed] (SOLR-712) Support for relative path XML Entity Includes

2012-09-11 Thread Amit Nithian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Nithian closed SOLR-712.
-

   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0

This looks to have been resolved by the various XML parser changes. Awesome!

 Support for relative path XML Entity Includes
 -

 Key: SOLR-712
 URL: https://issues.apache.org/jira/browse/SOLR-712
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.2, 1.3
Reporter: Amit Nithian
Priority: Minor
 Fix For: 4.0

 Attachments: XMLEntityInclude.tgz

   Original Estimate: 1h
  Remaining Estimate: 1h

 This patch modifies the Config.java and IndexSchema.java to support XML 
 entity includes of relative paths. When parsing an InputStream not providing 
 a SystemID (i.e. base path for example) to the DOM parser will basically 
 force all entity includes to be relative to the base project directory rather 
 than the directory hosting the document. This patch simply passes in the 
 configuration directory as the systemID thus making the entity includes 
 relative to the home of solrconfig.xml and schema.xml. 
 The IndexSchema.java was modified to ensure objects do NOT process the 
 xml:base attribute. Newer Xerces-J parsers allow for the removal of this 
 attribute in the DOM (by setting the appropriate feature); however, the DOM 
 parser used by Java 5 doesn't support this feature.
 For example:
 Without the entity include, if my Solr app was running on C:\solr, then any 
 entity includes would have to be relative to C:\solr regardless of 
 solrconfig.xml and schema.xml. This patch allows for includes relative to the 
 conf directory of solr.home (i.e. ../../my_base_schema.xml) would be located 
 two directories above conf.
 Please submit improvements or comments on this patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3304) Add Solr support for the new Lucene spatial module

2012-09-11 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-3304:
---

Attachment: SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch

Thanks for finding and fixing that bug Andy. Your fix wasn't quite right though 
since the getStrategy() method you refactored synchronized on a parameter 
(pointless) instead of the field. I fixed this.

This new patch makes that and various other changes:
* synchronized with the latest source tree (e.g. Spatial4j 0.3)
** This means distances are now degrees based (0-180 for circle radius) not 
kilometers
* removed ignoreIncompatibleGeometry option (see LUCENE-4173)
* Use the input string as the stored value that is returned.  So if you give 
lat,lon then that's what you get back, in whatever number of decimal places 
you chose.
* added prefixGridScanLevel performance tuning option to 
SpatialRecursivePrefixTreeFieldType (simply exposed it from the strategy)
* keep distErrPct as a fraction (no change)

It would be nice to have a kilometer unit option but that isn't easily done 
until Spatial4j's shape reader gets to be more flexible.  That can wait.

That needScore local-param hack (see SOLR-2883) is unfortunate, as Solr can't 
get a Filter from a field type.  I'm tempted to change the default to 'false' 
as leaving it at true' triggers large RAM requirements and slow-downs for 
SpatialRecursivePrefixTreeFieldType.  This could be an opportunity to specify 
what the score should be, come to think of it.  Instead of needScore=false, 
maybe score=none (default) or score=distance or score=recipDistance or 
something like that.

The TwoDoubles strategy needs more attention and tests in Lucene spatial, but I 
don't want that to hold up this patch.  Shall I remove the adapter or let it 
get committed but don't advertise it until it's more worthy?

 Add Solr support for the new Lucene spatial module
 --

 Key: SOLR-3304
 URL: https://issues.apache.org/jira/browse/SOLR-3304
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0-ALPHA
Reporter: Bill Bell
Assignee: David Smiley
  Labels: spatial
 Attachments: SOLR-3304_Solr_fields_for_Lucene_spatial_module 
 (fieldName in Strategy) - indexableFields.patch, 
 SOLR-3304_Solr_fields_for_Lucene_spatial_module (fieldName in 
 Strategy).patch, SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch, 
 SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch, 
 SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch, 
 SOLR-3304-strategy-getter-fixed.patch


 Get the Solr spatial module integrated with the lucene spatial module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-712) Support for relative path XML Entity Includes

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452795#comment-13452795
 ] 

Uwe Schindler commented on SOLR-712:


Hi, I did not know about this issue, this is solved since Solr 3.1 (issue 
SOLR-1656).

 Support for relative path XML Entity Includes
 -

 Key: SOLR-712
 URL: https://issues.apache.org/jira/browse/SOLR-712
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.2, 1.3
Reporter: Amit Nithian
Priority: Minor
 Fix For: 4.0

 Attachments: XMLEntityInclude.tgz

   Original Estimate: 1h
  Remaining Estimate: 1h

 This patch modifies the Config.java and IndexSchema.java to support XML 
 entity includes of relative paths. When parsing an InputStream not providing 
 a SystemID (i.e. base path for example) to the DOM parser will basically 
 force all entity includes to be relative to the base project directory rather 
 than the directory hosting the document. This patch simply passes in the 
 configuration directory as the systemID thus making the entity includes 
 relative to the home of solrconfig.xml and schema.xml. 
 The IndexSchema.java was modified to ensure objects do NOT process the 
 xml:base attribute. Newer Xerces-J parsers allow for the removal of this 
 attribute in the DOM (by setting the appropriate feature); however, the DOM 
 parser used by Java 5 doesn't support this feature.
 For example:
 Without the entity include, if my Solr app was running on C:\solr, then any 
 entity includes would have to be relative to C:\solr regardless of 
 solrconfig.xml and schema.xml. This patch allows for includes relative to the 
 conf directory of solr.home (i.e. ../../my_base_schema.xml) would be located 
 two directories above conf.
 Please submit improvements or comments on this patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3819) Facet count not working when tagging excluding filters for range facets with group.facet true

2012-09-11 Thread Ricardo Merizalde (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricardo Merizalde updated SOLR-3819:


Summary: Facet count not working when tagging  excluding filters for range 
facets with group.facet true  (was: Facet count not working when tagging  
excluding filters for range facets with group.facet is true)

 Facet count not working when tagging  excluding filters for range facets 
 with group.facet true
 ---

 Key: SOLR-3819
 URL: https://issues.apache.org/jira/browse/SOLR-3819
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0-BETA
 Environment:  12.0.0 Darwin Kernel Version 12.0.
Reporter: Ricardo Merizalde

 I'm creating a range facet and I want to support multiple selection for it. 
 However, when I set group.facet on the tags/exclusions for filters stop 
 working. In other words, I only get the facet values for the filtered 
 documents. The following link works:
 http://localhost:8983/solr/catalogPreview/select?q=*:*facet=truewt=xmlrows=0facet.range={!ex%3DsalePrice}salePricef.salePrice.facet.range.gap=75f.salePrice.facet.range.start=100f.salePrice.facet.range.end=600group=truegroup.field=productIdf.salePrice.facet.mincount=1fq={!tag=salePrice}salePrice:[100%20TO%20175]group.facet=false
 The following doesn't:
 http://localhost:8983/solr/catalogPreview/select?q=*:*facet=truewt=xmlrows=0facet.range={!ex%3DsalePrice}salePricef.salePrice.facet.range.gap=75f.salePrice.facet.range.start=100f.salePrice.facet.range.end=600group=truegroup.field=productIdf.salePrice.facet.mincount=1fq={!tag=salePrice}salePrice:[100%20TO%20175]group.facet=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



patch attached, what next?!

2012-09-11 Thread Despot Jakimovski
Hi,

I finished attaching the patch to:
https://issues.apache.org/jira/browse/SOLR-3574
The status of the Jira issue is

   -  *Status:* [image: In Progress] In Progress
   -  *Priority:* [image: Major] Major
   -  *Resolution:* Unresolved

*Is there something else I should do (change some status/resolution and to
what?), before someone inspects the patch?*
I cannot see a log work option, so I can't change the remaining time of
the jira issue. But this might not be so important.

Cheers,
Despot


[jira] [Commented] (SOLR-3820) Solr Admin Query form is missing some edismax request parameters

2012-09-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452844#comment-13452844
 ] 

Jan Høydahl commented on SOLR-3820:
---

Good catch

 Solr Admin Query form is missing some edismax request parameters
 

 Key: SOLR-3820
 URL: https://issues.apache.org/jira/browse/SOLR-3820
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
 Fix For: 4.0


 The following edismax parameters are missing from the Solr Admin Query form:
 uf - User Fields
 pf2 - bigram phrase boost fields
 pf3 - trigram phrase boost fields
 ps2 - phrase slop for bigram phrases
 ps3 - phrase slop for trigram phrases
 boost - multiplicative boost function
 stopwords - remove stopwords from mandatory matching component (true/false, 
 defaults to true)
 lowercaseOperators - Enable lower-case and and or as operators 
 (true/false, defaults to true)
 The ability to set field name aliases is also missing: f.myalias.qf=realfield.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4355:


Attachment: LUCENE-4355.patch

updated patch: sugar fixed for docsEnum/dpEnum as proposed.

wasn't as bad as I thought :)

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4355.patch, LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4355:


Fix Version/s: 4.0
   5.0
 Assignee: Robert Muir

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4355.patch, LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452871#comment-13452871
 ] 

Uwe Schindler commented on LUCENE-4196:
---

Hi Robert,
I wanted to go through the codec code to check this myself. I just had no time 
to do it. E.g. things like the CompoundFileReader not using hard checks is one 
reason, why I want to go through it a second time. Whats the issue with keeping 
this issue open as a todo task?

 Turn asserts in I/O related code into hard checks
 -

 Key: LUCENE-4196
 URL: https://issues.apache.org/jira/browse/LUCENE-4196
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0-ALPHA
Reporter: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-4196.patch


 In lots of codecs we only assert, that e.g. some things inside files are 
 correctly in bounds, which leads to security problems (ok, not as bad as 
 C-Style buffer overflows), but e.g. allocating a large array after reading a 
 VInt from a file header and then OOM, is a security issue. So we have to 
 check all those contracts for files as hard checks, especially as a simply 
 check in most cases dont cost anything (and it costs not more than the assert 
 itsself, as the assert also takes CPU power, because it needs a check one 
 time on a static final class field).
 Of course we cannot check values we read when reading postings, but the 
 simple checks that any postings file has correct header and something like a 
 positive number of elements, or number of elements  file size,..., a 
 bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate 
 filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, 
 Mike changed all of those to asserts during the flex development (in my 
 opinion with no real reason).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452874#comment-13452874
 ] 

Robert Muir commented on LUCENE-4369:
-

Chris: well there is a lot more to convey than the old Field.Index.NOT_ANALYZED:

# text is treated as if it went thru keywordanalyzer
# term frequencies and positions are omitted
# length normalization and index-time boosts are disabled

The idea of MatchOnly is to describe that the field is really only useful for 
matching,
not searching. The other 2 things this Field does wrt scoring and index options 
become important
when someone adds multiple instances under the same name: I think its important 
to convey
that its still only 'matching' and they wont have real scoring here.

The problem I see with StringField as a name is that it doesn't hint at any 
of this. The current
name can lead you to believe you should use it because you happen to have your 
content as a Java String.


 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452876#comment-13452876
 ] 

Robert Muir commented on LUCENE-4196:
-

There is no issue except the fix version field: I'm just trying to get things 
with fixVersion=4.0 contained and assigned to people who
are actually planning on working the issues in the next few days, or moved out 
of the release.

If there is really more work thats necessary before 4.0 and someone is planning 
on working on it, then I think it should have the fixVersion.

But if its just a future item that would be nice, then it should be moved out.

 Turn asserts in I/O related code into hard checks
 -

 Key: LUCENE-4196
 URL: https://issues.apache.org/jira/browse/LUCENE-4196
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0-ALPHA
Reporter: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-4196.patch


 In lots of codecs we only assert, that e.g. some things inside files are 
 correctly in bounds, which leads to security problems (ok, not as bad as 
 C-Style buffer overflows), but e.g. allocating a large array after reading a 
 VInt from a file header and then OOM, is a security issue. So we have to 
 check all those contracts for files as hard checks, especially as a simply 
 check in most cases dont cost anything (and it costs not more than the assert 
 itsself, as the assert also takes CPU power, because it needs a check one 
 time on a static final class field).
 Of course we cannot check values we read when reading postings, but the 
 simple checks that any postings file has correct header and something like a 
 positive number of elements, or number of elements  file size,..., a 
 bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate 
 filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, 
 Mike changed all of those to asserts during the flex development (in my 
 opinion with no real reason).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452877#comment-13452877
 ] 

Uwe Schindler commented on LUCENE-4196:
---

Just remove the fix version alltogether.

 Turn asserts in I/O related code into hard checks
 -

 Key: LUCENE-4196
 URL: https://issues.apache.org/jira/browse/LUCENE-4196
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0-ALPHA
Reporter: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-4196.patch


 In lots of codecs we only assert, that e.g. some things inside files are 
 correctly in bounds, which leads to security problems (ok, not as bad as 
 C-Style buffer overflows), but e.g. allocating a large array after reading a 
 VInt from a file header and then OOM, is a security issue. So we have to 
 check all those contracts for files as hard checks, especially as a simply 
 check in most cases dont cost anything (and it costs not more than the assert 
 itsself, as the assert also takes CPU power, because it needs a check one 
 time on a static final class field).
 Of course we cannot check values we read when reading postings, but the 
 simple checks that any postings file has correct header and something like a 
 positive number of elements, or number of elements  file size,..., a 
 bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate 
 filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, 
 Mike changed all of those to asserts during the flex development (in my 
 opinion with no real reason).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4196) Turn asserts in I/O related code into hard checks

2012-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4196:


Fix Version/s: (was: 4.0)

 Turn asserts in I/O related code into hard checks
 -

 Key: LUCENE-4196
 URL: https://issues.apache.org/jira/browse/LUCENE-4196
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0-ALPHA
Reporter: Uwe Schindler
 Attachments: LUCENE-4196.patch


 In lots of codecs we only assert, that e.g. some things inside files are 
 correctly in bounds, which leads to security problems (ok, not as bad as 
 C-Style buffer overflows), but e.g. allocating a large array after reading a 
 VInt from a file header and then OOM, is a security issue. So we have to 
 check all those contracts for files as hard checks, especially as a simply 
 check in most cases dont cost anything (and it costs not more than the assert 
 itsself, as the assert also takes CPU power, because it needs a check one 
 time on a static final class field).
 Of course we cannot check values we read when reading postings, but the 
 simple checks that any postings file has correct header and something like a 
 positive number of elements, or number of elements  file size,..., a 
 bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate 
 filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, 
 Mike changed all of those to asserts during the flex development (in my 
 opinion with no real reason).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452884#comment-13452884
 ] 

Chris Male commented on LUCENE-4369:


As I say, I totally support renaming this field to something.  I think calling 
it anything else will help with distinguishing it from TextField so I'm +1 for 
MatchOnly.  Perhaps that'll encourage people to read the docs about it not 
being analyzed.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4369:


Attachment: LUCENE-4369.patch

patch: just from a eclipse rename of 'StringField - MatchOnlyField' and 
'LuceneTestCase.newStringField - LuceneTestCase.newMatchOnlyField'

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452900#comment-13452900
 ] 

Mark Harwood commented on LUCENE-4369:
--

SingleTermField ?

Not sure matching vs searching is a commonly understood differentiation.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452908#comment-13452908
 ] 

Robert Muir commented on LUCENE-4369:
-

Mark: I don't have strong feelings one way or the other. 

We don't need to rush it, I think its fairly contained to change, we don't even 
have to deal with this for 
4.0 if we aren't happy: we can deprecate StringField just have it extend 
XXXField in a future 4.x release too.

But I think the name StringField is not really good at all so its good to get 
all the ideas out here.


 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452914#comment-13452914
 ] 

Mark Harwood commented on LUCENE-4369:
--

Agreed on the need for a change - names are important.

I have a problem with using match on its own because the word is often 
associated with partial matching e.g. best match or fuzzy match.
A quick google suggests match has more connotations with fuzziness than 
exactness - there are 162m results for best match vs only 45m results for 
exact match.

So how about ExactMatchField?




 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4258) Incremental Field Updates through Stacked Segments

2012-09-11 Thread Sivan Yogev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sivan Yogev updated LUCENE-4258:


Attachment: LUCENE-4258-inner-changes.patch
LUCENE-4258-API-changes.patch
IncrementalFieldUpdates.odp

Adding a design proposal presentation, and two patches following the proposal 
concepts. The first patch includes proposed API changes (does not compile) for, 
and the other one inner changes for those interested in the implementation 
details. The second patch contains a new test named TestFieldsUpdates which 
currently fails.

 Incremental Field Updates through Stacked Segments
 --

 Key: LUCENE-4258
 URL: https://issues.apache.org/jira/browse/LUCENE-4258
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Sivan Yogev
 Attachments: IncrementalFieldUpdates.odp, 
 LUCENE-4258-API-changes.patch, LUCENE-4258-inner-changes.patch

   Original Estimate: 2,520h
  Remaining Estimate: 2,520h

 Shai and I would like to start working on the proposal to Incremental Field 
 Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452920#comment-13452920
 ] 

Shai Erera commented on LUCENE-4369:


bq. So how about ExactMatchField?

+1 for that. I was actually going to propose MatchExactField, but I don't 
mind the order of the words.

Also, since a way to search for these terms/fields using the regular query 
syntax would be through a PerFieldAnalyzerWrapper and assigning KeywordAnalyzer 
to that field (are there other ways), we can also call it KeywordField.

I don't like MatchOnlyField .. i.e. TextField also matches *only* the words 
that are indexed in that field.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452922#comment-13452922
 ] 

Robert Muir commented on LUCENE-4369:
-

I like ExactMatchField too.

I thought about Keyword too, but my concern is that this would get confused 
with 'search keywords' such as
the type used in META section of html documents. We could argue about the best 
field type for that :) but
I don't think this is it.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452924#comment-13452924
 ] 

Chris Male commented on LUCENE-4369:


I like ExactMatchField, good suggestion.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4258) Incremental Field Updates through Stacked Segments

2012-09-11 Thread Sivan Yogev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452928#comment-13452928
 ] 

Sivan Yogev commented on LUCENE-4258:
-

Forgot to mention that the implementation patch still missing many components...

 Incremental Field Updates through Stacked Segments
 --

 Key: LUCENE-4258
 URL: https://issues.apache.org/jira/browse/LUCENE-4258
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Sivan Yogev
 Attachments: IncrementalFieldUpdates.odp, 
 LUCENE-4258-API-changes.patch, LUCENE-4258-inner-changes.patch

   Original Estimate: 2,520h
  Remaining Estimate: 2,520h

 Shai and I would like to start working on the proposal to Incremental Field 
 Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452933#comment-13452933
 ] 

Uwe Schindler commented on LUCENE-4369:
---

ExactMatchField sounds ok, but I don't really like it. On the other hand, we 
already had Field.KEYWORD(...) static factory in Lucene 1.x (amybe also early 
2.x), and that was always fine to me. The term Keyword is only misleading (for 
my german, library background - Schlagwörter in GER) to me, so I would like 
to have a good term that tells the user this is a field thats taken as-is). In 
general I also dont really like the name KeywordTokenizer or KeywordAnalyzer, 
too, but thats given since long time - so coming from this name, 
KeywordTokenizer - KeywordField might be a good idea (like NumericTokenStream 
- NumericField), but

The problem with ExactMatch field is: If it is also stored, the name is 
misleasing again, so KeywordField is better. If we would 100% differentiate 
between stored and indexed fields while indexing (requiring that the field is 
also added 2 times, one time as indexed and one time as indexed), I would be 
fine with MatchOnlyField and StoredStringField.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452941#comment-13452941
 ] 

Uwe Schindler commented on LUCENE-4369:
---

Here the good old Lucene 1.9.1 API: 
http://memex.dsic.upv.es/pbs/Practicas/Lucene/api-1.9.1/org/apache/lucene/document/Field.html
 (see Field.Keyword, Field.Text, Field.Unstored)

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452945#comment-13452945
 ] 

Uwe Schindler edited comment on LUCENE-4369 at 9/11/12 10:48 PM:
-

bq. We don't need to rush it, I think its fairly contained to change, we don't 
even have to deal with this for 4.0 if we aren't happy: we can deprecate 
StringField just have it extend XXXField in a future 4.x release too.

I am against this, we should change this before Lucene 4.0. We have seen 
already on user list that many people understand it wrong, so for me this issue 
is a Blocker for 4.0.

  was (Author: thetaphi):
bq. We don't need to rush it, I think its fairly contained to change, we 
don't even have to deal with this for 
4.0 if we aren't happy: we can deprecate StringField just have it extend 
XXXField in a future 4.x release too.

I am against this, we should change this before Lucene 4.0. We have seen 
already on user list that many people understand it wrong, so for me this issue 
is a Blocker for 4.0.
  
 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452945#comment-13452945
 ] 

Uwe Schindler commented on LUCENE-4369:
---

bq. We don't need to rush it, I think its fairly contained to change, we don't 
even have to deal with this for 
4.0 if we aren't happy: we can deprecate StringField just have it extend 
XXXField in a future 4.x release too.

I am against this, we should change this before Lucene 4.0. We have seen 
already on user list that many people understand it wrong, so for me this issue 
is a Blocker for 4.0.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452947#comment-13452947
 ] 

Robert Muir commented on LUCENE-4369:
-

{quote}
The problem with ExactMatch field is: If it is also stored, the name is 
misleasing again, so KeywordField is better.
{quote}

I dont understand how storing is related. storing is the same always.

{quote}
If we would 100% differentiate between stored and indexed fields while indexing 
(requiring that the field is also added 2 times, one time as indexed and one 
time as indexed), I would be fine with MatchOnlyField and StoredStringField.
{quote}

In my opinion the only thing worse we could do to our .document API than 
StringField would be to require the user to add the field twice.


 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452956#comment-13452956
 ] 

Uwe Schindler commented on LUCENE-4369:
---

The names ExactMatchField or MatchOnlyField both have the problem, that 
they only refer to the indexing side. I would be fine with that name, if it 
would be unstored by default, so you have to turn on storing explicit. If it 
is automatically stored, people will complain that their index has too many 
useless garbage, because they expected a ExactMatchField to be used only for 
matching, so storing is wrong.

I would prefer: UntokenizedField or UntokenizedStringField

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452957#comment-13452957
 ] 

Robert Muir commented on LUCENE-4369:
-

{quote}
I am against this, we should change this before Lucene 4.0. We have seen 
already on user list that many people understand it wrong, so for me this issue 
is a Blocker for 4.0.
{quote}

I disagree with this. I've watched NOT_ANALYZED pop up on the user list for 
older releases time after time, its frustrating, but this problem is nothing 
new.
Its not introduced with 4.0: I opened this issue because I thought was useful 
feedback from someone testing the Lucene 4.0 BETA and its really trivial to fix,
once we settle on a name.

I don't think we should try to block releases when nobody can even agree on a 
good name yet.

We should instead focus on picking a good name: we can implement this for 4.1 
or 5.0 or whatever.


 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3824) Velocity: Error messages from search not displayed

2012-09-11 Thread JIRA
Jan Høydahl created SOLR-3824:
-

 Summary: Velocity: Error messages from search not displayed
 Key: SOLR-3824
 URL: https://issues.apache.org/jira/browse/SOLR-3824
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
 Fix For: 4.1, 5.0


Error messages are not displayed in Solritas GUI.

Example: In SolrCloud mode I have two shards, but shut down shard B. Then there 
is an error message:

{code}
lst name=error
str name=msgno servers hosting shard:/str
int name=code503/int
/lst
{code}

However this is not displayed by Velocity template, it shows an empty search 
result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452960#comment-13452960
 ] 

Robert Muir commented on LUCENE-4369:
-

{quote}
The names ExactMatchField or MatchOnlyField both have the problem, that 
they only refer to the indexing side.
{quote}

I dont know, I actually like ExactMatchField the best because it specifies 
exactly what I want it to specify.

MatchOnly is not as good because you can actually do things like sort (the 
javadocs mention this as one reason
you would use this field type), but ExactMatch just refers to the search 
behavior,
which is what I am really concerned about. It doesn't imply you cannot store 
it, it just tells you how the search
behavior behaves.


 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4345) Create a Classification module

2012-09-11 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452980#comment-13452980
 ] 

Tommaso Teofili commented on LUCENE-4345:
-

Thanks Lance for your useful insights, I'll definitely have a look :) .

bq. If you use index data which is already analyzed with the same analyzer as 
your test (unseen) documents, you can use a lot more documents as input. More 
is better. As the training data increases, signal drives out noise.

I agree, we could leverage this for sure.

bq. Once you add the ability to store  load models, training speed becomes 
less important.

Regarding storing and loading models, the base intuition (at least my intuition 
:P) in the case of Lucene is that the index itself plays that role.

 Create a Classification module
 --

 Key: LUCENE-4345
 URL: https://issues.apache.org/jira/browse/LUCENE-4345
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
 Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, 
 SOLR-3700_2.patch, SOLR-3700.patch


 Lucene/Solr can host huge sets of documents containing lots of information in 
 fields so that these can be used as training examples (w/ features) in order 
 to very quickly create classifiers algorithms to use on new documents and / 
 or to provide an additional service.
 So the idea is to create a contrib module (called 'classification') to host a 
 ClassificationComponent that will use already seen data (the indexed 
 documents / fields) to classify new documents / text fragments.
 The first version will contain a (simplistic) Lucene based Naive Bayes 
 classifier but more implementations should be added in the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4345) Create a Classification module

2012-09-11 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452987#comment-13452987
 ] 

Tommaso Teofili commented on LUCENE-4345:
-

by the way, if no one objects I plan to commit this shortly so that we can 
improve things directly by patching the trunk.

 Create a Classification module
 --

 Key: LUCENE-4345
 URL: https://issues.apache.org/jira/browse/LUCENE-4345
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
 Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, 
 SOLR-3700_2.patch, SOLR-3700.patch


 Lucene/Solr can host huge sets of documents containing lots of information in 
 fields so that these can be used as training examples (w/ features) in order 
 to very quickly create classifiers algorithms to use on new documents and / 
 or to provide an additional service.
 So the idea is to create a contrib module (called 'classification') to host a 
 ClassificationComponent that will use already seen data (the indexed 
 documents / fields) to classify new documents / text fragments.
 The first version will contain a (simplistic) Lucene based Naive Bayes 
 classifier but more implementations should be added in the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3825) Log document IDs when they are retrieved

2012-09-11 Thread Scott Stults (JIRA)
Scott Stults created SOLR-3825:
--

 Summary: Log document IDs when they are retrieved
 Key: SOLR-3825
 URL: https://issues.apache.org/jira/browse/SOLR-3825
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Scott Stults
Priority: Trivial


During relevancy tuning it's important to know exactly which documents the 
client has seen. Right now the only way to get that list is to splice into the 
HTTP traffic. Preferably the IDs could be logged along with the query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452998#comment-13452998
 ] 

Jack Krupansky commented on LUCENE-4369:


I would suggest RawTextField. Or, ExactTextField. Or, 
UnanalyzedTextField. I mean, text is text to an average user. Generally, 
people should use TextField for text, but use StringField when they need the 
exact, raw text as is and without being tokenized or otherwise changed.

KeywordTokenizer is confusing since it really is NoTokenizer or 
ExactTextTokenizer or RawTextTokenizer.

Is there currently a wiki page that describes the distinction between match 
and search? I would not expect an average user to know the distinction.




 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3825) Log document IDs when they are retrieved

2012-09-11 Thread Scott Stults (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Stults updated SOLR-3825:
---

Attachment: SOLR-3825.patch

 Log document IDs when they are retrieved
 

 Key: SOLR-3825
 URL: https://issues.apache.org/jira/browse/SOLR-3825
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Scott Stults
Priority: Trivial
 Attachments: SOLR-3825.patch


 During relevancy tuning it's important to know exactly which documents the 
 client has seen. Right now the only way to get that list is to splice into 
 the HTTP traffic. Preferably the IDs could be logged along with the query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3824) Velocity: Error messages from search not displayed

2012-09-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3824:
--

Attachment: SOLR-3824.patch

First patch, showing any error section in a big red box.

To test, try e.g. 
{noformat}
http://localhost:8983/solr/collection1/browse?defType=luceneq=%22a
{noformat}

 Velocity: Error messages from search not displayed
 --

 Key: SOLR-3824
 URL: https://issues.apache.org/jira/browse/SOLR-3824
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
 Fix For: 4.1, 5.0

 Attachments: SOLR-3824.patch


 Error messages are not displayed in Solritas GUI.
 Example: In SolrCloud mode I have two shards, but shut down shard B. Then 
 there is an error message:
 {code}
 lst name=error
 str name=msgno servers hosting shard:/str
 int name=code503/int
 /lst
 {code}
 However this is not displayed by Velocity template, it shows an empty search 
 result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3824) Velocity: Error messages from search not displayed

2012-09-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3824:
--

Fix Version/s: (was: 4.1)
   4.0

Nice to include this in 4.0 as well since the likelyhood for errors in a 
sharded env is larger

 Velocity: Error messages from search not displayed
 --

 Key: SOLR-3824
 URL: https://issues.apache.org/jira/browse/SOLR-3824
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
 Fix For: 4.0, 5.0

 Attachments: SOLR-3824.patch


 Error messages are not displayed in Solritas GUI.
 Example: In SolrCloud mode I have two shards, but shut down shard B. Then 
 there is an error message:
 {code}
 lst name=error
 str name=msgno servers hosting shard:/str
 int name=code503/int
 /lst
 {code}
 However this is not displayed by Velocity template, it shows an empty search 
 result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3824) Velocity: Error messages from search not displayed

2012-09-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-3824.
---

Resolution: Fixed

Committed to trunk r1383405 and branch_4x r1383412

 Velocity: Error messages from search not displayed
 --

 Key: SOLR-3824
 URL: https://issues.apache.org/jira/browse/SOLR-3824
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
 Fix For: 4.0, 5.0

 Attachments: SOLR-3824.patch


 Error messages are not displayed in Solritas GUI.
 Example: In SolrCloud mode I have two shards, but shut down shard B. Then 
 there is an error message:
 {code}
 lst name=error
 str name=msgno servers hosting shard:/str
 int name=code503/int
 /lst
 {code}
 However this is not displayed by Velocity template, it shows an empty search 
 result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3243) eDismax and non-fielded range query

2012-09-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453021#comment-13453021
 ] 

Jan Høydahl commented on SOLR-3243:
---

Bill Bell, would you care to test this and comment.

I think there is still a loophole for a bare * query - it gets expanded across 
all fields as well and is less efficient than a MatchAllDocsQuery, which is 
more likely to be the intent when issuing a *. Perhaps we can incorporate that 
in this issue as well?

 eDismax and non-fielded range query
 ---

 Key: SOLR-3243
 URL: https://issues.apache.org/jira/browse/SOLR-3243
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Critical
 Fix For: 5.0

 Attachments: SOLR-3243.patch


 Reported by Bill Bell in SOLR-3085:
 If you enter a non-fielded open-ended range in the search box, like [* TO *], 
 eDismax will expand it to all fields:
 {noformat}
 +DisjunctionMaxQuery((content:[* TO *]^2.0 | id:[* TO *]^50.0 | author:[* TO 
 *]^15.0 | meta:[* TO *]^10.0 | name:[* TO *]^20.0))
 {noformat}
 This does not make sense, and a side effect is that range queries for strings 
 are very expensive, open-ended even more, and you can totally crash the 
 search server by hammering something like ([* TO *] OR [* TO *] OR [* TO *]) 
 a few times...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453022#comment-13453022
 ] 

Uwe Schindler commented on LUCENE-4369:
---

Thanks Jack, that exactly also my opinion, we just need good names. I like 
your's, too. Raw is a good term, too. The MatchOnly or ExactMatch terms are 
in my opinion not very good, sorry.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453036#comment-13453036
 ] 

Robert Muir commented on LUCENE-4369:
-

{quote}
Raw is a good term, too.
{quote}

+1, lets think about that.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4258) Incremental Field Updates through Stacked Segments

2012-09-11 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453050#comment-13453050
 ] 

Adrien Grand commented on LUCENE-4258:
--

On slide 4 one of the enumerated operations is field deletion but I am not sure 
how to do it with the proposed API on slide 5?

It is just a tought, but your work plan only mentions Lucene fields. Wouldn't 
it be easier to start working with DocValues? I guess it would help us get 
started with document updates and would already solve most use-cases (I'm 
especially thinking of scoring factors).

 Incremental Field Updates through Stacked Segments
 --

 Key: LUCENE-4258
 URL: https://issues.apache.org/jira/browse/LUCENE-4258
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Sivan Yogev
 Attachments: IncrementalFieldUpdates.odp, 
 LUCENE-4258-API-changes.patch, LUCENE-4258-inner-changes.patch

   Original Estimate: 2,520h
  Remaining Estimate: 2,520h

 Shai and I would like to start working on the proposal to Incremental Field 
 Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453066#comment-13453066
 ] 

Steven Rowe commented on LUCENE-4369:
-

AuNaturelTextField

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Collator-based facet sorting in Solr

2012-09-11 Thread Toke Eskildsen
Claudio Ranieri and I briefly discussed collator based sorting for
facets in the thread Problem with accented words sorting on the
solr-user mailing list. Here's the idea:

Solr faceting supports sorting by either count or index order. Claudio
and I both need the order to be collator-based. My understanding of the
issue is that it is not currently possible.

Collator-based document sorting in Solr uses CollationKeys as field
values. This does not work with faceting on fields with multiple values
as there is no mapping from the key to the human readable value. 

ICU sort keys are always null (00) terminated and when two keys are
compared, the comparison stops as soon as null is reached(?)
http://userguide.icu-project.org/collation/architecture

If we concatenate the keys with the original values:
key00original valueoffset of original value
we get an entity where the ordering is still correct upon comparison and
where the original value can be extracted by using the offset from the
last int (or maybe short, to spare 2 bytes) in the BytesRef.

If the idea is sound, I'll open a JIRA issue. Unfortunately I do not
have time right now for hacking on it.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453070#comment-13453070
 ] 

Michael McCandless commented on LUCENE-4355:


+1, looks great!

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4355.patch, LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Collator-based facet sorting in Solr

2012-09-11 Thread Robert Muir
On Tue, Sep 11, 2012 at 10:43 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:

 ICU sort keys are always null (00) terminated and when two keys are
 compared, the comparison stops as soon as null is reached(?)
 http://userguide.icu-project.org/collation/architecture

 If we concatenate the keys with the original values:
 key00original valueoffset of original value
 we get an entity where the ordering is still correct upon comparison and
 where the original value can be extracted by using the offset from the
 last int (or maybe short, to spare 2 bytes) in the BytesRef.


I think the idea is sound, but I don't think we need the offset? I'm
fairly positive ICU
collation keys explicitly avoid 0 bytes except for the null
terminator. So the original value
can be extracted after the fact just by looking for the terminator...
such a thing
could even be done client-side and i dont think we need the offset for
speed either,
because its something you would do before final display.

we need to verify what I'm saying is true about avoiding 0 bytes, I'll
look into it.

Of course such an option is only useful for the new
ICUCollationAnalyzer (solr's ICUCollationField uses that)
because the older deprecated filters are encoded in a different way: I
think we should leave those alone.

-- 
lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Collator-based facet sorting in Solr

2012-09-11 Thread Robert Muir
Just a concern where things could act a little funky:

today for example, If I set strength=primary, then its going to fold
Test and test to the same unique term,
but under this scheme you would have bytesTest and bytestest as two terms.

this could be undesirable in the typical case that you just want
case-insensitive facets: but we don't provide
any way to preprocess the text to avoid this.

Really a lot of this is because factory-based analysis chains have no
way to specify the AttributeFactory,
e.g. i guess if we really wanted to fix this right we would need to
pass in the AttributeFactory to TokenizerFactory's create() method.

But for now from Solr it would be a little hacky, e.g. someone is
gonna have to fold the case client-side or whatever
if they don't want these problems.


On Tue, Sep 11, 2012 at 10:43 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 Claudio Ranieri and I briefly discussed collator based sorting for
 facets in the thread Problem with accented words sorting on the
 solr-user mailing list. Here's the idea:

 Solr faceting supports sorting by either count or index order. Claudio
 and I both need the order to be collator-based. My understanding of the
 issue is that it is not currently possible.

 Collator-based document sorting in Solr uses CollationKeys as field
 values. This does not work with faceting on fields with multiple values
 as there is no mapping from the key to the human readable value.

 ICU sort keys are always null (00) terminated and when two keys are
 compared, the comparison stops as soon as null is reached(?)
 http://userguide.icu-project.org/collation/architecture

 If we concatenate the keys with the original values:
 key00original valueoffset of original value
 we get an entity where the ordering is still correct upon comparison and
 where the original value can be extracted by using the offset from the
 last int (or maybe short, to spare 2 bytes) in the BytesRef.

 If the idea is sound, I'll open a JIRA issue. Unfortunately I do not
 have time right now for hacking on it.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3891) Documents loaded at search time (IndexReader.document) should be a different class from the index-time Document

2012-09-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3891.


   Resolution: Duplicate
Fix Version/s: (was: 4.1)
   5.0

Fixed in LUCENE-3312.

 Documents loaded at search time (IndexReader.document) should be a different 
 class from the index-time Document
 ---

 Key: LUCENE-3891
 URL: https://issues.apache.org/jira/browse/LUCENE-3891
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: 5.0


 The fact that the Document you can load at search time is the same Document 
 class you had indexed is horribly trappy in Lucene, because, the loaded 
 document necessarily loses information like field boost, whether a field was 
 tokenized, etc.  (See LUCENE-3854 for a recent example).
 We should fix this, statically, so that it's an entirely different class at 
 search time vs index time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2915) make CoreCodecProvider convenience class so apps can easily pick per-field codecs

2012-09-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2915.


   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0

PerFieldPostingsFormat solved this.

 make CoreCodecProvider convenience class so apps can easily pick per-field 
 codecs
 -

 Key: LUCENE-2915
 URL: https://issues.apache.org/jira/browse/LUCENE-2915
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2915.patch


 We already have DefaultCodecProvider, which simply registers all core codecs 
 and uses Standard for all fields, but it's package private.
 We should make this public, and name it CoreCodecProvider.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2807) Improve test debuggability through ant

2012-09-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2807.


   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0

Dawid already fixed these issues (thanks!).

 Improve test debuggability through ant
 --

 Key: LUCENE-2807
 URL: https://issues.apache.org/jira/browse/LUCENE-2807
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Michael McCandless
 Fix For: 4.0


 Some small improvements would go a long ways...
 When trying to debug an intermittent fail, I usually run w/ 
 -Dtests.verbose=true and w/ many iters.  But because the formatter buffers 
 this can hit OOME, so maybe we make an unbuffered formatter.  Also, it'd be 
 nice if we could have the formatter discard output for a given iter if there 
 was no failure, and I think the iters should stop as soon as a failure is hit.
 Maybe somehow we make a new tests.mode that would switch on these behaviours?
 Unbuffered formatter is also vital when debugging a deadlock...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-3825) Log document IDs when they are retrieved

2012-09-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-3825:
-

Assignee: Grant Ingersoll

 Log document IDs when they are retrieved
 

 Key: SOLR-3825
 URL: https://issues.apache.org/jira/browse/SOLR-3825
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Scott Stults
Assignee: Grant Ingersoll
Priority: Trivial
 Attachments: SOLR-3825.patch


 During relevancy tuning it's important to know exactly which documents the 
 client has seen. Right now the only way to get that list is to splice into 
 the HTTP traffic. Preferably the IDs could be logged along with the query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4

2012-09-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453133#comment-13453133
 ] 

David Smiley commented on LUCENE-4197:
--

committed removal of PrefixCellsTokenizer

 Small improvements to Lucene Spatial Module for v4
 --

 Key: LUCENE-4197
 URL: https://issues.apache.org/jira/browse/LUCENE-4197
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
 Fix For: 4.0

 Attachments: LUCENE-4197_rename_CachedDistanceValueSource.patch, 
 LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch,
  SpatialArgs-_remove_unused_min_and_max_params.patch


 This issue is to capture small changes to the Lucene spatial module that 
 don't deserve their own issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-3823.
--

Resolution: Invalid

The error is because of the colon character, it has meaning in a query and must 
be escaped. See: http://lucene.apache.org/core/3_6_1/queryparsersyntax.html.

So I'll close this as invalid, if you disagree please let us know.

BTW, it's better to raise this kind of question on the user's list rather than 
open a JIRA, at least until you're sure it's really a bug.

 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453137#comment-13453137
 ] 

Robert Muir commented on LUCENE-4355:
-

Thanks Mike: Ill give some time in case anyone else wants to review, but i'd 
like to commit this in a day or two.

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4355.patch, LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2723) Speed up Lucene's low level bulk postings read API

2012-09-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2723.


Resolution: Won't Fix

We nuked the low level bulk postings API... and BlockPostingsFormat now does 
bulk reads under the hood and gives great performance ...

 Speed up Lucene's low level bulk postings read API
 --

 Key: LUCENE-2723
 URL: https://issues.apache.org/jira/browse/LUCENE-2723
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.1

 Attachments: LUCENE-2723-BulkEnumWrapper.patch, 
 LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, 
 LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, LUCENE-2723.patch, 
 LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
 LUCENE-2723.patch, LUCENE-2723_termscorer.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723_wastedint.patch


 Spinoff from LUCENE-1410.
 The flex DocsEnum has a simple bulk-read API that reads the next chunk
 of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
 (from LUCENE-1410).  This is not unlike sucking coffee through those
 tiny plastic coffee stirrers they hand out airplanes that,
 surprisingly, also happen to function as a straw.
 As a result we see no perf gain from using FOR/PFOR.
 I had hacked up a fix for this, described at in my blog post at
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
 I'm opening this issue to get that work to a committable point.
 So... I've worked out a new bulk-read API to address performance
 bottleneck.  It has some big changes over the current bulk-read API:
   * You can now also bulk-read positions (but not payloads), but, I
  have yet to cutover positional queries.
   * The buffer contains doc deltas, not absolute values, for docIDs
 and positions (freqs are absolute).
   * Deleted docs are not filtered out.
   * The doc  freq buffers need not be aligned.  For fixed intblock
 codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
 Group varint, etc.) they won't be.
 It's still a work in progress...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2301) search for fix all TODO 4.0 comments before releasing 4.0

2012-09-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2301.


   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0
   5.0

I can't find any more TODO 4.0s ... lots of generic TODOs though :)

 search for  fix all TODO 4.0 comments before releasing 4.0
 -

 Key: LUCENE-2301
 URL: https://issues.apache.org/jira/browse/LUCENE-2301
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Michael McCandless
Priority: Minor
 Fix For: 5.0, 4.0


 Let's try to use the specific string?:
 {code}
 TODO 4.0
 {code}
 to mark any place where we must do something for 4.0?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453151#comment-13453151
 ] 

Erick Erickson commented on LUCENE-4369:


Anything with Raw is good. The problem with Keyword or Untokenized or 
Unanalyzed in the name is that it rather assumes that the user is familiar 
with what those terms mean in Lucene. If they're experienced enough to 
understand _that_, they're less likely to fall into this error in the first 
place.

We could do something that removes it from consideration unless people dig. I 
understand it's a general field, but how about something like Identifier (I'm 
not too keen on that name actually). I'm reaching for something that is 
naturally thought of as a type suitable for uniqueKey fields but requires 
one to dig a bit before using it for other fields.

OK, an idea out of left field, why do we have a string as a type anyway? Does 
it make any sense to just remove it and have people use KeywordTokenizer when 
they want this behavior? I'm ready for _this_ idea to be shot down in flames 
G

I suppose in the Solr world, we could just remove the string type from 
schema.xml and provide an example fieldType that was only KeyworTokenized and 
avoid a world of confusion for many users.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453159#comment-13453159
 ] 

Robert Muir commented on LUCENE-4369:
-

{quote}
OK, an idea out of left field, why do we have a string as a type anyway? Does 
it make any sense to just remove it and have people use KeywordTokenizer when 
they want this behavior? I'm ready for this idea to be shot down in flames 
G
{quote}

I've said the same thing before, but I figure I won't get consensus for that. 

I'm happy to just get the name to be anything but String for now :)

Its still screwed up there are things like setBoost() at all on StringField 
when it omits norms etc,
and screwed up that it bypasses the Analyzer (the classic NOT_ANALYZED 
problem), but
fixing the name would at least help.


 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2163) Remove synchronized from DirReader.reopen/clone

2012-09-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2163:
---

Attachment: LUCENE-2163.patch

We've made awesome progress removing sync'd, now that SR is read-only ... but I 
found a few remaining sync'd in StandardDirectoryReader that I *think* are not 
necessary?

Eg doClose is already protected by IR.close (only one thread will decRef to 
RC=0).

And for doOpenIfChanged/noWriter... why do they need to be sync'd?  If it's 
solely to prevent strange exceptions when one thread is closing while another 
is reopening ... I don't think we need to do that (it's best effort, and I 
think likely you'd get ACE anyway since we'd try to incRef an already-closed 
SR)?

But then again I suppose the sync'd are not really hurting anything (it won't 
block searches since nothing else is sync'd...).  Still it's nice to remove 
them if we can, in case something on the search path does become sync'd at some 
point ...

 Remove synchronized from DirReader.reopen/clone
 ---

 Key: LUCENE-2163
 URL: https://issues.apache.org/jira/browse/LUCENE-2163
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 4.1

 Attachments: LUCENE-2163.patch


 Spinoff from LUCENE-2161, where the fact that DirReader.reopen is
 sync'd was dangerous in the context of NRT (could block all searches
 against that reader when CMS was throttling).  So, with LUCENE-2161,
 we're removing the synchronization when it's an NRT reader that you're
 reopening.
 But... why should we sync even for a normal reopen?  There are
 various sync'd methods on IndexReader/DirReader (we are reducing that,
 with LUCENE-2161 and also LUCENE-2156), but, in general it doesn't
 seem like normal reopen really needs to be sync'd.  Performing a reopen
 shouldn't incur any chance of blocking a search...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2143) Understand why NRT performance is affected by flush frequency

2012-09-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2143.


Resolution: Not A Problem

This is a hotspot issue ... not much we can do about it.

 Understand why NRT performance is affected by flush frequency
 -

 Key: LUCENE-2143
 URL: https://issues.apache.org/jira/browse/LUCENE-2143
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.1

 Attachments: SearchTest.java


 In LUCENE-2061 (perf tests for NRT), I test NRT performance by first
 getting a baseline QPS with only searching, using enough threads to
 saturate.
 Then, I pick an indexing rate (I used 100 docs/sec), and index docs at
 that rate, and I also reopen a NRT reader at different frequencies
 (10/sec, 1/sec, every 5 seconds, etc.), and then again test QPS
 (saturated).
 I think this is a good approach for testing NRT -- apps can see, as a
 function of freshness and at a fixed indexing rate, what the cost is
 to QPS.  You'd expect as index rate goes up, and freshness goes up,
 QPS will go down.
 But I found something very strange: the low frequency reopen rates
 often caused a highish hit to QPS.  When I forced IW to flush every
 100 docs (= once per second), the performance was generally much
 better.
 I actually would've expected the reverse -- flushing in batch ought to
 use fewer resoruces.
 One theory is something odd about my test env (based on OpenSolaris),
 so I'd like to retest on a more mainstream OS.
 I'm opening this issue to get to the bottom of it...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2093) Use query-private scope instead of shared Term-TermInfo cache

2012-09-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2093.


   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0
   5.0

We've improved queries so they now save their own term state during rewrite and 
re-use it during matching.

 Use query-private scope instead of shared Term-TermInfo cache
 --

 Key: LUCENE-2093
 URL: https://issues.apache.org/jira/browse/LUCENE-2093
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Michael McCandless
Priority: Minor
 Fix For: 5.0, 4.0


 Spinoff of LUCENE-2075.
 We currently use a shared terms cache so multiple resolves of the same term 
 within execution of a single query save CPU.  But this ties up a good amount 
 of long term RAM...
 So, it might be better to instead create a query private scope, where 
 places in Lucene like the terms dict could store  retrieve results.  The 
 scope would be private to each running query, and would be GCable as soon as 
 the query completes.  Then we've have perfect within query hit rate...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453174#comment-13453174
 ] 

Michael McCandless commented on LUCENE-4369:


I think it's useful to have a dedicated sugar field for things like primary 
keys, URLs, enumerated fields (country, state, zip code), entitlements 
fields (ACLs), tags, etc., and when users do this directly today I suspect they 
often forget to disable norms and index with docs-only.

But I agree the name is trappy now.

+1 for ExactTextField.  I don't really like raw: it sounds too ... low level. 
 Like it's not even gonna be indexed or something.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2163) Remove synchronized from DirReader.reopen/clone

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453180#comment-13453180
 ] 

Uwe Schindler commented on LUCENE-2163:
---

If doClose is protected by close() it is a no-op, so it does not matter if 
there is a sync or not :-)

The other ones I already wanted to remove while refactoring, I just missed to 
do it. I think this issue is a relict from earlier times... I would just commit 
that removal. If you sync on reopen, you must sync everything.

 Remove synchronized from DirReader.reopen/clone
 ---

 Key: LUCENE-2163
 URL: https://issues.apache.org/jira/browse/LUCENE-2163
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 4.1

 Attachments: LUCENE-2163.patch


 Spinoff from LUCENE-2161, where the fact that DirReader.reopen is
 sync'd was dangerous in the context of NRT (could block all searches
 against that reader when CMS was throttling).  So, with LUCENE-2161,
 we're removing the synchronization when it's an NRT reader that you're
 reopening.
 But... why should we sync even for a normal reopen?  There are
 various sync'd methods on IndexReader/DirReader (we are reducing that,
 with LUCENE-2161 and also LUCENE-2156), but, in general it doesn't
 seem like normal reopen really needs to be sync'd.  Performing a reopen
 shouldn't incur any chance of blocking a search...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453199#comment-13453199
 ] 

Robert Muir commented on LUCENE-2684:
-

{quote}
It is actually possible to force this, today, by having your collector return 
false from acceptDocsOutOfOrder...
{quote}

Well you are using a custom collector anyway if you are doing this, so can't we 
just add a sentence to that
method's javadocs indicating that you should return false if you want to use 
the scorer navigation apis?

 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4345) Create a Classification module

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453208#comment-13453208
 ] 

Robert Muir commented on LUCENE-4345:
-

Can we remove the ClassificationException? It only seems to box IOException... 
we can just throw IOException directly instead?

 Create a Classification module
 --

 Key: LUCENE-4345
 URL: https://issues.apache.org/jira/browse/LUCENE-4345
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
 Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, 
 SOLR-3700_2.patch, SOLR-3700.patch


 Lucene/Solr can host huge sets of documents containing lots of information in 
 fields so that these can be used as training examples (w/ features) in order 
 to very quickly create classifiers algorithms to use on new documents and / 
 or to provide an additional service.
 So the idea is to create a contrib module (called 'classification') to host a 
 ClassificationComponent that will use already seen data (the indexed 
 documents / fields) to classify new documents / text fragments.
 The first version will contain a (simplistic) Lucene based Naive Bayes 
 classifier but more implementations should be added in the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4373) BBoxStrategy should support query shapes of any type

2012-09-11 Thread David Smiley (JIRA)
David Smiley created LUCENE-4373:


 Summary: BBoxStrategy should support query shapes of any type
 Key: LUCENE-4373
 URL: https://issues.apache.org/jira/browse/LUCENE-4373
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
Priority: Minor


It's great that BBoxStrategy has sophisticated shape area similarity based on 
bounding box, but I think that doesn't have to preclude having a 
non-rectangular query shape.  The bbox to bbox query implemented already is 
probably pretty pretty fast as can work by numeric range queries, but I'd like 
this to be the first stage of which the 2nd is a FieldCache based comparison to 
the query shape if it's not a rectangle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453218#comment-13453218
 ] 

Uwe Schindler commented on LUCENE-2684:
---

I think this issue is fixed already? VisitSubScorers works in 3.6.2 (if it gets 
released, Robert backported) and in 4.0 its working, too?

As you need a custom collector anyway to make use of Scorer.getChildren(), we 
should maybe make BS1 throw UOE on getChildren() in 4.0 (explaining that you 
need inOrder) and visitSubScorers in 3.6.2?

 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453223#comment-13453223
 ] 

Robert Muir commented on LUCENE-2684:
-

{quote}
As you need a custom collector anyway to make use of Scorer.getChildren(), we 
should maybe make BS1 throw UOE on getChildren() in 4.0 (explaining that you 
need inOrder) and visitSubScorers in 3.6.2?
{quote}

+1, i think for freq() and getChildren() we should throw UOE with text like 
this. But we can also do the javadocs too.

Then i think there would be a lot less surprises.

 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4374) Spatial- rename vector.TwoDoublesStrategy to vector.PointVectorStrategy

2012-09-11 Thread David Smiley (JIRA)
David Smiley created LUCENE-4374:


 Summary: Spatial- rename vector.TwoDoublesStrategy to 
vector.PointVectorStrategy
 Key: LUCENE-4374
 URL: https://issues.apache.org/jira/browse/LUCENE-4374
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
 Fix For: 4.0


TwoDoubles isn't necessarily appropriate since it could be two floats, once it 
is enhanced to make that configurable.  I like PointVector because it's clear 
it indexes points.  Eventually I could imagine a CircleVectorStrategy in the 
same package.

This does suggest BBoxStrategy should be RectVectorStrategy in the vector 
package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3826) Allow unit test classes to specify core name during setup

2012-09-11 Thread Amit Nithian (JIRA)
Amit Nithian created SOLR-3826:
--

 Summary: Allow unit test classes to specify core name during setup
 Key: SOLR-3826
 URL: https://issues.apache.org/jira/browse/SOLR-3826
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0-BETA
Reporter: Amit Nithian
Priority: Minor
 Fix For: 4.0


When creating a unit test extending SolrTestCaseJ4, the corename is forced to 
collection1 which can be problematic if you want to do unit tests relying on 
schema/solrconfig specific to a core. Rather than hard-coding to collection1 
allow this to be specified in the initCore method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3826) Allow unit test classes to specify core name during setup

2012-09-11 Thread Amit Nithian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Nithian updated SOLR-3826:
---

Attachment: SOLR-3826.patch

Simple patch demonstrating what I am talking about. Hopefully not too 
problematic :-)

 Allow unit test classes to specify core name during setup
 -

 Key: SOLR-3826
 URL: https://issues.apache.org/jira/browse/SOLR-3826
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0-BETA
Reporter: Amit Nithian
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3826.patch


 When creating a unit test extending SolrTestCaseJ4, the corename is forced to 
 collection1 which can be problematic if you want to do unit tests relying 
 on schema/solrconfig specific to a core. Rather than hard-coding to 
 collection1 allow this to be specified in the initCore method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453238#comment-13453238
 ] 

Michael McCandless commented on LUCENE-2684:


+1

But we should word it as a workaround ... ie, it's sort of strange that 
returning false from this unrelated method means suddenly scorer.freq() works: 
that's really an implementation detail.  EG someday we could make BS1 score 
docs in order (it is possible, just not sure it'd be performant), and then this 
workaround no longer works.


 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453245#comment-13453245
 ] 

Uwe Schindler commented on LUCENE-2684:
---

It does not only affect score(). In my case it was retrieving the subquery 
score...

bq. EG someday we could make BS1 score docs in order (it is possible, just not 
sure it'd be performant), and then this workaround no longer works.

But with in-order scoring we are in all cases use correctly positioned scorers, 
otherwise it is a bug (like the DisjunctionSumScorer bug in 3.6 and 4.0 we 
fixed recently). So returning false works around the issue currently, but it 
would not hurt if somebody would return false, although our new BS1 can handle 
in order. But on the other hand, if BS1 would score in order, but not position 
sub-scorers correctly it is clearly a bug!

 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453246#comment-13453246
 ] 

Robert Muir commented on LUCENE-2684:
-

{quote}
But we should word it as a workaround ... ie, it's sort of strange that 
returning false from this unrelated method means suddenly scorer.freq() works: 
that's really an implementation detail. EG someday we could make BS1 score docs 
in order (it is possible, just not sure it'd be performant), and then this 
workaround no longer works.
{quote}

I don't agree: the strangeness is the two booleans toplevelScorer and 
scoreDocsInOrder. If we wanted to do this in the future, we could just rename 
scoreDocsInOrder
to needsNavigation. 

Or we could just fold both the booleans into 'BS1 is ok' ... are they used 
anywhere else? :)


 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453245#comment-13453245
 ] 

Uwe Schindler edited comment on LUCENE-2684 at 9/12/12 5:20 AM:


It does not only affect freq(). In my case it was retrieving the subquery 
score...

bq. EG someday we could make BS1 score docs in order (it is possible, just not 
sure it'd be performant), and then this workaround no longer works.

But with in-order scoring we are in all cases use correctly positioned scorers, 
otherwise it is a bug (like the DisjunctionSumScorer bug in 3.6 and 4.0 we 
fixed recently). So returning false works around the issue currently, but it 
would not hurt if somebody would return false, although our new BS1 can handle 
in order. But on the other hand, if BS1 would score in order, but not position 
sub-scorers correctly it is clearly a bug!

  was (Author: thetaphi):
It does not only affect score(). In my case it was retrieving the subquery 
score...

bq. EG someday we could make BS1 score docs in order (it is possible, just not 
sure it'd be performant), and then this workaround no longer works.

But with in-order scoring we are in all cases use correctly positioned scorers, 
otherwise it is a bug (like the DisjunctionSumScorer bug in 3.6 and 4.0 we 
fixed recently). So returning false works around the issue currently, but it 
would not hurt if somebody would return false, although our new BS1 can handle 
in order. But on the other hand, if BS1 would score in order, but not position 
sub-scorers correctly it is clearly a bug!
  
 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3628) SolrDocument uses user-provided collections unsafely

2012-09-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3628:
---

Attachment: SOLR-3628.patch

updated patch to include a test that the field really is backed by the 
collection (since we now explicitly document it)

will commit a soon as full test run finishes

 SolrDocument uses user-provided collections unsafely
 

 Key: SOLR-3628
 URL: https://issues.apache.org/jira/browse/SOLR-3628
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.6, 4.0-ALPHA
 Environment: Mac OS X 10.7.4, Java 6
Reporter: Tom Switzer
Assignee: Hoss Man
 Fix For: 4.0

 Attachments: SOLR-3628.patch, solrdoc-ro-list-bug-comp.patch, 
 solrdoc-ro-list-bug.patch


 Adding a RO Collection as the value of a field (ie. SolrDocument or 
 SolrInputField) will result in an UnsupportedOperationException later on when 
 adding more values to that field.
 This happens because no defensive copy of collections are made. Instead, if a 
 collection is given first, then it becomes the backing collection for the 
 field. This can cause problems if the collection is modified after the fact 
 or if a read-only collection is given (eg. Collection.unmodifiableList(...)).
 It can be reproduced with:
 SolrDocument doc = new SolrDocument()
 doc.addField(v, Collections.unmodifiableList(new ArrayListObject()))
 doc.addField(v, a)
 I've created a patch that includes a fix and a test with, essentially, the 
 above. The patch just ensures that SolrDocument and SolrInputField always use 
 a Collection they created as the value, rather than relying on what was given 
 to them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453259#comment-13453259
 ] 

Uwe Schindler commented on LUCENE-2684:
---

An idea (separate issue!) would be:
BS1 completely violates the scorer interface, the only method you can call is 
the one taking a Collector. In my opinion, BS1 should *not* implement the 
Scorer interface, that the whole bug! It should maybe some separate class like 
OutOfOrderDocIdReporter (name is just an example) that only implements 
collect(Collector). And the navigation api (advance, next) should be separated 
from score() and freq() - a simple java interface Scorer. So the current 
in-order scorer would be a simple DocIdSetIterator that additionally implements 
the Scorer interface (to provide score() and freq()) and current out-of-order 
scorers would implement only the OutOfOrderDocIdReporter API and pass a inlined 
Scorer interface (without advance and next) to the setScorer() method (like 
BucketScorer currently).

 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453261#comment-13453261
 ] 

Robert Muir commented on LUCENE-2684:
-

Collectible... (not serious)

 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453262#comment-13453262
 ] 

Michael McCandless commented on LUCENE-2684:


The problem is that scoresDocsInOrder doesn't really capture what's necessary 
here (yes, it works today, but, not necessarily tomorrow).

I agree Uwe: if we add a Collector.needsNavigation() then even a fixed BS1 
that sorted the docIDs before collection would not be usable since the subs 
will not be on the doc during collect().

And I agree Robert: the current booleans topLevelScorer and 
scoreDocsInOrder, and then a new needsNavigation, will make things rather 
confusing.  Really I think topLevelScorer should be strongly typed: the intent 
is to declare whether you will call Scorer.score(Collector) or whether you will 
call .nextDoc()/.score() ... they really should be different classes.

If we don't think any other future scorer would want to score docs NOT in order 
... then maybe we should simple rename scoreDocsInOrder to needsNavigation?  
(Or scoreDocAtOnce, scoreDocAtATime, something else...).

 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453269#comment-13453269
 ] 

Robert Muir commented on LUCENE-2684:
-

{quote}
If we don't think any other future scorer would want to score docs NOT in order 
... then maybe we should simple rename scoreDocsInOrder to needsNavigation? (Or 
scoreDocAtOnce, scoreDocAtATime, something else...).
{quote}

I actually just remembered the query-time join i think does this too?

But yeah, if we are going to have booleans, i would prefer something more along 
the lines of document-at-a-time since its less confusing than
scoreDocsInOrder (its standard IR terminology and less confusing).


 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

2012-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453274#comment-13453274
 ] 

Michael McCandless commented on LUCENE-2684:


bq. BS1 completely violates the scorer interface, the only method you can call 
is the one taking a Collector. In my opinion, BS1 should not implement the 
Scorer interface, that the whole bug!

Well let's remember that the must have doc-at-once scoring, for all subs too 
is a very rare use-case.

The vast majority of users just need a fast .score(Collector) interface.

But yeah I agree: it should be strongly typed, and BS1 should only implement 
the .score(Collector) interface.  The ScoresDocAtOnce interface can easily 
implement the .score(Collector) interface (as Scorer does today...).

 it's not possible to access sub-query's freq information if BooleanScorer is 
 use
 

 Key: LUCENE-2684
 URL: https://issues.apache.org/jira/browse/LUCENE-2684
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Michael McCandless
 Fix For: 4.1


 LUCENE-2590 added an advanced feature, allowing an app to gather all 
 sub-scorers for any Query.
 This is powerful because then, during collection, the app can get some 
 details about how each sub-query participated in the overall match for the 
 given document.
 However, I think this is completely broken if the BooleanQuery uses 
 BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
 processes chunks of 2048 sequential docIDs per scorer.  This is a big 
 performance gain, but it means that the sub scorers will all be positioned to 
 the end of the 2048 doc chunk while the docs that matched within that chunk 
 are collected.
 I don't think we can easily fix this... likely the fix is to make it 
 easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
 actually possible to force this, today, by having your collector return false 
 from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3825) Log document IDs when they are retrieved

2012-09-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453287#comment-13453287
 ] 

Grant Ingersoll commented on SOLR-3825:
---

A few comments on the patch:

# SolrMBeanTest fails with this patch due to the description and source being 
null
# I don't think we want/need member variables for ids and idScores, as it won't 
be thread safe. I'd just loop the DocIterator once, building a StringBuilder 
and then calling addToLog on that StringBuilder. This will also avoid the need 
for clone()
# For the scores, let's just do an output of id:score, id:score, ...   Using a 
Map won't be reliable, as we will want to maintain order in the log.
# For the log key, let's just call it the same thing which should simplify 
parsing, regardless of whether there are scores present or not, so the format 
would be: responseLog: id1[:score1],id2[:score2],...  where [ ] is used to 
indicate it is optional.
# We should follow the normal SearchComponent pattern of being able to turn 
on/off the component via a request parameter.  {code}if 
(!params.getBool(COMPONENT_NAME, false)) {
  return;
}{code}  This component should be OFF by default.
# In the ResponseLogComponentTest, do we need the createCore() stuff?  See some 
of the other tests and how they use initCore.

 Log document IDs when they are retrieved
 

 Key: SOLR-3825
 URL: https://issues.apache.org/jira/browse/SOLR-3825
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Scott Stults
Assignee: Grant Ingersoll
Priority: Trivial
 Attachments: SOLR-3825.patch


 During relevancy tuning it's important to know exactly which documents the 
 client has seen. Right now the only way to get that list is to splice into 
 the HTTP traffic. Preferably the IDs could be logged along with the query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3628) SolrDocument uses user-provided collections unsafely

2012-09-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3628.


   Resolution: Fixed
Fix Version/s: 5.0

Committed revision 1383520. - trunk
Committed revision 1383533. - 4x

Thanks Tom!

 SolrDocument uses user-provided collections unsafely
 

 Key: SOLR-3628
 URL: https://issues.apache.org/jira/browse/SOLR-3628
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.6, 4.0-ALPHA
 Environment: Mac OS X 10.7.4, Java 6
Reporter: Tom Switzer
Assignee: Hoss Man
 Fix For: 4.0, 5.0

 Attachments: SOLR-3628.patch, solrdoc-ro-list-bug-comp.patch, 
 solrdoc-ro-list-bug.patch


 Adding a RO Collection as the value of a field (ie. SolrDocument or 
 SolrInputField) will result in an UnsupportedOperationException later on when 
 adding more values to that field.
 This happens because no defensive copy of collections are made. Instead, if a 
 collection is given first, then it becomes the backing collection for the 
 field. This can cause problems if the collection is modified after the fact 
 or if a read-only collection is given (eg. Collection.unmodifiableList(...)).
 It can be reproduced with:
 SolrDocument doc = new SolrDocument()
 doc.addField(v, Collections.unmodifiableList(new ArrayListObject()))
 doc.addField(v, a)
 I've created a patch that includes a fix and a test with, essentially, the 
 above. The patch just ensures that SolrDocument and SolrInputField always use 
 a Collection they created as the value, rather than relying on what was given 
 to them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3823:
---

Description: 
When using a boost query (bq) that contains a parentheses (like this example 
from the Relevancy Cookbook section of the wiki):

{noformat}
 ? defType = dismax 
 q = foo bar 
 bq = (*:* -xxx)^999 
{noformat}

You get the following error:


org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
Encountered  ) )  at line 1, column 12. Was expecting one of: EOF AND 
... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... 
QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... 
REGEXPTERM ... [ ... { ... NUMBER ...


  was:
When using a boost query (bq) that contains a parentheses (like this example 
from the Relevancy Cookbook section of the wiki):

 ? defType = dismax 
 q = foo bar 
 bq = (*:* -xxx)^999 

You get the following error:


org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
Encountered  ) )  at line 1, column 12. Was expecting one of: EOF AND 
... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... 
QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... 
REGEXPTERM ... [ ... { ... NUMBER ...



1) editing the issue description to include noformat tags -- i think Erick 
was getting confused by the \*:\* showing up as just :

2) i can't reproduce the described problem.  When i tried using the solr 
example data, this request worked just fine...

http://localhost:8983/solr/select?q=ipoddefType=dismaxbq=%28*:*%20-id:IW-02%29^999

Mathos: please follow up on the solr-user@lucene mailing list with more details 
about the problems you are you having and your actual (specific) configs/queries


 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice

2012-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453310#comment-13453310
 ] 

Michael McCandless commented on LUCENE-4371:


+1

I think having II implement slice is much cleaner than Directory having to 
implement createSlicer returning an IndexInputSlicer with only one method.

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys

2012-09-11 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4173:
-

Attachment: 
LUCENE-4173_remove_ignoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch

This patch removes ignoreIncompatibleGeometry and modified the strategies to 
fail when given a shape that isn't the precise shape used -- no coalescing.  
BBox  TwoDoubles were both doing coalescing (e.g. shape.getBoundingBox()).  
PrefixTree can handle anything so change there.

I'll commit this pending your +1 Chris.

An enum for FAIL, COALESCE, or IGNORE can be done in another issue if desired.

 Remove IgnoreIncompatibleGeometry for SpatialStrategys
 --

 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
Assignee: David Smiley
 Attachments: LUCENE-4173.patch, 
 LUCENE-4173_remove_ignoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch


 Silently not indexing anything for a Shape is not okay.  Users should get an 
 Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Mathos Marcer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453314#comment-13453314
 ] 

Mathos Marcer commented on SOLR-3823:
-

The problem seems to be when I specify defType=edismax, under defType=dismax it 
is working like a champ.

 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-11 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453317#comment-13453317
 ] 

Steven Rowe commented on LUCENE-4369:
-

Serious suggestion: WholeTextField

(Following the raw/cooked food metaphor used in various computational contexts 
- whole food means unprocessed.)

I like ExactTextField too, but it's missing the beginning and end anchors: the 
intent is exactly this search string, but it doesn't necessarily imply and 
nothing else.  E.g. would a user armed only with the name assume that an 
ExactTextField query string two three would not match an indexed string one 
two three four?

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice

2012-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453320#comment-13453320
 ] 

Michael McCandless commented on LUCENE-4371:


I don't think the default impl (SlicedIndexInput) should overrided BII's 
copyBytes?  Seems ... spooky.

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453322#comment-13453322
 ] 

Robert Muir commented on LUCENE-4371:
-

I agree Mike, i wanted to remove it... but I'm afraid!

I also dont understand why we have DataOutput.copyBytes(DataInput), and also 
IndexInput.copyBytes(IndexOutput).
Is this all really necessary?

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reopened SOLR-3823:
--


Thanks, Hoss, you're right...

But I can get this to fail both with BETA and today's trunk with the example 
data.
{noformat}
http://localhost:8983/solr/select?q=foodefType=edismaxbq=(name:nonsense 
-xxx)^999
{noformat}
Interestingly this works: (note the space after bq), 
{noformat}
http://localhost:8983/solr/select?q=foodefType=edismaxbq =(name:nonsense 
-xxx)^999
{noformat}
This fails (spaces around parens, there was an issue with non-space parens 
lately, but apparently it's unrelated.)
{noformat}
http://localhost:8983/solr/select?q=foodefType=edismaxbq= ( name:nonsense 
-xxx ) ^999
{noformat}

Stack trace from log:

Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered 
EOF at line 1, column 1.
Was expecting one of:
NOT ...
+ ...
- ...
BAREOPER ...
( ...
* ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
REGEXPTERM ...
[ ...
{ ...
NUMBER ...
TERM ...
* ...

at 
org.apache.lucene.queryparser.classic.QueryParser.generateParseException(QueryParser.java:708)
at 
org.apache.lucene.queryparser.classic.QueryParser.jj_consume_token(QueryParser.java:590)
at 
org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:275)
at 
org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:181)
at 
org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:261)
at 
org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:181)
at 
org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:120)
... 35 more

Sep 11, 2012 12:37:58 PM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select 
params={q=foodefType=edismaxbq=+(+name:nonsense+-xxx+)+^999} status=400 
QTime=2 


 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Mathos Marcer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453349#comment-13453349
 ] 

Mathos Marcer commented on SOLR-3823:
-

I'm glad I'm not just going crazy :-)

I did notice while the space before the equal sign (ie bq =(name:nonsense 
-xxx)^999) doesn't produce a parsing error, comparing results between 3.6 and 
4.0 BETA, it doesn't appear to be applying the boost.  In fact I get the same 
results as if I didn't have the bq option there at all.

 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Mathos Marcer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453350#comment-13453350
 ] 

Mathos Marcer commented on SOLR-3823:
-

Actually looking at it closer, it is probably because with adding the space 
after bq is it doesn't register it as bq but as bq  looking at the params 
section of the query:


str name=bq (*:* -replacement)^9950/str


 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Mathos Marcer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453350#comment-13453350
 ] 

Mathos Marcer edited comment on SOLR-3823 at 9/12/12 7:06 AM:
--

Actually looking at it closer, it is probably because with adding the space 
after bq is it doesn't register it as bq but as bq  looking at the params 
section of the query:


str name=bq (*:* -replacement)^9950/str


  was (Author: mathos):
Actually looking at it closer, it is probably because with adding the space 
after bq is it doesn't register it as bq but as bq  looking at the params 
section of the query:


str name=bq (*:* -replacement)^9950/str

  
 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Mathos Marcer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453350#comment-13453350
 ] 

Mathos Marcer edited comment on SOLR-3823 at 9/12/12 7:08 AM:
--

Actually looking at it closer, it is probably because with adding the space 
after bq is it doesn't register it as bq but as bq  looking at the params 
section of the query:


str name=bq (\*:\* -replacement)^9950/str


  was (Author: mathos):
Actually looking at it closer, it is probably because with adding the space 
after bq is it doesn't register it as bq but as bq  looking at the params 
section of the query:


str name=bq (*:* -replacement)^9950/str

  
 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2747) Include formatted Changes.html for release

2012-09-11 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved SOLR-2747.
---

Resolution: Fixed

We can add Lucene Changes.html generation in a separate issue.

 Include formatted Changes.html for release
 --

 Key: SOLR-2747
 URL: https://issues.apache.org/jira/browse/SOLR-2747
 Project: Solr
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Steven Rowe
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-2747_fix.patch, SOLR-2747.patch, SOLR-2747.patch, 
 SOLR-2747.patch, SOLR-2747.patch, SOLR-2747.patch


 Just like when releasing Lucene, Solr should also have a html formatted 
 changes file.
 The Lucene Perl script (lucene/src/site/changes/changes2html.pl) should be 
 reused.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-11 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453356#comment-13453356
 ] 

Erick Erickson commented on SOLR-3823:
--

FWIW, I'm on a Mac (Lion) too, although I doubt that matters.

 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer

 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4362) ban tab-indented source

2012-09-11 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated LUCENE-4362:
---

Attachment: LUCENE-4362_4x.patch

OK, after waiting 50 minutes for the tests to complete, all tests pass with 
these two patches (trunk and 4x).

So if I check all this in, it'll change the generated java files since they 
were newly generated from the changes to the jflex/jj files. Is this the usual 
procedure?

This doesn't address the tabs introduced by the parser compilers.

If no one objects, I'll check this in probably tonight or tomorrow.

But I'd still like to keep this open even so. Between last week and now more 
tabs have been introduced into source.

Any suggestions about what to do about tabs introduced into generated files?

 ban tab-indented source
 ---

 Key: LUCENE-4362
 URL: https://issues.apache.org/jira/browse/LUCENE-4362
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
Assignee: Erick Erickson
 Attachments: LUCENE-4326_trunk.patch, LUCENE-4362_4x.patch, 
 LUCENE-4362_4x.patch, LUCENE-4362_core.patch, LUCENE-4362.patch, 
 LUCENE-4362.patch


 This makes code really difficult to read and work with.
 Its easy enough to prevent.
 {noformat}
 Index: build.xml
 ===
 --- build.xml (revision 1380979)
 +++ build.xml (working copy)
 @@ -77,11 +77,12 @@
  or
containsregexp expression=@author\b casesensitive=yes/
containsregexp expression=\bno(n|)commit\b casesensitive=no/
 +  containsregexp expression=\t casesensitive=no/
  /or
/fileset
map from=${validate.currDir}${file.separator} to=* /
  /pathconvert
 -fail if=validate.patternsFoundThe following files contain @author 
 tags or nocommits:${line.separator}${validate.patternsFound}/fail
 +fail if=validate.patternsFoundThe following files contain @author 
 tags, tabs or nocommits:${line.separator}${validate.patternsFound}/fail
/target
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >