date:20110708


[ 
https://issues.apache.org/jira/browse/SOLR-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061787#comment-13061787
 ] 

Simon Willnauer commented on SOLR-1825:
---

+1 - shoot for it

 SolrQuery.addFacetQuery should call setFacet(true)
 --

 Key: SOLR-1825
 URL: https://issues.apache.org/jira/browse/SOLR-1825
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.5
Reporter: David Smiley
Assignee: Chris Male
Priority: Trivial
 Attachments: SOLR-1825.patch, solr1825.patch


 Note that solrQuery.addFacetField(name) does enable faceting automatically 
 but addFacetQuery does not.  This is inconsistent.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file

2011-07-08 Thread Bill Bell (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061791#comment-13061791
 ] 

Bill Bell commented on SOLR-2616:
-

+1 please!!

 Include jdk14 logging configuration file
 

 Key: SOLR-2616
 URL: https://issues.apache.org/jira/browse/SOLR-2616
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2616_jdk14logging_setup.patch


 The /example/ Jetty Solr configuration should include a basic logging 
 configuration file.  Looking at this wiki page: 
 http://wiki.apache.org/solr/LoggingInDefaultJettySetup  I am creating this 
 patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3291) bugs in memorycodec with lots of docs

bugs in memorycodec with lots of docs
-

 Key: LUCENE-3291
 URL: https://issues.apache.org/jira/browse/LUCENE-3291
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/codecs
Affects Versions: 4.0
Reporter: Robert Muir


While working on LUCENE-3290, I noticed a readVint that i thought should be a 
readVLong, so I wrote a test (Test2BPostings)
to try to catch things like this... it takes about 5 minutes to run with 
MemoryCodec.

The problem is, it dies on some other bug in FSTs first!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3291) bugs in memorycodec with lots of docs


 [ 
https://issues.apache.org/jira/browse/LUCENE-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3291:


Attachment: LUCENE-3291_test.patch

here's the test, it indexes 26 terms (a..z) per doc about 80M times to create 
just over Integer.MAX_VALUE t-d pairs.

with memorycodec (ant test-core -Dtestcase=Test2BPostings -Dtests.codec=Memory) 
it fails like this:

{noformat}
[junit] Caused by: java.lang.ArrayIndexOutOfBoundsException
[junit] at java.lang.System.arraycopy(Native Method)
[junit] at 
org.apache.lucene.util.fst.FST$BytesWriter.writeBytes(FST.java:855)
[junit] at 
org.apache.lucene.util.fst.ByteSequenceOutputs.write(ByteSequenceOutputs.java:113)
[junit] at 
org.apache.lucene.util.fst.ByteSequenceOutputs.write(ByteSequenceOutputs.java:32)
[junit] at org.apache.lucene.util.fst.FST.addNode(FST.java:401)
[junit] at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:120)
[junit] at 
org.apache.lucene.util.fst.Builder.compileNode(Builder.java:153)
[junit] at org.apache.lucene.util.fst.Builder.finish(Builder.java:440)
[junit] at 
org.apache.lucene.index.codecs.memory.MemoryCodec$TermsWriter.finish(MemoryCodec.java:228)
{noformat}

 bugs in memorycodec with lots of docs
 -

 Key: LUCENE-3291
 URL: https://issues.apache.org/jira/browse/LUCENE-3291
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/codecs
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3291_test.patch


 While working on LUCENE-3290, I noticed a readVint that i thought should be a 
 readVLong, so I wrote a test (Test2BPostings)
 to try to catch things like this... it takes about 5 minutes to run with 
 MemoryCodec.
 The problem is, it dies on some other bug in FSTs first!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3289) FST should allow controlling how hard builder tries to share suffixes

2011-07-08 Thread Eks Dev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061804#comment-13061804
 ] 

Eks Dev commented on LUCENE-3289:
-

bq. The strings are extremely long (more like short documents) and probably 
need to be compressed in some different datastructure, e.g. a word-based one?

That would be indeed cool, e.g. FST with words (ngrams?) as symbols. Ages ago 
we used one trie, for all unique terms to get prefix/edit distance on words and 
one word-trie (symbols were words via symbol table) for documents. I am sure 
this would cut memory requirements significantly for multiword cases when 
compared to char level FST.
e.g. TermDictionary that supports ord() could be used as a symbol table.






 FST should allow controlling how hard builder tries to share suffixes
 -

 Key: LUCENE-3289
 URL: https://issues.apache.org/jira/browse/LUCENE-3289
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3289.patch, LUCENE-3289.patch


 Today we have a boolean option to the FST builder telling it whether
 it should share suffixes.
 If you turn this off, building is much faster, uses much less RAM, and
 the resulting FST is a prefix trie.  But, the FST is larger than it
 needs to be.  When it's on, the builder maintains a node hash holding
 every node seen so far in the FST -- this uses up RAM and slows things
 down.
 On a dataset that Elmer (see java-user thread Autocompletion on large
 index on Jul 6 2011) provided (thank you!), which is 1.32 M titles
 avg 67.3 chars per title, building with suffix sharing on took 22.5
 seconds, required 1.25 GB heap, and produced 91.6 MB FST.  With suffix
 sharing off, it was 8.2 seconds, 450 MB heap and 129 MB FST.
 I think we should allow this boolean to be shade-of-gray instead:
 usually, how well suffixes can share is a function of how far they are
 from the end of the string, so, by adding a tunable N to only share
 when suffix length  N, we can let caller make reasonable tradeoffs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2641) Auto Facet Selection component

2011-07-08 Thread Upayavira (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061810#comment-13061810
 ] 

Upayavira commented on SOLR-2641:
-

Same issue with pivot facets (SOLR-792). I'm going to try to work it out (as a 
slow, background task).

 Auto Facet Selection component
 --

 Key: SOLR-2641
 URL: https://issues.apache.org/jira/browse/SOLR-2641
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Attachments: SOLR_2641.patch


 It sure would be nice if you could have Solr automatically select field(s) 
 for faceting based dynamically off the profile of the results.  For example, 
 you're indexing disparate types of products, all with varying attributes 
 (color, size - like for apparel, memory_size - for electronics, subject - for 
 books, etc), and a user searches for ipod where most products match 
 products with color and memory_size attributes... let's automatically facet 
 on those fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2011-07-08 Thread Bill Bell (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061823#comment-13061823
 ] 

Bill Bell commented on SOLR-1725:
-

Is there a reason why this is not committed. It seems pretty awesome!! 

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2452) rewrite solr build system

2011-07-08 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2452:
--

Attachment: SOLR-2452.diffSource.py.patch.zip

The solr2452 branch is now up-to-date with trunk, and I've committed to the 
branch the work that I was keeping as a script/patch pair.

I think this is ready to commit to trunk.

For review purposes, I'm attaching the zipped output from {{python -u 
diffSource.py trunk branches/solr2452}}, but the patch is huge, so I don't know 
how useful it will be.  (I had to compress it because it exceeds JIRA's 10MB 
threshold.)

I plan on merging the solr2452 branch back to trunk in about 24 hours, and then 
work on backporting the changes to branch_3x.

 rewrite solr build system
 -

 Key: SOLR-2452
 URL: https://issues.apache.org/jira/browse/SOLR-2452
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Steven Rowe
 Fix For: 3.4, 4.0

 Attachments: SOLR-2452-post-reshuffling.patch, 
 SOLR-2452-post-reshuffling.patch, SOLR-2452-post-reshuffling.patch, 
 SOLR-2452.diffSource.py.patch.zip, SOLR-2452.dir.reshuffle.sh, 
 SOLR-2452.dir.reshuffle.sh


 As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
 think we should rewrite the solr build system.
 Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3292) IOContext should be part of the SegmentReader cache key

IOContext should be part of the SegmentReader cache key 


 Key: LUCENE-3292
 URL: https://issues.apache.org/jira/browse/LUCENE-3292
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0


Once IOContext (LUCENE-2793) is landed the IOContext should be part of the key 
used to cache that reader in the pool

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3293) Use IOContext.READONCE in VarGapTermsIndexReader to load FST

Use IOContext.READONCE in VarGapTermsIndexReader to load FST


 Key: LUCENE-3293
 URL: https://issues.apache.org/jira/browse/LUCENE-3293
 Project: Lucene - Java
  Issue Type: Task
  Components: core/codecs
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0


VarGapTermsIndexReader should pass READONCE context down when it
opens/reads the FST. Yet, it should just replace the ctx passed in, ie if we 
are merging vs reading we want to differentiate.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061836#comment-13061836
 ] 

Simon Willnauer commented on LUCENE-2793:
-

I fixed the two minor things from above, created two followup issues 
(LUCENE-3292  LUCENE-3293) for the remaining TODOs and will go ahead 
reintegrating the branch now.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793_final.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061838#comment-13061838
 ] 

Simon Willnauer commented on SOLR-1725:
---

bq. Is there a reason why this is not committed. It seems pretty awesome!!
indeed this looks good... somebody should bring it uptodate I guess :)

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field


 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated SOLR-2242:
--

Comment: was deleted

(was: 
I am out of the office on vacation, I will return Monday July 11. I will not be 
checking email.

For urgent Systems Department business, please contact Mercy Anaba, 
man...@jhu.edu,(410) 516-5306.
)

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, 
 SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061839#comment-13061839
 ] 

Simon Willnauer commented on SOLR-2242:
---

bq. Are we ready to commit?
bill, isnt't there a test failure still on this issue related to FC? Yonik 
mentioned BW compat issues here and promised to comment. I will ping him again.

thanks for the patience

simon

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, 
 SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2793.
-

   Resolution: Fixed
Fix Version/s: IOContext branch
   4.0
Lucene Fields: [New, Patch Available]  (was: [New])

I reintegrated the branch and committed to trunk in revision 1144196. I will 
now go ahead and delete the branch. all further developments should happen on 
trunk. @Varun make sure you move you current work in progress to trunk and be 
careful with svn update on the branch since some of your changes might get lost.

Thanks Varun... good job!

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0, IOContext branch

 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793_final.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 9414 - Failure

2011-07-08 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9414/

All tests passed

Build Log (for compile errors):
[...truncated 10240 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061872#comment-13061872
 ] 

Simon Willnauer commented on LUCENE-2878:
-

{quote}
I think I agree. The only possible trade-off that goes the other way is in the 
case where you have the positions available already during initial 
search/scoring, and there is not too much turnover in the TopDocs priority 
queue during hit collection. Then a Highlighter might save some time by not 
re-scoring and re-iterating the positions if it accumulated them up front (even 
for docs that were eventually dropped off the queue). I think it should be 
possible to test out both approaches given the right API here though?
{quote}

Yes, I think we should go and provide both possibilities here.

{quote}

The callback idea sounds appealing, but I still think we should also consider 
enabling the top-down approach: especially if this is going to run in two 
passes, why not let the highlighter drive the iteration? Keep in mind that 
positions consumers (like highlighters) may possibly be interested in more than 
just the lowest-level positions (they may want to see phrases, eg, and 
near-clauses - trying to avoid the s-word).
{quote}

I am not sure if I understand this correctly. I think the collector should be 
some kind of a visitor that walks down the query/scorer tree and each scorer 
can ask if it should pass the current positions to the collector something like 
this: 
{code}
class PositionCollector {

  public boolean register(Scorer scorer) {
if(interestedInScorere(scorere)) {
   // store infor about the scorer
   return true;
}
return false;
  }

  /*
   * Called by a registered scorer for each position change
   */
  public void nexPosition(Scorer scorer) {
   // collect positions for the current scorer
  } 
}
{code}
that way the iteration process is still driven by the top-level consumer but if 
you need information about intermediate positions you can collect them.

{quote}
Another consideration is ordering. I think  that positions are retrieved from 
the index in document order. This could be a natural order for many cases, but 
score order will also be useful. I'm not sure whose responsibility the sorting 
should be. Highlighters will want to be able to optimize their work (esp for 
very large documents) by terminating after considering only the first N 
matches, where the ordering could either be score or document-order.
{quote}

so the order here depends on the first collector I figure. the usual case it 
that you do your search and retrieve the top N documents (those are also the 
top N you want to highlight right?) then you pass in your top N and do the 
highlighting collection based on those top N. In that collection you are not 
interested all matches but only in the top N from the previous collection. The 
simplest yet maybe not the best way to do this is using a simple filter that is 
build from the top N docs.

I will go ahead and create the branch now


 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: Bulk Postings branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet,

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 9414 - Failure

2011-07-08 Thread Simon Willnauer

my bad - just committed a fix

simon

On Fri, Jul 8, 2011 at 11:41 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9414/

 All tests passed

 Build Log (for compile errors):
 [...truncated 10240 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file


[ 
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061873#comment-13061873
 ] 

Simon Willnauer commented on SOLR-2616:
---

+1

 Include jdk14 logging configuration file
 

 Key: SOLR-2616
 URL: https://issues.apache.org/jira/browse/SOLR-2616
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2616_jdk14logging_setup.patch


 The /example/ Jetty Solr configuration should include a basic logging 
 configuration file.  Looking at this wiki page: 
 http://wiki.apache.org/solr/LoggingInDefaultJettySetup  I am creating this 
 patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3290) add FieldInvertState.numUniqueTerms, Terms.sumDocFreq


[ 
https://issues.apache.org/jira/browse/LUCENE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061876#comment-13061876
 ] 

Michael McCandless commented on LUCENE-3290:


You are right -- nice catch!  Can you change the sumTotalTF to be a readVLong?  
Thanks.

 add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
 -

 Key: LUCENE-3290
 URL: https://issues.apache.org/jira/browse/LUCENE-3290
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3290.patch


 For scoring systems like lnu.ltc 
 (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to 
 supply 3 stats:
 * average tf within d
 * # of unique terms within d
 * average number of unique terms across field
 If we add FieldInvertState.numUniqueTerms, you can incorporate the first two 
 into your norms/docvalues (once we cut over),
 the average tf within d being length / numUniqueTerms.
 to compute the average across the field, we can just write the sum of all 
 terms' docfreqs into the terms dictionary header,
 and you can then divide this by maxdoc to get the average.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 9398 - Failure

2011-07-08 Thread Michael McCandless

I committed a fix -- just a test bug.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 7, 2011 at 1:27 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9398/

 1 tests failed.
 REGRESSION:  org.apache.lucene.search.TestSpanQueryFilter.testFilterWorks

 Error Message:
 docIdSet doesn't contain docId 10

 Stack Trace:
 junit.framework.AssertionFailedError: docIdSet doesn't contain docId 10
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1435)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1353)
        at 
 org.apache.lucene.search.TestSpanQueryFilter.assertContainsDocId(TestSpanQueryFilter.java:84)
        at 
 org.apache.lucene.search.TestSpanQueryFilter.testFilterWorks(TestSpanQueryFilter.java:56)




 Build Log (for compile errors):
 [...truncated 1226 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3290) add FieldInvertState.numUniqueTerms, Terms.sumDocFreq


[ 
https://issues.apache.org/jira/browse/LUCENE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061887#comment-13061887
 ] 

Michael McCandless commented on LUCENE-3290:


Patch looks awesome!  Nice to add these additional status.

 add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
 -

 Key: LUCENE-3290
 URL: https://issues.apache.org/jira/browse/LUCENE-3290
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3290.patch


 For scoring systems like lnu.ltc 
 (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to 
 supply 3 stats:
 * average tf within d
 * # of unique terms within d
 * average number of unique terms across field
 If we add FieldInvertState.numUniqueTerms, you can incorporate the first two 
 into your norms/docvalues (once we cut over),
 the average tf within d being length / numUniqueTerms.
 to compute the average across the field, we can just write the sum of all 
 terms' docfreqs into the terms dictionary header,
 and you can then divide this by maxdoc to get the average.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3290) add FieldInvertState.numUniqueTerms, Terms.sumDocFreq


 [ 
https://issues.apache.org/jira/browse/LUCENE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3290:


Attachment: LUCENE-3290.patch

i committed the fix to memorycodec, synced the patch up to trunk, and renamed 
the confusing 'sumDF' variable in termsconsumer, that actually is no sumDF at 
all :)

I think this is ready to go

 add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
 -

 Key: LUCENE-3290
 URL: https://issues.apache.org/jira/browse/LUCENE-3290
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3290.patch, LUCENE-3290.patch


 For scoring systems like lnu.ltc 
 (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to 
 supply 3 stats:
 * average tf within d
 * # of unique terms within d
 * average number of unique terms across field
 If we add FieldInvertState.numUniqueTerms, you can incorporate the first two 
 into your norms/docvalues (once we cut over),
 the average tf within d being length / numUniqueTerms.
 to compute the average across the field, we can just write the sum of all 
 terms' docfreqs into the terms dictionary header,
 and you can then divide this by maxdoc to get the average.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3294) Some code still compares string equality instead using equals


 [ 
https://issues.apache.org/jira/browse/LUCENE-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3294:


Attachment: LUCENE-3294.patch

here is a patch

 Some code still compares string equality instead using equals
 -

 Key: LUCENE-3294
 URL: https://issues.apache.org/jira/browse/LUCENE-3294
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3294.patch


 I found a couple of places where we still use string == otherstring which 
 don't look correct. I will attache a patch soon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3294) Some code still compares string equality instead using equals

Some code still compares string equality instead using equals
-

 Key: LUCENE-3294
 URL: https://issues.apache.org/jira/browse/LUCENE-3294
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0
 Attachments: LUCENE-3294.patch

I found a couple of places where we still use string == otherstring which don't 
look correct. I will attache a patch soon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2011-07-08 Thread Mike Sokolov (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061904#comment-13061904
]

Mike Sokolov commented on LUCENE-2878:
--

bq. I am not sure if I understand this correctly. I think the collector should
be some kind of a visitor that walks down the query/scorer tree and each scorer
can ask if it should pass the current positions to the collector something like
this:

Yes that sounds right

Re: ordering; I was concerned about the order in which the positions are
iterated within each document, not so much the order in which the documents are
returned. I think this is an issue for the highlighter mostly, which can
score position-ranges in the document so as to return the best snippet.
This kind of score may be built up from tfidf scores for each term, proximity,
length of the position-ranges and so on.

Allow Scorer to expose positions and payloads aka. nuke spans
--

Key: LUCENE-2878
URL: https://issues.apache.org/jira/browse/LUCENE-2878
Project: Lucene - Java
Issue Type: Improvement
Components: core/search
Affects Versions: Bulk Postings branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Labels: gsoc2011, lucene-gsoc-11, mentor
Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch,
PosHighlighter.patch

Currently we have two somewhat separate types of queries, the one which can
make use of positions (mainly spans) and payloads (spans). Yet Span*Query
doesn't really do scoring comparable to what other queries do and at the end
of the day they are duplicating lot of code all over lucene. Span*Queries are
also limited to other Span*Query instances such that you can not use a
TermQuery or a BooleanQuery with SpanNear or anthing like that.
Beside of the Span*Query limitation other queries lacking a quiet interesting
feature since they can not score based on term proximity since scores doesn't
expose any positional information. All those problems bugged me for a while
now so I stared working on that using the bulkpostings API. I would have done
that first cut on trunk but TermScorer is working on BlockReader that do not
expose positions while the one in this branch does. I started adding a new
Positions class which users can pull from a scorer, to prevent unnecessary
positions enums I added ScorerContext#needsPositions and eventually
Scorere#needsPayloads to create the corresponding enum on demand. Yet,
currently only TermQuery / TermScorer implements this API and other simply
return null instead.
To show that the API really works and our BulkPostings work fine too with
positions I cut over TermSpanQuery to use a TermScorer under the hood and
nuked TermSpans entirely. A nice sideeffect of this was that the Position
BulkReading implementation got some exercise which now :) work all with
positions while Payloads for bulkreading are kind of experimental in the
patch and those only work with Standard codec.
So all spans now work on top of TermScorer ( I truly hate spans since today )
including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother
to implement the other codecs yet since I want to get feedback on the API and
on this first cut before I go one with it. I will upload the corresponding
patch in a minute.
I also had to cut over SpanQuery.getSpans(IR) to
SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk
first but after that pain today I need a break first :).
The patch passes all core tests
(org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't
look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2633) Make SolrDispatchFilter testable and add tests


 [ 
https://issues.apache.org/jira/browse/SOLR-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Tosca updated SOLR-2633:


Attachment: SOLR-2633-only-tests.patch

This second cut contains more tests which are convering about 80% of the code 
of the class under test.


 Make SolrDispatchFilter testable and add tests
 --

 Key: SOLR-2633
 URL: https://issues.apache.org/jira/browse/SOLR-2633
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1, 3.2, 3.3
Reporter: Edoardo Tosca
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2633-tests-only.patch


 I have ideas for possible extensions/enhancements to the SolrDispatchFilter. 
 However, as it doesn't have any tests, making safe enhancements is difficult. 
 Given its monolithic nature, it is hard to test. Therefore, I am proposing to 
 refactor it to make it testable, and to provide tests for it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2633) Make SolrDispatchFilter testable and add tests


 [ 
https://issues.apache.org/jira/browse/SOLR-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Tosca updated SOLR-2633:


Attachment: (was: SOLR-2633-only-tests.patch)

 Make SolrDispatchFilter testable and add tests
 --

 Key: SOLR-2633
 URL: https://issues.apache.org/jira/browse/SOLR-2633
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1, 3.2, 3.3
Reporter: Edoardo Tosca
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2633-tests-only.patch


 I have ideas for possible extensions/enhancements to the SolrDispatchFilter. 
 However, as it doesn't have any tests, making safe enhancements is difficult. 
 Given its monolithic nature, it is hard to test. Therefore, I am proposing to 
 refactor it to make it testable, and to provide tests for it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2633) Make SolrDispatchFilter testable and add tests


 [ 
https://issues.apache.org/jira/browse/SOLR-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Tosca updated SOLR-2633:


Attachment: SOLR-2633-tests-only.patch

 Make SolrDispatchFilter testable and add tests
 --

 Key: SOLR-2633
 URL: https://issues.apache.org/jira/browse/SOLR-2633
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1, 3.2, 3.3
Reporter: Edoardo Tosca
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2633-tests-only.patch, SOLR-2633-tests-only.patch


 I have ideas for possible extensions/enhancements to the SolrDispatchFilter. 
 However, as it doesn't have any tests, making safe enhancements is difficult. 
 Given its monolithic nature, it is hard to test. Therefore, I am proposing to 
 refactor it to make it testable, and to provide tests for it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2633) Make SolrDispatchFilter testable and add tests


[ 
https://issues.apache.org/jira/browse/SOLR-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061915#comment-13061915
 ] 

Edoardo Tosca commented on SOLR-2633:
-

I'm still struggling in trying to understand some bits of code in the doFilter 
method.
Does anyone have an example of real usage of the management path?
I'd like to cover that before refactoring.
the incriminated piece of code is in SolrDispatchFilter, line 164-168 (pasted 
below):
{code} 
// check for management path
String alternate = cores.getManagementPath();
if (alternate != null  path.startsWith(alternate)) {
   path = path.substring(0, alternate.length());
}
{code} 

Thanks


 Make SolrDispatchFilter testable and add tests
 --

 Key: SOLR-2633
 URL: https://issues.apache.org/jira/browse/SOLR-2633
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1, 3.2, 3.3
Reporter: Edoardo Tosca
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2633-tests-only.patch, SOLR-2633-tests-only.patch


 I have ideas for possible extensions/enhancements to the SolrDispatchFilter. 
 However, as it doesn't have any tests, making safe enhancements is difficult. 
 Given its monolithic nature, it is hard to test. Therefore, I am proposing to 
 refactor it to make it testable, and to provide tests for it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061918#comment-13061918
 ] 

Simon Willnauer commented on LUCENE-2878:
-

mike I created a branch here: 
https://svn.apache.org/repos/asf/lucene/dev/branches/positions

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: Bulk Postings branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3294) Some code still compares string equality instead using equals


[ 
https://issues.apache.org/jira/browse/LUCENE-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061925#comment-13061925
 ] 

Michael McCandless commented on LUCENE-3294:


Nice catch Simon -- looks good!

 Some code still compares string equality instead using equals
 -

 Key: LUCENE-3294
 URL: https://issues.apache.org/jira/browse/LUCENE-3294
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3294.patch


 I found a couple of places where we still use string == otherstring which 
 don't look correct. I will attache a patch soon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3294) Some code still compares string equality instead using equals


 [ 
https://issues.apache.org/jira/browse/LUCENE-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3294.
-

Resolution: Fixed

Committed in revision 1144280


 Some code still compares string equality instead using equals
 -

 Key: LUCENE-3294
 URL: https://issues.apache.org/jira/browse/LUCENE-3294
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3294.patch


 I found a couple of places where we still use string == otherstring which 
 don't look correct. I will attache a patch soon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

omitNorms and omitTermFreqAndPosition

2011-07-08 Thread Gastone Penzo

Hi,
i have a problem with omitTermFreqAndPosition and omitNorms.
In my schema i have some fields with these property set True.
for example the field category

then i make a query like:
select?q=category:(x OR y or Z)

it returns all docs that have as category x or y or z.

i make a debugQuery=on to see the score and i see every docs have different
score.
why? the tf is calculated and, also normalization. why? they should be have
the same score..
cause it's not a full-text search but i search only docs that are inside a
group. stop
 Thank you very much

-- 
*Gastone Penzo*
*
*

[jira] [Created] (LUCENE-3295) BitVector never skips fully populated bytes when writing ClearedDgaps

BitVector never skips fully populated bytes when writing ClearedDgaps
-

 Key: LUCENE-3295
 URL: https://issues.apache.org/jira/browse/LUCENE-3295
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0


When writing cleared DGaps in BitVector we compare a byte against 0xFF (255) 
yet the byte is casted into an int (-1) and the comparison will never succeed. 
We should mask the byte with 0xFF before comparing or compare against -1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3295) BitVector never skips fully populated bytes when writing ClearedDgaps


 [ 
https://issues.apache.org/jira/browse/LUCENE-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3295:
---

Assignee: Simon Willnauer

 BitVector never skips fully populated bytes when writing ClearedDgaps
 -

 Key: LUCENE-3295
 URL: https://issues.apache.org/jira/browse/LUCENE-3295
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3295.patch


 When writing cleared DGaps in BitVector we compare a byte against 0xFF (255) 
 yet the byte is casted into an int (-1) and the comparison will never 
 succeed. We should mask the byte with 0xFF before comparing or compare 
 against -1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3295) BitVector never skips fully populated bytes when writing ClearedDgaps


 [ 
https://issues.apache.org/jira/browse/LUCENE-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3295:


Attachment: LUCENE-3295.patch

here is a simple patch and a test that at least exercise the code.

 BitVector never skips fully populated bytes when writing ClearedDgaps
 -

 Key: LUCENE-3295
 URL: https://issues.apache.org/jira/browse/LUCENE-3295
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3295.patch


 When writing cleared DGaps in BitVector we compare a byte against 0xFF (255) 
 yet the byte is casted into an int (-1) and the comparison will never 
 succeed. We should mask the byte with 0xFF before comparing or compare 
 against -1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

FindBugs PMD ?

Developers,
Any thoughts on using FindBugs  PMD to catch more bugs in Lucene/Solr?  
Jenkins could be configured to run FindBugs  PMD analysis nightly.  It would 
have helped find this:

 (LUCENE-3294) Some code still compares string equality instead using 
equals

I am aware there are a high degree of false-positives but there are ways of 
dealing with them, such as with @SuppressWarnings(PMD) and with //NOPMD   and 
for Findbugs, there is @edu.umd.cs.findbugs.annotations.SuppressWarnings()  
and there's a fairly detailed configuration file for FindBugs to really control 
it and to make exceptions.  I'd also really like to see use of FindBugs 
concurrency annotations @GuardedBy, @Immutable, @NotThreadSafe, @ThreadSafe.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FindBugs PMD ?

2011-07-08 Thread Robert Muir

On Fri, Jul 8, 2011 at 10:08 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Developers,
 Any thoughts on using FindBugs  PMD to catch more bugs in Lucene/Solr?  
 Jenkins could be configured to run FindBugs  PMD analysis nightly.  It would 
 have helped find this:

         (LUCENE-3294) Some code still compares string equality instead using 
 equals

 I am aware there are a high degree of false-positives but there are ways of 
 dealing with them, such as with @SuppressWarnings(PMD) and with //NOPMD   
 and for Findbugs, there is 
 @edu.umd.cs.findbugs.annotations.SuppressWarnings()  and there's a fairly 
 detailed configuration file for FindBugs to really control it and to make 
 exceptions.  I'd also really like to see use of FindBugs concurrency 
 annotations @GuardedBy, @Immutable, @NotThreadSafe, @ThreadSafe.

I think its a good idea for nightly, but I am strongly against linking
to an LGPL library for these annotations.
I would prefer PMD instead, because of the license.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2331) Refactor CoreContainer's SolrXML serialization code and improve testing


 [ 
https://issues.apache.org/jira/browse/SOLR-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-2331.
---

Resolution: Fixed

Thanks Steve - calling this one done - SolrCore needs more refactoring, but it 
can come in further issues. Noble has a great one going to factor out zookeeper 
parts as well.

 Refactor CoreContainer's SolrXML serialization code and improve testing
 ---

 Key: SOLR-2331
 URL: https://issues.apache.org/jira/browse/SOLR-2331
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2331-fix-windows-file-deletion-failure.patch, 
 SOLR-2331-fix-windows-file-deletion-failure.patch, SOLR-2331.patch


 CoreContainer has enough code in it - I'd like to factor out the solr.xml 
 serialization code into SolrXMLSerializer or something - which should make 
 testing it much easier and lightweight.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file


[ 
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061976#comment-13061976
 ] 

Mark Miller commented on SOLR-2616:
---

So while I'm pro putting the logging config file in for easy use, I'm not so 
sure about wiring it up out of the box. Perhaps I'm just over used to things 
going to the console while starting/deving with Solr - but it has become 
something I've gotten used to :)

I was thinking we just put the file there, and modify any doc to alert that you 
can also start Solr with a -D command to use the example logging config file.

I could see going either way though.

Thoughts?

 Include jdk14 logging configuration file
 

 Key: SOLR-2616
 URL: https://issues.apache.org/jira/browse/SOLR-2616
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2616_jdk14logging_setup.patch


 The /example/ Jetty Solr configuration should include a basic logging 
 configuration file.  Looking at this wiki page: 
 http://wiki.apache.org/solr/LoggingInDefaultJettySetup  I am creating this 
 patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2331) Refactor CoreContainer's SolrXML serialization code and improve testing


[ 
https://issues.apache.org/jira/browse/SOLR-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061973#comment-13061973
 ] 

Mark Miller edited comment on SOLR-2331 at 7/8/11 2:23 PM:
---

Thanks Steve - calling this one done - CoreContainer needs more refactoring, 
but it can come in further issues. Noble has a great one going to factor out 
zookeeper parts as well.

  was (Author: markrmil...@gmail.com):
Thanks Steve - calling this one done - SolrCore needs more refactoring, but 
it can come in further issues. Noble has a great one going to factor out 
zookeeper parts as well.
  
 Refactor CoreContainer's SolrXML serialization code and improve testing
 ---

 Key: SOLR-2331
 URL: https://issues.apache.org/jira/browse/SOLR-2331
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2331-fix-windows-file-deletion-failure.patch, 
 SOLR-2331-fix-windows-file-deletion-failure.patch, SOLR-2331.patch


 CoreContainer has enough code in it - I'd like to factor out the solr.xml 
 serialization code into SolrXMLSerializer or something - which should make 
 testing it much easier and lightweight.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file


[ 
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061978#comment-13061978
 ] 

Robert Muir commented on SOLR-2616:
---

what will wiring it up out of box do to tests (e.g. example tests)?

Will running the tests now cause jetty to create files outside of the build/ 
folder?

 Include jdk14 logging configuration file
 

 Key: SOLR-2616
 URL: https://issues.apache.org/jira/browse/SOLR-2616
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2616_jdk14logging_setup.patch


 The /example/ Jetty Solr configuration should include a basic logging 
 configuration file.  Looking at this wiki page: 
 http://wiki.apache.org/solr/LoggingInDefaultJettySetup  I am creating this 
 patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FindBugs PMD ?

Rob, there is an ASL 2.0 licensed implementation here:
https://github.com/stephenc/findbugs-annotations

~ David

On Jul 8, 2011, at 10:12 AM, Robert Muir wrote:

 On Fri, Jul 8, 2011 at 10:08 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Developers,
 Any thoughts on using FindBugs  PMD to catch more bugs in Lucene/Solr?  
 Jenkins could be configured to run FindBugs  PMD analysis nightly.  It 
 would have helped find this:
 
 (LUCENE-3294) Some code still compares string equality instead using 
 equals
 
 I am aware there are a high degree of false-positives but there are ways of 
 dealing with them, such as with @SuppressWarnings(PMD) and with //NOPMD   
 and for Findbugs, there is 
 @edu.umd.cs.findbugs.annotations.SuppressWarnings()  and there's a 
 fairly detailed configuration file for FindBugs to really control it and to 
 make exceptions.  I'd also really like to see use of FindBugs concurrency 
 annotations @GuardedBy, @Immutable, @NotThreadSafe, @ThreadSafe.
 
 I think its a good idea for nightly, but I am strongly against linking
 to an LGPL library for these annotations.
 I would prefer PMD instead, because of the license.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3295) BitVector never skips fully populated bytes when writing ClearedDgaps


[ 
https://issues.apache.org/jira/browse/LUCENE-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061981#comment-13061981
 ] 

Robert Muir commented on LUCENE-3295:
-

good catch, just some thoughts looking at the test:

* we should create a helper no-arg LTC.newIOContext() that uses LTC's random, or
* should we need to actually pass IOcontext like this in tests explicitly?
  or, should MDW randomize the IOContexts that it passes down to its wrapped 
Dir?


 BitVector never skips fully populated bytes when writing ClearedDgaps
 -

 Key: LUCENE-3295
 URL: https://issues.apache.org/jira/browse/LUCENE-3295
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3295.patch


 When writing cleared DGaps in BitVector we compare a byte against 0xFF (255) 
 yet the byte is casted into an int (-1) and the comparison will never 
 succeed. We should mask the byte with 0xFF before comparing or compare 
 against -1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3292) IOContext should be part of the SegmentReader cache key

2011-07-08 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061982#comment-13061982
 ] 

Varun Thacker commented on LUCENE-3292:
---

I am not quite sure on how to start with this. 

In SegmentReader#get something like this is required :
{noformat}
if (readOnly) {
  assert context != IOContext.DEFAULT;
  //assert context.context == IOContext.Context.READ;
  // Using the second assert checks for both READ and READONCE
}
{noformat}

And what do I need to do in IndexWriter.ReaderPool#get so that context should 
be part of the key used to cache that reader in the pool?

 IOContext should be part of the SegmentReader cache key 
 

 Key: LUCENE-3292
 URL: https://issues.apache.org/jira/browse/LUCENE-3292
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0


 Once IOContext (LUCENE-2793) is landed the IOContext should be part of the 
 key used to cache that reader in the pool

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file


[ 
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061984#comment-13061984
 ] 

Mark Miller commented on SOLR-2616:
---

It will be an issue with tests as is I believe, but nothing we couldn't work 
around.

 Include jdk14 logging configuration file
 

 Key: SOLR-2616
 URL: https://issues.apache.org/jira/browse/SOLR-2616
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2616_jdk14logging_setup.patch


 The /example/ Jetty Solr configuration should include a basic logging 
 configuration file.  Looking at this wiki page: 
 http://wiki.apache.org/solr/LoggingInDefaultJettySetup  I am creating this 
 patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3295) BitVector never skips fully populated bytes when writing ClearedDgaps


[ 
https://issues.apache.org/jira/browse/LUCENE-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061986#comment-13061986
 ] 

Simon Willnauer commented on LUCENE-3295:
-

while those comments are really unrelated, how would you pass a randomized 
IOContext in the MDW? ignore the given one?

I agree we should have a zero arg newIOContext()

 BitVector never skips fully populated bytes when writing ClearedDgaps
 -

 Key: LUCENE-3295
 URL: https://issues.apache.org/jira/browse/LUCENE-3295
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3295.patch


 When writing cleared DGaps in BitVector we compare a byte against 0xFF (255) 
 yet the byte is casted into an int (-1) and the comparison will never 
 succeed. We should mask the byte with 0xFF before comparing or compare 
 against -1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file


[ 
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061991#comment-13061991
 ] 

David Smiley commented on SOLR-2616:


The logging configuration file I provided does not log to a file nor does it 
suppress logging to the console.  There is some commented configuration to make 
it easier to log to a file. The net perceived effect of applying this patch 
should be no change.

 Include jdk14 logging configuration file
 

 Key: SOLR-2616
 URL: https://issues.apache.org/jira/browse/SOLR-2616
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2616_jdk14logging_setup.patch


 The /example/ Jetty Solr configuration should include a basic logging 
 configuration file.  Looking at this wiki page: 
 http://wiki.apache.org/solr/LoggingInDefaultJettySetup  I am creating this 
 patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3295) BitVector never skips fully populated bytes when writing ClearedDgaps


[ 
https://issues.apache.org/jira/browse/LUCENE-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061992#comment-13061992
 ] 

Robert Muir commented on LUCENE-3295:
-

we can have a true/false setter on MDW (randomizeIOContexts), so we control if 
it respects the 
given one (e.g. tests that actually want to test IOContext works) or not.

 BitVector never skips fully populated bytes when writing ClearedDgaps
 -

 Key: LUCENE-3295
 URL: https://issues.apache.org/jira/browse/LUCENE-3295
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3295.patch


 When writing cleared DGaps in BitVector we compare a byte against 0xFF (255) 
 yet the byte is casted into an int (-1) and the comparison will never 
 succeed. We should mask the byte with 0xFF before comparing or compare 
 against -1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061999#comment-13061999
 ] 

Robert Muir commented on SOLR-2452:
---

playing around with the branch, the whole situation looks so much better to me.

in my opinion we can then go and make other little improvements, make things 
faster, add new targets, in separate issues... so I think you should just 
commit before the patch goes out of date.

maybe we even encounter some serious grief, but I think we should just work 
thru this in svn.

great work!

 rewrite solr build system
 -

 Key: SOLR-2452
 URL: https://issues.apache.org/jira/browse/SOLR-2452
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Steven Rowe
 Fix For: 3.4, 4.0

 Attachments: SOLR-2452-post-reshuffling.patch, 
 SOLR-2452-post-reshuffling.patch, SOLR-2452-post-reshuffling.patch, 
 SOLR-2452.diffSource.py.patch.zip, SOLR-2452.dir.reshuffle.sh, 
 SOLR-2452.dir.reshuffle.sh


 As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
 think we should rewrite the solr build system.
 Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file

[
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062000#comment-13062000
]

Mark Miller commented on SOLR-2616:
---

bq. The logging configuration file I provided does not log to a file nor does
it suppress logging to the console.

The question in my mind is not what the patch does, but what should we do.

If we want this as an example that is not hooked up, my preference would be to
let the user know he should use -D to hook up the sample log file - not
configure it in jetty.xml - we should still stay somewhat logging framework
agnostic.

In both cases I would prefer that the default log.properties file use the
FileHandler rather than ConsoleHandler though. We should give something close
to what you actually might want to use - which is not to setup logging to log
to the console.

First I'm gathering feedback from others though.

My current leaning is to doc the wiki and what not to mention the sample log
props and use of -D to put it in action, and to setup the default log props to
log to the ./logs dir.

Include jdk14 logging configuration file

Key: SOLR-2616
URL: https://issues.apache.org/jira/browse/SOLR-2616
Project: Solr
Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
Fix For: 3.4, 4.0

Attachments: SOLR-2616_jdk14logging_setup.patch

The /example/ Jetty Solr configuration should include a basic logging
configuration file. Looking at this wiki page:
http://wiki.apache.org/solr/LoggingInDefaultJettySetup I am creating this
patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file

[
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062002#comment-13062002
]

David Smiley commented on SOLR-2616:

Ok.

The main thing I wanted to accomplish in this patch, was to make it easy for me
to enable debug logging for a particular logger and to actually see the
results. Before this patch, the current state, I could use the logging admin
page to enable debug logging for a known Solr logger but the debug output
wouldn't go anywhere because the default threshold for the console logger is
INFO. This patch includes a commented line to lower the console threshold.

FYI I still *hate* JDK14 logging (aka JUL); but nonetheless it's the default as
provided with Solr.

Include jdk14 logging configuration file

Key: SOLR-2616
URL: https://issues.apache.org/jira/browse/SOLR-2616
Project: Solr
Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
Fix For: 3.4, 4.0

Attachments: SOLR-2616_jdk14logging_setup.patch

The /example/ Jetty Solr configuration should include a basic logging
configuration file. Looking at this wiki page:
http://wiki.apache.org/solr/LoggingInDefaultJettySetup I am creating this
patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2615) Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level


[ 
https://issues.apache.org/jira/browse/SOLR-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062003#comment-13062003
 ] 

Yonik Seeley commented on SOLR-2615:


bq. Yonik, if I instead use a doDebug boolean flag initialized in the 
constructor, would that sufficiently satisfy you to commit this?

Yep, I think so...

 Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE 
 level
 ---

 Key: SOLR-2615
 URL: https://issues.apache.org/jira/browse/SOLR-2615
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: David Smiley
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2615_LogUpdateProcessor_debug_logging.patch


 It would be great if the LogUpdateProcessor logged each command (add, delete, 
 ...) at debug (Fine) level. Presently it only logs a summary of 8 commands 
 and it does so at the very end.
 The attached patch implements this.
 * I moved the LogUpdateProcessor ahead of RunUpdateProcessor so that the 
 debug level log happens before Solr does anything with it. It should not 
 affect the ordering of the existing summary log which happens at finish(). 
 * I changed UpdateRequestProcessor's static log variable to be an instance 
 variable that uses the current class name. I think this makes much more sense 
 since I want to be able to alter logging levels for a specific processor 
 without doing it for all of them. This change did require me to tweak the 
 factory's detection of the log level which avoids creating the 
 LogUpdateProcessor.
 * There was an NPE bug in AddUpdateCommand.getPrintableId() in the event 
 there is no schema unique field. I fixed that.
 You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) 
 syntax, which is both performant and concise as there's no point in guarding 
 the debug message with an isDebugEnabled() since debug() will internally 
 check this any way and there is no string concatenation if debug isn't 
 enabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file


[ 
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062004#comment-13062004
 ] 

Robert Muir commented on SOLR-2616:
---

{quote}
My current leaning is to doc the wiki and what not to mention the sample log 
props and use of -D to put it in action, and to setup the default log props to 
log to the ./logs dir.
{quote}

yeah as long as we dont somehow create test meddling, I'm happy! There is 
already some hacks in the build somehow related to this:
{noformat}
in lucene/common-build.xml: property name=tests.loggingfile 
value=/dev/null/
and in the JUnitResultFormatter to reboot logging for each test case: 
 try {
  LogManager.getLogManager().readConfiguration();
} catch (Exception e) {}
{noformat}


 Include jdk14 logging configuration file
 

 Key: SOLR-2616
 URL: https://issues.apache.org/jira/browse/SOLR-2616
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2616_jdk14logging_setup.patch


 The /example/ Jetty Solr configuration should include a basic logging 
 configuration file.  Looking at this wiki page: 
 http://wiki.apache.org/solr/LoggingInDefaultJettySetup  I am creating this 
 patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2616) Include jdk14 logging configuration file

[
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062006#comment-13062006
]

Yonik Seeley commented on SOLR-2616:

bq. My current leaning is to doc the wiki and what not to mention the sample
log props and use of -D to put it in action, and to setup the default log props
to log to the ./logs dir.

That's a good plan I think. It does seem important for newbies to get the
instant console feedback of address already in use or other exceptions. I
actually find it pretty useful myself (when I forget that I already have an
instance running, or just for seeing requests come in by default, etc).

We can also document it right in the example/README.txt!

Include jdk14 logging configuration file

Key: SOLR-2616
URL: https://issues.apache.org/jira/browse/SOLR-2616
Project: Solr
Issue Type: Improvement
Reporter: David Smiley
Assignee: Mark Miller
Priority: Minor
Fix For: 3.4, 4.0

Attachments: SOLR-2616_jdk14logging_setup.patch

The /example/ Jetty Solr configuration should include a basic logging
configuration file. Looking at this wiki page:
http://wiki.apache.org/solr/LoggingInDefaultJettySetup I am creating this
patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2633) Make SolrDispatchFilter testable and add tests


[ 
https://issues.apache.org/jira/browse/SOLR-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062012#comment-13062012
 ] 

Mark Miller commented on SOLR-2633:
---

The heck if I know.

The comment says:

{code}
  /**
   * Sets the alternate path for multicore handling:
   * This is used in case there is a registered unnamed core (aka name is ) to
   * declare an alternate way of accessing named cores.
   * This can also be used in a pseudo single-core environment so admins can 
prepare
   * a new version before swapping.
   * @param path
   */
{code}

But the code is:

{code}
// check for management path
String alternate = cores.getManagementPath();
if (alternate != null  path.startsWith(alternate)) {
  path = path.substring(0, alternate.length());
}
{code}

This simply checks if the path starts with your management path (say /manage), 
and then sets the path to the management path - I don't see how this triggers 
or does anything later though...

Does anyone out there use this or know what if it does/did work? Perhaps it 
should just go away.

 Make SolrDispatchFilter testable and add tests
 --

 Key: SOLR-2633
 URL: https://issues.apache.org/jira/browse/SOLR-2633
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1, 3.2, 3.3
Reporter: Edoardo Tosca
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2633-tests-only.patch, SOLR-2633-tests-only.patch


 I have ideas for possible extensions/enhancements to the SolrDispatchFilter. 
 However, as it doesn't have any tests, making safe enhancements is difficult. 
 Given its monolithic nature, it is hard to test. Therefore, I am proposing to 
 refactor it to make it testable, and to provide tests for it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2588) Make Velocity an optional dependency in SolrCore

[
https://issues.apache.org/jira/browse/SOLR-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062016#comment-13062016
]

David Smiley commented on SOLR-2588:

I'm surprised velocity became a core dependency, but nonetheless I think it
should be possible to use Solr in an embedded fashion without pulling in
extraneous dependencies like velocity and others. What if these response
writers were initialized on-demand? This would increase startup time
decrease memory usage just a little since most people aren't actually going to
use all response writers that Solr supports. I'm willing to put together a
patch.

Make Velocity an optional dependency in SolrCore

Key: SOLR-2588
URL: https://issues.apache.org/jira/browse/SOLR-2588
Project: Solr
Issue Type: Wish
Affects Versions: 3.2
Reporter: Gunnar Wagenknecht
Priority: Minor
Fix For: 3.4, 4.0

In 1.4. it was fine to run Solr without Velocity on the classpath. However,
in 3.2. SolrCore won't load because of a hard reference to the Velocity
response writer in a static initializer.
{noformat}
... ERROR org.apache.solr.core.CoreContainer -
java.lang.NoClassDefFoundError: org/apache/velocity/context/Context
at org.apache.solr.core.SolrCore.clinit(SolrCore.java:1447)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
{noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2633) Make SolrDispatchFilter testable and add tests


[ 
https://issues.apache.org/jira/browse/SOLR-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062033#comment-13062033
 ] 

David Smiley commented on SOLR-2633:


I think it's great that this issue is going to make it more testable. But why 
is SolrDispatchFilter a filter in the first place instead of a Servlet? This is 
somewhat off-topic, but if perhaps it should be a servlet then this issue is 
majorly disrupted by such a change.

 Make SolrDispatchFilter testable and add tests
 --

 Key: SOLR-2633
 URL: https://issues.apache.org/jira/browse/SOLR-2633
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1, 3.2, 3.3
Reporter: Edoardo Tosca
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2633-tests-only.patch, SOLR-2633-tests-only.patch


 I have ideas for possible extensions/enhancements to the SolrDispatchFilter. 
 However, as it doesn't have any tests, making safe enhancements is difficult. 
 Given its monolithic nature, it is hard to test. Therefore, I am proposing to 
 refactor it to make it testable, and to provide tests for it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2633) Make SolrDispatchFilter testable and add tests


[ 
https://issues.apache.org/jira/browse/SOLR-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062034#comment-13062034
 ] 

Edoardo Tosca commented on SOLR-2633:
-

{quote}
This simply checks if the path starts with your management path (say /manage), 
and then sets the path to the management path - I don't see how this triggers 
or does anything later though...
{quote}

exactly right. I saw the comment (i forgot to paste it previously), i tried to 
add a managementPath=/manage attribute to solr.xml and see what it could 
trigger but i haven't discovered anything :(
thanks

 Make SolrDispatchFilter testable and add tests
 --

 Key: SOLR-2633
 URL: https://issues.apache.org/jira/browse/SOLR-2633
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1, 3.2, 3.3
Reporter: Edoardo Tosca
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2633-tests-only.patch, SOLR-2633-tests-only.patch


 I have ideas for possible extensions/enhancements to the SolrDispatchFilter. 
 However, as it doesn't have any tests, making safe enhancements is difficult. 
 Given its monolithic nature, it is hard to test. Therefore, I am proposing to 
 refactor it to make it testable, and to provide tests for it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2633) Make SolrDispatchFilter testable and add tests


[ 
https://issues.apache.org/jira/browse/SOLR-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062037#comment-13062037
 ] 

Mark Miller commented on SOLR-2633:
---

bq. But why is SolrDispatchFilter a filter in the first place instead of a 
Servlet?

It used to be a Servlet once. I cannot remember the history of the change - 
hossman probably does.

 Make SolrDispatchFilter testable and add tests
 --

 Key: SOLR-2633
 URL: https://issues.apache.org/jira/browse/SOLR-2633
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1, 3.2, 3.3
Reporter: Edoardo Tosca
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2633-tests-only.patch, SOLR-2633-tests-only.patch


 I have ideas for possible extensions/enhancements to the SolrDispatchFilter. 
 However, as it doesn't have any tests, making safe enhancements is difficult. 
 Given its monolithic nature, it is hard to test. Therefore, I am proposing to 
 refactor it to make it testable, and to provide tests for it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2588) Make Velocity an optional dependency in SolrCore

2011-07-08 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062044#comment-13062044
 ] 

Erik Hatcher commented on SOLR-2588:


Sorry - I missed this when it first got posted, and David's comment bumped 
it... it was intentional to make Velocity a core component as the idea being 
that we'd use it for built-in admin UI.  So far we're only using it for the 
/browse interface though.  

I get the argument that Velocity ideally shouldn't be required to embed Solr 
though.  I'm ok with the Velocity writer creation either being in the try/catch 
as Ryan posted, or pulling it out of the default writers and having it be 
explicitly configured in solrconfig.xml for our example app.

 Make Velocity an optional dependency in SolrCore
 

 Key: SOLR-2588
 URL: https://issues.apache.org/jira/browse/SOLR-2588
 Project: Solr
  Issue Type: Wish
Affects Versions: 3.2
Reporter: Gunnar Wagenknecht
Priority: Minor
 Fix For: 3.4, 4.0


 In 1.4. it was fine to run Solr without Velocity on the classpath. However, 
 in 3.2. SolrCore won't load because of a hard reference to the Velocity 
 response writer in a static initializer.
 {noformat}
 ... ERROR org.apache.solr.core.CoreContainer - 
 java.lang.NoClassDefFoundError: org/apache/velocity/context/Context
   at org.apache.solr.core.SolrCore.clinit(SolrCore.java:1447)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062056#comment-13062056
 ] 

Robert Muir commented on SOLR-2399:
---

its great to see all this progress here!

I had one suggestion, I felt this way about Version 1980 too... should we 
default the verbose checbox for analysis to on? I could be in the minority 
here, am I the only one who clicks verbose every time when using this?



 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-110702.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-08 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062058#comment-13062058
 ] 

Erik Hatcher commented on SOLR-2399:


verbose default to on please, yes: +1 - I always check that myself, and teach 
it that way to others.

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-110702.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-08 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062061#comment-13062061
 ] 

Uwe Schindler commented on SOLR-2399:
-

verbose on: +1

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-110702.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: FindBugs PMD ?

2011-07-08 Thread Uwe Schindler

Just a stupid question:
Once you add those annotations, wouldn't the JAR file not require then this
annotations.jar? Or are all of them not available to runtime?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Smiley, David W. [mailto:dsmi...@mitre.org]
 Sent: Friday, July 08, 2011 4:30 PM
 To: dev@lucene.apache.org
 Subject: Re: FindBugs  PMD ?
 
 Rob, there is an ASL 2.0 licensed implementation here:
 https://github.com/stephenc/findbugs-annotations
 
 ~ David
 
 On Jul 8, 2011, at 10:12 AM, Robert Muir wrote:
 
  On Fri, Jul 8, 2011 at 10:08 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
  Developers,
  Any thoughts on using FindBugs  PMD to catch more bugs in
 Lucene/Solr?  Jenkins could be configured to run FindBugs  PMD analysis
 nightly.  It would have helped find this:
 
  (LUCENE-3294) Some code still compares string equality
  instead using equals
 
  I am aware there are a high degree of false-positives but there are
ways
 of dealing with them, such as with @SuppressWarnings(PMD) and with
 //NOPMD   and for Findbugs, there is
 @edu.umd.cs.findbugs.annotations.SuppressWarnings()  and there's a
 fairly detailed configuration file for FindBugs to really control it and
to make
 exceptions.  I'd also really like to see use of FindBugs concurrency
 annotations @GuardedBy, @Immutable, @NotThreadSafe, @ThreadSafe.
 
  I think its a good idea for nightly, but I am strongly against linking
  to an LGPL library for these annotations.
  I would prefer PMD instead, because of the license.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2399) Solr Admin Interface, reworked

2011-07-08 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-2399:


Attachment: SOLR-2399-110702.patch

bq. verbose on: +1
I've updated the last Patch, now based on SVN-Rev {{1144392}} -- verbose 
activated per default

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-110702.patch, SOLR-2399-110702.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2541) Plugininfo tries to load nodes of type long

2011-07-08 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2541.


   Resolution: Fixed
Fix Version/s: 4.0
   3.4
 Assignee: Hoss Man

Frank: your assumption was spot on, definitely a bug the way PluginInfo was 
ignore long.

thank you so much for the test!

Committed revision 1144415. - trunk
Committed revision 1144417. - 3x



 Plugininfo tries to load nodes of type long
 -

 Key: SOLR-2541
 URL: https://issues.apache.org/jira/browse/SOLR-2541
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: all
Reporter: Frank Wesemann
Assignee: Hoss Man
 Fix For: 3.4, 4.0

 Attachments: PlugininfoTest.java, Solr-2541.patch


 As of version 3.1 Plugininfo adds all nodes whose types are not 
 lst,str,int,bool,arr,float or double to the children list.
 The type long is missing in the NL_TAGS set.
 I assume this a bug because DOMUtil recognizes this type, so I consider it a 
 valid tag in solrconfig.xml
 Maybe it's time for a dtd? Or one may define SolrConfig.nodetypes somewhere.
 I'll add a patch, that extends the NL_TAGS Set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9438 - Failure

2011-07-08 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9438/

1 tests failed.
REGRESSION:  
org.apache.lucene.facet.search.CategoryListIteratorTest.testPayloadIntDecodingIterator

Error Message:
expected category not found: 3

Stack Trace:
junit.framework.AssertionFailedError: expected category not found: 3
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
at 
org.apache.lucene.facet.search.CategoryListIteratorTest.testPayloadIntDecodingIterator(CategoryListIteratorTest.java:125)




Build Log (for compile errors):
[...truncated 8788 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3295) BitVector never skips fully populated bytes when writing ClearedDgaps


 [ 
https://issues.apache.org/jira/browse/LUCENE-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3295:
---

Attachment: LUCENE-3295.patch

Egads, thanks Simon!  I found a few more crazy problems with BitVector
(patch attached, merged with the first patch), and added some asserts
and a few more test cases.


 BitVector never skips fully populated bytes when writing ClearedDgaps
 -

 Key: LUCENE-3295
 URL: https://issues.apache.org/jira/browse/LUCENE-3295
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/other
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3295.patch, LUCENE-3295.patch


 When writing cleared DGaps in BitVector we compare a byte against 0xFF (255) 
 yet the byte is casted into an int (-1) and the comparison will never 
 succeed. We should mask the byte with 0xFF before comparing or compare 
 against -1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3292) IOContext should be part of the SegmentReader cache key


[ 
https://issues.apache.org/jira/browse/LUCENE-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062119#comment-13062119
 ] 

Michael McCandless commented on LUCENE-3292:


Right now the ReaderPool.readerMap is a MapSegmentInfo,SegmentReader.

I think we just need to change that to 
MapSegmentInfoAndIOContext,SegmentReader instead, where 
SegmentInfoAndIOContext is a new struct holding SegmentInfo and 
IOContext.Context and implementing hashCode/equals by delegating to the 
SegmentInfo and IOContext.Context.

 IOContext should be part of the SegmentReader cache key 
 

 Key: LUCENE-3292
 URL: https://issues.apache.org/jira/browse/LUCENE-3292
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0


 Once IOContext (LUCENE-2793) is landed the IOContext should be part of the 
 key used to cache that reader in the pool

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FindBugs PMD ?

The annotations defined by FindBugs are marked with CLASS retention, which 
means there shouldn't be a runtime dependency.

However the JCIP (Java Concurrency In Practice, a book) annotations, such as 
@ThreadSafe, are unfortunately marked with RUNTIME retention.  Information I've 
found leads me to believe that in Java 6, there is no runtime or compile time 
dependency for 3rd party libraries using Lucene/Solr if there are annotations 
there, but Java 5 has problems with it: 
https://issues.apache.org/jira/browse/HTTPCLIENT-866Just now I messaged the 
maintainer of the ASL licensed cleanroom port of the findbugs annotations to 
see if he'll do the same for the JCIP ones.

~ David

On Jul 8, 2011, at 1:16 PM, Uwe Schindler wrote:

 Just a stupid question:
 Once you add those annotations, wouldn't the JAR file not require then this
 annotations.jar? Or are all of them not available to runtime?
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 -Original Message-
 From: Smiley, David W. [mailto:dsmi...@mitre.org]
 Sent: Friday, July 08, 2011 4:30 PM
 To: dev@lucene.apache.org
 Subject: Re: FindBugs  PMD ?
 
 Rob, there is an ASL 2.0 licensed implementation here:
 https://github.com/stephenc/findbugs-annotations
 
 ~ David
 
 On Jul 8, 2011, at 10:12 AM, Robert Muir wrote:
 
 On Fri, Jul 8, 2011 at 10:08 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
 Developers,
 Any thoughts on using FindBugs  PMD to catch more bugs in
 Lucene/Solr?  Jenkins could be configured to run FindBugs  PMD analysis
 nightly.  It would have helped find this:
 
(LUCENE-3294) Some code still compares string equality
 instead using equals
 
 I am aware there are a high degree of false-positives but there are
 ways
 of dealing with them, such as with @SuppressWarnings(PMD) and with
 //NOPMD   and for Findbugs, there is
 @edu.umd.cs.findbugs.annotations.SuppressWarnings()  and there's a
 fairly detailed configuration file for FindBugs to really control it and
 to make
 exceptions.  I'd also really like to see use of FindBugs concurrency
 annotations @GuardedBy, @Immutable, @NotThreadSafe, @ThreadSafe.
 
 I think its a good idea for nightly, but I am strongly against linking
 to an LGPL library for these annotations.
 I would prefer PMD instead, because of the license.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
 additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FindBugs PMD ?

For build integration, I'd like to do this on the maven side.  It's easier 
there, and it should not matter that it's not the official build since it only 
needs to be run by Jenkins which is already running the maven build any way.

~ David

On Jul 8, 2011, at 10:08 AM, Smiley, David W. wrote:

 Developers,
 Any thoughts on using FindBugs  PMD to catch more bugs in Lucene/Solr?  
 Jenkins could be configured to run FindBugs  PMD analysis nightly.  It would 
 have helped find this:
 
(LUCENE-3294) Some code still compares string equality instead using 
 equals
 
 I am aware there are a high degree of false-positives but there are ways of 
 dealing with them, such as with @SuppressWarnings(PMD) and with //NOPMD   
 and for Findbugs, there is 
 @edu.umd.cs.findbugs.annotations.SuppressWarnings()  and there's a fairly 
 detailed configuration file for FindBugs to really control it and to make 
 exceptions.  I'd also really like to see use of FindBugs concurrency 
 annotations @GuardedBy, @Immutable, @NotThreadSafe, @ThreadSafe.
 
 ~ David Smiley
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

[
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062151#comment-13062151
]

Michael McCandless commented on LUCENE-3282:

This looks great Shay!

What was the use case for subclassing to translate the filter into OBS? Is it
a custom filter cache? Makes me nervous because the app really should create
reuse this OBS filter, usually...

On the Collector: we try to keep our Querys IR-state-free... so it makes me
nervous to stick a Collector right on the Query. Can we add a
CollectorProvider that the Query invokes when it makes the Weight/Scorer?

Instead of NoOpCollector can we just check for null?

BlockJoinQuery: Allow to add a custom child collector, and customize the
parent bitset extraction
-

Key: LUCENE-3282
URL: https://issues.apache.org/jira/browse/LUCENE-3282
Project: Lucene - Java
Issue Type: Improvement
Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon
Attachments: LUCENE-3282.patch

It would be nice to allow to add a custom child collector to the
BlockJoinQuery to be called on every matching doc (so we can do things with
it, like counts and such). Also, allow to extend BlockJoinQuery to have a
custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings

2011-07-08 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062155#comment-13062155
 ] 

Erick Erickson commented on SOLR-2535:
--

OK, I've applied the patch to both 3x and trunk and it looks good. If nobody 
objects I'll commit this Monday.

 REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show 
 directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
Assignee: Erick Erickson
 Fix For: 3.4, 4.0

 Attachments: SOLR-2535.patch, 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-07-08 Thread Shawn Heisey (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062161#comment-13062161
]

Shawn Heisey commented on SOLR-1972:

I tried to add a percentile of 100, so I could see the slowest query, and it
didn't seem to do anything. I'll be changing the following line so it works:

if (percentile = 0 percentile 100) {

Need additional query stats in admin interface - median, 95th and 99th
percentile
-

Key: SOLR-1972
URL: https://issues.apache.org/jira/browse/SOLR-1972
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
Attachments: SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch,
SOLR-1972.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch

I would like to see more detailed query statistics from the admin GUI. This
is what you can get now:
requests : 809
errors : 0
timeouts : 0
totalTime : 70053
avgTimePerRequest : 86.59209
avgRequestsPerSecond : 0.8148785
I'd like to see more data on the time per request - median, 95th percentile,
99th percentile, and any other statistical function that makes sense to
include. In my environment, the first bunch of queries after startup tend to
take several seconds each. I find that the average value tends to be useless
until it has several thousand queries under its belt and the caches are
thoroughly warmed. The statistical functions I have mentioned would quickly
eliminate the influence of those initial slow queries.
The system will have to store individual data about each query. I don't know
if this is something Solr does already. It would be nice to have a
configurable count of how many of the most recent data points are kept, to
control the amount of memory the feature uses. The default value could be
something like 1024 or 4096.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings


[ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062162#comment-13062162
 ] 

Simon Willnauer commented on SOLR-2535:
---

bq. OK, I've applied the patch to both 3x and trunk and it looks good. If 
nobody objects I'll commit this Monday.
don't wait too long, no need to wait until monday.



 REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show 
 directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
Assignee: Erick Erickson
 Fix For: 3.4, 4.0

 Attachments: SOLR-2535.patch, 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2541) Plugininfo tries to load nodes of type long

2011-07-08 Thread Frank Wesemann (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062168#comment-13062168
 ] 

Frank Wesemann commented on SOLR-2541:
--

Thanks for taking this issue, Hoss.
Btw: Do you know the reason for this change? I regarded the old rule 
load/instantiate everything that has a class attribute as a good practice.

 Plugininfo tries to load nodes of type long
 -

 Key: SOLR-2541
 URL: https://issues.apache.org/jira/browse/SOLR-2541
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: all
Reporter: Frank Wesemann
Assignee: Hoss Man
 Fix For: 3.4, 4.0

 Attachments: PlugininfoTest.java, Solr-2541.patch


 As of version 3.1 Plugininfo adds all nodes whose types are not 
 lst,str,int,bool,arr,float or double to the children list.
 The type long is missing in the NL_TAGS set.
 I assume this a bug because DOMUtil recognizes this type, so I consider it a 
 valid tag in solrconfig.xml
 Maybe it's time for a dtd? Or one may define SolrConfig.nodetypes somewhere.
 I'll add a patch, that extends the NL_TAGS Set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3280) Add new bit set impl for caching filters


 [ 
https://issues.apache.org/jira/browse/LUCENE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3280:
---

Attachment: LUCENE-3280.patch

New patch, renaming to FixedBitSet, adding test (adapted from TestOBS's), 
adding getBits, hashCode, equals.

I think it's ready to commit!

 Add new bit set impl for caching filters
 

 Key: LUCENE-3280
 URL: https://issues.apache.org/jira/browse/LUCENE-3280
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3280.patch, LUCENE-3280.patch


 I think OpenBitSet is trying to satisfy too many audiences, and it's
 confusing/error-proned as a result.  It has int/long variants of many
 methods.  Some methods require in-bound access, others don't; of those
 others, some methods auto-grow the bits, some don't.  OpenBitSet
 doesn't always know its numBits.
 I'd like to factor out a more focused bit set impl whose primary
 target usage is a cached Lucene Filter, ie a bit set indexed by docID
 (int, not long) whose size is known and fixed up front (backed by
 final long[]) and is always accessed in-bounds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings


[ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062180#comment-13062180
 ] 

Yonik Seeley commented on SOLR-2535:


bq. don't wait too long, no need to wait until monday.

+1, commit it now!

Esp for a bug fix, unless one thinks there is something likely controversial 
about it, or one is unsure about something and is thus requesting feedback.


 REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show 
 directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
Assignee: Erick Erickson
 Fix For: 3.4, 4.0

 Attachments: SOLR-2535.patch, 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

I really hate to even mention formatting code, but I don't want to gaffe too badly on my first commit.

2011-07-08 Thread Erick Erickson

This is one of those topics that generates far more passion than it
deserves, all I want to know is what the norms are. Personally, I tend
to reformat the entire file. But then I didn't cut my eye teeth on
code that a zillion other people work with and I fully appreciate that
the diffs get hard to read, you can't easily separate the code changes
from the format changes, so I'm not *proposing* reformatting the whole
file...

So do we take a page from Martin Fowler's Refactoring book and only
reformat the parts that we're working on?

What about reformatting the whole file and noting in the checkin notes
reformat only, no code changes? (assuming an egregiously
badly-formatted file that one is working on).

I guess the more I think about it the more sense only reformatting the
bits we're working on makes. I'd guess that someone working on a large
patch would...er...not appreciate merge conflicts because of
reformatting even though it's s easy

Again, I'm just looking for norms here. I suspect this topic has
been...er...discussed upon occasion in loud tones with much table
pounding...

Thanks,
Erick

P.S. This is relative to my first checkin, SOLR-2535

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system

2011-07-08 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062184#comment-13062184
 ] 

Steven Rowe commented on SOLR-2452:
---

Merged with trunk, committed in r1144510. (Forgot to include issue number in 
log comment.)

 rewrite solr build system
 -

 Key: SOLR-2452
 URL: https://issues.apache.org/jira/browse/SOLR-2452
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Steven Rowe
 Fix For: 3.4, 4.0

 Attachments: SOLR-2452-post-reshuffling.patch, 
 SOLR-2452-post-reshuffling.patch, SOLR-2452-post-reshuffling.patch, 
 SOLR-2452.diffSource.py.patch.zip, SOLR-2452.dir.reshuffle.sh, 
 SOLR-2452.dir.reshuffle.sh


 As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
 think we should rewrite the solr build system.
 Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: I really hate to even mention formatting code, but I don't want to gaffe too badly on my first commit.

2011-07-08 Thread Robert Muir

On Fri, Jul 8, 2011 at 5:06 PM, Erick Erickson erickerick...@gmail.com wrote:
 This is one of those topics that generates far more passion than it
 deserves, all I want to know is what the norms are. Personally, I tend
 to reformat the entire file. But then I didn't cut my eye teeth on
 code that a zillion other people work with and I fully appreciate that
 the diffs get hard to read, you can't easily separate the code changes
 from the format changes, so I'm not *proposing* reformatting the whole
 file...


My opinion: just get rid of the shitty bad in whatever way is
convenient for you.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: I really hate to even mention formatting code, but I don't want to gaffe too badly on my first commit.

2011-07-08 Thread Michael McCandless

On Fri, Jul 8, 2011 at 5:11 PM, Robert Muir rcm...@gmail.com wrote:
 On Fri, Jul 8, 2011 at 5:06 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 This is one of those topics that generates far more passion than it
 deserves, all I want to know is what the norms are. Personally, I tend
 to reformat the entire file. But then I didn't cut my eye teeth on
 code that a zillion other people work with and I fully appreciate that
 the diffs get hard to read, you can't easily separate the code changes
 from the format changes, so I'm not *proposing* reformatting the whole
 file...


 My opinion: just get rid of the shitty bad in whatever way is
 convenient for you.

+1

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3290) add FieldInvertState.numUniqueTerms, Terms.sumDocFreq


 [ 
https://issues.apache.org/jira/browse/LUCENE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3290.
-

   Resolution: Fixed
Fix Version/s: 3.4

The FieldInvertState.numUniqueTerms portion is backported to 3.x (no collection 
level stats are in 3.x in general, seems tricky)

 add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
 -

 Key: LUCENE-3290
 URL: https://issues.apache.org/jira/browse/LUCENE-3290
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3290.patch, LUCENE-3290.patch


 For scoring systems like lnu.ltc 
 (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to 
 supply 3 stats:
 * average tf within d
 * # of unique terms within d
 * average number of unique terms across field
 If we add FieldInvertState.numUniqueTerms, you can incorporate the first two 
 into your norms/docvalues (once we cut over),
 the average tf within d being length / numUniqueTerms.
 to compute the average across the field, we can just write the sum of all 
 terms' docfreqs into the terms dictionary header,
 and you can then divide this by maxdoc to get the average.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

heads up: reindex trunk indexes

2011-07-08 Thread Robert Muir

i just committed https://issues.apache.org/jira/browse/LUCENE-3290,
you need to re-index.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3290) add FieldInvertState.numUniqueTerms, Terms.sumDocFreq


[ 
https://issues.apache.org/jira/browse/LUCENE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062202#comment-13062202
 ] 

Yonik Seeley commented on LUCENE-3290:
--

Is there currently a way to get the number of documents that have a value in 
the field?
Then one could compute the average length of a (sparse) field via 
sumTotalTermFreq(field)/docsWithField(field)
docsWithField(field) would be useful in other contexts that want to know how 
sparse a field is (automatically selecting faceting algorithms, etc).

 add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
 -

 Key: LUCENE-3290
 URL: https://issues.apache.org/jira/browse/LUCENE-3290
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3290.patch, LUCENE-3290.patch


 For scoring systems like lnu.ltc 
 (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to 
 supply 3 stats:
 * average tf within d
 * # of unique terms within d
 * average number of unique terms across field
 If we add FieldInvertState.numUniqueTerms, you can incorporate the first two 
 into your norms/docvalues (once we cut over),
 the average tf within d being length / numUniqueTerms.
 to compute the average across the field, we can just write the sum of all 
 terms' docfreqs into the terms dictionary header,
 and you can then divide this by maxdoc to get the average.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: I really hate to even mention formatting code, but I don't want to gaffe too badly on my first commit.

2011-07-08 Thread Chris Hostetter


: What about reformatting the whole file and noting in the checkin notes
: reformat only, no code changes? (assuming an egregiously
: badly-formatted file that one is working on).

objection to these types of of commits has historicly been that it may 
make it harder to apply patches that other people people submit or have 
submitted (ie: a patch in jira against trunk that hasn't been applied yet, 
or a patch someone makes against the 3.3 release that can no longer apply 
to the 3x branch because of the code reformatting, etc...)

i use to agree with that as part of the general philosohy of if it aint 
broke, don't fix it ... but i think over time my definition of broke 
has changed ... code you can't read because it isn't indented consistently 
is (t ome) broken code.  so fix away.

but yes: we should at least isolate such changes.  I'd much rather see 
these two commits:

  +2/-2 == fixed race condition in DocWriter
  +425/-410 == fixed consistent formatting in DocWriter

...then see this...

  +427/-412 == fixed race condition and formatting in DocWriter



-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-07-08 Thread Shawn Heisey (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shawn Heisey updated SOLR-1972:
---

Attachment: elyograg-1972-trunk.patch
elyograg-1972-3.2.patch

Of course, adding support for a 100th percentile was NOT as easy as simply
changing to =. New patches.

Need additional query stats in admin interface - median, 95th and 99th
percentile
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3290) add FieldInvertState.numUniqueTerms, Terms.sumDocFreq