date:20110616

[jira] [Commented] (LUCENE-3209) Memory codec

2011-06-16 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050909#comment-13050909
 ] 

Simon Willnauer commented on LUCENE-3209:
-

This seems to be related to LUCENE-3069 right?

> Memory codec
> 
>
> Key: LUCENE-3209
> URL: https://issues.apache.org/jira/browse/LUCENE-3209
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3209.patch
>
>
> This codec stores all terms/postings in RAM.  It uses an
> FST.  This is useful on a primary key field to ensure
> lookups don't need to hit disk, to keep NRT reopen time fast even
> under IO contention.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

solr expungeDeletes default value?

2011-06-16 Thread Ryan McKinley

on /trunk expungeDeletes=false by default

Is that the most reasonable default?

What are the tradeoffs?

With expungeDeletes=true, how does that relate to optimize?

(sorry if this has already been covered)

thanks
ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2597) XmlCharFilter

2011-06-16 Thread Mike Sokolov (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated SOLR-2597:
---

Attachment: SOLR-2597.patch

Updated patch addresses (most of) Robert and Hoss' comments (thanks for the 
speedy review!):

Test now uses the random in the test framework

I added a test for the factory (actually all the tests now use the factory 
since it is now used to create the parser), but I haven't plumbed this all the 
way through to a schema declaration. 

Moved to org.apache.solr.analysis: I don't know if this is the right place for 
this, but at least it should resolve any jar and java 1.6 dependency problems - 
I think? - at least I can compile and run the tests from both eclipse and ant 
command line although I'm not sure what that proves exactly.

The parser is now created in the factory rather than being maintained as a 
static in the reader class.

> XmlCharFilter
> -
>
> Key: SOLR-2597
> URL: https://issues.apache.org/jira/browse/SOLR-2597
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Mike Sokolov
> Attachments: SOLR-2597.patch, SOLR-2597.patch
>
>
> This CharFilter processes incoming XML using the Woodstox parser, stripping 
> all non-text content and remembering offsets, just like HTMLCharFilter, but 
> respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
> also provides the ability to exclude (and include) the content of certain 
> named elements.
> In order to compute character offsets properly when mixed line termination 
> styles are present (\r, \r\n), or when XML character entities (<, ", 
> &) are present, we require a newer version of Woodstox (4.1.1) than is 
> currently in solr/lib.  The earlier versions of the parser could not report 
> these entity events, so we couldn't tell the difference between "<" and 
> "<" and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Indexing slower in trunk

2011-06-16 Thread Erick Erickson

OK, I tried using separate machines for indexing and running Solr, connected
my work and personal Macs with an ethernet cable. Poor-man's network...

I also changed things a bit to parse the Wiki data and store 1.5M docs in memory
then then try to send them to Solr in various-sized batches, thus removing
all the work associated with reading/parsing the XML from the timings.

And the results are...ambiguous. So I re-read some of the blog posts by
Simon and Mark, and think that where I'm missing out is the phrase
"... computers with highly concurrent hardware ...". I don't have that, and
what I'm seeing is that DWPT doesn't seem to make much difference in
this situation. Of course my situation is probably totally irrelevant,
since I've
got to believe that people indexing *serious* data will have, shall we say,
more impressive hardware than I have.

Or perhaps I should say that whatever I do, I can get trunk and 3x to
perform pretty equivalently. Do note that what I'm really looking for is
the time until I can search the last document sent to Solr, so included
in here is a "commit" step. If I take that out, I'm seeing very substantial
gains in trunk. So presumably with a run that lasted longer than just
a couple of minutes I'd see impressive speedups.

I suspect that I just don't have enough hardware to consistently
encounter the situations where DWPT really shines.

It's also possible that I'm doing something stupid, but until some kind
person sets me up with sufficient hardware I'm afraid I'll have to drop
it 

Best
Erick

On Thu, Jun 16, 2011 at 12:05 PM, Erick Erickson
 wrote:
> OK, after more tests I'm pretty sure that my personal machine
> that I'm testing on is just resource-constrained, leading to the
> results I mentioned before. After all, I'm running my Solr
> instance, the indexing program, etc on a Macbook
> with 1 CPU and 2 cores. The indexing program is parsing the
> XML.
>
> On a proper setup, where the indexing machine was separate
> from the machine(s) feeding the index process I suspect this would
> be a different story. H, I may try that sometime too
>
> Best
> Erick
>
> On Tue, Jun 14, 2011 at 9:25 AM, Uwe Schindler  wrote:
>> For simple removing deletes, there is also IW.expungeDeletes(), which is
>> less intensive! Not sure if solr support this, too, but as far as I know
>> there is an issue open.
>>
>> Also please note: As soon as one segment is selected for merging (the merge
>> policy may also do this dependent on the number of deletes in a segment), it
>> will reclaim all deleted ressources - that's what merging does. So expunging
>> deletes once per week is a good idea, if your index consists of very old and
>> large segments that are rarely merged anymore and lots of documents are
>> deleted from them.
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>>> -Original Message-
>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>> Sent: Tuesday, June 14, 2011 3:19 PM
>>> To: dev@lucene.apache.org
>>> Subject: Re: Indexing slower in trunk
>>>
>>> Optimization used to have a very noticeable impact on search speed prior
>> to
>>> some index format changes from quite a while ago.
>>>
>>> At this point the effect is much less noticeable, but the thing optimize
>> does
>>> do is reclaim resources from deleted documents. If you have lots of
>>> deletions, it's a good idea to periodically optimize, but in that case
>> it's often
>>> done pretty infrequently (once a
>>> day/week/month) rather than as part of any ongoing indexing process.
>>>
>>> Best
>>> Erick
>>>
>>> 2011/6/14 Yury Kats :
>>> > On 6/14/2011 4:28 AM, Uwe Schindler wrote:
>>> >> indexing and optimizing was only a
>>> >> good idea pre Lucene-2.9, now it's mostly obsolete)
>>> >
>>> > Could you please elaborate on this? Is optimizing obsolete in general
>>> > or after indexing new documents? Is it obsolete after deletions? And
>>> > what it "mostly"?
>>> >
>>> > Thanks!
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>> > additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>>> commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-3.x - Build # 410 - Still Failing

2011-06-16 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-3.x/410/

1 tests failed.
FAILED:  org.apache.lucene.util.fst.TestFSTs.testBigSet

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.verifyPruned(TestFSTs.java:791)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:499)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:363)
at org.apache.lucene.util.fst.TestFSTs.doTest(TestFSTs.java:211)
at 
org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:944)
at org.apache.lucene.util.fst.TestFSTs.testBigSet(TestFSTs.java:964)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1271)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)




Build Log (for compile errors):
[...truncated 12481 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3209) Memory codec

2011-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050793#comment-13050793
 ] 

Michael McCandless commented on LUCENE-3209:


To clarify: this codec stores postings on disk, but then on read (for 
searching) it loads the full byte[] from disk into RAM.

> Memory codec
> 
>
> Key: LUCENE-3209
> URL: https://issues.apache.org/jira/browse/LUCENE-3209
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3209.patch
>
>
> This codec stores all terms/postings in RAM.  It uses an
> FST.  This is useful on a primary key field to ensure
> lookups don't need to hit disk, to keep NRT reopen time fast even
> under IO contention.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3209) Memory codec

2011-06-16 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3209:
---

Attachment: LUCENE-3209.patch

Patch; I think it's working and ready to commit.  All tests pass w/ it, and I 
went and disabled the same tests that avoid SimpleText codec.

> Memory codec
> 
>
> Key: LUCENE-3209
> URL: https://issues.apache.org/jira/browse/LUCENE-3209
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3209.patch
>
>
> This codec stores all terms/postings in RAM.  It uses an
> FST.  This is useful on a primary key field to ensure
> lookups don't need to hit disk, to keep NRT reopen time fast even
> under IO contention.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-16 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050790#comment-13050790
 ] 

Erick Erickson commented on SOLR-2399:
--

Stefan:

Minor nit. If you refresh the stats page, everything shows up collapsed. Is it 
possible to show the same view as it was when the refresh was hit? The use-case 
here is that I wanted to watch how many documents were in the index as a job 
was running, so I wanted the "search" node expanded just as it was when I hit 
refresh...

Really minor nit, though.

> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
> SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
> SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
> SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> *Features:*
> * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
> * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
> * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
> * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
> SOLR-2400)
> * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
> * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
> * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
> * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
> * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
> * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
> ** Stub (using static data)
> Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3209) Memory codec

2011-06-16 Thread Michael McCandless (JIRA)

Memory codec


 Key: LUCENE-3209
 URL: https://issues.apache.org/jira/browse/LUCENE-3209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0


This codec stores all terms/postings in RAM.  It uses an
FST.  This is useful on a primary key field to ensure
lookups don't need to hit disk, to keep NRT reopen time fast even
under IO contention.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-16 Thread noah (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050784#comment-13050784
 ] 

noah commented on SOLR-2399:


The admin interface doesn't load in Safari 5 due to the use of variables and 
properties named 'class'.
Simple patch available here: https://gist.github.com/1030496


> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
> SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
> SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
> SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> *Features:*
> * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
> * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
> * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
> * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
> SOLR-2400)
> * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
> * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
> * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
> * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
> * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
> * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
> ** Stub (using static data)
> Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050764#comment-13050764
 ] 

Robert Muir commented on LUCENE-3208:
-

the backport looks good, and important/scary to also fix this 
IndexSearcher/Searcher bug.

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-3x.patch, LUCENE-3208-3x.patch, 
> LUCENE-3208-LTC.patch, LUCENE-3208.patch, LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3207.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.3

Fixed through LUCENE-3208.

> CustomScoreQuery calls weight() where it should call createWeight()
> ---
>
> Key: LUCENE-3207
> URL: https://issues.apache.org/jira/browse/LUCENE-3207
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3207.patch
>
>
> Thanks to Uwe for helping me track down this bug after I pulled my hair out 
> for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3208.
---

Resolution: Fixed

Committed 3.x revision: 1136702

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-3x.patch, LUCENE-3208-3x.patch, 
> LUCENE-3208-LTC.patch, LUCENE-3208.patch, LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-3207:
-

Assignee: Uwe Schindler

> CustomScoreQuery calls weight() where it should call createWeight()
> ---
>
> Key: LUCENE-3207
> URL: https://issues.apache.org/jira/browse/LUCENE-3207
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3207.patch
>
>
> Thanks to Uwe for helping me track down this bug after I pulled my hair out 
> for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208-3x.patch

Thanks to Robert for helping me to find the bug in TestSimilaroity. It was 
caused by a copypaste error when nuking Searcher. Searcher.java and also 
IndexSearcher.java had a private similarity field and separate setters. Methods 
implemented by Searcher would therefore not see changes on Similarity done on 
the IndexSearcher.

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-3x.patch, LUCENE-3208-3x.patch, 
> LUCENE-3208-LTC.patch, LUCENE-3208.patch, LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208-3x.patch

Patch for 3.x branch. To apply, copy the trunk's AssertingIndexSearcher first 
to its target dir and then apply patch.

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-3x.patch, LUCENE-3208-LTC.patch, 
> LUCENE-3208.patch, LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2506) EOFException from SolrServer.queryAndStreamResponse() in /trunk

2011-06-16 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-2506:


Attachment: index.zip

I am unable to reproduce with a test case, but here is an index that has just 
two docs that will always reproduce.

I don't think it has anything to do with SOLR-1566

> EOFException from SolrServer.queryAndStreamResponse() in /trunk
> ---
>
> Key: SOLR-2506
> URL: https://issues.apache.org/jira/browse/SOLR-2506
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: index.zip
>
>
> Ran into this on trunk... don't have time to dig into it now, but will post 
> it here so it is not lost.
> I suspect this is caused by something in SOLR-1566,  need to add some better 
> tests to flush it out
> {code}
> org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: 
> java.io.EOFException
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
>   at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
>   at 
> org.apache.solr.client.solrj.SolrServer.queryAndStreamResponse(SolrServer.java:143)
> ...
> Caused by: java.lang.RuntimeException: java.io.EOFException
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:211)
>   ... 51 more
> Caused by: java.io.EOFException
>   at 
> org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:160)
>   at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:158)
>   at 
> org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:401)
>   at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:172)
>   at 
> org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:110)
>   at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:174)
>   at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:102)
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:208)
>   ... 51 more
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I have made changes suggested by Simon and have added Context to the test 
cases, though I've used DEFAULT in most of it. 

Also do we need the test- TestBufferedIndexInput ? I have added a 
IOContext.DEFAULT and fixed it though. 

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050728#comment-13050728
 ] 

Jan Høydahl commented on SOLR-219:
--

I like your idea @Robert. It's explicit and backwards compat, and would allow 
us to shoot our issues as well as our feet :)

> Determine if prefix, wildcard, fuzzy queries should be lowercased
> -
>
> Key: SOLR-219
> URL: https://issues.apache.org/jira/browse/SOLR-219
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Priority: Minor
> Fix For: 3.3
>
> Attachments: lowercase_prefix.patch, wildcardlowercase.patch
>
>
> Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy 
> queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Gunnar Wagenknecht (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050714#comment-13050714
 ] 

Gunnar Wagenknecht commented on SOLR-219:
-

{quote}
But in no case should we try to magically apply the analysis chain... too 
ambiguous what would happen.
{quote}

Agreed. I just need a way in the schema when configuring fields to say which 
analyzers should run for wildcard and/or prefix queries.

> Determine if prefix, wildcard, fuzzy queries should be lowercased
> -
>
> Key: SOLR-219
> URL: https://issues.apache.org/jira/browse/SOLR-219
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Priority: Minor
> Fix For: 3.3
>
> Attachments: lowercase_prefix.patch, wildcardlowercase.patch
>
>
> Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy 
> queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050708#comment-13050708
 ] 

Robert Muir commented on SOLR-219:
--

a lot of analysis things like stemming are not prepared to deal with wildcard 
characters in the term, and returning multiple tokens (because a tokenizer 
splits on a * or whatever) makes no sense either

in my opinion, a good solution here is to allow you to specify in your schema: 
this is the analysis chain for these multitermqueries, so it would be a 
different chain rather than "query" or "index" (similar to SOLR-2477 where I 
propose allowing you to specify one for "phrase"). The QP would use this chain 
for things like wildcards, and throw an exception if the analyzer returns more 
than one token from a wildcard term.

This way you can use KeywordTokenizer + lowercase/fold characters or whatever, 
but in general doing things like WDF or synonyms makes no sense here.  If you 
want to do things like stemming, thats fine, you can shoot yourself in the foot 
this way and we won't stop you.

But in no case should we try to magically apply the analysis chain... too 
ambiguous what would happen.


> Determine if prefix, wildcard, fuzzy queries should be lowercased
> -
>
> Key: SOLR-219
> URL: https://issues.apache.org/jira/browse/SOLR-219
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Priority: Minor
> Fix For: 3.3
>
> Attachments: lowercase_prefix.patch, wildcardlowercase.patch
>
>
> Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy 
> queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050705#comment-13050705
 ] 

Jan Høydahl commented on SOLR-219:
--

Agree with Gunnar that the problem is wider than lowercasing. How hard would it 
be to let each filter choose whether to work on prefix terms or not, and run 
them through analysis?

A use case is for the Nordic characters æøåäö. A Norwegian name "Øyvind" would 
typically be normalized and indexed as "oeyvind", and when a swede searches for 
"Öyvin*", he'd get match if at least the mappingCharFilter and LowercaseFilter 
were allowed to run and turn the query into "oeyvin*".

> Determine if prefix, wildcard, fuzzy queries should be lowercased
> -
>
> Key: SOLR-219
> URL: https://issues.apache.org/jira/browse/SOLR-219
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Priority: Minor
> Fix For: 3.3
>
> Attachments: lowercase_prefix.patch, wildcardlowercase.patch
>
>
> Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy 
> queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2490) PropertiesRequestHandler; encode line.separator

2011-06-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050698#comment-13050698
 ] 

Mike Sokolov edited comment on SOLR-2490 at 6/16/11 8:12 PM:
-

I would recommend using entities for this: 
 for CRLF, just 

 for LF?

If this is processed by an XML parser, that'll already work for free anyway.

  was (Author: sokolov):
I would recommend using entities for this: 
 for CRLF, just 
 
for LF?

If this is processed by an XML parser, that'll already work for free anyway.
  
> PropertiesRequestHandler; encode line.separator
> ---
>
> Key: SOLR-2490
> URL: https://issues.apache.org/jira/browse/SOLR-2490
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Trivial
>
> Currently, the XML looks like this:
> {code}
> /tmp
> 
> 
> Sun Microsystems Inc.
> {code}
> would be good to have this instead:
> {code}
> /tmp
> \n
> Sun Microsystems Inc.
> {code}
> afterwords we will be able to display to used line seperator

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2490) PropertiesRequestHandler; encode line.separator

2011-06-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050698#comment-13050698
 ] 

Mike Sokolov commented on SOLR-2490:


I would recommend using entities for this: 
 for CRLF, just 
 for 
LF?

If this is processed by an XML parser, that'll already work for free anyway.

> PropertiesRequestHandler; encode line.separator
> ---
>
> Key: SOLR-2490
> URL: https://issues.apache.org/jira/browse/SOLR-2490
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Trivial
>
> Currently, the XML looks like this:
> {code}
> /tmp
> 
> 
> Sun Microsystems Inc.
> {code}
> would be good to have this instead:
> {code}
> /tmp
> \n
> Sun Microsystems Inc.
> {code}
> afterwords we will be able to display to used line seperator

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

2011-06-16 Thread Martijn van Groningen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050677#comment-13050677
 ] 

Martijn van Groningen commented on SOLR-2564:
-

I also did some performance tests with the following query on random data in 
the example schema:
{code}http://localhost:8983/solr/select?q=*:*&sort=_docid_ 
desc&group=true&group.cacheMB=0&group.field=single1000_i{code}
The field single1000_i had 1000 distinct values and the index has in total 
10 documents.

I ran this query on the following Solr setups:
* Last nights nightly build.
* Solr build with this patch as it is.
* Solr build with this patch and the necessary changes in 
AbstractFirstPassGroupingCollector so that pollLast was used in all cases.
During my tests I noticed that differences between the first and the second 
setups was neglectable smal, but the the last Solr setup was on average 32% 
faster than the two other setups. So moving to the Java6's pollLast() method 
has definitely a positive impact on performance!

I also think that this patch is ready to be committed and that the pollLast 
should be added when Lucene or the grouping module is java 6. (I prefer the 
first option) I'll commit it in the coming day or so.

> Integrating grouping module into Solr 4.0
> -
>
> Key: SOLR-2564
> URL: https://issues.apache.org/jira/browse/SOLR-2564
> Project: Solr
>  Issue Type: Improvement
>Reporter: Martijn van Groningen
>Assignee: Martijn van Groningen
>Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
> SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
> SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this 
> up in Solr.
> Besides the current grouping features Solr provides, Solr will then also 
> support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1135956 - in /lucene/dev/branches/branch_3x: ./ lucene/ lucene/backwards/ lucene/backwards/src/test-framework/ lucene/backwards/src/test/ solr/ solr/contrib/dataimporthandler/ solr/co

2011-06-16 Thread Uwe Schindler

Shalin,

i had to comment out your test because the finally block does not compile with 
Java 5 (Solr 3.1), Jenkins is down at the moment, so did not catch earlier.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: sha...@apache.org [mailto:sha...@apache.org]
> Sent: Wednesday, June 15, 2011 10:36 AM
> To: comm...@lucene.apache.org
> Subject: svn commit: r1135956 - in /lucene/dev/branches/branch_3x: ./
> lucene/ lucene/backwards/ lucene/backwards/src/test-framework/
> lucene/backwards/src/test/ solr/ solr/contrib/dataimporthandler/
> solr/contrib/dataimporthandler/src/main/java/org/apache/solr/ha...
> 
> Author: shalin
> Date: Wed Jun 15 08:36:06 2011
> New Revision: 1135956
> 
> URL: http://svn.apache.org/viewvc?rev=1135956&view=rev
> Log:
> SOLR-2551 -- Check dataimport.properties for write access (if delta-import is
> supported in DIH configuration) before starting an import
> 
> Modified:
> lucene/dev/branches/branch_3x/   (props changed)
> lucene/dev/branches/branch_3x/lucene/   (props changed)
> lucene/dev/branches/branch_3x/lucene/backwards/   (props changed)
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/   (props
> changed)
> lucene/dev/branches/branch_3x/lucene/backwards/src/test-framework/
> (props changed)
> lucene/dev/branches/branch_3x/solr/   (props changed)
> 
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/CHANGES.
> txt
> 
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/main/j
> ava/org/apache/solr/handler/dataimport/DataImporter.java
> 
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/main/j
> ava/org/apache/solr/handler/dataimport/SolrWriter.java
> 
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/test/ja
> va/org/apache/solr/handler/dataimport/TestSqlEntityProcessorDelta.java
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/CHANGES.
> txt
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/contrib
> /dataimporthandler/CHANGES.txt?rev=1135956&r1=1135955&r2=1135956&vi
> ew=diff
> ==
> 
> ---
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/CHANGES.
> txt (original)
> +++
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/CHANGES
> +++ .txt Wed Jun 15 08:36:06 2011
> @@ -11,7 +11,8 @@ $Id$
> 
>  ==  3.3.0-dev ==
> 
> -(No Changes)
> +* SOLR-2551: Check dataimport.properties for write access (if
> +delta-import is supported
> +  in DIH configuration) before starting an import (C S, shalin)
> 
>  ==  3.2.0 ==
> 
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/main/j
> ava/org/apache/solr/handler/dataimport/DataImporter.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/contrib
> /dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/Dat
> aImporter.java?rev=1135956&r1=1135955&r2=1135956&view=diff
> ==
> 
> ---
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/main/j
> ava/org/apache/solr/handler/dataimport/DataImporter.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/mai
> +++ n/java/org/apache/solr/handler/dataimport/DataImporter.java Wed
> Jun
> +++ 15 08:36:06 2011
> @@ -39,6 +39,7 @@ import org.apache.commons.io.IOUtils;
> 
>  import javax.xml.parsers.DocumentBuilder;
>  import javax.xml.parsers.DocumentBuilderFactory;
> +import java.io.File;
>  import java.io.StringReader;
>  import java.text.SimpleDateFormat;
>  import java.util.*;
> @@ -85,6 +86,8 @@ public class DataImporter {
> 
>private final Map coreScopeSession;
> 
> +  private boolean isDeltaImportSupported = false;
> +
>/**
> * Only for testing purposes
> */
> @@ -113,7 +116,9 @@ public class DataImporter {
>initEntity(e, fields, false);
>verifyWithSchema(fields);
>identifyPk(e);
> -}
> +  if (e.allAttributes.containsKey(SqlEntityProcessor.DELTA_QUERY))
> +isDeltaImportSupported = true;
> +}
>}
> 
>private void verifyWithSchema(Map fields) { @@
> -350,6 +355,7 @@ public class DataImporter {
> 
>  try {
>docBuilder = new DocBuilder(this, writer, requestParams);
> +  checkWritablePersistFile(writer);
>docBuilder.execute();
>if (!requestParams.debug)
>  cumulativeStatistics.add(docBuilder.importStatistics);
> @@ -364,6 +370,15 @@ public class DataImporter {
> 
>}
> 
> +  private void checkWritablePersistFile(SolrWriter writer) {
> +File persistFile = writer.getPersistFile();
> +boolean isWritable = persistFile.exists() ? persistFile.canWrite() :
> persistFile.getParentFile().canWrite();
> +if (isDeltaI

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050654#comment-13050654
 ] 

Uwe Schindler commented on LUCENE-3208:
---

Missed a change in the new grouping module: Trunk revision: 1136605

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
> LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2490) PropertiesRequestHandler; encode line.separator

2011-06-16 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050653#comment-13050653
 ] 

Stefan Matheis (steffkes) commented on SOLR-2490:
-

bq. i don't think we should do this.
okay -- but, then there is no chance to show any difference between {{\n}}, 
{{\r}} or {{\r\n}} in the interface, because it's just a linebreak in the 
xml-source. 

bq. if PropertiesRequestHandler tried to specially encode any (or all) 
properties with whitespace in them ...
what about especially (and only) this one? That's a common problem for 
displaying linebreaks.

> PropertiesRequestHandler; encode line.separator
> ---
>
> Key: SOLR-2490
> URL: https://issues.apache.org/jira/browse/SOLR-2490
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Trivial
>
> Currently, the XML looks like this:
> {code}
> /tmp
> 
> 
> Sun Microsystems Inc.
> {code}
> would be good to have this instead:
> {code}
> /tmp
> \n
> Sun Microsystems Inc.
> {code}
> afterwords we will be able to display to used line seperator

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2490) PropertiesRequestHandler; encode line.separator

2011-06-16 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050648#comment-13050648
 ] 

Hoss Man commented on SOLR-2490:


hmmm...

i don't think we should do this.

the request handler as written is total agnostic to what the properties are or 
how they are being written out -- it just builds up the response and lets the 
writer take care of it.  As noted the XmlResponseWriter does in fact output the 
newline.

if PropertiesRequestHandler tried to specially encode any (or all) properties 
with whitespace in them, that would screw up clients that were treating the 
whitespace as significant when parsing the xml -- and worse it would royally 
screw up clients using other response writers where whitespace is always 
significant.


> PropertiesRequestHandler; encode line.separator
> ---
>
> Key: SOLR-2490
> URL: https://issues.apache.org/jira/browse/SOLR-2490
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Trivial
>
> Currently, the XML looks like this:
> {code}
> /tmp
> 
> 
> Sun Microsystems Inc.
> {code}
> would be good to have this instead:
> {code}
> /tmp
> \n
> Sun Microsystems Inc.
> {code}
> afterwords we will be able to display to used line seperator

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050645#comment-13050645
 ] 

Mike Sokolov commented on SOLR-219:
---

Is there a reson this issue can't be dealt with by including an appropriate 
MappingCharFilter in the field definition?

> Determine if prefix, wildcard, fuzzy queries should be lowercased
> -
>
> Key: SOLR-219
> URL: https://issues.apache.org/jira/browse/SOLR-219
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Priority: Minor
> Fix For: 3.3
>
> Attachments: lowercase_prefix.patch, wildcardlowercase.patch
>
>
> Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy 
> queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2477) add analyzer type="phrase"

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050646#comment-13050646
 ] 

Robert Muir commented on SOLR-2477:
---

{quote}
but we should seriously consider whether FieldQParser should also be using 
getPhraseAnalyzer. 
{quote}

Looking at how this is described, it seems to me it should use the phrase 
analyzer... we can document that it does this, and of course the change is 
backwards compatible (because if you don't define it, its your query analyzer).

{quote}
we should also make sure analysis.jsp and the Analysis RequestHandler(s?) have 
options for using this.
{quote}

I agree... hopefully this isn't too bad.


> add analyzer type="phrase"
> --
>
> Key: SOLR-2477
> URL: https://issues.apache.org/jira/browse/SOLR-2477
> Project: Solr
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: SOLR-2477.patch
>
>
> This is just exposing LUCENE-2892, so you can easily configure things
> so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050645#comment-13050645
 ] 

Mike Sokolov edited comment on SOLR-219 at 6/16/11 6:52 PM:


Is there a reason this issue can't be dealt with by including an appropriate 
MappingCharFilter in the field definition?

  was (Author: sokolov):
Is there a reson this issue can't be dealt with by including an appropriate 
MappingCharFilter in the field definition?
  
> Determine if prefix, wildcard, fuzzy queries should be lowercased
> -
>
> Key: SOLR-219
> URL: https://issues.apache.org/jira/browse/SOLR-219
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Priority: Minor
> Fix For: 3.3
>
> Attachments: lowercase_prefix.patch, wildcardlowercase.patch
>
>
> Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy 
> queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2477) add analyzer type="phrase"

2011-06-16 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050642#comment-13050642
 ] 

Hoss Man commented on SOLR-2477:


At first glance this looks great to me ... but we should seriously consider 
whether FieldQParser should also be using getPhraseAnalyzer.  I think given the 
semantics the answer is "yes" -- but either way it should be clearly documented.

we should also make sure analysis.jsp and the Analysis RequestHandler(s?) have 
options for using this.



> add analyzer type="phrase"
> --
>
> Key: SOLR-2477
> URL: https://issues.apache.org/jira/browse/SOLR-2477
> Project: Solr
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: SOLR-2477.patch
>
>
> This is just exposing LUCENE-2892, so you can easily configure things
> so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050638#comment-13050638
 ] 

Jason Rutherglen commented on SOLR-1431:


Noble, the Jira issue is HBASE-3529 where much of the code is offline on Git 
because of the different pieces involved.  That being said, I've linked the 
various Lucene and Solr Jira issues that are required to implement Solr in 
HBase, eg LUCENE-2919 and SOLR-2563.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050629#comment-13050629
 ] 

Mark Miller commented on SOLR-1431:
---

Got a 3 day weekend, so I won't likely look at nobles patch more till next week 
- I def will still take a peek and weigh in, but this is simple enough that I 
don't mind if we just commit and iterate on trunk if necessary in further 
issues.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Related project link to ManifoldCF from Solr site?

2011-06-16 Thread karl.wright

I created a ticket for it - SOLR-2602.  I'll attach a patch shortly.
Karl

-Original Message-
From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com] 
Sent: Thursday, June 16, 2011 2:00 PM
To: dev@lucene.apache.org
Subject: Re: Related project link to ManifoldCF from Solr site?

a link in the related projects section seems possible, what do other think?

simon

On Thu, Jun 16, 2011 at 7:46 PM,   wrote:
> Hi folks,
>
>
>
> How hard would it be to get a link to ManifoldCF from the Solr site’s 
> related-link section?  I’m seeing a lot of people who know Solr but 
> have no idea ManifoldCF even exists, and I’d like to find some way to 
> correct that problem.
>
>
>
> Karl
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional 
commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2602) It would be great if the Solr site referred to ManifoldCF as a related product

2011-06-16 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated SOLR-2602:
--

Attachment: SOLR-2602.patch

> It would be great if the Solr site referred to ManifoldCF as a related product
> --
>
> Key: SOLR-2602
> URL: https://issues.apache.org/jira/browse/SOLR-2602
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Karl Wright
>Priority: Minor
> Attachments: SOLR-2602.patch
>
>
> The "Related products" section of the Solr site has just Lucene and Nutch in 
> it.  It would be appropriate to have a link for ManifoldCF as well.  Url 
> would be: http://incubator.apache.org/connectors/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Related project link to ManifoldCF from Solr site?

2011-06-16 Thread Mark Miller


On Jun 16, 2011, at 2:00 PM, Simon Willnauer wrote:

> a link in the related projects section seems possible, what do other think?

Seems fine to me - for open source projects anyway? Apache Open Source projects?

Too lazy to think about it, but lazily, I'm willing to support linking to 
Apache Open Source projects that integrate with Solr without hesitation.

- Mark Miller
lucidimagination.com









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050618#comment-13050618
 ] 

Noble Paul commented on SOLR-1431:
--

Jason. Open an issue and I will be glad to pitch in

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050613#comment-13050613
 ] 

Jason Rutherglen commented on SOLR-1431:


@Noble I agree, I don't think committing this patch should hold things up.  
That was just a little note.  

I've been looking at implementing Solr into HBase and am worried [somewhat] 
about the ZK libaries.  HBase + Solr can help with massive scale near realtime 
systems you've described, eg, HBase implements splitting, partitioning, a fast 
write ahead log, etc.  Facebook has implemented the index directly into HBase, 
which probably offers degraded indexing and search performance.

bq. We badly need the cloud features now

Right, many users are going with Elastic Search for the reasons mentioned.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050606#comment-13050606
 ] 

Noble Paul commented on SOLR-1431:
--

Jason, Yeah , it would be ideal. But we need to get things moving fast enough 
so that users can get the benefit ASAP. We badly need the cloud features now. 
I'm sure there are others too. We have clusters with 1000's of Solr hosts which 
are managed w/ ad-hoc tools.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Related project link to ManifoldCF from Solr site?

2011-06-16 Thread Simon Willnauer

a link in the related projects section seems possible, what do other think?

simon

On Thu, Jun 16, 2011 at 7:46 PM,   wrote:
> Hi folks,
>
>
>
> How hard would it be to get a link to ManifoldCF from the Solr site’s
> related-link section?  I’m seeing a lot of people who know Solr but have no
> idea ManifoldCF even exists, and I’d like to find some way to correct that
> problem.
>
>
>
> Karl
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2602) It would be great if the Solr site referred to ManifoldCF as a related product

2011-06-16 Thread Karl Wright (JIRA)

It would be great if the Solr site referred to ManifoldCF as a related product
--

 Key: SOLR-2602
 URL: https://issues.apache.org/jira/browse/SOLR-2602
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Karl Wright
Priority: Minor


The "Related products" section of the Solr site has just Lucene and Nutch in 
it.  It would be appropriate to have a link for ManifoldCF as well.  Url would 
be: http://incubator.apache.org/connectors/


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050603#comment-13050603
 ] 

Uwe Schindler commented on LUCENE-3208:
---

Committed trunk revision: 1136568

Now backporting and adding sophisticated backwards...

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
> LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208.patch

New patch:
- Added AssertingIndexReader in test-framework, this one ensures that weights 
are only normalized once when this is done by IndexSearcher. This class can be 
extended to add further checks
- As suggested by Yonik, changes the key used for fContext in the 
QueryValueSource to be the valuesource itsself. The backup code cannot be 
removed, there is somewhere a bug (new issue)

All tests pass. I would like to commit this to trunk soon and then add 
sophisticated backwards for 3.x :-)

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
> LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1136543 - /lucene/dev/branches/branch_3x/lucene/CHANGES.txt

2011-06-16 Thread Steven A Rowe

Thanks Robert, I was the botcher...  TODO: double check CHANGES.txt diff after 
a merge... - Steve

> -Original Message-
> From: rm...@apache.org [mailto:rm...@apache.org]
> Sent: Thursday, June 16, 2011 12:57 PM
> To: comm...@lucene.apache.org
> Subject: svn commit: r1136543 -
> /lucene/dev/branches/branch_3x/lucene/CHANGES.txt
>
> Author: rmuir
> Date: Thu Jun 16 16:56:39 2011
> New Revision: 1136543
>
> URL: http://svn.apache.org/viewvc?rev=1136543&view=rev
> Log:
> LUCENE-3204: fix botched CHANGES merge
>
> Modified:
> lucene/dev/branches/branch_3x/lucene/CHANGES.txt
>
> Modified: lucene/dev/branches/branch_3x/lucene/CHANGES.txt
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/CHANGES
> .txt?rev=1136543&r1=1136542&r2=1136543&view=diff
> =
> =
> --- lucene/dev/branches/branch_3x/lucene/CHANGES.txt (original)
> +++ lucene/dev/branches/branch_3x/lucene/CHANGES.txt Thu Jun 16 16:56:39
> 2011
> @@ -3,468 +3,6 @@ Lucene Change Log
>  For more information on past and future Lucene versions, please see:
>  http://s.apache.org/luceneversions
>
> -=== Trunk (not yet released) ===
> -
> -Changes in backwards compatibility policy
> -
> -* LUCENE-1458, LUCENE-2111, LUCENE-2354: Changes from flexible indexing:
> -
> -  - On upgrading to 3.1, if you do not fully reindex your documents,
> -Lucene will emulate the new flex API on top of the old index,
> -incurring some performance cost (up to ~10% slowdown, typically).
> -To prevent this slowdown, use oal.index.IndexUpgrader
> -to upgrade your indexes to latest file format (LUCENE-3082).
> -
> -Mixed flex/pre-flex indexes are perfectly fine -- the two
> -emulation layers (flex API on pre-flex index, and pre-flex API on
> -flex index) will remap the access as required.  So on upgrading to
> -3.1 you can start indexing new documents into an existing index.
> -To get optimal performance, use oal.index.IndexUpgrader
> -to upgrade your indexes to latest file format (LUCENE-3082).
> -
> -  - The postings APIs (TermEnum, TermDocsEnum, TermPositionsEnum)
> -have been removed in favor of the new flexible
> -indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum,
> -DocsEnum, DocsAndPositionsEnum). One big difference is that field
> -and terms are now enumerated separately: a TermsEnum provides a
> -BytesRef (wraps a byte[]) per term within a single field, not a
> -Term.  Another is that when asking for a Docs/AndPositionsEnum, you
> -now specify the skipDocs explicitly (typically this will be the
> -deleted docs, but in general you can provide any Bits).
> -
> -  - MultiReader ctor now throws IOException
> -
> -  - Directory.copy/Directory.copyTo now copies all files (not just
> -index files), since what is and isn't and index file is now
> -dependent on the codecs used.
> -
> -  - UnicodeUtil now uses BytesRef for UTF-8 output, and some method
> -signatures have changed to CharSequence.  These are internal APIs
> -and subject to change suddenly.
> -
> -  - Positional queries (PhraseQuery, *SpanQuery) will now throw an
> -exception if use them on a field that omits positions during
> -indexing (previously they silently returned no results).
> -
> -  - FieldCache.{Byte,Short,Int,Long,Float,Double}Parser's API has
> -changed -- each parse method now takes a BytesRef instead of a
> -String.  If you have an existing Parser, a simple way to fix it is
> -invoke BytesRef.utf8ToString, and pass that String to your
> -existing parser.  This will work, but performance would be better
> -if you could fix your parser to instead operate directly on the
> -byte[] in the BytesRef.
> -
> -  - The internal (experimental) API of NumericUtils changed completely
> -from String to BytesRef. Client code should never use this class,
> -so the change would normally not affect you. If you used some of
> -the methods to inspect terms or create TermQueries out of
> -prefix encoded terms, change to use BytesRef. Please note:
> -Do not use TermQueries to search for single numeric terms.
> -The recommended way is to create a corresponding NumericRangeQuery
> -with upper and lower bound equal and included. TermQueries do not
> -score correct, so the constant score mode of NRQ is the only
> -correct way to handle single value queries.
> -
> -  - NumericTokenStream now works directly on byte[] terms. If you
> -plug a TokenFilter on top of this stream, you will likely get
> -an IllegalArgumentException, because the NTS does not support
> -TermAttribute/CharTermAttribute. If you want to further filter
> -or attach Payloads to NTS, use the new NumericTermAttribute.
> -
> -  (Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael
> Busch)
> -
> -* LUCENE-2265: FuzzyQuery and Wild

Re: Indexing slower in trunk

2011-06-16 Thread Martijn v Groningen

@Uwe
Solr does support expunge deletes:
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22

On 16 June 2011 18:05, Erick Erickson  wrote:

> OK, after more tests I'm pretty sure that my personal machine
> that I'm testing on is just resource-constrained, leading to the
> results I mentioned before. After all, I'm running my Solr
> instance, the indexing program, etc on a Macbook
> with 1 CPU and 2 cores. The indexing program is parsing the
> XML.
>
> On a proper setup, where the indexing machine was separate
> from the machine(s) feeding the index process I suspect this would
> be a different story. H, I may try that sometime too
>
> Best
> Erick
>
> On Tue, Jun 14, 2011 at 9:25 AM, Uwe Schindler  wrote:
> > For simple removing deletes, there is also IW.expungeDeletes(), which is
> > less intensive! Not sure if solr support this, too, but as far as I know
> > there is an issue open.
> >
> > Also please note: As soon as one segment is selected for merging (the
> merge
> > policy may also do this dependent on the number of deletes in a segment),
> it
> > will reclaim all deleted ressources - that's what merging does. So
> expunging
> > deletes once per week is a good idea, if your index consists of very old
> and
> > large segments that are rarely merged anymore and lots of documents are
> > deleted from them.
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >> -Original Message-
> >> From: Erick Erickson [mailto:erickerick...@gmail.com]
> >> Sent: Tuesday, June 14, 2011 3:19 PM
> >> To: dev@lucene.apache.org
> >> Subject: Re: Indexing slower in trunk
> >>
> >> Optimization used to have a very noticeable impact on search speed prior
> > to
> >> some index format changes from quite a while ago.
> >>
> >> At this point the effect is much less noticeable, but the thing optimize
> > does
> >> do is reclaim resources from deleted documents. If you have lots of
> >> deletions, it's a good idea to periodically optimize, but in that case
> > it's often
> >> done pretty infrequently (once a
> >> day/week/month) rather than as part of any ongoing indexing process.
> >>
> >> Best
> >> Erick
> >>
> >> 2011/6/14 Yury Kats :
> >> > On 6/14/2011 4:28 AM, Uwe Schindler wrote:
> >> >> indexing and optimizing was only a
> >> >> good idea pre Lucene-2.9, now it's mostly obsolete)
> >> >
> >> > Could you please elaborate on this? Is optimizing obsolete in general
> >> > or after indexing new documents? Is it obsolete after deletions? And
> >> > what it "mostly"?
> >> >
> >> > Thanks!
> >> >
> >> > -
> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> >> > additional commands, e-mail: dev-h...@lucene.apache.org
> >> >
> >> >
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> additional
> >> commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Met vriendelijke groet,

Martijn van Groningen

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050544#comment-13050544
 ] 

Mark Miller commented on SOLR-1431:
---

bq. I think it could conflict with other uses of Zookeeper when the library 
versions are different.

Yeah - always a problem with dependencies like this. It's hard to say what 
direction we go right now though - some have argued even non zookeeper mode 
should be single install zookeeper mode instead. Has it's advantages and 
disadvantages I think. For me, I can really only take it an issue at a team, 
and while I hope to drive some more things around SolrCloud soon, it's 
obviously been a while. Others have some issues open, but more ideas are always 
good.

I certainly agree that CoreContainer could be modularized better - would help 
for testing too. I have an issue to do this for the persistence code (baby 
steps :) ), but feel free to open further issues.

I somewhat took the easy route in integrating zookeeper - there are certainly 
lots of improvements that could be made overall. And TODO's to finish - I think 
a couple guys have done a few from the wiki in various issues, and I know 
loggly has privately impl'd a couple from their talk at revolution (would be 
cool to see that come back, but I know they are busy guys). I love TODO's - 
minimal effort, but when you put one at a future pain point, your code doesn't 
look so stupid even when it's not perfect yet :)

We should discuss in other issues though.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050521#comment-13050521
 ] 

Jason Rutherglen commented on SOLR-1431:


Seems to be fine.  It'd be great to modularize Zookeeper references into a 
separate abstract interface (like what's done here), and not tie it to 
CoreContainer.  I think it could conflict with other uses of Zookeeper when the 
library versions are different.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Indexing slower in trunk

2011-06-16 Thread Erick Erickson

OK, after more tests I'm pretty sure that my personal machine
that I'm testing on is just resource-constrained, leading to the
results I mentioned before. After all, I'm running my Solr
instance, the indexing program, etc on a Macbook
with 1 CPU and 2 cores. The indexing program is parsing the
XML.

On a proper setup, where the indexing machine was separate
from the machine(s) feeding the index process I suspect this would
be a different story. H, I may try that sometime too

Best
Erick

On Tue, Jun 14, 2011 at 9:25 AM, Uwe Schindler  wrote:
> For simple removing deletes, there is also IW.expungeDeletes(), which is
> less intensive! Not sure if solr support this, too, but as far as I know
> there is an issue open.
>
> Also please note: As soon as one segment is selected for merging (the merge
> policy may also do this dependent on the number of deletes in a segment), it
> will reclaim all deleted ressources - that's what merging does. So expunging
> deletes once per week is a good idea, if your index consists of very old and
> large segments that are rarely merged anymore and lots of documents are
> deleted from them.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Tuesday, June 14, 2011 3:19 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Indexing slower in trunk
>>
>> Optimization used to have a very noticeable impact on search speed prior
> to
>> some index format changes from quite a while ago.
>>
>> At this point the effect is much less noticeable, but the thing optimize
> does
>> do is reclaim resources from deleted documents. If you have lots of
>> deletions, it's a good idea to periodically optimize, but in that case
> it's often
>> done pretty infrequently (once a
>> day/week/month) rather than as part of any ongoing indexing process.
>>
>> Best
>> Erick
>>
>> 2011/6/14 Yury Kats :
>> > On 6/14/2011 4:28 AM, Uwe Schindler wrote:
>> >> indexing and optimizing was only a
>> >> good idea pre Lucene-2.9, now it's mostly obsolete)
>> >
>> > Could you please elaborate on this? Is optimizing obsolete in general
>> > or after indexing new documents? Is it obsolete after deletions? And
>> > what it "mostly"?
>> >
>> > Thanks!
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>> > additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Resolved] (LUCENENET-426) Mark BaseFragmentsBuilder methods as virtual

2011-06-16 Thread Digy (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-426.


   Resolution: Fixed
Fix Version/s: Lucene.Net 2.9.4g
   Lucene.Net 2.9.4

Thanks Itamar.
Fixed in trunk & 2.9.4g branch.

DIGY

> Mark BaseFragmentsBuilder methods as virtual
> 
>
> Key: LUCENENET-426
> URL: https://issues.apache.org/jira/browse/LUCENENET-426
> Project: Lucene.Net
>  Issue Type: Improvement
>  Components: Lucene.Net Contrib
>Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x, 
> Lucene.Net 2.9.4g
>Reporter: Itamar Syn-Hershko
>Priority: Minor
> Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g
>
> Attachments: fvh.patch
>
>
> Without marking methods in BaseFragmentsBuilder as virtual, it is meaningless 
> to have FragmentsBuilder deriving from a class named "Base", since most of 
> its functionality cannot be overridden. Attached is a patch for marking the 
> important methods virtual.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene

2011-06-16 Thread ian towey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050508#comment-13050508
 ] 

ian towey commented on LUCENE-2091:
---

Not sure am i using this BM25BooleanQuery correctly, getting variation in the 
number of hits when testing v QueryParser. Is there limitations to the query 
string that BM25BooleanQuery can deal with, e.g.  "gas OR ((oil AND car) NOT 
ship)", the results returned by BM25BooleanQuery seem to be the all docs that 
don't contain the term "ship", (comparing  BM25BooleanQuery v QueryParser)


> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050506#comment-13050506
 ] 

Robert Muir commented on LUCENE-3208:
-

Great, Uwe, I'm satisfied.

Sorry for being so vocal about this, but i wasted many hours on this stupid bug 
(I know you did before, too), and the bug is not very friendly to people that 
debug with System.out.println, you don't catch it until you pull out enough of 
your hair to start using Thread.dumpStack...

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
> LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208-LTC.patch

Here is my idea to enforce one-time normalizing and prevent side-effects during 
tests.

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
> LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050498#comment-13050498
 ] 

Robert Muir commented on LUCENE-3208:
-

bq. Wrapping every weight just makes things uglier, esp if you want to do 
something with the produced weight.

It doesn't have to be done this way necessarily. Personally i would be happy if 
TermWeight had a boolean 'normalized' (used only for asserting) and an assert.

it doesn't have to be totally perfect, but, I refuse to debug this issue again.

If its not done here, I will open a blocker issue!


> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050488#comment-13050488
 ] 

Uwe Schindler commented on LUCENE-3208:
---

A second idea would be that LuceneTestCase.newSearcher() returns such a 
Searcher, that wraps and disallows this. We have other helper classes like 
MockDirectory asserting similar things.

I am currently thinking about coding this, its just a few lines.

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2186) DataImportHandler multi-threaded option throws exception

2011-06-16 Thread Frank Wesemann (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Wesemann updated SOLR-2186:
-

Attachment: TestTikaEntityProcessor.patch

Adds a test for 

> DataImportHandler multi-threaded option throws exception
> 
>
> Key: SOLR-2186
> URL: https://issues.apache.org/jira/browse/SOLR-2186
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Grant Ingersoll
> Attachments: SOLR-2186.patch, SOLR-2186.patch, Solr-2186.patch, 
> TestDocBuilderThreaded.java, TestTikaEntityProcessor.patch, TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and 
> the entire operation fails. This is true even if only 1 thread is configured 
> via *threads='1'*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050481#comment-13050481
 ] 

Yonik Seeley commented on LUCENE-3208:
--

I think it's worth to do it in tests... but not as part of the public API 
(weighting yourself is expert level only and most people don't do it).  
Wrapping every weight just makes things uglier, esp if you want to do something 
with the produced weight.


> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-16 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3191.


Resolution: Fixed

Thanks Uwe!

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, 
> LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050477#comment-13050477
 ] 

Robert Muir commented on LUCENE-3208:
-

i think its worth the trouble, if we can do it.

we shouldnt rely upon the fact that getting sumOfSquaredWeights in some of 
these weights currently has *side effects* and sometimes is just wasted 
computation.

other times it creates wrong scores.


> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050473#comment-13050473
 ] 

Yonik Seeley commented on LUCENE-3208:
--

+1, looks good!  
Doesn't seem like it's worth the trouble to catch Weight being normalized more 
than once.  I'd say just commit this as is.

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050459#comment-13050459
 ] 

Michael McCandless commented on LUCENE-3206:


This new FST API looks *sweet*!  Nice work :)

So with this we no longer need static Util methods right?  (Since each
arc can .follow a sequence of inputs).

I like OutputAlgebra ... better matches what this class actually does,
and if this means we can not create a new Object for every arc transition
that would be great (this makes FST lookups costly now).

I don't know if this is possible, but, one thing I don't like about
the current API is that the BYTE1/2/4 is an enum and not parameterized
into the Builder/FST.  Ie, Builder/FST should really take the input
type as a type param too, since really an FST acts like a SortedMap.
But I fear this could get scary-hairy w/ the required generics...


> FST package API refactoring
> ---
>
> Key: LUCENE-3206
> URL: https://issues.apache.org/jira/browse/LUCENE-3206
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Affects Versions: 3.2
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time 
> to fiddle with it. I've been using the current API for some time and I do 
> have some ideas for improvement. This is a placeholder for these -- I'll post 
> a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

Sorry for messing up the patch again! 

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050444#comment-13050444
 ] 

Simon Willnauer commented on LUCENE-2793:
-

hey varun

patch looks close!

here are some comments:

* the assert context == Context.MERGE should be assert context != Context.MERGE 
|| mergeInfo != null;
* can you move that assert into IOContext(Context, MergeInfo) and let other 
related constructors call this(context, mergeInfo) instead of initializing all 
members themself?
* I think there should be a public static final IOContext READONCE = new 
IOContext(true); then you can make the corresponding constructor private. I 
think the context should be Context.READ instead of default in that case right?
* IOContext(MergePolicy.OneMerge) seems to be unnecessary. I think you should 
add a method to OneMerge to get a MergeInfo from it and only have a MergeInfo 
ctor. Then you can move MergeInfo into OneMerge too.
* PerFieldCodecWrapper still seems to be deleted
* In IndexReader IOContext context=null; should be IOContext context= new 
IOContext(READ); no?
*  no commit should be nocommit - we have a script on jenkins that checks this 
:)
* I still see some whitespace problems in SegmentWriteState.java 
* I think IOContext.DEFAULT_IOCONTEXT should be IOContext.DEFAULT since 
IOContext is implicit


I am waiting for you fixing the tests before I review further. Yet, what is 
missing is still the decision what buffer size to used down in direcotries etc.

good work so far!



> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] alternatives to FSDirectory for multi-threaded search performance

2011-06-16 Thread Robert Stewart

What are the recommended best practices for using FSDirectory vs. RamDirectory, 
etc. for use in multi-threaded search?

In a previous version of Lucene.Net (1.9) I used a modified FSDirectory 
implementation which used a pool of open FileStream objects for each segment 
file, and handed them out in round-robin fashion from the Clone() method.  That 
way multiple threads could read most segment files in parallel.  It definitely 
increased multithreaded search performance quite a bit.  My indexes are quite 
large (100+ million docs) and I can not load entire segments in to RAM using 
RamDirectory.

My question is what is the best practice here?  Is using a pool of descriptors 
as described above the best idea?

Thanks
Bob

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-16 Thread Martin Grotzke (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050435#comment-13050435
 ] 

Martin Grotzke commented on SOLR-2583:
--

bq. Are you sure real floats are actually needed?
In our case score values are e.g. 15887 (one example just taken from one of 
the files). With this sample this test fails:
{noformat}
byte small = SmallFloat.floatToByte315(104626500f);
assertEquals(104626500f, SmallFloat.byte315ToFloat(small), 0f);
-> AssertionError: expected:<1.04626496E8> but was:<1.00663296E8>
{noformat}

This shows that even we have a case where this will produce wrong results, and 
even if we could fix this in our case there might be someone else with the same 
issue.


bq. it would also good to measure performance...
I'd not expect that the boxing makes a real difference here, especially in 
relation to the rest of the time spent during a search request.
A time based performance comparison that has a real value would take some time, 
it would have to put in relation to the rest of a search request (how do you do 
this?) and finally it would require proper interpretation when everything is 
together. Right now I don't think it's worth the effort.


{quote}
bq. that uses a fixed size and an increasing number of puts
I'm not certain how realistic that is, remember behind the scenes 
compactbytearray uses blocks,
and if you touch every one (by putting every K docid or something) then you are 
just testing
the worst case.
{quote}
Do you want to change the test to s.th. that's more realistic?


@Yonik: what do you say regarding the suggestion to use HashMap up to ~5.5% and 
above that using the float[]?

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Indexing slower in trunk

2011-06-16 Thread Simon Willnauer

On Tue, Jun 14, 2011 at 2:54 PM, Erick Erickson  wrote:
> Thanks, guys. Yes, I am running it all locally and disk seeks
> may well be the culprit. This thread is mainly to be sure that
> the behavior I'm seeing is expected, or at least explainable.
>
> Really, I don't need to pursue this further unless there's
> actually data I can gather to help speed things up. If this
> is just a consequence of DWPT and/or my particular
> setup then that's fine. I'm mostly trying to understand
> the characteristics of indexing/searching on the trunk.
> This started with me exploring memory
> requirements, and is really just something I noticed along
> the way and wanted to get some feedback on.
>
> So, absent the commit step, the times are reasonably
> comparable. Can I impose upon one of you to give a
> two-sentence summary of what DWPT buys us from a
> user perspective? If memory serves it should have
> background merging and other goodies.

Let me try to conclude this in a couple of sentences:

Previously IW wrote small in memory segments on a per-thread basis and
merged then together on flush. Yet, this means you can't add / update
any documents while we are flushing and flushing can take a long time.
With DWPT  we write segments single-threaded so each thread gets its
private DWPT. That allows to flush segments to disc concurrently while
carry on indexing at the same time. The performance gains are massive
here, our nightly benchmark sees 269% speedup on indexing throughput.
The downside is that you have to do more merges eventually since you
write more smallish segments (no in memory merge on flush). Plus if
you read from the same disk you are writing too you might see slower
indexing.

to read more about this I wrote a blog that explains a big portion of
it: http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/

>
> Uwe:
> Yep, I was curious about optimize but understand that it's not required
> in recent code. That said, data is not searchable until a commit
> happens, so just for yucks I changed the optimize to a commit. Stats
> of that run below.
>
> Simon:
> OK, adjusted the ram buffer size to 512M, and it's a bit faster, but
> not all that much, see stats, and the delta could well be sampling
> errors, one run doth not a statistical certainty make. Up until the
> commit step, the admin stats page is showing no documents in
> the index so I think this setting completely avoids intermediate
> committing although that says nothing about the individual writers
> writing lots of segments to disk, that still happens.
>
> Added 188 docs. Took 1437 ms. cumulative interval (seconds) = 284
> Added 189 docs. Took 1285 ms. cumulative interval (seconds) = 285
> Added 190 docs. Took 1182 ms. cumulative interval (seconds) = 286
> Added 191 docs. Took 1675 ms. cumulative interval (seconds) = 288
> About to commit, total time so far: 290
> Total Time Taken-> 395 seconds    ***100 secs for the commit to finish.
> Total documents added-> 1917728
> Docs/sec-> 4855
>
> Thanks, all
> Erick
>
>
> On Tue, Jun 14, 2011 at 4:39 AM, Simon Willnauer
>  wrote:
>> Erick, it seems you need to adjust your settings for 4.0 a little.
>> When you index with DWPT it builds thread private segments which are
>> independently flushed to disk. Yet, when you set your ram buffer IW
>> will accumulate the ram used by all active DWPT and flush the largest
>> once you reach your ram buffer. with 128M you might end up wil lots of
>> small segments which need to be merged in the background. Eventually
>> what will happen here is that your disk is so busy that you are not
>> able to flush fast enough and threads might stall.
>>
>> What you can try here is adjust your RAM buffer to be a little higher,
>> lets say 350MB or change the max number of thread states in
>> DocumentsWriterPerThreadPool ie.
>> ThreadAffinityDocumentsWriterThreadPool. The latter is unfortunately
>> not exposed yet in solr so maybe for testing you just want to change
>> the default value in DocumentsWriterPerThreadPool to 4. That will also
>> cause segments to be bigger eventually.
>>
>> simon
>>
>> On Tue, Jun 14, 2011 at 10:28 AM, Uwe Schindler  wrote:
>>> Hi Erick,
>>>
>>> Do you use harddisks or SSDs? I assume harddisks, which may explain what you
>>> see:
>>>
>>> - DWPT writes lots of segments in parallel, which also explains why you are
>>> seeing more files. Writing in parallel to several files, needs more head
>>> movements of your harddisk and this slows down. In the past, only one
>>> segment was written at the same time (sequential), so the harddisk is not so
>>> stressed.
>>> - Optimizing may be slower for the same reason: there are many more files to
>>> merge (but optimize cost should not be counted as a problem here as normally
>>> you won't need to optimize after initial indexing and optimizing was only a
>>> good idea pre Lucene-2.9, now it's mostly obsolete)
>>>
>>> Uwe
>>>
>>> -
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63,

[jira] [Commented] (SOLR-2597) XmlCharFilter

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050406#comment-13050406
 ] 

Robert Muir commented on SOLR-2597:
---

yes, the factories are long-lived and do expensive things up-front to configure 
themselves (parsing files etc)


> XmlCharFilter
> -
>
> Key: SOLR-2597
> URL: https://issues.apache.org/jira/browse/SOLR-2597
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Mike Sokolov
> Attachments: SOLR-2597.patch
>
>
> This CharFilter processes incoming XML using the Woodstox parser, stripping 
> all non-text content and remembering offsets, just like HTMLCharFilter, but 
> respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
> also provides the ability to exclude (and include) the content of certain 
> named elements.
> In order to compute character offsets properly when mixed line termination 
> styles are present (\r, \r\n), or when XML character entities (<, ", 
> &) are present, we require a newer version of Woodstox (4.1.1) than is 
> currently in solr/lib.  The earlier versions of the parser could not report 
> these entity events, so we couldn't tell the difference between "<" and 
> "<" and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Gunnar Wagenknecht (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050404#comment-13050404
 ] 

Gunnar Wagenknecht commented on SOLR-219:
-

Any progress on the issue? We are also hit by this issue. Ideally, it would be 
nice if I could configure the analyzers to run for wildcard queries. For 
example, I still want to do lowercasing and character normalization (umlauts) 
for wildcard queries.

> Determine if prefix, wildcard, fuzzy queries should be lowercased
> -
>
> Key: SOLR-219
> URL: https://issues.apache.org/jira/browse/SOLR-219
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Priority: Minor
> Fix For: 3.3
>
> Attachments: lowercase_prefix.patch, wildcardlowercase.patch
>
>
> Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy 
> queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208.patch

Here the patch with IndexSearcher.createWeigth renamed to 
createNormalizedWeight() and public/expert, so Solr can access it and custom 
search code.

I am currently thinking about a possibility to check that each Weight is only 
normaliized one time, possibly using setOnce(). Its not easy to do, maybe wrap 
the Weight returned by the IndexSearcher method using a WrappedWeight that 
throws UOE on normalize,

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208.patch, LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050382#comment-13050382
 ] 

Michael McCandless commented on LUCENE-3208:


+1

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2597) XmlCharFilter

2011-06-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050370#comment-13050370
 ] 

Mike Sokolov commented on SOLR-2597:


OK - I can extend LuceneTestCase, use its random, add can certainly a test for 
the Factory.

I'm not sure what the right package for this code is; working in Eclipse of 
course, all the jars get mushed into one giant classpath.  I guess I should 
build w/ant to see the dependency issues?  But it does sound as if it needs to 
move somewhere where solr/lib contents can be a dependent.

Apparently there is another jar you can get 
(http://woodstox.codehaus.org/stax-api-1.0.1.jar) to provide the 
javax.xml.stream package (StaX) for Java 5, but it doesn't sound as if it would 
be worth the trouble if this moves into solr land - is that right, can we rely 
on Java 6 there? 

I agree that having a static parser is distasteful, but it's a performance 
optimization.  It tends to be expensive to instantiate these parsers.  I'm not 
clear on what the object lifecycle for the XmlCharFilter is exactly - Robert 
are you saying the factory is long-lived, but the filter is not?

> XmlCharFilter
> -
>
> Key: SOLR-2597
> URL: https://issues.apache.org/jira/browse/SOLR-2597
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Mike Sokolov
> Attachments: SOLR-2597.patch
>
>
> This CharFilter processes incoming XML using the Woodstox parser, stripping 
> all non-text content and remembering offsets, just like HTMLCharFilter, but 
> respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
> also provides the ability to exclude (and include) the content of certain 
> named elements.
> In order to compute character offsets properly when mixed line termination 
> styles are present (\r, \r\n), or when XML character entities (<, ", 
> &) are present, we require a newer version of Woodstox (4.1.1) than is 
> currently in solr/lib.  The earlier versions of the parser could not report 
> these entity events, so we couldn't tell the difference between "<" and 
> "<" and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1331) Support merging multiple cores

2011-06-16 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1331:


Attachment: SOLR-1331.patch

Adds a srcCore (multi-valued) parameter through which one or more source core 
names can be given.

We use the IW.addIndexes(IndexReader...) method to merge the source core's 
indexes to the target core's index. Even if an IW is open on the source 
indexes, using a reader protects against corruption.

Note - although the indexDir param also ends up calling the 
IW.addIndexes(IndexReader...) method, we cannot protect against open IWs on the 
directory so the caveat of calling commit before using mergeindexes with 
indexDir param still applies.

A commit needs to be called after a merge action to see the changes.

> Support merging multiple cores
> --
>
> Key: SOLR-1331
> URL: https://issues.apache.org/jira/browse/SOLR-1331
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 3.3
>
> Attachments: SOLR-1331.patch
>
>
> There should be a provision to merge one core with another. It should be 
> possible to create a core, add documents to it and then just merge it into 
> the main core which is serving requests. This way, the user will not need to 
> know the filesystem as it is needed for SOLR-1051

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2601) Create a MessagePackResponseWriter

2011-06-16 Thread Noble Paul (JIRA)

Create a MessagePackResponseWriter
--

 Key: SOLR-2601
 URL: https://issues.apache.org/jira/browse/SOLR-2601
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor


In the past I explored various standard communication formats for Solr. No 
other format was very suitable. MessagePack seems to be a suitable format . 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208.patch

First patch with some minor hacks and 2 disabled tests.

The problems:
- 2 tests are in Span Package and call the IndexSearcher.createWeight method, 
which is protected. I commented them out for now
- QueryValueSource is still to be investigated, I dont completely understand 
how it works, it looks fine for now (tests pass) but I have to follow how it 
works. Maybe Yonik can help. There was also a bug in fcontext preventing 
caching the weight. There is a reflection hack in it for now (nocommit)

This patch also fixes:
- Solr's BoostedQuery was fixed, too

I still dont like the name of the protected method in IndexSearcher 
createWeight(), as it does more than that. It rewrites, creates the weight and 
then normalizes the query. I would like to rename it and make it maybe public, 
but expert only.

For 3.x I would do the rename, too, and use VirtualMethod to fix invocations by 
3rd party code if overridden.

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3208.patch
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050347#comment-13050347
 ] 

Dawid Weiss edited comment on LUCENE-3206 at 6/16/11 11:01 AM:
---

This is my take at the revamped FST API. My changes are mostly aiming at having 
a bit clearer code (especially wrt. to loops), but also detach the "algebra" of 
a transition's output from the actual output. This should allow us to create an 
output algebra that would work directly on mutable integers, for example (to 
save on autoboxing). I also just like the way it reads after the changes:
{code}
  FST fst = FSTBuilder.fst(FST.ArcLabel.BYTE2, PositiveInt.class)
.add("abc", 10)
.add("abc, 5)
.add("def", 0, 3), 2)
.build();
{code}
or a loop over all arcs of a state:
{code}
  Arc arc = fst.getRoot();
  for (Arc tmp = arc.copy(); tmp.hasNext(); tmp.next()) {
int label = tmp.getLabel(); // transition label here.
Integer output = tmp.getOutput();
  }
{code}

I definitely didn't consider all the use cases that FSTs are used for currently 
(in particular the "stop" bit indicating non-accepted input sequences that are 
also dead ends), but I think these could be integrated... I think :) 

Arcs now also store the pointer to the FST object, which may seem like an 
overhead, but I doubt it really will be (it's a single pointer and we buffer 
arcs whenever we can; a larger waste is having an object on each arc's output, 
even if it can be a primitive type or reused buffer).




  was (Author: dweiss):
This is my take at the revamped FST API. My changes are mostly aiming at 
having a bit clearer code (especially wrt. to loops), but also detach the 
"algebra" of a transition's output from the actual output. This should allow us 
to create an output algebra that would work directly on mutable integers, for 
example (to save on autoboxing). I also just like the way it reads after the 
changes:
{code}
  FST fst = FSTBuilder.fst(FST.ArcLabel.BYTE2, PositiveInt.class)
.add("abc", 10)
.add("abc, 5)
.add("def", 0, 3), 2)
.build();
{code}
or a loop over all arcs of a state:
{code}
  Arc arc = fst.getRoot();
  for (Arc tmp = arc.copy(); tmp.hasNext(); tmp.next()) {
int label = tmp.getLabel(); // transition label here.
Integer output = tmp.getOutput(); // FSAs have a constant empty output.
  }
{code}

I definitely didn't consider all the use cases that FSTs are used for currently 
(in particular the "stop" bit indicating non-accepted input sequences that are 
also dead ends), but I think these could be integrated... I think :) 

Arcs now also store the pointer to the FST object, which may seem like an 
overhead, but I doubt it really will be (it's a single pointer and we buffer 
arcs whenever we can; a larger waste is having an object on each arc's output, 
even if it can be a primitive type or reused buffer).



  
> FST package API refactoring
> ---
>
> Key: LUCENE-3206
> URL: https://issues.apache.org/jira/browse/LUCENE-3206
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Affects Versions: 3.2
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time 
> to fiddle with it. I've been using the current API for some time and I do 
> have some ideas for improvement. This is a placeholder for these -- I'll post 
> a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050347#comment-13050347
 ] 

Dawid Weiss commented on LUCENE-3206:
-

This is my take at the revamped FST API. My changes are mostly aiming at having 
a bit clearer code (especially wrt. to loops), but also detach the "algebra" of 
a transition's output from the actual output. This should allow us to create an 
output algebra that would work directly on mutable integers, for example (to 
save on autoboxing). I also just like the way it reads after the changes:
{code}
  FST fst = FSTBuilder.fst(FST.ArcLabel.BYTE2, PositiveInt.class)
.add("abc", 10)
.add("abc, 5)
.add("def", 0, 3), 2)
.build();
{code}
or a loop over all arcs of a state:
{code}
  Arc arc = fst.getRoot();
  for (Arc tmp = arc.copy(); tmp.hasNext(); tmp.next()) {
int label = tmp.getLabel(); // transition label here.
Integer output = tmp.getOutput(); // FSAs have a constant empty output.
  }
{code}

I definitely didn't consider all the use cases that FSTs are used for currently 
(in particular the "stop" bit indicating non-accepted input sequences that are 
also dead ends), but I think these could be integrated... I think :) 

Arcs now also store the pointer to the FST object, which may seem like an 
overhead, but I doubt it really will be (it's a single pointer and we buffer 
arcs whenever we can; a larger waste is having an object on each arc's output, 
even if it can be a primitive type or reused buffer).




> FST package API refactoring
> ---
>
> Key: LUCENE-3206
> URL: https://issues.apache.org/jira/browse/LUCENE-3206
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Affects Versions: 3.2
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time 
> to fiddle with it. I've been using the current API for some time and I do 
> have some ideas for improvement. This is a placeholder for these -- I'll post 
> a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Dawid Weiss (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-3206:


Attachment: LUCENE-3206.patch

An empty (but compiling and consistent) take at the FST/FSA API.

> FST package API refactoring
> ---
>
> Key: LUCENE-3206
> URL: https://issues.apache.org/jira/browse/LUCENE-3206
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Affects Versions: 3.2
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time 
> to fiddle with it. I've been using the current API for some time and I do 
> have some ideas for improvement. This is a placeholder for these -- I'll post 
> a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050339#comment-13050339
 ] 

Noble Paul commented on SOLR-1431:
--

This might need some more cleanup, but I think it is close to a state where it 
can be checked in. 



> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050340#comment-13050340
 ] 

Mark Miller commented on SOLR-1431:
---

I can look at this latest patch soon Noble. We should also give Jason a fair 
amount of time to weigh in.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1431:


Assignee: Noble Paul  (was: Mark Miller)

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1431:
-

Attachment: SOLR-1431.patch

Even the checkDistributed() method is abstracted out to ShardHandler. The 
current HttpShardHandler (this is default) takes care of zookeeper also

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I made the changes. I also fixed test-framework but haven't touched the test 
cases yet. 

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050316#comment-13050316
 ] 

Uwe Schindler commented on LUCENE-3208:
---

I started to rewrite some stuff, very straightforward.

- BufferedDeletesStream has to be changed as it was also calling Query.weight, 
but I replaced the usage here by QueryWrapperFilter and getting the DocIdSet. 
Code gets much easier here.
- QueryWrapperFilter's hack was rewritten, easy
- in TestFrameWork, QueryUtils were also rewritten, they often use weight, but 
thats internal only.

The main issue:
In IndexSearcher is already a method called createWeight(Query) (which 
currently delegates to the Query). I moved the code over here. I have to still 
complain about the name, it creates a Weight yes, but it should also note that 
it rewrites and normalizes the weight. So I would like to rename that method, 
too and deprecate the old one.

For now I leave the name unchanged. Patch comes soon (core only).

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050301#comment-13050301
 ] 

Simon Willnauer commented on LUCENE-3208:
-

+1

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050296#comment-13050296
 ] 

Robert Muir commented on LUCENE-3208:
-

+1

> Move Query.weight() to IndexSearcher as protected method
> 
>
> Key: LUCENE-3208
> URL: https://issues.apache.org/jira/browse/LUCENE-3208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
>
> We had this issue several times, latest in LUCENE-3207.
> The method Query.weight() was left in Query for backwards reasons in Lucene 
> 2.9 when we changed Weight class. This method is only to be called on 
> top-level queries - and this is done by IndexSearcher. This method is just a 
> utility method, that has nothing to do with the query itsself (it just 
> combines the createWeight method and calls the normalization afterwards). 
> The problem we have is that any query that wraps other queries (like 
> CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
> Query.createWeight(), it will do normalization two times, leading to strange 
> bugs.
> For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
> replacement method with a big deprecation warning, so user sees this. In 
> IndexSearcher itsself the method will be protected to only be called by 
> itsself or subclasses of IndexSearcher. Delegation for backwards is no 
> problem, as protected is accessible by classes in same package.
> I would suggest the method name to be 
> IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

Move Query.weight() to IndexSearcher as protected method


 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0


We had this issue several times, latest in LUCENE-3207.

The method Query.weight() was left in Query for backwards reasons in Lucene 2.9 
when we changed Weight class. This method is only to be called on top-level 
queries - and this is done by IndexSearcher. This method is just a utility 
method, that has nothing to do with the query itsself (it just combines the 
createWeight method and calls the normalization afterwards). 

The problem we have is that any query that wraps other queries (like 
CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
Query.createWeight(), it will do normalization two times, leading to strange 
bugs.

For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
replacement method with a big deprecation warning, so user sees this. In 
IndexSearcher itsself the method will be protected to only be called by itsself 
or subclasses of IndexSearcher. Delegation for backwards is no problem, as 
protected is accessible by classes in same package.

I would suggest the method name to be 
IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term & collection statistics

2011-06-16 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3174:


Attachment: LUCENE-3174.patch

i fixed a few problems: javadocs warnings and also the fact that i had left an 
assert commented out from hair-pulling with CustomScoreQuery.


> Similarity.Stats class for term & collection statistics
> ---
>
> Key: LUCENE-3174
> URL: https://issues.apache.org/jira/browse/LUCENE-3174
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
> LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
> LUCENE-3174_normalize_boost.patch
>
>
> In order to support ranking methods besides TF-IDF, we need to make the 
> statistics they need available. These statistics could be computed in 
> computeWeight (soon to become computeStats) and stored in a separate object 
> for easy access. Since this object will be used solely by subclasses of 
> Similarity, it should be implented as a static inner class, i.e. 
> Similarity.Stats.
> There are two ways this could be implemented:
> - as a single Similarity.Stats class, reused by all ranking algorithms. In 
> this case, this class would have a member field for all statistics;
> - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
> subclass would define only the statistics needed for the ranking algorithm.
> In the second case, the Stats class in DefaultSimilarity would have a single 
> field, idf, while the one in e.g. BM25Similarity would have idf and average 
> field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050264#comment-13050264
 ] 

Uwe Schindler commented on LUCENE-3207:
---

This bug is stupid: I had a similar issue during the rewrite of 
ConstantScoreQuery to directly wrap queries, where I copied some code from 
CustomScoreQuery (just removed the custom scoring). I fixed it in Constant*, 
not sure why I left CustomScoreQuery unchanged. Maybe because tests passed.

> CustomScoreQuery calls weight() where it should call createWeight()
> ---
>
> Key: LUCENE-3207
> URL: https://issues.apache.org/jira/browse/LUCENE-3207
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-3207.patch
>
>
> Thanks to Uwe for helping me track down this bug after I pulled my hair out 
> for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050263#comment-13050263
 ] 

Robert Muir commented on LUCENE-3207:
-

as an explanation, this causes this query to call sumOfSquaredWeights + 
queryNorm + normalize() twice.

the reason it doesnt cause any tests to fail in trunk is this:
in trunk sumOfSquaredWeights is not really a getter, its also a setter:
{noformat}
@Override
public float sumOfSquaredWeights() {
  queryWeight = idf * getBoost(); // compute query weight
  return queryWeight * queryWeight;   // square it
}
{noformat}

in my patch on LUCENE-3174, my sumOfSquaredWeights returns queryWeight * 
queryWeight, but doesn't reset any state.
so you end out normalizing twice and thats why the test failed on the branch.


> CustomScoreQuery calls weight() where it should call createWeight()
> ---
>
> Key: LUCENE-3207
> URL: https://issues.apache.org/jira/browse/LUCENE-3207
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-3207.patch
>
>
> Thanks to Uwe for helping me track down this bug after I pulled my hair out 
> for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3207:


Attachment: LUCENE-3207.patch

> CustomScoreQuery calls weight() where it should call createWeight()
> ---
>
> Key: LUCENE-3207
> URL: https://issues.apache.org/jira/browse/LUCENE-3207
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-3207.patch
>
>
> Thanks to Uwe for helping me track down this bug after I pulled my hair out 
> for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Robert Muir (JIRA)

CustomScoreQuery calls weight() where it should call createWeight()
---

 Key: LUCENE-3207
 URL: https://issues.apache.org/jira/browse/LUCENE-3207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3207.patch

Thanks to Uwe for helping me track down this bug after I pulled my hair out for 
hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term & collection statistics

2011-06-16 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3174:


Attachment: LUCENE-3174.patch

here's the patch with the unrelated bug fixed in CustomScoreQuery.

now all tests pass.

> Similarity.Stats class for term & collection statistics
> ---
>
> Key: LUCENE-3174
> URL: https://issues.apache.org/jira/browse/LUCENE-3174
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
> LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174_normalize_boost.patch
>
>
> In order to support ranking methods besides TF-IDF, we need to make the 
> statistics they need available. These statistics could be computed in 
> computeWeight (soon to become computeStats) and stored in a separate object 
> for easy access. Since this object will be used solely by subclasses of 
> Similarity, it should be implented as a static inner class, i.e. 
> Similarity.Stats.
> There are two ways this could be implemented:
> - as a single Similarity.Stats class, reused by all ranking algorithms. In 
> this case, this class would have a member field for all statistics;
> - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
> subclass would define only the statistics needed for the ranking algorithm.
> In the second case, the Stats class in DefaultSimilarity would have a single 
> field, idf, while the one in e.g. BM25Similarity would have idf and average 
> field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050245#comment-13050245
 ] 

Noble Paul commented on SOLR-1431:
--

What are the concerns with the latest patch? I can work on them. I guess this 
is the optimal way to resolve SOLR-2592




> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

95 matches

Mail list logo