date:20110120


 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2657:


Attachment: LUCENE-2657-branch_3x.patch
LUCENE-2657.patch

Patches implementing my proposal to place the Maven POMs in 
{{dev-tools/maven/}} and add a new top-level Ant target {{get-maven-poms}}, 
which is invoked by {{generate-maven-artifacts}}.  {{generate-maven-artifacts}} 
remains in the top-level {{build.xml}}, as well as in {{lucene/}}, {{solr/}}, 
and {{modules/}} (trunk only).

I couldn't figure out a way for {{generate-maven-artifacts}} under child 
directories {{lucene/}}, {{solr/}} and {{modules/}} to depend on the top-level 
{{get-maven-poms}} target, so I instead have {{generate-maven-artifacts}} in 
the child directories explicitly run the {{get-maven-poms}} target via the 
{{ant}} task.  As a result, running {{generate-maven-artifacts}} from the top 
level will cause {{get-maven-poms}} to be run once for each child directory, 
but the repeated copy operation doesn't hurt anything, and the process is quick.

Unless there are objections, I will commit this tomorrow.

 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657-branch_3x.patch, 
 LUCENE-2657-branch_3x.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1996) Possible edismax phrase query bug with query parametr like: q=(aaa+bbb)+OR+otherField:(zzz)^30

2011-01-20 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984127#action_12984127
 ] 

Otis Gospodnetic commented on SOLR-1996:


Rafał, was that the case?  If so, we can close this.

 Possible edismax phrase query bug with query parametr like: 
 q=(aaa+bbb)+OR+otherField:(zzz)^30
 --

 Key: SOLR-1996
 URL: https://issues.apache.org/jira/browse/SOLR-1996
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.4, 1.4.1
 Environment: Ubuntu 10, Java 5 - 6
Reporter: Rafał Kuć

 I think there is a problem with edismax query parser. When I try to use pf 
 parameter with query parameter defined like this: q=(aaa bbb)+OR+field:(aaa 
 bbb)^100 the pf parametr is not working - with debug turned on i see some 
 strange phrase query as a part of lucene raw query. Of course when I set the 
 query parametr to something like: q=(aaa bbb) without the 'OR' part the 
 phrase boost is working perfectly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1996) Possible edismax phrase query bug with query parametr like: q=(aaa+bbb)+OR+otherField:(zzz)^30

2011-01-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984131#action_12984131
 ] 

Rafał Kuć commented on SOLR-1996:
-

Yes - I forgot about the issue - please close it ;) 

 Possible edismax phrase query bug with query parametr like: 
 q=(aaa+bbb)+OR+otherField:(zzz)^30
 --

 Key: SOLR-1996
 URL: https://issues.apache.org/jira/browse/SOLR-1996
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.4, 1.4.1
 Environment: Ubuntu 10, Java 5 - 6
Reporter: Rafał Kuć

 I think there is a problem with edismax query parser. When I try to use pf 
 parameter with query parameter defined like this: q=(aaa bbb)+OR+field:(aaa 
 bbb)^100 the pf parametr is not working - with debug turned on i see some 
 strange phrase query as a part of lucene raw query. Of course when I set the 
 query parametr to something like: q=(aaa bbb) without the 'OR' part the 
 phrase boost is working perfectly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984134#action_12984134
 ] 

Shay Banon commented on LUCENE-2871:


Strange, did not get it when running the tests, will try and find out why it 
can happen.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984139#action_12984139
 ] 

Michael McCandless commented on LUCENE-2871:


Yeah, me neither -- tests all pass when I force dir to eg NIOFSDir, and, my 
benchmark runs on the 100K index; just fails for the 10M index... curious.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984145#action_12984145
 ] 

Uwe Schindler commented on LUCENE-2871:
---

Looking at the current patch, the class seems wrong. In my opinion, this should 
be only in NIOFSDirectory. SimpleFSDir should only use RAF.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Query parser contract changes?

2011-01-20 Thread Robert Muir

On Tue, Jan 18, 2011 at 3:58 AM,  karl.wri...@nokia.com wrote:
 This turns out to have indeed been due to a recent, but un-announced, index 
 format change.  A rebuilt index worked properly.


This was from LUCENE-2862, i'll send out an email now

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

heads up (late)

2011-01-20 Thread Robert Muir

5 days ago the trunk index format changed with LUCENE-2862,
you should re-index any trunk indexes.

its likely if you open up old trunk indexes, you won't get an
exception, just that queries will return zero results.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Query parser contract changes?

2011-01-20 Thread Michael McCandless

On Thu, Jan 20, 2011 at 6:59 AM, Robert Muir rcm...@gmail.com wrote:
 On Tue, Jan 18, 2011 at 3:58 AM,  karl.wri...@nokia.com wrote:
 This turns out to have indeed been due to a recent, but un-announced, index 
 format change.  A rebuilt index worked properly.


 This was from LUCENE-2862, i'll send out an email now

Ugh!  Mea culpa.  Sorry :(  I forgot this was an index change!

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()

[
https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984148#action_12984148
]

Robert Muir commented on LUCENE-2876:
-

I'd like to commit this later today if there aren't any objections (its just
boring cleanup).

As far as 3.1, i rethought the issue, and i think e.g. DisjunctionMaxQuery
should really work with LUCENE-2590.

So i'll look at fixing trying to pass non-null weight in 3.x too, i think users
will see it as a bug...

Remove Scorer.getSimilarity()
-

Key: LUCENE-2876
URL: https://issues.apache.org/jira/browse/LUCENE-2876
Project: Lucene - Java
Issue Type: Task
Components: Query/Scoring
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: LUCENE-2876.patch

Originally this was part of the patch for per-field Similarity (LUCENE-2236),
but I pulled it
out here as its own issue as its really mostly unrelated. I also like it as a
separate issue
to apply the deprecation to branch_3x to just make less surprises/migration
hassles for 4.0 users.
Currently Scorer takes a confusing number of ctors, either a Similarity, or a
Weight + Similarity.
Also, lots of scorers don't use the Similarity at all, and its not really
needed in Scorer itself.
Additionally, the Weight argument is often null. The Weight makes sense to be
here in Scorer,
its the parent that created the scorer, and used by Scorer itself to support
LUCENE-2590's features.
But I dont think all queries work with this feature correctly right now,
because they pass null.
Finally the situation gets confusing if you start to consider delegators like
ScoreCachingWrapperScorer,
which arent really delegating correctly so I'm unsure features like
LUCENE-2590 aren't working with this.
So I think we should remove the getSimilarity, if your scorer uses a
Similarity its already coming
to you via your ctor from your Weight and you can manage this yourself.
Also, all scorers should pass the Weight (parent) that created them, and this
should be Scorer's only ctor.
I fixed all core/contrib/solr Scorers (even the internal ones) to pass their
parent Weight, just for consistency
of this visitor interface. The only one that passes null is Solr's
ValueSourceScorer.
I set fix-for 3.1, not because i want to backport anything, only to mark the
getSimilarity deprecated there.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()

2011-01-20 Thread Doron Cohen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984171#action_12984171
]

Doron Cohen commented on LUCENE-2876:
-

Patch looks good.

+1 for this cleanup which removes calls with null arg and comment
{//similarity not in use}.

Some minor comments:

* jdoc: in some of the scorer constructors a Weight param was added but
existing jdocs for the costructor which document (some) params was not updated
to also mention the weight. I am not 100% sure this should be fixed, as there
is inconsistency in the level of jdoc for scorer implementations. So if there
were no jdocs at all there I would say nothing, but since there were some now
they became less complete...
* ExactPhraseScorer is created with both Weight and Similarity - I think the
Similarity param can be removed as part of this cleanup.
* Same for SloppyPhraseScorer, PhraseScorer, SpanScorer, TermScorer,
MatchAllDocsScorer - Similarity param can be removed.

One question not related to this patch - just saw it reviewing:

* it is interesting that SloppyPhraseScorer now extends PhraseScorer but
ExactPhraseScorer does not, is this on purpose? Perhaps related do Mike's
recent optimizations in this scorer?

Remove Scorer.getSimilarity()
-

Key: LUCENE-2876
URL: https://issues.apache.org/jira/browse/LUCENE-2876
Project: Lucene - Java
Issue Type: Task
Components: Query/Scoring
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: LUCENE-2876.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 3949 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3949/

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple

Error Message:
expected:3 but was:2

Stack Trace:
junit.framework.AssertionFailedError: expected:3 but was:2
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple(TestLBHttpSolrServer.java:126)




Build Log (for compile errors):
[...truncated 8226 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()

[
https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984177#action_12984177
]

Robert Muir commented on LUCENE-2876:
-

Thanks for the review Doron!

bq. jdoc: in some of the scorer constructors a Weight param was added but
existing jdocs for the costructor which document (some) params was not updated
to also mention the weight. I am not 100% sure this should be fixed, as there
is inconsistency in the level of jdoc for scorer implementations. So if there
were no jdocs at all there I would say nothing, but since there were some now
they became less complete...

I'll fix this, thanks!

{quote}
ExactPhraseScorer is created with both Weight and Similarity - I think the
Similarity param can be removed as part of this cleanup.
Same for SloppyPhraseScorer, PhraseScorer, SpanScorer, TermScorer,
MatchAllDocsScorer - Similarity param can be removed.
{quote}

These still need the Similarity param? They use it in scoring, its just they
don't pass it to the superclass constructor (Scorer's constructor).

Its possible I misunderstood your idea though. Lets take the TermQuery example,
are you suggesting that
we should expose TermWeight's Similarity and just pass TermWeight to
TermScorer (requiring TermScorer
to take a TermWeight in its ctor instead of Weight + Similarity?) Currently
TermWeight's local copy of
Similarity, which it uses to compute IDF, is private.

bq. it is interesting that SloppyPhraseScorer now extends PhraseScorer but
ExactPhraseScorer does not, is this on purpose? Perhaps related do Mike's
recent optimizations in this scorer?

Yes, that's correct. Just at a glance he might have done this so that
ExactPhraseScorer can compute a
score cache like TermScorer, and other similar optimizations since the tf
values are really integers for
this exact case.

It might be that if we look at splitting calculations and matching out from
Scorer, that we can make
these matchers like ExactPhrase/SloppyPhrase simpler, and we could then clean
up... not sure though!

Remove Scorer.getSimilarity()
-

Key: LUCENE-2876
URL: https://issues.apache.org/jira/browse/LUCENE-2876
Project: Lucene - Java
Issue Type: Task
Components: Query/Scoring
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: LUCENE-2876.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()

2011-01-20 Thread Doron Cohen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984186#action_12984186
]

Doron Cohen commented on LUCENE-2876:
-

{quote}
These still need the Similarity param? They use it in scoring, its just they
don't pass it to the superclass constructor (Scorer's constructor).
... Currently TermWeight's local copy of Similarity, which it uses to compute
IDF, is private.
{quote}

You're right, I was for some reason under the impression that part of the
reason for the change is that Weight already exposes Similarity but it is not,
and I think it shouldn't, so current patch is good here.

Remove Scorer.getSimilarity()
-

Key: LUCENE-2876
URL: https://issues.apache.org/jira/browse/LUCENE-2876
Project: Lucene - Java
Issue Type: Task
Components: Query/Scoring
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: LUCENE-2876.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Adding Myself as Mentor to the Incubator Proposal

2011-01-20 Thread Stefan Bodewig

Hi all,

I'm in the process of signing up as mentor on the Incubator wiki and
thought I'd better introduce myself since I don't expect anybody around
here to know me.

Like many people I came to the ASF to scratch a few itches I encountered
with an existing project I used at work.  In my case it was JServ in
around 1998 but I mainly remained a pure user with the occasional bug
report back then.  But I gradually became more involved over time.  In
2000 I was voted in as a committer to Ant and later the year as a member
of the ASF.

Today I still am an active committer to Ant, the PMC chairman of Gump
and involved in a few smaller parts of Commons and the remainings of
Jakarta.  A few years ago I mentored Apache Ivy through incubation so I
already wear my Incubator scars.

During work hours the .NET platform has become my main development
target since 2005.  Even though all the ASF projects I'm involved in are
Java projects I'm very familiar with C# and the platform in general.

Early last year I coded up a prototype for a customer project (that
never took off) using Lucene.NET and recall how wrong it felt so I
fully understand and appreciate the need for an idiomatic API.

It's my goal to keep out of any technical decisions, that's really up to
the committers to decide.  I may find time to participate in the
discussions and even provide a patch or two but won't promise anything.

I hope I can contribute a small part to a successful reboot of the
project.  Let's enjoy the ride.

Stefan

-- 
http://stefan.samaflost.de/

[jira] Updated: (LUCENE-2876) Remove Scorer.getSimilarity()

[
https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2876:

Attachment: LUCENE-2876.patch

Updated patch: with javadocs inconsistencies corrected.

Remove Scorer.getSimilarity()
-

Key: LUCENE-2876
URL: https://issues.apache.org/jira/browse/LUCENE-2876
Project: Lucene - Java
Issue Type: Task
Components: Query/Scoring
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: LUCENE-2876.patch, LUCENE-2876.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Adding Myself as Mentor to the Incubator Proposal

2011-01-20 Thread Michael Herndon

Welcome =)





- Michael

On Thu, Jan 20, 2011 at 8:50 AM, Stefan Bodewig bode...@apache.org wrote:

 Hi all,

 I'm in the process of signing up as mentor on the Incubator wiki and
 thought I'd better introduce myself since I don't expect anybody around
 here to know me.

 Like many people I came to the ASF to scratch a few itches I encountered
 with an existing project I used at work.  In my case it was JServ in
 around 1998 but I mainly remained a pure user with the occasional bug
 report back then.  But I gradually became more involved over time.  In
 2000 I was voted in as a committer to Ant and later the year as a member
 of the ASF.

 Today I still am an active committer to Ant, the PMC chairman of Gump
 and involved in a few smaller parts of Commons and the remainings of
 Jakarta.  A few years ago I mentored Apache Ivy through incubation so I
 already wear my Incubator scars.

 During work hours the .NET platform has become my main development
 target since 2005.  Even though all the ASF projects I'm involved in are
 Java projects I'm very familiar with C# and the platform in general.

 Early last year I coded up a prototype for a customer project (that
 never took off) using Lucene.NET and recall how wrong it felt so I
 fully understand and appreciate the need for an idiomatic API.

 It's my goal to keep out of any technical decisions, that's really up to
 the committers to decide.  I may find time to participate in the
 discussions and even provide a patch or two but won't promise anything.

 I hope I can contribute a small part to a successful reboot of the
 project.  Let's enjoy the ride.

 Stefan

 --
 http://stefan.samaflost.de/

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984206#action_12984206
 ] 

Shay Banon commented on LUCENE-2871:


bq. Looking at the current patch, the class seems wrong. In my opinion, this 
should be only in NIOFSDirectory. SimpleFSDir should only use RAF.

Its a good question, not sure what to do with it. Here is the problem. The 
channel output can be used with all 3 FS dirs (simple, nio, and mmap), and 
actually might make sense to be used even with SimpleFS (i.e. using non nio to 
read, but use file channel to write). In order to have all of them supported, 
currently, the simplest way is to put it in the base class so code will be 
shared. On IRC, a discussion was made to externalize the outputs and inputs, 
and then one can more easily pick and choose, but I think this will belong on a 
different patch.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch, LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984222#action_12984222
 ] 

Earwin Burrfoot commented on LUCENE-2871:
-

Before arguing where to put this new IndexOutput, I think it's wise to have a 
benchmark proving we need it at all.
I have serious doubts FileChannel's going to outperform RAF.write(). Why should 
it?
And for the purporses of benchmark it can be anywhere.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch, LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984263#action_12984263
 ] 

Shay Banon commented on LUCENE-2871:


Agreed Earwin, lets first see if it make sense, this is just an experiment and 
might not make sense for single threaded writes.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch, LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2325) faceting throws NPE if all q's and fq's excluded

faceting throws NPE if all q's and fq's excluded


 Key: SOLR-2325
 URL: https://issues.apache.org/jira/browse/SOLR-2325
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley


Example of a faceting request that produces a NPE
http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Assigned: (SOLR-2325) faceting throws NPE if all q's and fq's excluded


 [ 
https://issues.apache.org/jira/browse/SOLR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-2325:
--

Assignee: Yonik Seeley

 faceting throws NPE if all q's and fq's excluded
 

 Key: SOLR-2325
 URL: https://issues.apache.org/jira/browse/SOLR-2325
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Yonik Seeley

 Example of a faceting request that produces a NPE
 http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2474:
---

Fix Version/s: 4.0
   3.1
 Assignee: Michael McCandless

 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2574.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

[
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984280#action_12984280
]

Michael McCandless commented on LUCENE-2474:

bq. Actually, I am against the last patch you posted, as it clearly has nothing
to do with this issue

Woops! Heh.

bq. A MultiReader is just a wrapper - you don't reopen it, so it could just
start off with an empty listener list, the subs could all retain their listener
lists and an addListener() could just delegate to the contained readers.

Well, it does have a reopen (reopens the subs wraps in a new MR), but I guess
delegation would work for MR. And, same for ParallelReader.

And I think the NRT case should work fine, since we don't expose IW.getReader
anymore (hmm -- this was never backported to 3.x?) -- if you new
IndexReader(IW), it creates a single collection holding all listeners, and then
shares it w/ all SRs.

Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean
custom caches that use the IndexReader (getFieldCacheKey)

Key: LUCENE-2474
URL: https://issues.apache.org/jira/browse/LUCENE-2474
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Shay Banon
Fix For: 3.1, 4.0

Attachments: LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2574.patch

Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean
custom caches that use the IndexReader (getFieldCacheKey).
A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its
make a lot of sense to cache things based on IndexReader#getFieldCacheKey,
even Lucene itself uses it, for example, with the CachingWrapperFilter.
FieldCache enjoys being called explicitly to purge its cache when possible
(which is tricky to know from the outside, especially when using NRT -
reader attack of the clones).
The provided patch allows to plug a CacheEvictionListener which will be
called when the cache should be purged for an IndexReader.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics


 [ 
https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2691:
---

Fix Version/s: 3.1

We should backport this to 3.1?

 Consolidate Near Real Time and Reopen API semantics
 ---

 Key: LUCENE-2691
 URL: https://issues.apache.org/jira/browse/LUCENE-2691
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2691.patch, LUCENE-2691.patch


 We should consolidate the IndexWriter.getReader and the IndexReader.reopen 
 semantics, since most people are already using the IR.reopen() method, we 
 should simply add::
 {code}
 IR.reopen(IndexWriter)
 {code}
 Initially, it could just call the IW.getReader(), but it probably should 
 switch to just using package private methods for sharing the internals

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984285#action_12984285
]

Michael McCandless commented on LUCENE-2324:

OK I think Michael's example can be solved, with a small change to the delete
buffering.

When a delete arrives, we should buffer in each DWPT, but also buffer into the
global deletes pool (held in DocumentsWriter).

Whenever any DWPT is flushed, that global pool is pushed.

Then, the buffered deletes against each DWPT are carried (as usual) along w/
the segment that's flushed from that DWPT, but those buffered deletes *only*
apply to the docs in that one segment.

The pushed deletes from the global pool apply to all prior segments (ie, they
coalesce).

This way, the deletes that will be applied to the already flushed segments are
aggressively pushed.

Separately, I think we should relax the error semantics for updateDocument: if
an aborting exception occurs (eg disk full while flushing a segment), then it's
possible that the delete from an updateDocument will have applied but the
add did not. Outside of error cases, of course, updateDocument will continue
to be atomic (ie a commit() can never split the delete add). Then the
updateDocument case is handled as just an [atomic wrt flush] add delete.

Per thread DocumentsWriters that write their own private segments
-

Key: LUCENE-2324
URL: https://issues.apache.org/jira/browse/LUCENE-2324
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch,
lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out

See LUCENE-2293 for motivation and more details.
I'm copying here Mike's summary he posted on 2293:
Change the approach for how we buffer in RAM to a more isolated
approach, whereby IW has N fully independent RAM segments
in-process and when a doc needs to be indexed it's added to one of
them. Each segment would also write its own doc stores and
normal segment merging (not the inefficient merge we now do on
flush) would merge them. This should be a good simplification in
the chain (eg maybe we can remove the *PerThread classes). The
segments can flush independently, letting us make much better
concurrent use of IO CPU.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984291#action_12984291
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

bq. When a delete arrives, we should buffer in each DWPT, but also buffer into 
the global deletes pool (held in DocumentsWriter).

This'll work, however it seems like it's going to be a temporary solution if we 
implement sequence-ids properly and/or implement non-sequential merges.  In 
fact, with shared doc-store gone, what's holding up non-sequential merging?

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, 
 lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2872) Terms dict should block-encode terms


 [ 
https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2872:
---

Attachment: LUCENE-2872.patch

New patch, specializes read* in ByteArrayDataInput (poached from LUCENE-2824).

 Terms dict should block-encode terms
 

 Key: LUCENE-2872
 URL: https://issues.apache.org/jira/browse/LUCENE-2872
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2872.patch, LUCENE-2872.patch, LUCENE-2872.patch


 With PrefixCodedTermsReader/Writer we now encode each term standalone,
 ie its bytes, metadata, details for postings (frq/prox file pointers),
 etc.
 But, this is costly when something wants to visit many terms but pull
 metadata for only few (eg respelling, certain MTQs).  This is
 particularly costly for sep codec because it has more metadata to
 store, per term.
 So instead I think we should block-encode all terms between indexed
 term, so that the metadata is stored column stride instead.  This
 makes it faster to enum just terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2325) faceting throws NPE if all q's and fq's excluded


 [ 
https://issues.apache.org/jira/browse/SOLR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2325:
---

Attachment: SOLR-2325.patch

Here's a patch that fixes things up.
Reviewing the documentation, it looks like this wasn't actually a bug - in the 
past, only filters could be excluded.  Still, it makes sense to be able to 
exclude the main query too, and this is what this patch implements.

 faceting throws NPE if all q's and fq's excluded
 

 Key: SOLR-2325
 URL: https://issues.apache.org/jira/browse/SOLR-2325
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Attachments: SOLR-2325.patch


 Example of a faceting request that produces a NPE
 http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics


 [ 
https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2691:
--

Assignee: Michael McCandless  (was: Grant Ingersoll)

 Consolidate Near Real Time and Reopen API semantics
 ---

 Key: LUCENE-2691
 URL: https://issues.apache.org/jira/browse/LUCENE-2691
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2691.patch, LUCENE-2691.patch


 We should consolidate the IndexWriter.getReader and the IndexReader.reopen 
 semantics, since most people are already using the IR.reopen() method, we 
 should simply add::
 {code}
 IR.reopen(IndexWriter)
 {code}
 Initially, it could just call the IW.getReader(), but it probably should 
 switch to just using package private methods for sharing the internals

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics


[ 
https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984302#action_12984302
 ] 

Michael McCandless commented on LUCENE-2691:


I'll backport...

 Consolidate Near Real Time and Reopen API semantics
 ---

 Key: LUCENE-2691
 URL: https://issues.apache.org/jira/browse/LUCENE-2691
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2691.patch, LUCENE-2691.patch


 We should consolidate the IndexWriter.getReader and the IndexReader.reopen 
 semantics, since most people are already using the IR.reopen() method, we 
 should simply add::
 {code}
 IR.reopen(IndexWriter)
 {code}
 Initially, it could just call the IW.getReader(), but it probably should 
 switch to just using package private methods for sharing the internals

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2872) Terms dict should block-encode terms


[ 
https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984306#action_12984306
 ] 

Robert Muir commented on LUCENE-2872:
-

+1 to commit, the last specialization made all the difference on my benchmarks.

I think this will pave the way for us to fix Sep codec in the branch...


 Terms dict should block-encode terms
 

 Key: LUCENE-2872
 URL: https://issues.apache.org/jira/browse/LUCENE-2872
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2872.patch, LUCENE-2872.patch, LUCENE-2872.patch


 With PrefixCodedTermsReader/Writer we now encode each term standalone,
 ie its bytes, metadata, details for postings (frq/prox file pointers),
 etc.
 But, this is costly when something wants to visit many terms but pull
 metadata for only few (eg respelling, certain MTQs).  This is
 particularly costly for sep codec because it has more metadata to
 store, per term.
 So instead I think we should block-encode all terms between indexed
 term, so that the metadata is stored column stride instead.  This
 makes it faster to enum just terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2325) Allow tagging and exclusion of main query for faceting


 [ 
https://issues.apache.org/jira/browse/SOLR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2325:
---

  Description: 
Example of a faceting request that produces a NPE because tagging/excluding 
main query is not supported.
http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity

  was:
Example of a faceting request that produces a NPE
http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity

 Priority: Minor  (was: Major)
Fix Version/s: 4.0
   3.1
   Issue Type: Improvement  (was: Bug)
  Summary: Allow tagging and exclusion of main query for faceting  
(was: faceting throws NPE if all q's and fq's excluded)

 Allow tagging and exclusion of main query for faceting
 --

 Key: SOLR-2325
 URL: https://issues.apache.org/jira/browse/SOLR-2325
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Assignee: Yonik Seeley
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2325.patch


 Example of a faceting request that produces a NPE because tagging/excluding 
 main query is not supported.
 http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2824) optimizations for bufferedindexinput


[ 
https://issues.apache.org/jira/browse/LUCENE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984326#action_12984326
 ] 

Michael McCandless commented on LUCENE-2824:


I'm seeing excellent gains w/ this patch, on Linux 64bit Java 6 NIOFSDir:

||Query||QPS clean||QPS robspec||Pct diff
|spanFirst(unit, 5)|16.67|15.62|{color:red}-6.3%{color}|
|unit state|8.04|7.87|{color:red}-2.2%{color}|
|spanNear([unit, state], 10, true)|4.31|4.25|{color:red}-1.2%{color}|
|unit state~3|4.85|5.02|{color:green}3.6%{color}|
|unit state|10.35|10.94|{color:green}5.7%{color}|
|unit~1.0|9.60|10.15|{color:green}5.7%{color}|
|unit~2.0|9.35|9.94|{color:green}6.3%{color}|
|united~2.0|3.30|3.51|{color:green}6.4%{color}|
|+nebraska +state|161.71|174.23|{color:green}7.7%{color}|
|+unit +state|11.20|12.09|{color:green}8.0%{color}|
|doctitle:.*[Uu]nited.*|3.93|4.25|{color:green}8.0%{color}|
|united~1.0|15.12|16.39|{color:green}8.4%{color}|
|un*d|49.33|56.09|{color:green}13.7%{color}|
|u*d|14.85|16.97|{color:green}14.3%{color}|
|state|25.95|30.12|{color:green}16.1%{color}|
|unit*|22.72|26.88|{color:green}18.3%{color}|
|uni*|12.64|15.20|{color:green}20.2%{color}|
|doctimesecnum:[1 TO 6]|8.42|10.73|{color:green}27.4%{color}|

+1 to commit.

 optimizations for bufferedindexinput
 

 Key: LUCENE-2824
 URL: https://issues.apache.org/jira/browse/LUCENE-2824
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-2824.patch


 along the same lines as LUCENE-2816:
 * the readVInt/readVLong/readShort/readInt/readLong are not optimal here 
 since they defer to readByte. for example this means checking the buffer's 
 bounds per-byte in readVint instead of per-vint.
 * its an easy win to speed this up, even for the vint case: its essentially 
 always faster, the only slower case is 1024 single-byte vints in a row, in 
 this case we would do a single extra bounds check (1025 instead of 1024)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2824) optimizations for bufferedindexinput


 [ 
https://issues.apache.org/jira/browse/LUCENE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2824:


Fix Version/s: 4.0
   3.1
 Assignee: Robert Muir

 optimizations for bufferedindexinput
 

 Key: LUCENE-2824
 URL: https://issues.apache.org/jira/browse/LUCENE-2824
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2824.patch


 along the same lines as LUCENE-2816:
 * the readVInt/readVLong/readShort/readInt/readLong are not optimal here 
 since they defer to readByte. for example this means checking the buffer's 
 bounds per-byte in readVint instead of per-vint.
 * its an easy win to speed this up, even for the vint case: its essentially 
 always faster, the only slower case is 1024 single-byte vints in a row, in 
 this case we would do a single extra bounds check (1025 instead of 1024)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2872) Terms dict should block-encode terms


 [ 
https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2872.


Resolution: Fixed

 Terms dict should block-encode terms
 

 Key: LUCENE-2872
 URL: https://issues.apache.org/jira/browse/LUCENE-2872
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2872.patch, LUCENE-2872.patch, LUCENE-2872.patch


 With PrefixCodedTermsReader/Writer we now encode each term standalone,
 ie its bytes, metadata, details for postings (frq/prox file pointers),
 etc.
 But, this is costly when something wants to visit many terms but pull
 metadata for only few (eg respelling, certain MTQs).  This is
 particularly costly for sep codec because it has more metadata to
 store, per term.
 So instead I think we should block-encode all terms between indexed
 term, so that the metadata is stored column stride instead.  This
 makes it faster to enum just terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

WARNING: re-index all trunk indices

2011-01-20 Thread Michael McCandless

If you are using Lucene's trunk (to be 4.0) builds, read on...

I just committed LUCENE-2872, which is a hard break on the index file format.

If you are living on Lucene's trunk then you have to remove any
previously created indices and re-index, after updating.

The change cuts over to a faster on-disk terms dictionary format,
which block-encodes term data  metadata between indexed
terms.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs


[ 
https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984331#action_12984331
 ] 

Jason Rutherglen commented on LUCENE-2558:
--

If we implement deletes via sequence id across all segments, then the .del file 
should probably remain the same (a set of bits)?  Also, when we load up the BV 
on IW start, then I guess we'll need to init the array appropriately.

 Use sequence ids for deleted docs
 -

 Key: LUCENE-2558
 URL: https://issues.apache.org/jira/browse/LUCENE-2558
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Realtime Branch


 Utilizing the sequence ids created via the update document
 methods, we will enable IndexReader deleted docs over a sequence
 id array. 
 One of the decisions is what primitive type to use. We can start
 off with an int[], then possibly move to a short[] (for lower
 memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics


 [ 
https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2691.


Resolution: Fixed

 Consolidate Near Real Time and Reopen API semantics
 ---

 Key: LUCENE-2691
 URL: https://issues.apache.org/jira/browse/LUCENE-2691
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2691.patch, LUCENE-2691.patch


 We should consolidate the IndexWriter.getReader and the IndexReader.reopen 
 semantics, since most people are already using the IR.reopen() method, we 
 should simply add::
 {code}
 IR.reopen(IndexWriter)
 {code}
 Initially, it could just call the IW.getReader(), but it probably should 
 switch to just using package private methods for sharing the internals

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly


[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984333#action_12984333
 ] 

Robert Muir commented on LUCENE-2657:
-

+1, patch looks good.


 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657-branch_3x.patch, 
 LUCENE-2657-branch_3x.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984343#action_12984343
 ] 

Michael McCandless commented on LUCENE-2324:


bq. In fact, with shared doc-store gone, what's holding up non-sequential 
merging?

Nothing really!  We could/should go do it right now... I think it should be 
trivial.  Then, we should fixup our default MP to behave more like BSMP!!  
Immense segments are merged only pair wise, and no inadvertent optimizing...

I think the buffered deletes will work fine for non-sequential merging -- we'd 
do the same coalescing we do now, only applying deletes on-demand to the 
to-be-merged segs, etc.

We just have to make sure the merged segment is appended to the end of the 
index (well, what was the end as of when the merge kicked off); this way I 
think we can continue w/ the invariant that buffered deletes apply to all 
segments to their left?

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, 
 lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2877) BUG in the org.apache.lucene.analysis.br.BrazilianAnalyzer

2011-01-20 Thread Renan Pedro Terra de Oliveira (JIRA)

BUG in the org.apache.lucene.analysis.br.BrazilianAnalyzer
--

 Key: LUCENE-2877
 URL: https://issues.apache.org/jira/browse/LUCENE-2877
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.0.2
 Environment: Windows 7 64bits, Eclipse Helios
Reporter: Renan Pedro Terra de Oliveira
Priority: Critical
 Fix For: 3.0.4


One weird bug with this field is that instead of false, you have to search 
for falsee to get the correct results.

The same behavior happen with other fields that stored in the index and not 
analyzed.

Example of create fields to indexing:
Field field = new Field(situacaoDocumento, ATIVO, Field.Store.YES, 
Field.Index.NOT_ANALYZED);
or
Field field = new Field(copia, false, Field.Store.YES, 
Field.Index.NOT_ANALYZED);

Example search i need to do, but nothing get correct result:
IndexSearcher searcher = ...;
TopScoreDocCollector collector = ;
Query query = new TermQuery(new Term(copia, false));
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
if (hits.length  0) {
return searcher.doc(0);
}
return null;
Example search i do to work:
IndexSearcher searcher = ...;
TopScoreDocCollector collector = ;
Query query = new TermQuery(new Term(copia, 
falsee));
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
if (hits.length  0) {
return searcher.doc(0);
}
return null;

I tested on the Luke (Lucene Index Toolbox) and he prove the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs


[ 
https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984350#action_12984350
 ] 

Michael McCandless commented on LUCENE-2558:


We could also [someday] move deletes to a stacked model... where we only write 
deltas (newly deleted docs in the current session) against the segment, and 
on open we coalesce these.  Merging would also periodically coalesce and write 
a new full vector...

 Use sequence ids for deleted docs
 -

 Key: LUCENE-2558
 URL: https://issues.apache.org/jira/browse/LUCENE-2558
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Realtime Branch


 Utilizing the sequence ids created via the update document
 methods, we will enable IndexReader deleted docs over a sequence
 id array. 
 One of the decisions is what primitive type to use. We can start
 off with an int[], then possibly move to a short[] (for lower
 memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2877) BUG in the org.apache.lucene.analysis.br.BrazilianAnalyzer


[ 
https://issues.apache.org/jira/browse/LUCENE-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984353#action_12984353
 ] 

Robert Muir commented on LUCENE-2877:
-

Hello, I think the issue is that you are using Field.Index.NOT_ANALYZED.

This means the BrazilianAnalyzer is not actually analyzing your text at 
index-time, causing the confusion.


 BUG in the org.apache.lucene.analysis.br.BrazilianAnalyzer
 --

 Key: LUCENE-2877
 URL: https://issues.apache.org/jira/browse/LUCENE-2877
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.0.2
 Environment: Windows 7 64bits, Eclipse Helios
Reporter: Renan Pedro Terra de Oliveira
Priority: Critical
 Fix For: 3.0.4


 One weird bug with this field is that instead of false, you have to search 
 for falsee to get the correct results.
 The same behavior happen with other fields that stored in the index and not 
 analyzed.
 Example of create fields to indexing:
 Field field = new Field(situacaoDocumento, ATIVO, Field.Store.YES, 
 Field.Index.NOT_ANALYZED);
 or
 Field field = new Field(copia, false, Field.Store.YES, 
 Field.Index.NOT_ANALYZED);
 Example search i need to do, but nothing get correct result:
   IndexSearcher searcher = ...;
   TopScoreDocCollector collector = ;
   Query query = new TermQuery(new Term(copia, false));
   searcher.search(query, collector);
   ScoreDoc[] hits = collector.topDocs().scoreDocs;
   if (hits.length  0) {
   return searcher.doc(0);
   }
   return null;
 Example search i do to work:
   IndexSearcher searcher = ...;
   TopScoreDocCollector collector = ;
   Query query = new TermQuery(new Term(copia, 
 falsee));
   searcher.search(query, collector);
   ScoreDoc[] hits = collector.topDocs().scoreDocs;
   if (hits.length  0) {
   return searcher.doc(0);
   }
   return null;
 I tested on the Luke (Lucene Index Toolbox) and he prove the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2325) Allow tagging and exclusion of main query for faceting


 [ 
https://issues.apache.org/jira/browse/SOLR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2325.


Resolution: Fixed

 Allow tagging and exclusion of main query for faceting
 --

 Key: SOLR-2325
 URL: https://issues.apache.org/jira/browse/SOLR-2325
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Assignee: Yonik Seeley
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2325.patch


 Example of a faceting request that produces a NPE because tagging/excluding 
 main query is not supported.
 http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Odd Boolean scoring behavior?

2011-01-20 Thread Yonik Seeley

On Thu, Jan 20, 2011 at 2:17 PM,  karl.wri...@nokia.com wrote:
 The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any
 effect.  I can change it all over the place, and nothing much changes.

Then perhaps your language term doesn't actually match anything in the
index?  (i.e. how is it analyzed?)
Next step would be to get score explanations (just add debugQuery=true
if you're using Solr, or see IndexSearcher.explain() if not).

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright

I tried commenting out the final OR term, and that excluded all records that 
were out-of-language as expected.  It's just the boost that doesn't seem to 
work.

Exploring the explain is challenging because of its size, but there are NO 
boosts recorded of the size I am using (10.0).  Here's the basic structure of 
the first result.

0.0 = (MATCH) sum of:
  0.0 = (MATCH) sum of:
0.0 = (MATCH) weight(language:eng in 52867945), product of:
  0.0 = queryWeight(language:eng), product of:
1.0 = idf(docFreq=23889670, maxDocs=59327671)
0.0 = queryNorm
  1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
1.0 = tf(termFreq(language:eng)=0)
1.0 = idf(docFreq=23889670, maxDocs=59327671)
1.0 = fieldNorm(field=language, doc=52867945)
0.0 = (MATCH) product of:
  0.0 = (MATCH) sum of:
0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
  1.0 = boost
  0.0 = queryNorm
0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
  1.0 = boost
  0.0 = queryNorm

...

  0.0069078947 = coord(21/3040)
  0.0 = (MATCH) product of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
1.0 = boost
0.0 = queryNorm
  0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
1.0 = boost
0.0 = queryNorm

...

0.0069078947 = coord(21/3040)

It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, 
does not actually apply boost?

Karl



-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley
Sent: Thursday, January 20, 2011 2:36 PM
To: dev@lucene.apache.org
Subject: Re: Odd Boolean scoring behavior?

On Thu, Jan 20, 2011 at 2:17 PM,  karl.wri...@nokia.com wrote:
 The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any
 effect.  I can change it all over the place, and nothing much changes.

Then perhaps your language term doesn't actually match anything in the
index?  (i.e. how is it analyzed?)
Next step would be to get score explanations (just add debugQuery=true
if you're using Solr, or see IndexSearcher.explain() if not).

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984373#action_12984373
 ] 

Michael McCandless commented on LUCENE-2871:


OK -- I was able to index 10M docs w/ the new patch.  And search results are 
identical.

But the indexing time on trunk vs the patch were nearly identical -- 536.80 sec 
(trunk) and 536.06 (w/ patch).  But, this is on a fast machine, lots of RAM (so 
writes go straight to buffer cache) and an SSD, using 6 indexing threads.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch, LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2856) Create IndexWriter event listener, specifically for merges


 [ 
https://issues.apache.org/jira/browse/LUCENE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2856:
-

Attachment: LUCENE-2856.patch

Here's another iteration.  Changed the name to IndexEventListener.  Added 
experimental to the Javadocs, and I probably need to add more.  There are some 
nocommits still, eg, for the reason a flush kicked off.  Reader events should 
be in a different issue as reader pool is moving out of IW soon?  All tests 
pass. 

 Create IndexWriter event listener, specifically for merges
 --

 Key: LUCENE-2856
 URL: https://issues.apache.org/jira/browse/LUCENE-2856
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Jason Rutherglen
 Attachments: LUCENE-2856.patch, LUCENE-2856.patch, LUCENE-2856.patch, 
 LUCENE-2856.patch


 The issue will allow users to monitor merges occurring within IndexWriter 
 using a callback notifier event listener.  This can be used by external 
 applications such as Solr to monitor large segment merges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984379#action_12984379
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

{quote}We could/should go do it right now{quote}

Nice!

{quote}I think the buffered deletes will work fine for non-sequential merging - 
we'd do the same coalescing we do now, only applying deletes on-demand to the 
to-be-merged segs, etc.{quote}

I think this is going to make IW deletes even more hairy and hard to 
understand!  Though if we keep the option of using a BV for deletes then 
there's probably no choice.  If we implemented sequence-id deletes using a 
short[], then we're only increasing the RAM usage by 16 times, though we then 
do not need to clone which can generate excessive garbage (in a high flush 
[N]RT enviro).  

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, 
 lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2011-01-20 Thread Ahsan Iqbal (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984381#action_12984381
 ] 

Ahsan Iqbal commented on SOLR-1604:
---

Hi Ahmet

I have read on mailing lists dated Mon, 20 Jul 2009 that you had merged the 
surround query parser with solr.
I have tried that by downloading jar file for surround query parser. and then 
pasting that jar file in web-inf/lib 
and configured query parser in solrconfig.xml as 
queryParser name=SurroundQParser 
class=org.apache.lucene.queryParser.surround.parser.QueryParser/

then web page the following exception comes 

org.apache.solr.common.SolrException: Error Instantiating QParserPlugin, 
org.apache.lucene.queryParser.surround.parser.QueryParser is not a 
org.apache.solr.search.QParserPlugin

can u guide what i m doing wrong

Regards
Ahsan

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: Next

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2720) IndexWriter should throw IndexFormatTooOldExc on open, not later during optimize/getReader/close

2011-01-20 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2720:
---

Attachment: LUCENE-2720-trunk.patch

Patch against trunk. I need to fix 3x to write the version and produce an index
for TestBackCompat before committing this patch (even though the tests pass).

IndexWriter should throw IndexFormatTooOldExc on open, not later during
optimize/getReader/close

Key: LUCENE-2720
URL: https://issues.apache.org/jira/browse/LUCENE-2720
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Michael McCandless
Fix For: 3.1, 4.0

Attachments: LUCENE-2720-trunk.patch

Spinoff of LUCENE-2618 and also related to the original issue LUCENE-2523...
If you open IW on a too-old index, you don't find out until much later that
the index is too old.
This is because IW does not go and open segment readers on all segments. It
only does so when it's time to apply deletes, do merges, open an NRT reader,
etc.
This is a serious bug because you can in fact succeed in committing with the
new major version of Lucene against your too-old index, which is catastrophic
because suddenly the old Lucene version will no longer open the index, and so
your index becomes unusable.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Odd Boolean scoring behavior?

2011-01-20 Thread Yonik Seeley

On Thu, Jan 20, 2011 at 3:06 PM,  karl.wri...@nokia.com wrote:
 I tried commenting out the final OR term, and that excluded all records that 
 were out-of-language as expected.  It's just the boost that doesn't seem to 
 work.

I see a lot of unexpected zeros - queryNorm has factors if idf and the
boost in it - the fact that it's 0 suggests that you used a 0 boost.

Why don't you do a toString() on your query and see if it's what you expect.

-Yonik
http://www.lucidimagination.com



 Exploring the explain is challenging because of its size, but there are NO 
 boosts recorded of the size I am using (10.0).  Here's the basic structure of 
 the first result.

 0.0 = (MATCH) sum of:
  0.0 = (MATCH) sum of:
    0.0 = (MATCH) weight(language:eng in 52867945), product of:
      0.0 = queryWeight(language:eng), product of:
        1.0 = idf(docFreq=23889670, maxDocs=59327671)
        0.0 = queryNorm
      1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
        1.0 = tf(termFreq(language:eng)=0)
        1.0 = idf(docFreq=23889670, maxDocs=59327671)
        1.0 = fieldNorm(field=language, doc=52867945)
    0.0 = (MATCH) product of:
      0.0 = (MATCH) sum of:
        0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
          1.0 = boost
          0.0 = queryNorm
        0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
          1.0 = boost
          0.0 = queryNorm

 ...

      0.0069078947 = coord(21/3040)
  0.0 = (MATCH) product of:
    0.0 = (MATCH) sum of:
      0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
        1.0 = boost
        0.0 = queryNorm
      0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
        1.0 = boost
        0.0 = queryNorm

 ...

    0.0069078947 = coord(21/3040)

 It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, 
 does not actually apply boost?

 Karl



 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik 
 Seeley
 Sent: Thursday, January 20, 2011 2:36 PM
 To: dev@lucene.apache.org
 Subject: Re: Odd Boolean scoring behavior?

 On Thu, Jan 20, 2011 at 2:17 PM,  karl.wri...@nokia.com wrote:
 The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any
 effect.  I can change it all over the place, and nothing much changes.

 Then perhaps your language term doesn't actually match anything in the
 index?  (i.e. how is it analyzed?)
 Next step would be to get score explanations (just add debugQuery=true
 if you're using Solr, or see IndexSearcher.explain() if not).

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()

[
https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984387#action_12984387
]

Robert Muir commented on LUCENE-2876:
-

Committed revision 1061499.

Will work on adding the @deprecated and fixing javadocs and null Weights in
branch_3x,
but we need to provide Similarity where we were providing it before for
backwards compatibility.

Remove Scorer.getSimilarity()
-

Key: LUCENE-2876
URL: https://issues.apache.org/jira/browse/LUCENE-2876
Project: Lucene - Java
Issue Type: Task
Components: Query/Scoring
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: LUCENE-2876.patch, LUCENE-2876.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs


[ 
https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984389#action_12984389
 ] 

Jason Rutherglen commented on LUCENE-2558:
--

In regards to the deltas, when they're in RAM (ie, for norm and DF updates), 
I'm guessing we'd need to place the updates into a hash map (that hopefully 
uses primitives instead of objects to save RAM)?  We could instantiate a new 
array when the map reached a certain size?

 Use sequence ids for deleted docs
 -

 Key: LUCENE-2558
 URL: https://issues.apache.org/jira/browse/LUCENE-2558
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Realtime Branch


 Utilizing the sequence ids created via the update document
 methods, we will enable IndexReader deleted docs over a sequence
 id array. 
 One of the decisions is what primitive type to use. We can start
 off with an int[], then possibly move to a short[] (for lower
 memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2482) Index sorter

2011-01-20 Thread Juan Grande (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Juan Grande updated LUCENE-2482:

Attachment: LUCENE-2482-4.0.patch

Hi! I'm attaching a patch with an implementation of this feature for Lucene
4.0. I'm not sure if the style is right because I can't download the
codestyle.xml file for Eclipse.

Index sorter

Key: LUCENE-2482
URL: https://issues.apache.org/jira/browse/LUCENE-2482
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/*
Affects Versions: 3.1, 4.0
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki
Fix For: 3.1, 4.0

Attachments: indexSorter.patch, LUCENE-2482-4.0.patch

A tool to sort index according to a float document weight. Documents with
high weight are given low document numbers, which means that they will be
first evaluated. When using a strategy of early termination of queries (see
TimeLimitedCollector) such sorting significantly improves the quality of
partial results.
(Originally this tool was created by Doug Cutting in Nutch, and used norms as
document weights - thus the ordering was limited by the limited resolution of
norms. This is a pure Lucene version of the tool, and it uses arbitrary
floats from a specified stored field).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2326) Replication command indexversion fails to return index version

2011-01-20 Thread Eric Pugh (JIRA)

Replication command indexversion fails to return index version
--

 Key: SOLR-2326
 URL: https://issues.apache.org/jira/browse/SOLR-2326
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
 Environment: Branch 3x latest
Reporter: Eric Pugh
 Fix For: 3.1


To test this, I took the /example/multicore/core0 solrconfig and added a simple 
replication handler:

  requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
str name=replicateAftercommit/str
str name=replicateAfterstartup/str
str name=confFilesschema.xml/str
  /lst
  /requestHandler

When I query the handler for details I get back the indexVersion that I expect: 
http://localhost:8983/solr/core0/replication?command=detailswt=jsonindent=true

But when I ask for just the indexVersion I get back a 0, which prevent the 
slaves from pulling updates: 
http://localhost:8983/solr/core0/replication?command=indexversionwt=jsonindent=true





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright

The original query is fine, and has the boost as expected:

((+language:eng +(
CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) 
CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 
+value_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_0:bunker~0.8332333 
+othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667) 
CutoffQueryWrapper((+othervalue_1:bunker~0.8332333 
+value_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_1:bunker~0.8332333 +othervalue_0:hill)^0.5714286)
...
CutoffQueryWrapper((+othervalue_7:bunker~0.8332333 
+value_7:hillmonument~0.8332333)^0.85714287) 
CutoffQueryWrapper((+value_7:bunker~0.8332333 
+othervalue_7:hillmonument~0.8332333)^0.85714287)))^3.0)
(
CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) 
CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 
+value_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_0:bunker~0.8332333 
+othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667)
...
))

The rewritten query is odd.  Here's a sample:


((+language:eng +(
CutoffQueryWrapper((+() +value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)

...

CutoffQueryWrapper((+() +(()^0.556))^0.85714287)))^3.0)
(
CutoffQueryWrapper((+() +value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.667) 
CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
...
CutoffQueryWrapper((+() +(()^0.556))^0.85714287) 
CutoffQueryWrapper(+() +(()^0.667)) 
CutoffQueryWrapper((+() +(()^0.667))^0.85714287) 
CutoffQueryWrapper((+() +(()^0.556))^0.85714287)
)

As you can see, there are a lot of repeats, a lot of blank matches, but the 
original boost *is* still there.  I really can't interpret this any further - 
the many blank and repeated matches seem wrong to me, but the scorer 
explanation seems even more wrong.  Any ideas?

Karl


-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley
Sent: Thursday, January 20, 2011 3:34 PM
To: dev@lucene.apache.org
Subject: Re: Odd Boolean scoring behavior?

On Thu, Jan 20, 2011 at 3:06 PM,  karl.wri...@nokia.com wrote:
 I tried commenting out the final OR term, and that excluded all records that 
 were out-of-language as expected.  It's just the boost that doesn't seem to 
 work.

I see a lot of unexpected zeros - queryNorm has factors if idf and the
boost in it - the fact that it's 0 suggests that you used a 0 boost.

Why don't you do a toString() on your query and see if it's what you expect.

-Yonik
http://www.lucidimagination.com



 Exploring the explain is challenging because of its size, but there are NO 
 boosts recorded of the size I am using (10.0).  Here's the basic structure of 
 the first result.

 0.0 = (MATCH) sum of:
  0.0 = (MATCH) sum of:
    0.0 = (MATCH) weight(language:eng in 52867945), product of:
      0.0 = queryWeight(language:eng), product of:
        1.0 = idf(docFreq=23889670, maxDocs=59327671)
        0.0 = queryNorm
      1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
        1.0 = tf(termFreq(language:eng)=0)
        1.0 = idf(docFreq=23889670, maxDocs=59327671)
        1.0 = fieldNorm(field=language, doc=52867945)
    0.0 = (MATCH) product of:
      0.0 = (MATCH) sum of:
        0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
          1.0 = boost
          0.0 = queryNorm
        0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
 value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4

Lucene-3.x - Build # 249 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-3.x/249/

All tests passed

Build Log (for compile errors):
[...truncated 21087 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright

Found the cause of the zero querynorms, and fixed it.  But the results are 
still not as I would expect.  The first result has language=ger but scores 
higher than the second result which has language=eng.  And yet, my query is 
boosting like this:

Boolean
 OR Boolean (boost = 100.0)
  AND (language:eng)
  AND (stuff)
 OR (stuff)

... where (stuff) is the same stuff in both cases.

Here's the scoring for two results, the first one out of language, and the 
second one in language:

0.018082526 = (MATCH) 
org.apache.lucene.search.BooleanQuery$BooleanWeight@6cdcb5eb sum of:
  0.018059647 = (MATCH) 
org.apache.lucene.search.BooleanQuery$BooleanWeight@e2b8f23 sum of:
0.015771711 = (MATCH) weight(language:eng in 52867945), product of:
  0.015771711 = queryWeight(language:eng), product of:
1.0 = idf(docFreq=23889670, maxDocs=59327671)
0.015771711 = queryNorm
  1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
1.0 = tf(termFreq(language:eng)=0)
1.0 = idf(docFreq=23889670, maxDocs=59327671)
1.0 = fieldNorm(field=language, doc=52867945)
0.0022879362 = (MATCH) 
org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 product of:
  0.331206 = (MATCH) 
org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 sum of:
0.015771711 = (MATCH) 
CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 
value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 
value_5:monuments^7.9949305E-4))^0.667), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) 
CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
othervalue_5:burker^5.997396E-4) +(value_5:monument 
value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 
value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-4))^0.5714286), 
product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
value_5:busker^5.997396E-4) +(othervalue_5:monument 
othervalue_5:monumento^7.9949305E-4 
othervalue_5:monuments^7.9949305E-4))^0.5714286), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+value_5:hill 
+(value_5:monument value_5:monumenta^7.9949305E-4 
value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 
value_5:monuments^7.9949305E-4))^0.667), product of:
  1.0 = boost
  0.015771711 = queryNorm
0.015771711 = (MATCH) CutoffQueryWrapper((+othervalue_5:hill 
+(value_5:monument value_5:monumenta^7.9949305E-4 
value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4

Lucene-trunk - Build # 1433 - Still Failing

Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1433/

All tests passed

Build Log (for compile errors):
[...truncated 16653 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly


 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved LUCENE-2657.
-

Resolution: Fixed

Committed to trunk rev. 1061613, branch_3x rev. 1061612

 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657-branch_3x.patch, 
 LUCENE-2657-branch_3x.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-1218) maven artifact for webapp


 [ 
https://issues.apache.org/jira/browse/SOLR-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved SOLR-1218.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.1
 Assignee: Steven Rowe

Maven artifact for Solr webapp is now generated (fixed in LUCENE-2657).

 maven artifact for webapp
 -

 Key: SOLR-1218
 URL: https://issues.apache.org/jira/browse/SOLR-1218
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
Reporter: Benson Margulies
Assignee: Steven Rowe
 Fix For: 3.1, 4.0


 It would be convenient to have a packagingwar/packaging maven project for 
 the webapp, to allow launching solr from maven via jetty.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2876) Remove Scorer.getSimilarity()

[
https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir resolved LUCENE-2876.
-

Resolution: Fixed

backported to 3.x: revision 1061615.

Remove Scorer.getSimilarity()
-

Key: LUCENE-2876
URL: https://issues.apache.org/jira/browse/LUCENE-2876
Project: Lucene - Java
Issue Type: Task
Components: Query/Scoring
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: LUCENE-2876.patch, LUCENE-2876.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2824) optimizations for bufferedindexinput


 [ 
https://issues.apache.org/jira/browse/LUCENE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2824.
-

Resolution: Fixed

Committed revision 1061619, 1061622

 optimizations for bufferedindexinput
 

 Key: LUCENE-2824
 URL: https://issues.apache.org/jira/browse/LUCENE-2824
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2824.patch


 along the same lines as LUCENE-2816:
 * the readVInt/readVLong/readShort/readInt/readLong are not optimal here 
 since they defer to readByte. for example this means checking the buffer's 
 bounds per-byte in readVint instead of per-vint.
 * its an easy win to speed this up, even for the vint case: its essentially 
 always faster, the only slower case is 1024 single-byte vints in a row, in 
 this case we would do a single extra bounds check (1025 instead of 1024)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2774) ant generate-maven-artifacts target broken for contrib


[ 
https://issues.apache.org/jira/browse/LUCENE-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984583#action_12984583
 ] 

Steven Rowe commented on LUCENE-2774:
-

I tested {{ant clean generate-maven-artifacts}} on branch_3x with 
{{maven-ant-tasks-2.1.1.jar}} and both Ant 1.7.1 and 1.8.1.  Everything works.

I'll test more combinations tomorrow.

 ant generate-maven-artifacts target broken for contrib
 --

 Key: LUCENE-2774
 URL: https://issues.apache.org/jira/browse/LUCENE-2774
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Drew Farris
Assignee: Steven Rowe
Priority: Minor
 Attachments: LUCENE-2774.patch


 When executing 'ant generate-maven-artifacts' from a pristine checkout of 
 branch_3x/lucene or trunk/lucene the following error is encountered:
 {code}
 dist-maven:
  [copy] Copying 1 file to 
 /home/drew/lucene/branch_3x/lucene/build/contrib/analyzers/common
 [artifact:install-provider] Installing provider: 
 org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2:runtime
 [artifact:pom] An error has occurred while processing the Maven artifact 
 tasks.
 [artifact:pom]  Diagnosis:
 [artifact:pom] 
 [artifact:pom] Unable to initialize POM pom.xml.template: Cannot find parent: 
 org.apache.lucene:lucene-contrib for project: 
 org.apache.lucene:lucene-analyzers:jar:3.1-SNAPSHOT for project 
 org.apache.lucene:lucene-analyzers:jar:3.1-SNAPSHOT
 [artifact:pom] Unable to download the artifact from any repository
 {code}
 The contrib portion of the ant build is executed in a subant task which does 
 not pick up the pom definitions for lucene-parent and lucene-contrib from the 
 main build.xml, so the lucene-parent and lucene-controb poms must be loaded 
 specifically as a part of the contrib build using the artifact:pom task.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Solr-3.x - Build # 233 - Still Failing

Build: https://hudson.apache.org/hudson/job/Solr-3.x/233/

No tests ran.

Build Log (for compile errors):
[...truncated 15582 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-3.x - Build # 3954 - Failure