date:20110505


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029150#comment-13029150
 ] 

Doron Cohen commented on LUCENE-3068:
-

This is more complex than I originally thought.

# QueryParser creates a MultiplePhraseQuery (MPQ) when one of the (phrase) 
query positions is a multi-term.
# MPQ has an implicit OR behavior - it is used for e.g. wildcarding a phrase 
query.
# PhraseQuery (PQ) sloppy scorer assumes each query position has a single term.
# PQ with several terms in same position cannot be created by parsing it with a 
QP, only manual.
  Manually created, it would have an AND semantics: only docs with ALL the 
terms in pos N should match.
  In other words, assume doc D terms and positions are: 
  a:0 b:1 c:1 d:2
  MPQ for (a,b):0 d:1 should match D, finding the phrase b:1 d:2 (OR semantics)
  PQ for (a,b):0 d:1 should not match D, because it does not contain 'a' and 
'b' in the same position (AND semantics).


Therefore, rewriting PQ into MPQ is not a valid fix, because it would turn the 
AND logic assumed by creating the PQ this way, by an OR logic as assumed in 
MPQ. 

{code:title=TestPositionIncrement.testSetPosition has a test for this case 
exactly}
// phrase query should fail for non existing searched term 
// even if there exist another searched terms in the same searched 
position. 
q = new PhraseQuery();
q.add(new Term(field, 3),0);
q.add(new Term(field, 9),0);
hits = searcher.search(q, null, 1000).scoreDocs;
assertEquals(0, hits.length);
{code}

Although QP by default will not create this PQ, I think we need to support it, 
for applications needing to be strict with the search results, with slop. 

So fixing this would need to take place inside SloppyScorer, digging further...

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3073) make compoundfilewriter public


[ 
https://issues.apache.org/jira/browse/LUCENE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029205#comment-13029205
 ] 

Simon Willnauer commented on LUCENE-3073:
-

+1 - patch looks good

 make compoundfilewriter public
 --

 Key: LUCENE-3073
 URL: https://issues.apache.org/jira/browse/LUCENE-3073
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-3073.patch


 CompoundFileReader is public, but CompoundFileWriter is not.
 I propose we make it public + @lucene.internal instead (just in case someone 
 else finds themselves wanting to manipulate cfs files)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: I was accepted in GSoC!!!

2011-05-05 Thread Earwin Burrfoot

By the way, guys. LuSolr SVN repository is mirrored @
git://git.apache.org/lucene-solr.git , which is in turn mirrored @
https://github.com/apache/lucene-solr .
Working with git (maybe with stgit) is easier than juggling patches by hand.

On Wed, May 4, 2011 at 15:00, David Nemeskey nemeskey.da...@sztaki.hu wrote:
 Hi Uwe,

 do you mean one issue per GSoC proposal, or one for every logical unit in
 the project?

 If the second: Robert told me to use the flexscoring branch as a base for my
 project, since preliminary work has already been done in that branch. Should I
 open JIRA issues nevertheless?

 Thanks,
 David

 On 2011 May 04, Wednesday 09:56:02 Uwe Schindler wrote:
 Hi Vinicius,

 Submitting patches via JIRA is fine! We were just thinking about possibly
 providing some SVN to work with (as additional training), but came to the
 conclusion, that all students should go the standard Apache Lucene way of
 submitting patches to JIRA issues. You can of course still use SVN / GIT
 locally to organize your code. At the end we just need a patch to be
 committed by one of the core committers.

Uwe

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position


 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Attached patch fixes this bug by excluding fro the repeats check those PPs 
originated fro same offset in the query. 

This allows more strict phrase queries: strict on terms in same position (AND 
logic) but still sloppy.

All tests pass, this is ready to go in (unless there are reservations).

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

Hey folks

On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Isn't our end goal here a bunch of well factored search modules?  Ie,
 fast forward a year or two and I think we should have modules like
 these:

I think we have two camps here (10k feet view):

1. wants to move towards modularization might support all the modules
mike has listed below
2. wants to stick with Solr's current architecture and remain
monolithic (not negative in this case) as much as possible

I think we can meet somewhere in between and agree on certain module
that should be available to lucene users as well. The ones I have in
mind are
primary search features like:
 - Faceting
- Highlighting
- Suggest
- Function Query (consolidation is needed here!)
- Analyzer factories

things like distribution and replication should remain in solr IMO but
might be moved to a more extensible API so that people can add their
own implementation. I am thinking about things like the ZooKeeper
support that might not be a good solution for everybody where folks
have already JGroups infrastructure. So I think we can work towards 2
distinct goals.
1. extract common search features into modules
2. refactor solr to be more elastic / distributed  and extensible
with respect to those goals.

maybe we can get agreement on such a basis though.

let me know what you think

simon

  * Faceting

  * Highlighting

  * Suggest (good patch is on LUCENE-2995)

  * Schema

  * Query impls

  * Query parsers

  * Analyzers (good progress here already, thanks Robert!),
    incl. factories/XML configuration (still need this)

  * Database import (DIH)

  * Web app

  * Distribution/replication

  * Doc set representations

  * Collapse/grouping

  * Caches

  * Similarity/scoring impls (BM25, etc.)

  * Codecs

  * Joins

  * Lucene core

 In this future, much of this code came from what is now Solr and
 Lucene, but we should freely and aggressively poach from other
 projects when appropriate (and license/provenance is OK).

 I keep seeing all these cool compressed int set projects popping
 up... surely these are useful for us.  Solr poached a doc set impl
 from Nutch; probably there's other stuff to poach from Nutch, Mahout,
 etc.

 Katta's doing something sweet with distribution/replication; let's
 poach  merge w/ Solr's approach.  There are various facet impls out
 there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach  merge
 with Solr's.

 Elastic Search has lots of cool stuff, too, under ASL2.

 All these external open-source projects are fair game for poaching and
 refactoring into shared modules, along with what is now Solr and
 Lucene sources.

 In this ideal future, Solr becomes the bundling and default/example
 configuration of the Web App and other modules, much like how the
 various Linux distros bundle different stuff together around the Linux
 kernel.  And if you are an advanced app and don't need the webapp
 part, you can cherry pick the huper duper modules you do need and
 directly embedded into your app.

 Isn't this the future we are working towards?

 Mike

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-05 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029229#comment-13029229
 ] 

Shai Erera commented on LUCENE-3068:


Patch looks good to me.

One comment about the test - perhaps use the LTC methods that do random tests, 
like newDirectory(), newIndexWriterConfig() etc.? If you don't think it's 
appropriate for this test, that's ok with me.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec


[ 
https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029233#comment-13029233
 ] 

Simon Willnauer commented on LUCENE-3070:
-

Robert patch looks great!

some comments:
 * the simpletext nocommit should be a TODO IMO
 * for the preflex problem I think we need to add some infrastructure to add 
tests for 4.0 features somehow I will think about this
 * one problem we are having here is that our current implementation is 
somewhat wasteful. Currently on flush we pull a FieldsConsumer for every codec 
used in the indexing session (per DWPT) regardless if this field is indexed. so 
we are creating some unneeded files if you use one field for docvalues only. 
The other thing is that we need to somehow reset the FieldInfo#hasDocValues 
flag on an error when we are hitting non-aborting exceptions during indexing 
before we can actually create the corresponding consumer. That is something we 
should address in a spin-off issue I think.

overall I think you should commit the current state and we work from here!


 Enable DocValues by default for every Codec
 ---

 Key: LUCENE-3070
 URL: https://issues.apache.org/jira/browse/LUCENE-3070
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Affects Versions: CSF branch
Reporter: Simon Willnauer
 Fix For: CSF branch

 Attachments: LUCENE-3070.patch


 Currently DocValues are enable with a wrapper Codec so each codec which needs 
 DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader 
 should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec


[ 
https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029237#comment-13029237
 ] 

Simon Willnauer commented on LUCENE-3070:
-

one more think I think preflex should throw UOE instead of returning null... At 
some point we should also think about a better name for Source, something like 
InMemoryDocValues or RamResidentDocValues - something more self speaking

 Enable DocValues by default for every Codec
 ---

 Key: LUCENE-3070
 URL: https://issues.apache.org/jira/browse/LUCENE-3070
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Affects Versions: CSF branch
Reporter: Simon Willnauer
 Fix For: CSF branch

 Attachments: LUCENE-3070.patch


 Currently DocValues are enable with a wrapper Codec so each codec which needs 
 DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader 
 should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1064 - Failure

Build: 
https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-docvalues-branch/1064/

No tests ran.

Build Log (for compile errors):
[...truncated 63 lines...]
+ cd 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout
+ JAVA_HOME=/home/hudson/tools/java/latest1.5 
/home/hudson/tools/ant/latest1.7/bin/ant clean
Buildfile: build.xml

clean:

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build

clean:

clean:
 [echo] Building analyzers-common...

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/common
 [echo] Building analyzers-icu...

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/icu
 [echo] Building analyzers-phonetic...

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/phonetic
 [echo] Building analyzers-smartcn...

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/smartcn
 [echo] Building analyzers-stempel...

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/stempel
 [echo] Building benchmark...

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/benchmark/build

clean-contrib:

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/analysis-extras/build
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/analysis-extras/lucene-libs

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/clustering/build

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/dataimporthandler/target

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/extraction/build

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/uima/build

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/build

BUILD SUCCESSFUL
Total time: 7 seconds
+ cd 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene
+ JAVA_HOME=/home/hudson/tools/java/latest1.5 
/home/hudson/tools/ant/latest1.7/bin/ant compile compile-test build-contrib
Buildfile: build.xml

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/classes/java
[javac] Compiling 536 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/classes/java
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/util/Version.java:80:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   public boolean onOrAfter(Version other) {
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/index/codecs/DefaultDocValuesConsumer.java:49:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:34:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   int getColumn();
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:41:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   int getLine();
[javac]   ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] 1 error

Re: [JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1064 - Failure

I removed the @Override annotation on that file!

simon

On Thu, May 5, 2011 at 11:03 AM, Apache Jenkins Server
hud...@hudson.apache.org wrote:
 Build: 
 https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-docvalues-branch/1064/

 No tests ran.

 Build Log (for compile errors):
 [...truncated 63 lines...]
 + cd 
 /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout
 + JAVA_HOME=/home/hudson/tools/java/latest1.5 
 /home/hudson/tools/ant/latest1.7/bin/ant clean
 Buildfile: build.xml

 clean:

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build

 clean:

 clean:
     [echo] Building analyzers-common...

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/common
     [echo] Building analyzers-icu...

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/icu
     [echo] Building analyzers-phonetic...

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/phonetic
     [echo] Building analyzers-smartcn...

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/smartcn
     [echo] Building analyzers-stempel...

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/stempel
     [echo] Building benchmark...

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/benchmark/build

 clean-contrib:

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/analysis-extras/build
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/analysis-extras/lucene-libs

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/clustering/build

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/dataimporthandler/target

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/extraction/build

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/uima/build

 clean:
   [delete] Deleting directory 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/build

 BUILD SUCCESSFUL
 Total time: 7 seconds
 + cd 
 /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene
 + JAVA_HOME=/home/hudson/tools/java/latest1.5 
 /home/hudson/tools/ant/latest1.7/bin/ant compile compile-test build-contrib
 Buildfile: build.xml

 jflex-uptodate-check:

 jflex-notice:

 javacc-uptodate-check:

 javacc-notice:

 init:

 clover.setup:

 clover.info:
     [echo]
     [echo]       Clover not found. Code coverage reports disabled.
     [echo]

 clover:

 common.compile-core:
    [mkdir] Created dir: 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/classes/java
    [javac] Compiling 536 source files to 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/classes/java
    [javac] 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/util/Version.java:80:
  warning: [dep-ann] deprecated name isnt annotated with @Deprecated
    [javac]   public boolean onOrAfter(Version other) {
    [javac]                  ^
    [javac] 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/index/codecs/DefaultDocValuesConsumer.java:49:
  method does not override a method from its superclass
    [javac]   @Override
    [javac]    ^
    [javac] 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:34:
  warning: [dep-ann] deprecated name isnt annotated with @Deprecated
    [javac]   int getColumn();
    [javac]       ^
    [javac] 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:41:
  warning: [dep-ann] deprecated name isnt annotated with @Deprecated

[jira] [Commented] (LUCENE-3073) make compoundfilewriter public


[ 
https://issues.apache.org/jira/browse/LUCENE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029251#comment-13029251
 ] 

Michael McCandless commented on LUCENE-3073:


+1

 make compoundfilewriter public
 --

 Key: LUCENE-3073
 URL: https://issues.apache.org/jira/browse/LUCENE-3073
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-3073.patch


 CompoundFileReader is public, but CompoundFileWriter is not.
 I propose we make it public + @lucene.internal instead (just in case someone 
 else finds themselves wanting to manipulate cfs files)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3074) SimpleTextCodec needs SimpleText DocValues impl

SimpleTextCodec needs SimpleText DocValues impl
---

 Key: LUCENE-3074
 URL: https://issues.apache.org/jira/browse/LUCENE-3074
 Project: Lucene - Java
  Issue Type: Task
  Components: Index, Search
Affects Versions: CSF branch
Reporter: Simon Willnauer
Assignee: Michael McCandless
 Fix For: CSF branch


currently SimpleTextCodec uses binary docValues we should move that to a simple 
text impl.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3075) DocValues should be optionally be stored in a PerCodec CFS file to prevent too many files in the index

DocValues should be optionally be stored in a PerCodec CFS file to prevent too 
many files in the index
--

 Key: LUCENE-3075
 URL: https://issues.apache.org/jira/browse/LUCENE-3075
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index, Search
Affects Versions: CSF branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: CSF branch


Currently docvalues create one file per field to store the docvalues. Yet this 
could easily lead to too many open files so me might need to enable CFS per 
codec to keep the number of files reasonable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3073) make compoundfilewriter public


 [ 
https://issues.apache.org/jira/browse/LUCENE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3073.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

Committed revision 1099745, 1099747 (3x)

 make compoundfilewriter public
 --

 Key: LUCENE-3073
 URL: https://issues.apache.org/jira/browse/LUCENE-3073
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3073.patch


 CompoundFileReader is public, but CompoundFileWriter is not.
 I propose we make it public + @lucene.internal instead (just in case someone 
 else finds themselves wanting to manipulate cfs files)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029274#comment-13029274
 ] 

Doron Cohen commented on LUCENE-3068:
-

Thanks for reviewing Shai!
I'll updated the patch with random newDirectory and newICFG - not the focus 
here, but may improve coverage anyhow,
I added tests for the combined case - some AND some OR - that is, using MPQ, 
some add() with a single term (AND), some with an array longer than 1 (OR). 
Also refactored the tests a bit so that now there's a small test method for 
each test case.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position


 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Patch with more test cases - AND/OR logic for MPQ is combined, and test code 
made simpler.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-05-05 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2458:
--

Attachment: SOLR-2458.patch

The solution was simple. Change the commit() method to do ?commit=true instead 
of posting as commit/

Also cleaned up dead meat, added a -Doptimize=yes option and accepts -h and 
--help in addition to -help

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar
 Attachments: SOLR-2458.patch


 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-05-05 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2458:
--

Attachment: SOLR-2458.patch

This new patch bumps version number to 1.4 and adds examples to the help of how 
to post csv, json and pdf

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar
 Attachments: SOLR-2458.patch, SOLR-2458.patch


 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.


[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029315#comment-13029315
 ] 

Michael McCandless commented on SOLR-2493:
--

bq. Long term, i would love to see the custom config system we have replaced 
with something standard... like spring, or simly POJOs that are loaded (and 
saved!) via XStream. This is the bigger pile of work I was referring to.

+1

I think XML is an poor configuration language.  It's great for one computer to 
talk to another, but for files that humans may edit, it's bad -- too much stuff 
to type for the computer's benefit, too easy to make a silly mistake.

I think something like Yaml is a better choice... this is what ElasticSearch 
uses, for example.

And, while we're at it, I think we should make Solr's error checking brittle on 
startup: if anything is off about the configuration, the server refuses to 
start (see http://markmail.org/thread/ywkfmxjboyixkrjc).

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-05-05 Thread Grant Ingersoll


On May 5, 2011, at 4:15 AM, Simon Willnauer wrote:

 Hey folks
 
 On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Isn't our end goal here a bunch of well factored search modules?  Ie,
 fast forward a year or two and I think we should have modules like
 these:
 
 I think we have two camps here (10k feet view):
 

I'd say 3 camps:

 1. wants to move towards modularization might support all the modules
 mike has listed below
 2. wants to stick with Solr's current architecture and remain
 monolithic (not negative in this case) as much as possible

3.  Those who think most should be modularized, but realize it's a ton of work 
for an unproven gain (although most admit it is a highly likely gain) and 
should be handled on a case-by-case basis as people do the work.   I don't have 
anything against modularization, I just know, given my schedule, I won't be 
able to block off weeks of time to do it.  I'm happy to review where/when I can.


 
 I think we can meet somewhere in between and agree on certain module
 that should be available to lucene users as well. The ones I have in
 mind are
 primary search features like:
 - Faceting

Yeah, for instance, Bobo seems to have some interesting faceting 
implementations that are ASL, perhaps we can combine into this new faceting 
module.

 - Highlighting
 - Suggest
 - Function Query (consolidation is needed here!)
 - Analyzer factories

+1.

 
 things like distribution and replication should remain in solr IMO but
 might be moved to a more extensible API so that people can add their
 own implementation.

And, of course, all the web tier stuff (response writers, inputs, etc.)

 I am thinking about things like the ZooKeeper
 support that might not be a good solution for everybody where folks
 have already JGroups infrastructure.

Or other similar solutions.  I wonder about using a ZeroConf implementation 
that can do self-discovery.

 So I think we can work towards 2
 distinct goals.
 1. extract common search features into modules
 2. refactor solr to be more elastic / distributed  and extensible
 with respect to those goals.

3. Make it easier for Solr to be programmatically configured by decoupling the 
reading of schema.xml and solrconfig.xml from the code that actually contains 
the structures for the properties (IndexSchema and SolrConfig)

 
 maybe we can get agreement on such a basis though.
 
 let me know what you think

I think it's reasonable.  At the end of the day, it broadens the appeal of both 
Lucene and Solr.  Solr still exists and is not just a shell and at the end of 
the day, remains the primary choice for people who don't want to stitch 
everything together themselves.  All of it is easier to contribute to b/c 
people can focus in on the core area they know w/o having to know everything 
else per se.  Stuff should be better tested b/c of it as well since it will 
receive broader use.

That being said, and not to be discouraging, but I see it as a ton of work.




 
 simon
 
  * Faceting
 
  * Highlighting
 
  * Suggest (good patch is on LUCENE-2995)
 
  * Schema
 
  * Query impls
 
  * Query parsers
 
  * Analyzers (good progress here already, thanks Robert!),
incl. factories/XML configuration (still need this)
 
  * Database import (DIH)
 
  * Web app
 
  * Distribution/replication
 
  * Doc set representations
 
  * Collapse/grouping
 
  * Caches
 
  * Similarity/scoring impls (BM25, etc.)
 
  * Codecs
 
  * Joins
 
  * Lucene core
 
 In this future, much of this code came from what is now Solr and
 Lucene, but we should freely and aggressively poach from other
 projects when appropriate (and license/provenance is OK).
 
 I keep seeing all these cool compressed int set projects popping
 up... surely these are useful for us.  Solr poached a doc set impl
 from Nutch; probably there's other stuff to poach from Nutch, Mahout,
 etc.
 
 Katta's doing something sweet with distribution/replication; let's
 poach  merge w/ Solr's approach.  There are various facet impls out
 there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach  merge
 with Solr's.
 
 Elastic Search has lots of cool stuff, too, under ASL2.
 
 All these external open-source projects are fair game for poaching and
 refactoring into shared modules, along with what is now Solr and
 Lucene sources.
 
 In this ideal future, Solr becomes the bundling and default/example
 configuration of the Web App and other modules, much like how the
 various Linux distros bundle different stuff together around the Linux
 kernel.  And if you are an advanced app and don't need the webapp
 part, you can cherry pick the huper duper modules you do need and
 directly embedded into your app.
 
 Isn't this the future we are working towards?
 
 Mike
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Improvements to the maven build

2011-05-05 Thread Steven A Rowe

Hi Ryan,

On 5/4/2011 at 7:14 PM, Ryan McKinley wrote:
 As a rule, everything should go through JIRA on its way to svn -- this
 is important so that we have somewhere to point for why we did things.
  Even small things.

Your phrase As a rule provides wiggle room that we all use.  Even small 
things.  Um, I don't think so.  E.g. no-one is going to go through JIRA for a 
small typo fix.  This judgment about what's big enough to warrant a JIRA issue 
is one each committer has to make.  As a result, this argument (David's patch 
should have gone through JIRA because everything should go through JIRA) 
doesn't work for me.

 With patches from contributors it is especially important they are
 added to JIRA because they need to grant the license to ASF.  Also
 attachments are often stripped from mailing list archives, so down the
 road its really hard to know what happened.

These are both excellent points.  Non-trivial non-committer patches should 
definitely go through JIRA for these reasons.

 I understand the desire to keep maven support low key -- but we should
 do that with a good README in dev-tools.

I agree that the Maven build should be documented - I plan on putting something 
together soon, as suggested by David.  This seems completely orthogonal to me, 
though, to the question of using JIRA issues for Maven build changes.

 Even as officially non-official tools, it still gets into svn so
 we need a trail of where it came from and hopefully a log of why
 we thought it was important.

I agree in principle, but again, I'll continue to use my own judgment about 
whether to use JIRA for small changes, especially to stuff under dev-tools/.

Steve

Re: modularization discussion

2011-05-05 Thread Mark Miller


On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote:

 3.  Those who think most should be modularized, but realize it's a ton of 
 work for an unproven gain (although most admit it is a highly likely gain) 
 and should be handled on a case-by-case basis as people do the work.   I 
 don't have anything against modularization, I just know, given my schedule, I 
 won't be able to block off weeks of time to do it.  I'm happy to review 
 where/when I can.

+1. From what I have gathered, Grant and I come down pretty much on the same 
page on most of this stuff. Yeah, that mean's I'm reevaluating my position :) 
but seems to be the case.

Except I'm more open to IRC discussion :)

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2897) apply delete-by-Term and docID immediately to newly flushed segments


 [ 
https://issues.apache.org/jira/browse/LUCENE-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2897.


Resolution: Fixed

 apply delete-by-Term and docID immediately to newly flushed segments
 

 Key: LUCENE-2897
 URL: https://issues.apache.org/jira/browse/LUCENE-2897
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2897.patch, LUCENE-2897.patch


 Spinoff from LUCENE-2324.
 When we flush deletes today, we keep them as buffered Term/Query/docIDs that 
 need to be deleted.  But, for a newly flushed segment (ie fresh out of the 
 DWPT), this is silly, because during flush we visit all terms and we know 
 their docIDs.  So it's more efficient to apply the deletes (for this one 
 segment) at that time.
 We still must buffer deletes for all prior segments, but these deletes don't 
 need to map to a docIDUpto anymore; ie we just need a Set.
 This issue should wait until LUCENE-1076 is in since that issue cuts over 
 buffered deletes to a transactional stream.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3076) add -Dtests.codecprovider

add -Dtests.codecprovider
-

 Key: LUCENE-3076
 URL: https://issues.apache.org/jira/browse/LUCENE-3076
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0


Currently to test a codec (or set of codecs) you have to add them to lucene's 
core and edit a couple of arrays here and there...

It would be nice if when using the test-framework you could instead specify a 
codecprovider by classname (possibly containing your own set of huper-duper 
codecs).

For example I made the following little codecprovider in contrib:
{noformat}
public class AppendingCodecProvider extends CodecProvider {
  public AppendingCodecProvider() {
register(new AppendingCodec());
register(new SimpleTextCodec());
  }
}
{noformat}

Then, I'm able to run tests with 'ant -lib 
build/contrib/misc/lucene-misc-4.0-SNAPSHOT.jar test-core 
-Dtests.codecprovider=org.apache.lucene.index.codecs.appending.AppendingCodecProvider',
 and it always picks from my set of  codecs (in this case Appending and 
SimpleText), and I can set -Dtests.codec=Appending if i want to set just one.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3076) add -Dtests.codecprovider


 [ 
https://issues.apache.org/jira/browse/LUCENE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3076:


Attachment: LUCENE-3076.patch

 add -Dtests.codecprovider
 -

 Key: LUCENE-3076
 URL: https://issues.apache.org/jira/browse/LUCENE-3076
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3076.patch


 Currently to test a codec (or set of codecs) you have to add them to lucene's 
 core and edit a couple of arrays here and there...
 It would be nice if when using the test-framework you could instead specify a 
 codecprovider by classname (possibly containing your own set of huper-duper 
 codecs).
 For example I made the following little codecprovider in contrib:
 {noformat}
 public class AppendingCodecProvider extends CodecProvider {
   public AppendingCodecProvider() {
 register(new AppendingCodec());
 register(new SimpleTextCodec());
   }
 }
 {noformat}
 Then, I'm able to run tests with 'ant -lib 
 build/contrib/misc/lucene-misc-4.0-SNAPSHOT.jar test-core 
 -Dtests.codecprovider=org.apache.lucene.index.codecs.appending.AppendingCodecProvider',
  and it always picks from my set of  codecs (in this case Appending and 
 SimpleText), and I can set -Dtests.codec=Appending if i want to set just one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

On Thu, May 5, 2011 at 4:41 PM, Mark Miller markrmil...@gmail.com wrote:

 On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote:

 3.  Those who think most should be modularized, but realize it's a ton of 
 work for an unproven gain (although most admit it is a highly likely gain) 
 and should be handled on a case-by-case basis as people do the work.   I 
 don't have anything against modularization, I just know, given my schedule, 
 I won't be able to block off weeks of time to do it.  I'm happy to review 
 where/when I can.

 +1. From what I have gathered, Grant and I come down pretty much on the same 
 page on most of this stuff. Yeah, that mean's I'm reevaluating my position :) 
 but seems to be the case.

so this is one thing I really don't understand. you say you are in the
3rd camp. Guys in that camp have not much time to do the work but
still are not willing to sign up for what we want to modularize.
Nobody asks you to do the work I only ask you to say ok I think this
is good and NOT sitting in the way blocking others. This is really
what the 3rd camp is about to me but maybe I miss-understand something
here.

Again you are saying you are not in camp 1 but you want to still
fiddle around with long discussion before we get anything done (and
eventually be against it - nothing personal) because you don't have
enough time to fit stiff in your schedule. This makes no sense to me.
That case by case stuff makes me sick. Lets put some goals out and say
ok this makes sense in a module this doesn't and let folks work on it.
We need some agreement here and I think we have written enough emails
to make our points. I think we should agree on a set of things and
once we are there we can talk again. Dreams vs. Babysteps!

Lets settle on something now, today or next week and stop this wast of
time. I am happy with an agreement that we don't factor anything out.
all remains in solr but we need to move here! After all these
discussion I don't have any motivation to work on it anyway. I think I
need to step back for a while along those lines!

simon

 Except I'm more open to IRC discussion :)

 - Mark Miller
 lucidimagination.com

 Lucene/Solr User Conference
 May 25-26, San Francisco
 www.lucenerevolution.org






 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1418) QueryParser can throw NullPointerException during parsing of some queries in case if default field passed to constructor is null

2011-05-05 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029356#comment-13029356
 ] 

David Smiley commented on LUCENE-1418:
--

Ok, thanks for your attention Chris. I could have sworn I cornered the bug with 
my debugger the other day but at the moment I can't seem to reproduce it. It 
very well may be user error :-( -- a typo in AJAX-Solr which used *.* instead 
of *:*... probably it was that, in hindsight.

 QueryParser can throw NullPointerException during parsing of some queries in 
 case if default field passed to constructor is null
 

 Key: LUCENE-1418
 URL: https://issues.apache.org/jira/browse/LUCENE-1418
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 2.4
 Environment: CentOS 5.2 (probably any applies)
Reporter: Alexei Dets
Priority: Minor

 In case if QueryParser was constructed using QueryParser(String f,  Analyzer 
 a) constructor and f equals null then QueryParser can fail with 
 NullPointerException during parsing of some queries that _does_ contain field 
 name but have unbalanced parenthesis.
 Example 1:
 Query:  field:(expr1) expr2)
 Result:
 java.lang.NullPointerException
   at org.apache.lucene.index.Term.init(Term.java:50)
   at org.apache.lucene.index.Term.init(Term.java:36)
   at 
 org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543)
   at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1324)
   at 
 org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211)
   at 
 org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1168)
   at 
 org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128)
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170)
 Example2:
 Query:  field:(expr1) expr2)
 Result:
 java.lang.NullPointerException
   at org.apache.lucene.index.Term.init(Term.java:50)
   at org.apache.lucene.index.Term.init(Term.java:36)
   at 
 org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543)
   at 
 org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:612)
   at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1459)
   at 
 org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211)
   at 
 org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1168)
   at 
 org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128)
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170)
 Workaround: pass in constructor empty string as a default field name - in 
 this case QueryParser.parse method will throw ParseException (expected result 
 because query string is wrong) instead of NullPointerException.
 It is not obvious to me how to fix this so I'll describe my usecase, may be 
 I'm doing something completely wrong.
 Basically I have a set of per-field queries entered by user and need to 
 programmatically construct (after some preprocessing) one real Lucene query 
 combined from these user-entered per-field subqueries.
 To achieve this I basically do the following (simplified a bit):
 QueryParser parser = new QueryParser(null, analyzer); // I'll always provide 
 a field name in a query string as it is different each time and I don't have 
 any default
 BooleanQuery query = new BooleanQuery();
 Query subQuery1 = parser.parse(field1 + :( + queryString1 + ')');
 query.add(subQuery1, operator1); // operator = BooleanClause.Occur.MUST, 
 BooleanClause.Occur.MUST_NOT or BooleanClause.Occur.SHOULD
 Query subQuery2 = parser.parse(field2 + :( + queryString2 + ')');
 query.add(subQuery2, operator2); 
 Query subQuery3 = parser.parse(field3 + :( + queryString3 + ')');
 query.add(subQuery3, operator3); 
 ...
 IMHO either QueryParser constructor should be changed to throw 
 NullPointerException/InvalidArgumentException in case of null field passed 
 (and API documentation updated) or QueryParser.parse behavior should be fixed 
 to correctly throw ParseException instead of NullPointerException. Also IMHO 
 of a great help can be _public_ setField/getField methods of QueryParser 
 (that set/get field), this can help in use cases like my:
 QueryParser parser = new QueryParser(null, analyzer); // or add constructor 
 with analyzer _only_ for such cases
 BooleanQuery query = new BooleanQuery();
 parser.setField(field1);
 Query subQuery1 = parser.parse(queryString1);
 query.add(subQuery1, operator1);
 parser.setField(field2);
 Query subQuery2 = parser.parse(queryString2);
 query.add(subQuery2, operator2); 
 ...

--
This message is automatically

[jira] [Resolved] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running


 [ 
https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2904.


Resolution: Fixed

 non-contiguous LogMergePolicy should be careful to not select merges already 
 running
 

 Key: LUCENE-2904
 URL: https://issues.apache.org/jira/browse/LUCENE-2904
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2904.patch


 Now that LogMP can do non-contiguous merges, the fact that it disregards 
 which segments are already being merged is more problematic since it could 
 result in it returning conflicting merges and thus failing to run multiple 
 merges concurrently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3071:
--

Attachment: LUCENE-3071.patch

Proposed patch attached.

Working against Lucene 3.1 (remove the {{path.length()}} last parameter to 
assert call).
But I am having difficulties making the tests work against trunk ({{ant}} and 
{{ant test}} fail, at global scope).

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029369#comment-13029369
 ] 

Robert Muir commented on LUCENE-3071:
-

bq. But I am having difficulties making the tests work against trunk (ant and 
ant test fail, at global scope).

Can you provide more details about this?

If possible stuff like ant version, whether you are using an svn checkout (and 
what the full path is), logs of what error messages, etc would be great.

Feel free to open a new jira issue for these problems!

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2498) Stop consolidating boosts for values of multivalued fields

Stop consolidating boosts for values of multivalued fields
--

 Key: SOLR-2498
 URL: https://issues.apache.org/jira/browse/SOLR-2498
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1, 4.0, Next
Reporter: Neil Hooey


Currently, if you boost a value in a multivalued field during index time, the 
boosts are consolidated for every field, and the individual values are lost.

So, for example, given a list of photos with a multivalue field keywords, and 
a boost for a keyword assigned to a photo corresponds to the number of times 
that photo was downloaded after searching for that particular keyword.

{code}
photo1: Photo of a cat by itself:
keywords: [ cat:600 feline:100 ]
= boost total = 700

photo2: Photo of a cat driving a truck:
keywords: [ cat:100 feline:90 animal:80 truck:1000 ]
= boost total = 1270
{code}

If you search for cat feline, photo2 will rank higher, since the boost of 
cat-like words was consolidated for the truck boost anomoly to score a 
total of 1270. Whereas photo1, which has more cat feline downloads, only gets 
a score of 700, and ranks lower.

*Intuitively the boosts should be separate, so only the boosts for the terms 
searched will be counted.*

Given the current behaviour, you are forced to do one of the following:
1. Assemble all of the multi-values into a string, and use payloads in place of 
boosts.
2. Use dynamic fields, such as keyword_*, and boost them independently.

Neither of these solutions are ideal, as using payloads requires writing your 
own BoostingTermQuery, and defining a new dynamic field per multi-value makes 
searching more difficult than with mutlivalued fields.

There's a blog link that describes the current behaviour:
http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-05-05 Thread Grant Ingersoll


On May 5, 2011, at 11:03 AM, Simon Willnauer wrote:

 On Thu, May 5, 2011 at 4:41 PM, Mark Miller markrmil...@gmail.com wrote:
 
 On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote:
 
 3.  Those who think most should be modularized, but realize it's a ton of 
 work for an unproven gain (although most admit it is a highly likely gain) 
 and should be handled on a case-by-case basis as people do the work.   I 
 don't have anything against modularization, I just know, given my schedule, 
 I won't be able to block off weeks of time to do it.  I'm happy to review 
 where/when I can.
 
 +1. From what I have gathered, Grant and I come down pretty much on the same 
 page on most of this stuff. Yeah, that mean's I'm reevaluating my position 
 :) but seems to be the case.
 
 so this is one thing I really don't understand. you say you are in the
 3rd camp. Guys in that camp have not much time to do the work but
 still are not willing to sign up for what we want to modularize.

I don't follow this leap.  (BTW, I'm actually mostly in camp #1 and a little in 
camp #3, I just want to make sure, based on what I've read that all sides are 
represented.  I like Mike's approach, but I also know it is a ton of work and 
details matter.)  

 Nobody asks you to do the work I only ask you to say ok I think this
 is good and NOT sitting in the way blocking others. This is really
 what the 3rd camp is about to me but maybe I miss-understand something
 here.
 
 Again you are saying you are not in camp 1 but you want to still
 fiddle around with long discussion before we get anything done (and
 eventually be against it - nothing personal)

I don't think that is what Mark is saying nor is it what camp #3 is saying.  
And I don't think we are fiddling w/ long discussions (it's only been a couple 
of days.)  This is hugely important.  We need consensus to move forward.

 because you don't have
 enough time to fit stiff in your schedule. This makes no sense to me.
 That case by case stuff makes me sick. Lets put some goals out and say
 ok this makes sense in a module this doesn't and let folks work on it.

To me, the third camp is just saying the proof is in the pudding.  If you want 
to refactor, then go for it.  Just make sure everything still works, which of 
course I know people will (but part of that means actually running Solr, IMO).  
Perhaps, more importantly don't get mad that if I have only one day a week to 
work on Lucene/Solr that I spend it putting a specific feature in a specific 
place.  Just because something can/should be modularized, doesn't mean that a 
person working in that area must do it before they add whatever they were 
working on.  For instance, if and when function queries are a module, I will 
add to them there and be happy to do so.  In the meantime, I will likely add to 
them in Solr if that is something I happen to be interested in at that time b/c 
I can certainly add a new function in a day, but I can't refactor the whole 
module _and_ add my new function in a day.

In the end, I think we are in agreement (at least you and me), actually.  To 
me, the best place to start on this is:
1. Function queries
2. Spatial
3. Faceting

(In that order)

-Grant



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2499) Index-time boosts for multivalue fields are consolidated


 [ 
https://issues.apache.org/jira/browse/SOLR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Hooey updated SOLR-2499:
-

Description: 
Currently, if you boost a value in a multivalue field during index time, the 
boosts are consolidated for every field, and the individual values are lost.

So, for example, given a list of photos with a multivalue field keywords, and 
a boost for a keyword assigned to a photo corresponds to the number of times 
that photo was downloaded after searching for that particular keyword, we have 
documents like this:

{code}
photo1: Photo of a cat by itself
keywords: [ cat:600 feline:100 ]
= boost total = 700

photo2: Photo of a cat driving a truck
keywords: [ cat:100 feline:90 animal:80 truck:1000 ]
= boost total = 1270
{code}

If you search for cat feline, photo2 will rank higher, since the boost of 
cat-like words was consolidated with the truck boost anomaly. Whereas 
photo1, which has more downloads for cat and feline, ranks lower with a 
lower consolidated boost, even though the total boost for the relevant keywords 
is higher than for photo1.

*Intuitively, the boosts should be separate, so only the boosts for the terms 
searched will be counted.*

Given the current behaviour, you are forced to do one of the following:
1. Assemble all of the multi-values into a string, and use payloads in place of 
boosts.
2. Use dynamic fields, such as keyword_*, and boost them independently.

Neither of these solutions are ideal, as using payloads requires writing your 
own BoostingTermQuery, and defining a new dynamic field per multi-value makes 
searching more difficult than with multivalue fields.

There's a blog entry that describes the current behaviour:
http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2

  was:
Currently, if you boost a value in a multivalue field during index time, the 
boosts are consolidated for every field, and the individual values are lost.

So, for example, given a list of photos with a multivalue field keywords, and 
a boost for a keyword assigned to a photo corresponds to the number of times 
that photo was downloaded after searching for that particular keyword, we have 
documents like this:

{code}
photo1: Photo of a cat by itself
keywords: [ cat:600 feline:100 ]
= boost total = 700

photo2: Photo of a cat driving a truck
keywords: [ cat:100 feline:90 animal:80 truck:1000 ]
= boost total = 1270
{code}

If you search for cat feline, photo2 will rank higher, since the boost of 
cat-like words was consolidated with the truck boost anomaly. Whereas 
photo1, which has more downloads for cat and feline, ranks lower with a 
lower consolidated boost.

*Intuitively, the boosts should be separate, so only the boosts for the terms 
searched will be counted.*

Given the current behaviour, you are forced to do one of the following:
1. Assemble all of the multi-values into a string, and use payloads in place of 
boosts.
2. Use dynamic fields, such as keyword_*, and boost them independently.

Neither of these solutions are ideal, as using payloads requires writing your 
own BoostingTermQuery, and defining a new dynamic field per multi-value makes 
searching more difficult than with multivalue fields.

There's a blog entry that describes the current behaviour:
http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2


 Index-time boosts for multivalue fields are consolidated
 

 Key: SOLR-2499
 URL: https://issues.apache.org/jira/browse/SOLR-2499
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1, 4.0, Next
Reporter: Neil Hooey
  Labels: boost, multivalue, multivalued

 Currently, if you boost a value in a multivalue field during index time, the 
 boosts are consolidated for every field, and the individual values are lost.
 So, for example, given a list of photos with a multivalue field keywords, 
 and a boost for a keyword assigned to a photo corresponds to the number of 
 times that photo was downloaded after searching for that particular keyword, 
 we have documents like this:
 {code}
 photo1: Photo of a cat by itself
 keywords: [ cat:600 feline:100 ]
 = boost total = 700
 photo2: Photo of a cat driving a truck
 keywords: [ cat:100 feline:90 animal:80 truck:1000 ]
 = boost total = 1270
 {code}
 If you search for cat feline, photo2 will rank higher, since the boost of 
 cat-like words was consolidated with the truck boost anomaly. Whereas 
 photo1, which has more downloads for cat and feline, ranks lower with a 
 lower consolidated boost, even though the total boost for the relevant 
 keywords is higher than for photo1.
 *Intuitively, the boosts should be separate, so only the boosts for the terms 
 searched will be counted.*
 Given the current behaviour, you are forced to do one of the following:
 1. Assemble all of the

[jira] [Closed] (SOLR-2498) Stop consolidating boosts for values of multivalued fields


 [ 
https://issues.apache.org/jira/browse/SOLR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Hooey closed SOLR-2498.


Resolution: Duplicate

 Stop consolidating boosts for values of multivalued fields
 --

 Key: SOLR-2498
 URL: https://issues.apache.org/jira/browse/SOLR-2498
 Project: Solr
  Issue Type: Improvement
Reporter: Neil Hooey

 I accidentally double-submitted this bug when my browser crashed.
 Here is the real one:
 https://issues.apache.org/jira/browse/SOLR-2499

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2498) Stop consolidating boosts for values of multivalued fields


 [ 
https://issues.apache.org/jira/browse/SOLR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Hooey updated SOLR-2498:
-

   Labels:   (was: boost multivalue multivalued)
  Description: 
I accidentally double-submitted this bug when my browser crashed.

Here is the real one:
https://issues.apache.org/jira/browse/SOLR-2499

  was:
Currently, if you boost a value in a multivalued field during index time, the 
boosts are consolidated for every field, and the individual values are lost.

So, for example, given a list of photos with a multivalue field keywords, and 
a boost for a keyword assigned to a photo corresponds to the number of times 
that photo was downloaded after searching for that particular keyword.

{code}
photo1: Photo of a cat by itself:
keywords: [ cat:600 feline:100 ]
= boost total = 700

photo2: Photo of a cat driving a truck:
keywords: [ cat:100 feline:90 animal:80 truck:1000 ]
= boost total = 1270
{code}

If you search for cat feline, photo2 will rank higher, since the boost of 
cat-like words was consolidated for the truck boost anomoly to score a 
total of 1270. Whereas photo1, which has more cat feline downloads, only gets 
a score of 700, and ranks lower.

*Intuitively the boosts should be separate, so only the boosts for the terms 
searched will be counted.*

Given the current behaviour, you are forced to do one of the following:
1. Assemble all of the multi-values into a string, and use payloads in place of 
boosts.
2. Use dynamic fields, such as keyword_*, and boost them independently.

Neither of these solutions are ideal, as using payloads requires writing your 
own BoostingTermQuery, and defining a new dynamic field per multi-value makes 
searching more difficult than with mutlivalued fields.

There's a blog link that describes the current behaviour:
http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2

Affects Version/s: (was: Next)
   (was: 4.0)
   (was: 3.1)

 Stop consolidating boosts for values of multivalued fields
 --

 Key: SOLR-2498
 URL: https://issues.apache.org/jira/browse/SOLR-2498
 Project: Solr
  Issue Type: Improvement
Reporter: Neil Hooey

 I accidentally double-submitted this bug when my browser crashed.
 Here is the real one:
 https://issues.apache.org/jira/browse/SOLR-2499

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2499) Index-time boosts for multivalue fields are consolidated


 [ 
https://issues.apache.org/jira/browse/SOLR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Hooey updated SOLR-2499:
-

Comment: was deleted

(was: Double-submission, oops!
This issue is the canonical one.)

 Index-time boosts for multivalue fields are consolidated
 

 Key: SOLR-2499
 URL: https://issues.apache.org/jira/browse/SOLR-2499
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1, 4.0, Next
Reporter: Neil Hooey
  Labels: boost, multivalue, multivalued

 Currently, if you boost a value in a multivalue field during index time, the 
 boosts are consolidated for every field, and the individual values are lost.
 So, for example, given a list of photos with a multivalue field keywords, 
 and a boost for a keyword assigned to a photo corresponds to the number of 
 times that photo was downloaded after searching for that particular keyword, 
 we have documents like this:
 {code}
 photo1: Photo of a cat by itself
 keywords: [ cat:600 feline:100 ]
 = boost total = 700
 photo2: Photo of a cat driving a truck
 keywords: [ cat:100 feline:90 animal:80 truck:1000 ]
 = boost total = 1270
 {code}
 If you search for cat feline, photo2 will rank higher, since the boost of 
 cat-like words was consolidated with the truck boost anomaly. Whereas 
 photo1, which has more downloads for cat and feline, ranks lower with a 
 lower consolidated boost, even though the total boost for the relevant 
 keywords is higher than for photo1.
 *Intuitively, the boosts should be separate, so only the boosts for the terms 
 searched will be counted.*
 Given the current behaviour, you are forced to do one of the following:
 1. Assemble all of the multi-values into a string, and use payloads in place 
 of boosts.
 2. Use dynamic fields, such as keyword_*, and boost them independently.
 Neither of these solutions are ideal, as using payloads requires writing your 
 own BoostingTermQuery, and defining a new dynamic field per multi-value makes 
 searching more difficult than with multivalue fields.
 There's a blog entry that describes the current behaviour:
 http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Consolodation of boosts on multivalued fields

2011-05-05 Thread Neil Hooey

Currently when you assign boosts to multivalue fields during
index-time, they are consolidated, and the individual boosts are lost.

There are some relevant cases where the individual boost values are
important, so I'd like to fix this behaviour.

I've created an issue here, which gives some examples:
https://issues.apache.org/jira/browse/SOLR-2499

Do you have any ideas of where to get started with this fix, or have
an idea of how difficult the fix might be?

Thanks,

- Neil

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3071:
--

Attachment: ant.log.tar.bz2

I'm using Ubuntu 10.04.2 LTS.
ant -version
Apache Ant version 1.7.1 compiled on September 8 2010
I followed the wiki: http://wiki.apache.org/lucene-java/HowToContribute
I used svn checkout http://svn.eu.apache.org/repos/asf/lucene/dev/trunk/ 
lucene-trunk.
I'm working under revision 1099843 (yours).
See ant log attached.

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2500) TestSolrCoreProperties sometimes fails with no such core: core0

TestSolrCoreProperties sometimes fails with no such core: core0
-

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir


[junit] Testsuite: org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Testcase: 
testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
Caused an ERROR
[junit] No such core: core0
[junit] org.apache.solr.common.SolrException: No such core: core0
[junit] at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
[junit] at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
[junit] at 
org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029393#comment-13029393
 ] 

Robert Muir commented on LUCENE-3071:
-

Hi Olivier, thanks for uploading the log. 

This test fails for me sometimes too, somehow we should get to the bottom of 
it. 
I opened an issue: SOLR-2500

As a workaround, perhaps using 'ant clean test' will help... I fought with this
test a little bit the other day and somehow 'clean' seemed to temporarily get 
the
test passing...

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running

2011-05-05 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029403#comment-13029403
 ] 

Earwin Burrfoot commented on LUCENE-2904:
-

I think we should simply change the API for MergePolicy.
Instead of SegmentInfos it should accept a SetSegmentInfo with SIs eligible 
for merging (eg, completely written  not elected for another merge).
IW.getMergingSegments() is a damn cheat, and Expert notice is not an excuse! 
:)
Why should each and every MP do the set substraction when IW can do it for them 
once and for all? 

 non-contiguous LogMergePolicy should be careful to not select merges already 
 running
 

 Key: LUCENE-2904
 URL: https://issues.apache.org/jira/browse/LUCENE-2904
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2904.patch


 Now that LogMP can do non-contiguous merges, the fact that it disregards 
 which segments are already being merged is more problematic since it could 
 result in it returning conflicting merges and thus failing to run multiple 
 merges concurrently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

[
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-3065:
--

Attachment: SOLR-2497.patch

MoreLikeThis problem solved, it was as I said. The test included a TrieInt
field into the similarity fields, so it was used to calculate similarity. As
with previous Solr the TrieField was invisible to MLT this had no effect.
By the way: There is a commented out part with explicitely the MLT field, but I
dont understand it. It seems that it was never understood/supported.
Now, all numeric fields should work with MLT.

Now only the TestDistributedSearch is still failing with a strange date
failure. I'll dig.

NumericField should be stored in binary format in index (matching Solr's
format)

Key: LUCENE-3065
URL: https://issues.apache.org/jira/browse/LUCENE-3065
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
Fix For: 3.2, 4.0

Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch,
LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch,
SOLR-2497.patch

(Spinoff of LUCENE-3001)
Today when writing stored fields we don't record that the field was a
NumericField, and so at IndexReader time you get back an ordinary Field and
your number has turned into a string. See
https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
We have spare bits already in stored fields, so, we should use one to record
that the field is numeric, and then encode the numeric field in Solr's
more-compact binary format.
A nice side-effect is we fix the long standing issue that you don't get a
NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: (was: SOLR-2497.patch)

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Comment: was deleted

(was: Ideally this could be done with the schema-like approach of one of the 
GSoC projects?

We already discussed about that: We can use the FieldsReader/FieldsWriter type 
flag (which currently says, binary/text and compressed (unused now)) in the 
index file format to mark a field as NumericField. In that case, 
Document.getField() would return the NumericField instance.

For Lucene backwards we should still support creating text-only fields.

The new binary format would also be compatible with solr, as on getField, Solr 
would get a NumericField and can decide using instanceof what to do. Old Solr 
indexes without the NumericField marker flag would return as byte[], in which 
case, solr would do the decoding.

For storing on index side, Solr could move to NumericField completely (I dont 
like the current approach using NumericTokenStream and to/fromInternal wrappers 
around conventional Field).)

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Comment: was deleted

(was: Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's
NumericUtils, and fixed FieldsWriter/Reader to use free bits in the
field's flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField
back, when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like
you did before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr
that must now be fixed to handle the fact that a field can come back
as NumericField?  Anyone know where...?)

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065

[
https://issues.apache.org/jira/browse/SOLR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated SOLR-2497:

Attachment: SOLR-2497.patch

Now only the TestDistributedSearch is still failing with a strange date
failure. I'll dig.

Move Solr to new NumericField stored field impl of LUCENE-3065
--

Key: SOLR-2497
URL: https://issues.apache.org/jira/browse/SOLR-2497
Project: Solr
Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 3.2, 4.0

Attachments: SOLR-2497.patch, SOLR-2497.patch

This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField
Co would use NumericField for indexing and reading stored fields. To enable
this some missing changes in Solr's internals (Field - Fieldable) need to be
done. Also some backwards compatible stored fields parsing is needed to read
pre-3.2 indexes without reindexing (as the format changed a little bit and
Document.getFieldable returns NumericField instances now).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running

2011-05-05 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029408#comment-13029408
 ] 

Earwin Burrfoot commented on LUCENE-2904:
-

Ok, I'm wrong. We need both a list of all SIs and eligible SIs for 
calculations. But that should be handled through API change, not a new public 
method on IW.

 non-contiguous LogMergePolicy should be careful to not select merges already 
 running
 

 Key: LUCENE-2904
 URL: https://issues.apache.org/jira/browse/LUCENE-2904
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2904.patch


 Now that LogMP can do non-contiguous merges, the fact that it disregards 
 which segments are already being merged is more problematic since it could 
 result in it returning conflicting merges and thus failing to run multiple 
 merges concurrently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-05 Thread Uwe Schindler

Sorry, I did not want to delete this one, my huper duper browser gots totally 
confused and disturbed...

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Uwe Schindler (JIRA) [mailto:j...@apache.org]
 Sent: Thursday, May 05, 2011 6:13 PM
 To: dev@lucene.apache.org
 Subject: [jira] [Updated] (LUCENE-3065) NumericField should be stored in
 binary format in index (matching Solr's format)
 
 
  [ https://issues.apache.org/jira/browse/LUCENE-
 3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
 
 Uwe Schindler updated LUCENE-3065:
 --
 
 Comment: was deleted
 
 (was: Ideally this could be done with the schema-like approach of one of the
 GSoC projects?
 
 We already discussed about that: We can use the FieldsReader/FieldsWriter
 type flag (which currently says, binary/text and compressed (unused now))
 in the index file format to mark a field as NumericField. In that case,
 Document.getField() would return the NumericField instance.
 
 For Lucene backwards we should still support creating text-only fields.
 
 The new binary format would also be compatible with solr, as on getField,
 Solr would get a NumericField and can decide using instanceof what to do.
 Old Solr indexes without the NumericField marker flag would return as
 byte[], in which case, solr would do the decoding.
 
 For storing on index side, Solr could move to NumericField completely (I dont
 like the current approach using NumericTokenStream and to/fromInternal
 wrappers around conventional Field).)
 
  NumericField should be stored in binary format in index (matching
  Solr's format)
  --
  --
 
  Key: LUCENE-3065
  URL: https://issues.apache.org/jira/browse/LUCENE-3065
  Project: Lucene - Java
   Issue Type: Improvement
   Components: Index
 Reporter: Michael McCandless
 Assignee: Uwe Schindler
 Priority: Minor
  Fix For: 3.2, 4.0
 
  Attachments: LUCENE-3065.patch, LUCENE-3065.patch,
  LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch,
  LUCENE-3065.patch, LUCENE-3065.patch
 
 
  (Spinoff of LUCENE-3001)
  Today when writing stored fields we don't record that the field was a
  NumericField, and so at IndexReader time you get back an ordinary
  Field and your number has turned into a string.  See
  https://issues.apache.org/jira/browse/LUCENE-
 1701?focusedCommentId=127
  21972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
 tab
  panel#comment-12721972 We have spare bits already in stored fields,
  so, we should use one to record that the field is numeric, and then encode
 the numeric field in Solr's more-compact binary format.
  A nice side-effect is we fix the long standing issue that you don't get a
 NumericField back when loading your document.
 
 --
 This message is automatically generated by JIRA.
 For more information on JIRA, see: http://www.atlassian.com/software/jira
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)


[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029410#comment-13029410
 ] 

Uwe Schindler commented on LUCENE-3065:
---

Sorry my browser or JIRA deleted wrong comments, so I removed one from me and 
one from Mike :( - Sorry.

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

[
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-3065:
--

Comment: was deleted

(was: MoreLikeThis problem solved, it was as I said. The test included a
TrieInt field into the similarity fields, so it was used to calculate
similarity. As with previous Solr the TrieField was invisible to MLT this had
no effect.
By the way: There is a commented out part with explicitely the MLT field, but I
dont understand it. It seems that it was never understood/supported.
Now, all numeric fields should work with MLT.

Now only the TestDistributedSearch is still failing with a strange date
failure. I'll dig.)

NumericField should be stored in binary format in index (matching Solr's
format)

Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch,
LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)


[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029412#comment-13029412
 ] 

Uwe Schindler commented on LUCENE-3065:
---

Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's 
NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's 
flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField back, 
when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like you did 
before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr that 
must now be fixed to handle the fact that a field can come back as 
NumericField?  Anyone know where...?)


 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

[
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029412#comment-13029412
]

Uwe Schindler edited comment on LUCENE-3065 at 5/5/11 4:22 PM:
---

Revert of deletion of Mike's first comment (sorry)

{quote}
Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's
NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's
flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField back,
when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like you did
before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr that
must now be fixed to handle the fact that a field can come back as
NumericField? Anyone know where...?)
{quote}

was (Author: thetaphi):
Patch against 3.x.

I added a random test case to verify we now get the right NumericField back,
when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like you did
before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr that
must now be fixed to handle the fact that a field can come back as
NumericField? Anyone know where...?)

NumericField should be stored in binary format in index (matching Solr's
format)

Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch,
LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029413#comment-13029413
 ] 

Olivier Favre commented on LUCENE-3071:
---

{{ant clean test}} did it for me, thanks!

As for the failing tests, it is because of the {{finalOffset}} that I set to 
{{path.length()}}.
I'm not sure whether I should use {{path.length()}}, as my tokens don't go up 
to there when using the reverse mode.
When I take a look at the the end() function, I think that I should set it to 
the end of the string. But I can't see it on the javadoc.
If the purpose of the {{finalOffset}} parameter in 
{{assertTokenStreamContents()}} it to make sure of the {{endOffset}} of the 
last term, then I should not use {{path.length()}} blindly when using reverse 
and skip.

Can you help me with the purpose of {{finalOffset}}? Or can I simply skip it in 
my tests (they are working if I skip it)?

Thanks

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029416#comment-13029416
 ] 

Robert Muir commented on LUCENE-3071:
-

bq. Can you help me with the purpose of finalOffset? Or can I simply skip it in 
my tests (they are working if I skip it)?

The finalOffset is supposed to be the offset of the entire document, this is 
useful so that offsets are correct on multivalued fields.

Example multivalued field foo with two values:
bar  -- this one ends with a space
baz

With a whitespace tokenizer, value 1 will have a single token bar with 
startOffset=0, endOffset=3. But, finalOffset needs to be 4 (essentially however 
many chars you read in from the Reader)

This way, the offsets will then accumulate correctly for baz.


 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-05 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029421#comment-13029421
 ] 

Earwin Burrfoot commented on LUCENE-3065:
-

It's sad NumericFields are hardbaked into index format.

Eg - I have some fields that are similar to Numeric in that they are 
'stringified' binary structures, and they can't become first-class in the same 
manner as Numeric.

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)


[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029427#comment-13029427
 ] 

Uwe Schindler commented on LUCENE-3065:
---

Earwin: The long-term plan for flexible indexing is to make also stored fields 
flexible. For now its not possible, so NumericFields are handled separately. In 
the future, this might be a stored fields codec.

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-05 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029426#comment-13029426
 ] 

Jan Høydahl commented on SOLR-2493:
---

+1

@Michael, agree on this. But instead of relying on a monolithic solrconfig.xml 
file or .yml file, isn't it better to re-design configuration to fit a 
path/node concept more fine-grained (like ZK nodes)? It doesn't feel quite 
right to store solrconfig.xml and schema.xml as a huge string in the SolrCloud 
ZK schema. It would be better to have stuff like 
/solr/configs/configA/general/abortOnConfigurationError=false as a separate 
config node. Likewise /solr/configs/configA/schema/types/text_en to define 
fieldType text_en. The config concept won't need to be bound to ZK either. 
There could be pluggable backend implementations, where one could read/write 
the existing XML formats.

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors FSDir.open)


[ 
https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029430#comment-13029430
 ] 

Michael McCandless commented on LUCENE-1877:


Uggh, sorry about that Greg.  Somehow this obviously very important note 
was lost in this issue.

Can you describe how you use NFS and Lucene?  Is there a single machine writing 
to the NFS dir, or more than one?

 Use NativeFSLockFactory as default for new API (direct ctors  FSDir.open)
 --

 Key: LUCENE-1877
 URL: https://issues.apache.org/jira/browse/LUCENE-1877
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Javadocs
Reporter: Mark Miller
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch, 
 LUCENE-1877.patch


 A user requested we add a note in IndexWriter alerting the availability of 
 NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm 
 exit). Seems reasonable to me - we want users to be able to easily stumble 
 upon this class. The below code looks like a good spot to add a note - could 
 also improve whats there a bit - opening an IndexWriter does not necessarily 
 create a lock file - that would depend on the LockFactory used.
 {code}  pOpening an codeIndexWriter/code creates a lock file for the 
 directory in use. Trying to open
   another codeIndexWriter/code on the same directory will lead to a
   {@link LockObtainFailedException}. The {@link LockObtainFailedException}
   is also thrown if an IndexReader on the same directory is used to delete 
 documents
   from the index./p{code}
 Anyone remember why NativeFSLockFactory is not the default over 
 SimpleFSLockFactory?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-05-05 Thread Jason Rutherglen

+1 to Mike's proposal here.  Each of these could easily be
patches/issues.  The top ones would probably be the basics, eg,
faceting and schemas.

As the easiest short term solution for allowing other systems to use
Solr or it's features, it would be great if a 'committer' responded to
SOLR-1431.  Eg, it's assigned to someone and they should respond.  The
issue should probably be unassigned or assigned to someone else.

Lucene is a great project that many people rely on.  Refactoring Solr
will help the project by allowing more people to do more things with
Lucene.  That's an overall 'good' thing for everyone.  Also have we
lost the ability to execute distributed queries in Lucene?

Taking a step back I'd ask some of the owners of the projects
mentioned why they do not simply submit patches directly to the Apache
Lucene project as opposed to starting their own external projects?

On Tue, May 3, 2011 at 9:49 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Isn't our end goal here a bunch of well factored search modules?  Ie,
 fast forward a year or two and I think we should have modules like
 these:

  * Faceting

  * Highlighting

  * Suggest (good patch is on LUCENE-2995)

  * Schema

  * Query impls

  * Query parsers

  * Analyzers (good progress here already, thanks Robert!),
    incl. factories/XML configuration (still need this)

  * Database import (DIH)

  * Web app

  * Distribution/replication

  * Doc set representations

  * Collapse/grouping

  * Caches

  * Similarity/scoring impls (BM25, etc.)

  * Codecs

  * Joins

  * Lucene core

 In this future, much of this code came from what is now Solr and
 Lucene, but we should freely and aggressively poach from other
 projects when appropriate (and license/provenance is OK).

 I keep seeing all these cool compressed int set projects popping
 up... surely these are useful for us.  Solr poached a doc set impl
 from Nutch; probably there's other stuff to poach from Nutch, Mahout,
 etc.

 Katta's doing something sweet with distribution/replication; let's
 poach  merge w/ Solr's approach.  There are various facet impls out
 there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach  merge
 with Solr's.

 Elastic Search has lots of cool stuff, too, under ASL2.

 All these external open-source projects are fair game for poaching and
 refactoring into shared modules, along with what is now Solr and
 Lucene sources.

 In this ideal future, Solr becomes the bundling and default/example
 configuration of the Web App and other modules, much like how the
 various Linux distros bundle different stuff together around the Linux
 kernel.  And if you are an advanced app and don't need the webapp
 part, you can cherry pick the huper duper modules you do need and
 directly embedded into your app.

 Isn't this the future we are working towards?

 Mike

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.


[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029433#comment-13029433
 ] 

Michael McCandless commented on SOLR-2493:
--

Jan, I don't have any experience with ZooKeeper, but that sounds neat :)

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running


[ 
https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029435#comment-13029435
 ] 

Michael McCandless commented on LUCENE-2904:


Earwin, that sounds great (changing current API instead of new IW method), I 
think?  Can you open a new issue?  Thanks.

 non-contiguous LogMergePolicy should be careful to not select merges already 
 running
 

 Key: LUCENE-2904
 URL: https://issues.apache.org/jira/browse/LUCENE-2904
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2904.patch


 Now that LogMP can do non-contiguous merges, the fact that it disregards 
 which segments are already being merged is more problematic since it could 
 result in it returning conflicting merges and thus failing to run multiple 
 merges concurrently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3071:
--

Attachment: LUCENE-3071.patch

I fixed my code accordingly.
Tests run fine now.

Ready to ship?

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029444#comment-13029444
 ] 

Olivier Favre edited comment on LUCENE-3071 at 5/5/11 5:19 PM:
---

Fixed patch attached.

Tests run fine now.

Ready to ship?

  was (Author: ofavre):
I fixed my code accordingly.
Tests run fine now.

Ready to ship?
  
 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Bug in boilerpipe 1.1.0 referenced from solr-cell

2011-05-05 Thread Otis Gospodnetic

Andrew, you can get to Boilerplate author's email address on 
http://code.google.com/p/boilerpipe/ 


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Andrew Bisson andrew.bis...@gossinteractive.com
To: dev@lucene.apache.org
Sent: Wed, May 4, 2011 7:48:07 AM
Subject: Bug in boilerpipe 1.1.0 referenced from solr-cell

  
Solr-cell references boilerpipe 1.1.0 which contains a modified version of 
nekohtml 1.9.9.
It seems that this version of nekohtml is broken in that it references the 
class 
LostText without including it.
 
The unmodified release of nekohtml 1.9.9 does not reference or include this 
class and the latest release, 1.9.14, both references and includes it.
 
As a result, our application has been broken because it independently uses 
nekohtml and is now finding a broken version of the jar.
 
How should I report this issue as it is not directly a bug in solr?
Andrew Le Couteur Bisson
Senior Software Engineer
GOSS Interactive

t: 0844 880 3637
f:  0844 880 3638
e:  andrew.bis...@gossinteractive.com
w:www.gossinteractive.com
 
Have you registered for our e-Newsletter? www.gossinteractive.com/newsletter
Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, 
Plymouth, PL1 1LG. Company Registration No: 3553908
This email contains proprietary information, some or all of which may be 
legally 
privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email.
Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been 
checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses.

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029471#comment-13029471
 ] 

Robert Muir commented on LUCENE-3071:
-

Hi Olivier, at a glance the patch looks really great to me, thanks!

I took a quick look (not in enough detail) but had these thoughts, neither of 
which I think are really mandatory for this feature, just ideas:
* do you think it would be cleaner if we made it a separate tokenizer? (e.g. 
ReversePath). The main logic of the tokenizer seems to be completely split, 
depending on whether you are reversing or not.
* i think its possible in the future we could simplify the way finalOffset is 
being tracked, such that we just accumulate it on every read(), and then 
correctOffset a single time in end(). (i don't think this has much to do with 
your patch, just looking at the code in general).


 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2918) IndexWriter should prune 100% deleted segs even in the NRT case


 [ 
https://issues.apache.org/jira/browse/LUCENE-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2918.


Resolution: Fixed

 IndexWriter should prune 100% deleted segs even in the NRT case
 ---

 Key: LUCENE-2918
 URL: https://issues.apache.org/jira/browse/LUCENE-2918
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2918.patch


 We now prune 100% deleted segs on commit from IW or IR (LUCENE-2010),
 but this isn't quite aggressive enough, because in the NRT case you
 rarely call commit.
 Instead, the moment we delete the last doc of a segment, it should be
 pruned from the in-memory segmentInfos.  This way, if you open an NRT
 reader, or a merge kicks off, or commit is called, the 100% deleted
 segment is already gone.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: jira issues falling off the radar -- Next JIRA version

2011-05-05 Thread Chris Hostetter


:  We should definitely kill of Next ... i would suggest just removing it, 
:  and not bulk applying a new version (there is no requirement that issues 
:  have a version)
... 
: Based on that, I think it would be irresponsible to just delete Next 
: because any issues assigned to this version on the basis of that 
: description (like SOLR-2191) is going to be dropped on the floor.

Of course you're right ... i was thinking about it from a what's the 
minimum that must be done in order to eliminate a version, but that 
doesn't mean it would leave those issues in a good state.

Doing a little more reading about Jira version management, I realized that 
Jira allows Versions to be merged

I suggest we marge Next into 3.2 ...

http://confluence.atlassian.com/display/JIRA/Managing+Versions#ManagingVersions-Mergingmultipleversions


...objections?



-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: jira issues falling off the radar -- Next JIRA version

2011-05-05 Thread Michael McCandless

On Thu, May 5, 2011 at 3:24 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 I suggest we marge Next into 3.2 ...

+1

Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: jira issues falling off the radar -- Next JIRA version

2011-05-05 Thread Smiley, David W.

Marge away ;-)

On May 5, 2011, at 3:24 PM, Chris Hostetter wrote:

 I suggest we marge Next into 3.2 ...
 
 http://confluence.atlassian.com/display/JIRA/Managing+Versions#ManagingVersions-Mergingmultipleversions
 
 
   ...objections?


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 7757 - Failure

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7757/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testHealthyness

Error Message:
 flushingQueue: DWDQ: [ generation: 9 ] currentqueue: DWDQ: [ generation: 10 ] 
perThread queue: DWDQ: [ generation: 0 ] numDocsInRam: 3

Stack Trace:
junit.framework.AssertionFailedError:  flushingQueue: DWDQ: [ generation: 9 ] 
currentqueue: DWDQ: [ generation: 10 ] perThread queue: DWDQ: [ generation: 0 ] 
numDocsInRam: 3
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)
at 
org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:326)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:500)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2622)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2599)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1051)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1015)
at 
org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testHealthyness(TestFlushByRamOrCountsPolicy.java:276)




Build Log (for compile errors):
[...truncated 3370 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: jira issues falling off the radar -- Next JIRA version

2011-05-05 Thread Mark Miller

+1 - next should be nuked, the issues should simply be plopped into the next 
likely release and dealt with (done, moved, pushed) before release.


On May 5, 2011, at 3:24 PM, Chris Hostetter wrote:

 
 :  We should definitely kill of Next ... i would suggest just removing it, 
 :  and not bulk applying a new version (there is no requirement that issues 
 :  have a version)
   ... 
 : Based on that, I think it would be irresponsible to just delete Next 
 : because any issues assigned to this version on the basis of that 
 : description (like SOLR-2191) is going to be dropped on the floor.
 
 Of course you're right ... i was thinking about it from a what's the 
 minimum that must be done in order to eliminate a version, but that 
 doesn't mean it would leave those issues in a good state.
 
 Doing a little more reading about Jira version management, I realized that 
 Jira allows Versions to be merged
 
 I suggest we marge Next into 3.2 ...
 
 http://confluence.atlassian.com/display/JIRA/Managing+Versions#ManagingVersions-Mergingmultipleversions
 
 
   ...objections?
 
 
 
 -Hoss
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-05 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029533#comment-13029533
 ] 

Ryan McKinley commented on LUCENE-3071:
---

bq. do you think it would be cleaner if we made it a separate tokenizer?

I think its a tossup -- having keeping them together makes one less factory in 
solr (not much of an argument) and the other three parameters 
(delimiter,replacement,skip) are nice to keep consistent.


 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, 
 ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-05 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated LUCENE-3071:
--

Attachment: LUCENE-3071.patch

updated patch that includes solr factory.

Robert if this looks ok to you, i will go ahead and commit

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, 
 ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029540#comment-13029540
 ] 

Robert Muir commented on LUCENE-3071:
-

bq. having keeping them together makes one less factory in solr (not much of an 
argument) 

I don't understand this?

You can still have one solr factory, if reverse=true it creates ReverseXXX...


 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, 
 ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 7757 - Failure

the actual exception we are tripping here is

 java.lang.RuntimeException: java.lang.AssertionError
[junit] at
org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:328)
[junit] Caused by: java.lang.AssertionError
[junit] at
org.apache.lucene.index.DocumentsWriterFlushControl.setFlushPending(DocumentsWriterFlushControl.java:169)
[junit] at
org.apache.lucene.index.DocumentsWriterFlushControl.internalTryCheckOutForFlush(DocumentsWriterFlushControl.java:202)
[junit] at
org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:333)
[junit] at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:500)
[junit] at
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2622)
[junit] at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2599)
[junit] at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2465)
[junit] at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2538)
[junit] at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2520)
[junit] at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2504)
[junit] at
org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:326)
[junit] *** Thread: Thread-106 ***

I will take care of it tomorrow...

On Thu, May 5, 2011 at 9:45 PM, Apache Jenkins Server
hud...@hudson.apache.org wrote:
 Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7757/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testHealthyness

 Error Message:
  flushingQueue: DWDQ: [ generation: 9 ] currentqueue: DWDQ: [ generation: 10 
 ] perThread queue: DWDQ: [ generation: 0 ] numDocsInRam: 3

 Stack Trace:
 junit.framework.AssertionFailedError:  flushingQueue: DWDQ: [ generation: 9 ] 
 currentqueue: DWDQ: [ generation: 10 ] perThread queue: DWDQ: [ generation: 0 
 ] numDocsInRam: 3
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)
        at 
 org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:326)
        at 
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:500)
        at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2622)
        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2599)
        at 
 org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1051)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1015)
        at 
 org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testHealthyness(TestFlushByRamOrCountsPolicy.java:276)




 Build Log (for compile errors):
 [...truncated 3370 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors FSDir.open)

2011-05-05 Thread Greg Tarr (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029599#comment-13029599
 ] 

Greg Tarr commented on LUCENE-1877:
---

Instances of lucene run on machines with the indexes hosted remotely on a SAN 
with access through a fileserver. We've now changed our implementation to 
SimpleFSLockFactory in the hope this will lead to the write.lock files behaving 
properly.

 Use NativeFSLockFactory as default for new API (direct ctors  FSDir.open)
 --

 Key: LUCENE-1877
 URL: https://issues.apache.org/jira/browse/LUCENE-1877
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Javadocs
Reporter: Mark Miller
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch, 
 LUCENE-1877.patch


 A user requested we add a note in IndexWriter alerting the availability of 
 NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm 
 exit). Seems reasonable to me - we want users to be able to easily stumble 
 upon this class. The below code looks like a good spot to add a note - could 
 also improve whats there a bit - opening an IndexWriter does not necessarily 
 create a lock file - that would depend on the LockFactory used.
 {code}  pOpening an codeIndexWriter/code creates a lock file for the 
 directory in use. Trying to open
   another codeIndexWriter/code on the same directory will lead to a
   {@link LockObtainFailedException}. The {@link LockObtainFailedException}
   is also thrown if an IndexReader on the same directory is used to delete 
 documents
   from the index./p{code}
 Anyone remember why NativeFSLockFactory is not the default over 
 SimpleFSLockFactory?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-05 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029607#comment-13029607
 ] 

Hoss Man commented on LUCENE-3071:
--

bq. You can still have one solr factory, if reverse=true it creates 
ReverseXXX...

right ... if it makes the code cleaner to have two distinct Tokenizer impls, 
they can still share one factory.

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, 
 ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7762 - Failure

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7762/

1 tests failed.
FAILED:  org.apache.lucene.util.TestStringIntern.Monitor file 
(/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/backwards/test/1/junitvmwatcher8452078351423411177.properties)
 missing, location not writable, testcase not started or mixing ant versions?

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at java.lang.Thread.run(Thread.java:636)




Build Log (for compile errors):
[...truncated 53 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Improvements to the maven build

2011-05-05 Thread Ryan McKinley


 I agree in principle, but again, I'll continue to use my own judgment ...


This is always good policy!

I was mostly reacting to sending a patch to the mailing list.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065.patch

Updated patch with some improvements:
- NumericField now lazy inits the NumericTokenStream only when 
tokenStreamValue() is caled for the first time. This speeds up stored fields 
reading, as the TokenStream is generally not needed in that case.
- I currently dont like the instanceof chains in FieldsWriter and this lazy 
init code. Maybe NumericField and NumericTokenStream should define an enum type 
for the value so you can call NumericField.getValueType() - does anybody have a 
better idea?
- Improved JavaDocs for NumericField to reflect the new stored fields format

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065


 [ 
https://issues.apache.org/jira/browse/SOLR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2497:


Attachment: SOLR-2497.patch

Updated patch (some improvements in TrieField converter methods).

Still distributed numeric facetting (TestDistributedSearch) fails for trie 
dates - i have no idea why!

*I need help!*

 Move Solr to new NumericField stored field impl of LUCENE-3065
 --

 Key: SOLR-2497
 URL: https://issues.apache.org/jira/browse/SOLR-2497
 Project: Solr
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.2, 4.0

 Attachments: SOLR-2497.patch, SOLR-2497.patch, SOLR-2497.patch


 This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField 
  Co would use NumericField for indexing and reading stored fields. To enable 
 this some missing changes in Solr's internals (Field - Fieldable) need to be 
 done. Also some backwards compatible stored fields parsing is needed to read 
 pre-3.2 indexes without reindexing (as the format changed a little bit and 
 Document.getFieldable returns NumericField instances now).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed


[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029645#comment-13029645
 ] 

Robert Muir commented on LUCENE-3071:
-

this looks great!

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, 
 LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-1076) Allow MergePolicy to select non-contiguous merges


 [ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1076.


Resolution: Fixed

Thanks Shai!

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1076-3x.patch, LUCENE-1076.patch, 
 LUCENE-1076.patch, LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2966) SegmentReader.doCommit should be sync'd; norms methods need not be sync'd


 [ 
https://issues.apache.org/jira/browse/LUCENE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2966.


Resolution: Fixed

 SegmentReader.doCommit should be sync'd; norms methods need not be sync'd
 -

 Key: LUCENE-2966
 URL: https://issues.apache.org/jira/browse/LUCENE-2966
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2966.patch


 I fixed the failure in TestNRTThreads, but in the process tripped an assert 
 because SegmentReader.doCommit isn't sync'd.
 So I sync'd it, but I don't think the norms APIs need to be sync'd -- we 
 populate norms up front and then never change them.  Un-sync'ing them is 
 important so that in the NRT case calling IW.commit doesn't block searches 
 trying to pull norms.
 Also some small code refactoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize

[
https://issues.apache.org/jira/browse/LUCENE-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless resolved LUCENE-854.
---

Resolution: Fixed

Create merge policy that doesn't periodically inadvertently optimize

Key: LUCENE-854
URL: https://issues.apache.org/jira/browse/LUCENE-854
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 2.2
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
Fix For: 3.2, 4.0

Attachments: LUCENE-854.patch

The current merge policy, at every maxBufferedDocs *
power-of-mergeFactor docs added, will do a fully cascaded merge, which
is the same as an optimize.
I think this is not good because at that optimization poin, the
particular addDocument call is [surprisingly] very expensive. While,
amortized over all addDocument calls, the cost is low, the cost is
paid up front and in a very bunched up manner.
I think of this as pay it forward: you are paying the full cost of
an optimize right now on the expectation / hope that you will be
adding a great many more docs. But, if you don't add that many more
docs, then, the amortized cost for your index is in fact far higher
than it should have been. Better to pay as you go instead.
So we could make a small change to the policy by only merging the
first mergeFactor segments once we hit 2X the merge factor. With
mergeFactor=10, when we have created the 20th level 0 (just flushed)
segment, we merge the first 10 into a level 1 segment. Then on
creating another 10 level 0 segments, we merge the second set of 10
level 0 segments into a level 1 segment, etc.
With this new merge policy, an index that's a bit bigger than a
current optimization point would then have a lower amortized cost
per document. Plus the merge cost is less bunched up and less pay
it forward: instead you pay for what you are actually using.
We can start by creating this merge policy (probably, combined with
with the by size not by doc count segment level computation from
LUCENE-845) and then later decide whether we should make it the
default merge policy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3051) don't call SegmentInfo.sizeInBytes for the merging segments


 [ 
https://issues.apache.org/jira/browse/LUCENE-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3051.


Resolution: Fixed

 don't call SegmentInfo.sizeInBytes for the merging segments
 ---

 Key: LUCENE-3051
 URL: https://issues.apache.org/jira/browse/LUCENE-3051
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3051.patch


 Selckin has been running Lucene's tests on the RT branch, and hit this:
 {noformat}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testDeleteAllSlowly(org.apache.lucene.index.TestIndexWriter):   FAILED
 [junit] Some threads threw uncaught exceptions!
 [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
 exceptions!
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:535)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
 [junit] 
 [junit] 
 [junit] Tests run: 67, Failures: 1, Errors: 0, Time elapsed: 38.357 sec
 [junit] 
 [junit] - Standard Error -
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
 -Dtestmethod=testDeleteAllSlowly 
 -Dtests.seed=-4291771462012978364:4550117847390778918
 [junit] The following exceptions were thrown by threads:
 [junit] *** Thread: Lucene Merge Thread #1 ***
 [junit] org.apache.lucene.index.MergePolicy$MergeException: 
 java.io.FileNotFoundException: _4_1.del
 [junit]   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
 [junit]   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
 [junit] Caused by: java.io.FileNotFoundException: _4_1.del
 [junit]   at 
 org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:290)
 [junit]   at 
 org.apache.lucene.store.MockDirectoryWrapper.fileLength(MockDirectoryWrapper.java:549)
 [junit]   at 
 org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:287)
 [junit]   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3280)
 [junit]   at 
 org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2956)
 [junit]   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379)
 [junit]   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447)
 [junit] NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, 
 f6=Pulsing(freqCutoff=15), f7=MockFixedIntBlock(blockSize=1606), 
 f8=SimpleText, f9=MockSep, f1=MockVariableIntBlock(baseBlockSize=99), 
 f0=MockFixedIntBlock(blockSize=1606), f3=Pulsing(freqCutoff=15), f2=MockSep, 
 f5=SimpleText, f4=Standard, f=MockFixedIntBlock(blockSize=1606), c=MockSep, 
 termVector=MockRandom, d9=MockFixedIntBlock(blockSize=1606), 
 d8=Pulsing(freqCutoff=15), d5=SimpleText, d4=Standard, d7=MockRandom, 
 d6=MockVariableIntBlock(baseBlockSize=99), d25=MockRandom, d0=MockRandom, 
 c29=MockFixedIntBlock(blockSize=1606), 
 d24=MockVariableIntBlock(baseBlockSize=99), d1=Standard, c28=Standard, 
 d23=SimpleText, d2=MockFixedIntBlock(blockSize=1606), c27=MockRandom, 
 d22=Standard, d3=MockVariableIntBlock(baseBlockSize=99), 
 d21=Pulsing(freqCutoff=15), d20=MockSep, 
 c22=MockFixedIntBlock(blockSize=1606), c21=Pulsing(freqCutoff=15), 
 c20=MockRandom, d29=MockFixedIntBlock(blockSize=1606), c26=Standard, 
 d28=Pulsing(freqCutoff=15), c25=MockRandom, d27=MockRandom, c24=MockSep, 
 d26=MockVariableIntBlock(baseBlockSize=99), c23=SimpleText, e9=MockRandom, 
 e8=MockSep, e7=SimpleText, e6=MockFixedIntBlock(blockSize=1606), 
 e5=Pulsing(freqCutoff=15), c17=MockFixedIntBlock(blockSize=1606), 
 e3=Standard, d12=MockVariableIntBlock(baseBlockSize=99), 
 c16=Pulsing(freqCutoff=15), e4=SimpleText, 
 d11=MockFixedIntBlock(blockSize=1606), c19=MockSep, e1=MockSep, 
 d14=Pulsing(freqCutoff=15), c18=SimpleText, e2=Pulsing(freqCutoff=15), 
 d13=MockSep, e0=MockVariableIntBlock(baseBlockSize=99), d10=Standard, 
 d19=MockVariableIntBlock(baseBlockSize=99), c11=SimpleText, c10=Standard, 
 d16=Pulsing(freqCutoff=15), c13=MockRandom, 
 c12=MockVariableIntBlock(baseBlockSize=99), d15=MockSep, d18=SimpleText, 
 c15=MockFixedIntBlock(blockSize=1606), d17=Standard, 
 c14=Pulsing(freqCutoff=15), b3=MockSep, b2=SimpleText, b5=Standard, 
 b4=MockRandom,

[jira] [Commented] (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors FSDir.open)


[ 
https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029654#comment-13029654
 ] 

Michael McCandless commented on LUCENE-1877:


But multiple machines are able to write to the same index on the SAN?  (And 
must therefore rely on write.lock to protect the index from two writers at 
once).

What corruption are you seeing...?

 Use NativeFSLockFactory as default for new API (direct ctors  FSDir.open)
 --

 Key: LUCENE-1877
 URL: https://issues.apache.org/jira/browse/LUCENE-1877
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Javadocs
Reporter: Mark Miller
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch, 
 LUCENE-1877.patch


 A user requested we add a note in IndexWriter alerting the availability of 
 NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm 
 exit). Seems reasonable to me - we want users to be able to easily stumble 
 upon this class. The below code looks like a good spot to add a note - could 
 also improve whats there a bit - opening an IndexWriter does not necessarily 
 create a lock file - that would depend on the LockFactory used.
 {code}  pOpening an codeIndexWriter/code creates a lock file for the 
 directory in use. Trying to open
   another codeIndexWriter/code on the same directory will lead to a
   {@link LockObtainFailedException}. The {@link LockObtainFailedException}
   is also thrown if an IndexReader on the same directory is used to delete 
 documents
   from the index./p{code}
 Anyone remember why NativeFSLockFactory is not the default over 
 SimpleFSLockFactory?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3076) add -Dtests.codecprovider


[ 
https://issues.apache.org/jira/browse/LUCENE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029655#comment-13029655
 ] 

Michael McCandless commented on LUCENE-3076:


+1 this is great!

This means a codec writer can easily run all of Lucene/Solr's tests against 
his/her codec(s)...

 add -Dtests.codecprovider
 -

 Key: LUCENE-3076
 URL: https://issues.apache.org/jira/browse/LUCENE-3076
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3076.patch


 Currently to test a codec (or set of codecs) you have to add them to lucene's 
 core and edit a couple of arrays here and there...
 It would be nice if when using the test-framework you could instead specify a 
 codecprovider by classname (possibly containing your own set of huper-duper 
 codecs).
 For example I made the following little codecprovider in contrib:
 {noformat}
 public class AppendingCodecProvider extends CodecProvider {
   public AppendingCodecProvider() {
 register(new AppendingCodec());
 register(new SimpleTextCodec());
   }
 }
 {noformat}
 Then, I'm able to run tests with 'ant -lib 
 build/contrib/misc/lucene-misc-4.0-SNAPSHOT.jar test-core 
 -Dtests.codecprovider=org.apache.lucene.index.codecs.appending.AppendingCodecProvider',
  and it always picks from my set of  codecs (in this case Appending and 
 SimpleText), and I can set -Dtests.codec=Appending if i want to set just one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Minor problem with using code from the trunk with vb.net project.

2011-05-05 Thread Nicholas Paldino [.NET/C# MVP]

David,

Apologies if this is pedantic, but that should be one of the goals, to move
toward .NET naming conventions (which Lucene.NET does not abide by, and it
makes for an odd fit).

- Nick

-Original Message-
From: David Smith [mailto:dav...@nzcity.co.nz] 
Sent: Thursday, May 05, 2011 6:18 PM
To: lucene-net-...@lucene.apache.org
Subject: [Lucene.Net] Minor problem with using code from the trunk with
vb.net project.

Morning,

I checked out and compiled
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk yesterday,
looking to update from 2.0.0.4

To get the library to work with VB.Net I found I had to edit TopDocs.cs
(src/core/Search/TopDocs.cs).

Being case-insensitive VB.Net can't differentiate between the three public
variables (totalHits, scoreDocs  maxScore) and the three public properties
(TotalHits, ScoreDocs  MaxScore)


David

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7768 - Failure

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7768/

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin

Error Message:
expected:0 but was:1

Stack Trace:
junit.framework.AssertionFailedError: expected:0 but was:1
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:327)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1156)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1084)




Build Log (for compile errors):
[...truncated 10752 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065

2011-05-05 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029722#comment-13029722
 ] 

Chris Male commented on SOLR-2497:
--

Hey Uwe,

I spent quite some time tracking this down.  The problem is that the dates 
cannot be parsed because they are lacking the compulsory 'Z' on the end (its 
required by the date parser).

You need to change TrieDateField#indexedToReadable to:

return wrappedField.indexedToReadable(indexedForm) + Z;

with that change, the test now passes for me.

You can see in DateField#indexedToReadable it does the same thing.

 Move Solr to new NumericField stored field impl of LUCENE-3065
 --

 Key: SOLR-2497
 URL: https://issues.apache.org/jira/browse/SOLR-2497
 Project: Solr
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.2, 4.0

 Attachments: SOLR-2497.patch, SOLR-2497.patch, SOLR-2497.patch


 This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField 
  Co would use NumericField for indexing and reading stored fields. To enable 
 this some missing changes in Solr's internals (Field - Fieldable) need to be 
 done. Also some backwards compatible stored fields parsing is needed to read 
 pre-3.2 indexes without reindexing (as the format changed a little bit and 
 Document.getFieldable returns NumericField instances now).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7770 - Still Failing

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7770/

No tests ran.

Build Log (for compile errors):
[...truncated 472 lines...]
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:50:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:61:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:61:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:72:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:72:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:83:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:83:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:94:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path), 1 );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:94:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac]

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7771 - Still Failing

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7771/

No tests ran.

Build Log (for compile errors):
[...truncated 472 lines...]
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:50:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:61:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:61:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:72:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:72:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:83:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:83:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path) );
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:94:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac] ReversePathHierarchyTokenizer t = new 
ReversePathHierarchyTokenizer( new StringReader(path), 1 );
[javac] ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:94:
 cannot find symbol
[javac] symbol  : class ReversePathHierarchyTokenizer
[javac] location: class 
org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer
[javac]

[JENKINS] Solr-3.x - Build # 346 - Failure