Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-14 Thread Michael McCandless
Wow another issue caught by random testing!

On Fri, May 14, 2010 at 1:42 AM, Robert Muir rcm...@gmail.com wrote:
 the problem is a logic bug (e.g. i have no clue how to really fix
 except to switch over to a UTF-8 sort order).

 in converting automaton to utf-8/32, and trying to emulate the utf-16
 term dictionary order, the byte transition ranges (although sorted in
 utf-16 order) are themselves in utf-8/32 order: e.g. a byte range of
 0xe0-0xef is problematic during enumeration since the 0xee-0xef
 component should be sorted last in utf-16 order.

Ugh.  I suppose we could forcefully split such edges?  (We'd have to
fix reduce to not consolidate them).

Or just cutover to UTF8 order for trunk.

 i know a workaround until we switch over, but its gonna cause wasted
 seeks at the least (its just wrong).

This is the FIXME you committed right?  Ie always seek...

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-14 Thread Robert Muir
On Fri, May 14, 2010 at 5:14 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Or just cutover to UTF8 order for trunk.

I would really prefer we go this route, instead of trying to do any
hacks at this point!

 This is the FIXME you committed right?  Ie always seek...

Yeah, i can't even say for sure its actually a good workaround, since
the ordering is out of wack there could be other problems...


-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-14 Thread Yonik Seeley
On Fri, May 14, 2010 at 7:29 AM, Robert Muir rcm...@gmail.com wrote:
 On Fri, May 14, 2010 at 5:14 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Or just cutover to UTF8 order for trunk.

 I would really prefer we go this route, instead of trying to do any
 hacks at this point!

Sounds good...
So it seems like the biggest issue we might have in cutting over would
be the field cache and sorting?  Instead of using String.compareTo we
need one that compares as UTF-32 (or longer term, don't even create
strings of course...)


-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague




 This is the FIXME you committed right?  Ie always seek...

 Yeah, i can't even say for sure its actually a good workaround, since
 the ordering is out of wack there could be other problems...


 --
 Robert Muir
 rcm...@gmail.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-14 Thread Robert Muir
On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley
yo...@lucidimagination.com wrote:

 So it seems like the biggest issue we might have in cutting over would
 be the field cache and sorting?  Instead of using String.compareTo we
 need one that compares as UTF-32 (or longer term, don't even create
 strings of course...)

Admittedly not having looked at all the places that do
String.compareTo on terms [I am sure there are probably more?], I
wonder if its worth the effort up front to just go completely to bytes
for Term and everything.

I worry about using a special String comparator: any code, including
stuff thats not in Lucene/Solr, that puts these things into ordered
collections for example could have bugs.

If we instead move all of this to BytesRef where the comparator just
works by default, it seems less scary.

-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-14 Thread Yonik Seeley
On Fri, May 14, 2010 at 11:21 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Fri, May 14, 2010 at 7:29 AM, Robert Muir rcm...@gmail.com wrote:
 On Fri, May 14, 2010 at 5:14 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Or just cutover to UTF8 order for trunk.

 I would really prefer we go this route, instead of trying to do any
 hacks at this point!

 Sounds good...
 So it seems like the biggest issue we might have in cutting over would
 be the field cache and sorting?  Instead of using String.compareTo we
 need one that compares as UTF-32 (or longer term, don't even create
 strings of course...)

 Actually, I think on changing to unicode codepoint order, the
 StringIndex returned by FieldCache would in fact be sorted in
 codepoint order (even though it's still a String[]), because it just
 enums the terms from TermsEnum.

Right... the FIeldCache will be ordered correctly... but when the sort
code compares values across segments?

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-14 Thread Robert Muir
On Fri, May 14, 2010 at 11:21 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Actually, I think on changing to unicode codepoint order, the
 StringIndex returned by FieldCache would in fact be sorted in
 codepoint order (even though it's still a String[]), because it just
 enums the terms from TermsEnum.


but what about things like the binsearch in fieldcache that uses
String.compareTo?


-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-14 Thread Yonik Seeley
On Fri, May 14, 2010 at 11:23 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, May 14, 2010 at 11:21 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Fri, May 14, 2010 at 7:29 AM, Robert Muir rcm...@gmail.com wrote:
 On Fri, May 14, 2010 at 5:14 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Or just cutover to UTF8 order for trunk.

 I would really prefer we go this route, instead of trying to do any
 hacks at this point!

 Sounds good...
 So it seems like the biggest issue we might have in cutting over would
 be the field cache and sorting?  Instead of using String.compareTo we
 need one that compares as UTF-32 (or longer term, don't even create
 strings of course...)

 Actually, I think on changing to unicode codepoint order, the
 StringIndex returned by FieldCache would in fact be sorted in
 codepoint order (even though it's still a String[]), because it just
 enums the terms from TermsEnum.

 Right... the FIeldCache will be ordered correctly... but when the sort
 code compares values across segments?

And even worse... when a binary search is done to convert an ord from
one segment to another.
-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-14 Thread Michael McCandless
On Fri, May 14, 2010 at 11:23 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, May 14, 2010 at 11:21 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Fri, May 14, 2010 at 7:29 AM, Robert Muir rcm...@gmail.com wrote:
 On Fri, May 14, 2010 at 5:14 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Or just cutover to UTF8 order for trunk.

 I would really prefer we go this route, instead of trying to do any
 hacks at this point!

 Sounds good...
 So it seems like the biggest issue we might have in cutting over would
 be the field cache and sorting?  Instead of using String.compareTo we
 need one that compares as UTF-32 (or longer term, don't even create
 strings of course...)

 Actually, I think on changing to unicode codepoint order, the
 StringIndex returned by FieldCache would in fact be sorted in
 codepoint order (even though it's still a String[]), because it just
 enums the terms from TermsEnum.

 Right... the FIeldCache will be ordered correctly... but when the sort
 code compares values across segments?

Ahh yes we'd have to use a comparator based on codepoint, not
String.compareTo, at that point.

I think we should first fix FieldCache to return BytesRef-based
getStrings/getStringIndex (LUCENE-2380) I'll go take it.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #1187

2010-05-13 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1187/changes

Changes:

[mikemccand] LUCENE-2393: add total TF tracking to HighFreqTerms tool

[mikemccand] LUCENE-2459: fix FilterIndexReader to (by default) emulate flex 
API on top of pre-flex API

[mikemccand] LUCENE-2449: fix DBLRU cache to clone key when it promotes an 
entry during lookup

--
[...truncated 13128 lines...]
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-memory
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.index.memory...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-memory/stylesheet.css...
  [javadoc] Note: Custom tags that were not seen:  @lucene.experimental, 
@lucene.internal
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/memory/lucene-memory-2010-05-14_02-03-41-javadoc.jar
 [echo] Building misc...

javadocs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.index...
  [javadoc] Loading source files for package org.apache.lucene.misc...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/index/MultiPassIndexSplitter.java:43:
 warning - Tag @link: reference not found: IndexWriter#addIndexes(IndexReader[])
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc/stylesheet.css...
  [javadoc] Note: Custom tags that were not seen:  @lucene.internal
  [javadoc] 1 warning
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/misc/lucene-misc-2010-05-14_02-03-41-javadoc.jar
 [echo] Building queries...

javadocs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package org.apache.lucene.search.regex...
  [javadoc] Loading source files for package org.apache.lucene.search.similar...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries/stylesheet.css...
  [javadoc] Note: Custom tags that were not seen:  @lucene.experimental, 
@lucene.internal
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/queries/lucene-queries-2010-05-14_02-03-41-javadoc.jar
 [echo] Building queryparser...

javadocs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queryparser
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.analyzing...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.complexPhrase...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.builders...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.config...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.messages...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.nodes...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.parser...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.processors...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.util...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.ext...
  [javadoc] Loading source files for package 

Re: Build failed in Hudson: Lucene-trunk #1187

2010-05-13 Thread Robert Muir
the problem is a logic bug (e.g. i have no clue how to really fix
except to switch over to a UTF-8 sort order).

in converting automaton to utf-8/32, and trying to emulate the utf-16
term dictionary order, the byte transition ranges (although sorted in
utf-16 order) are themselves in utf-8/32 order: e.g. a byte range of
0xe0-0xef is problematic during enumeration since the 0xee-0xef
component should be sorted last in utf-16 order.

i know a workaround until we switch over, but its gonna cause wasted
seeks at the least (its just wrong).


On Thu, May 13, 2010 at 11:12 PM, Apache Hudson Server
hud...@hudson.zones.apache.org wrote:
 See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1187/changes

 Changes:

 [mikemccand] LUCENE-2393: add total TF tracking to HighFreqTerms tool

 [mikemccand] LUCENE-2459: fix FilterIndexReader to (by default) emulate flex 
 API on top of pre-flex API

 [mikemccand] LUCENE-2449: fix DBLRU cache to clone key when it promotes an 
 entry during lookup

 --
 [...truncated 13128 lines...]
    [mkdir] Created dir: 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-memory
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.index.memory...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-memory/stylesheet.css...
  [javadoc] Note: Custom tags that were not seen: �...@lucene.experimental, 
 @lucene.internal
      [jar] Building jar: 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/memory/lucene-memory-2010-05-14_02-03-41-javadoc.jar
     [echo] Building misc...

 javadocs:
    [mkdir] Created dir: 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.index...
  [javadoc] Loading source files for package org.apache.lucene.misc...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/index/MultiPassIndexSplitter.java:43:
  warning - Tag @link: reference not found: 
 IndexWriter#addIndexes(IndexReader[])
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc/stylesheet.css...
  [javadoc] Note: Custom tags that were not seen: �...@lucene.internal
  [javadoc] 1 warning
      [jar] Building jar: 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/misc/lucene-misc-2010-05-14_02-03-41-javadoc.jar
     [echo] Building queries...

 javadocs:
    [mkdir] Created dir: 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package org.apache.lucene.search.regex...
  [javadoc] Loading source files for package 
 org.apache.lucene.search.similar...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries/stylesheet.css...
  [javadoc] Note: Custom tags that were not seen: �...@lucene.experimental, 
 @lucene.internal
      [jar] Building jar: 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/queries/lucene-queries-2010-05-14_02-03-41-javadoc.jar
     [echo] Building queryparser...

 javadocs:
    [mkdir] Created dir: 
 http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queryparser
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package 
 org.apache.lucene.queryParser.analyzing...
  [javadoc] Loading source files for package 
 org.apache.lucene.queryParser.complexPhrase...
  [javadoc] Loading source files for package 
 org.apache.lucene.queryParser.core...
  [javadoc] Loading source files for package 
 org.apache.lucene.queryParser.core.builders...
  [javadoc] Loading source