Re: Build failed in Hudson: Lucene-trunk #1187
Wow another issue caught by random testing! On Fri, May 14, 2010 at 1:42 AM, Robert Muir rcm...@gmail.com wrote: the problem is a logic bug (e.g. i have no clue how to really fix except to switch over to a UTF-8 sort order). in converting automaton to utf-8/32, and trying to emulate the utf-16 term dictionary order, the byte transition ranges (although sorted in utf-16 order) are themselves in utf-8/32 order: e.g. a byte range of 0xe0-0xef is problematic during enumeration since the 0xee-0xef component should be sorted last in utf-16 order. Ugh. I suppose we could forcefully split such edges? (We'd have to fix reduce to not consolidate them). Or just cutover to UTF8 order for trunk. i know a workaround until we switch over, but its gonna cause wasted seeks at the least (its just wrong). This is the FIXME you committed right? Ie always seek... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1187
On Fri, May 14, 2010 at 5:14 AM, Michael McCandless luc...@mikemccandless.com wrote: Or just cutover to UTF8 order for trunk. I would really prefer we go this route, instead of trying to do any hacks at this point! This is the FIXME you committed right? Ie always seek... Yeah, i can't even say for sure its actually a good workaround, since the ordering is out of wack there could be other problems... -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1187
On Fri, May 14, 2010 at 7:29 AM, Robert Muir rcm...@gmail.com wrote: On Fri, May 14, 2010 at 5:14 AM, Michael McCandless luc...@mikemccandless.com wrote: Or just cutover to UTF8 order for trunk. I would really prefer we go this route, instead of trying to do any hacks at this point! Sounds good... So it seems like the biggest issue we might have in cutting over would be the field cache and sorting? Instead of using String.compareTo we need one that compares as UTF-32 (or longer term, don't even create strings of course...) -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague This is the FIXME you committed right? Ie always seek... Yeah, i can't even say for sure its actually a good workaround, since the ordering is out of wack there could be other problems... -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1187
On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley yo...@lucidimagination.com wrote: So it seems like the biggest issue we might have in cutting over would be the field cache and sorting? Instead of using String.compareTo we need one that compares as UTF-32 (or longer term, don't even create strings of course...) Admittedly not having looked at all the places that do String.compareTo on terms [I am sure there are probably more?], I wonder if its worth the effort up front to just go completely to bytes for Term and everything. I worry about using a special String comparator: any code, including stuff thats not in Lucene/Solr, that puts these things into ordered collections for example could have bugs. If we instead move all of this to BytesRef where the comparator just works by default, it seems less scary. -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1187
On Fri, May 14, 2010 at 11:21 AM, Michael McCandless luc...@mikemccandless.com wrote: On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, May 14, 2010 at 7:29 AM, Robert Muir rcm...@gmail.com wrote: On Fri, May 14, 2010 at 5:14 AM, Michael McCandless luc...@mikemccandless.com wrote: Or just cutover to UTF8 order for trunk. I would really prefer we go this route, instead of trying to do any hacks at this point! Sounds good... So it seems like the biggest issue we might have in cutting over would be the field cache and sorting? Instead of using String.compareTo we need one that compares as UTF-32 (or longer term, don't even create strings of course...) Actually, I think on changing to unicode codepoint order, the StringIndex returned by FieldCache would in fact be sorted in codepoint order (even though it's still a String[]), because it just enums the terms from TermsEnum. Right... the FIeldCache will be ordered correctly... but when the sort code compares values across segments? -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1187
On Fri, May 14, 2010 at 11:21 AM, Michael McCandless luc...@mikemccandless.com wrote: Actually, I think on changing to unicode codepoint order, the StringIndex returned by FieldCache would in fact be sorted in codepoint order (even though it's still a String[]), because it just enums the terms from TermsEnum. but what about things like the binsearch in fieldcache that uses String.compareTo? -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1187
On Fri, May 14, 2010 at 11:23 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, May 14, 2010 at 11:21 AM, Michael McCandless luc...@mikemccandless.com wrote: On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, May 14, 2010 at 7:29 AM, Robert Muir rcm...@gmail.com wrote: On Fri, May 14, 2010 at 5:14 AM, Michael McCandless luc...@mikemccandless.com wrote: Or just cutover to UTF8 order for trunk. I would really prefer we go this route, instead of trying to do any hacks at this point! Sounds good... So it seems like the biggest issue we might have in cutting over would be the field cache and sorting? Instead of using String.compareTo we need one that compares as UTF-32 (or longer term, don't even create strings of course...) Actually, I think on changing to unicode codepoint order, the StringIndex returned by FieldCache would in fact be sorted in codepoint order (even though it's still a String[]), because it just enums the terms from TermsEnum. Right... the FIeldCache will be ordered correctly... but when the sort code compares values across segments? And even worse... when a binary search is done to convert an ord from one segment to another. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1187
On Fri, May 14, 2010 at 11:23 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, May 14, 2010 at 11:21 AM, Michael McCandless luc...@mikemccandless.com wrote: On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, May 14, 2010 at 7:29 AM, Robert Muir rcm...@gmail.com wrote: On Fri, May 14, 2010 at 5:14 AM, Michael McCandless luc...@mikemccandless.com wrote: Or just cutover to UTF8 order for trunk. I would really prefer we go this route, instead of trying to do any hacks at this point! Sounds good... So it seems like the biggest issue we might have in cutting over would be the field cache and sorting? Instead of using String.compareTo we need one that compares as UTF-32 (or longer term, don't even create strings of course...) Actually, I think on changing to unicode codepoint order, the StringIndex returned by FieldCache would in fact be sorted in codepoint order (even though it's still a String[]), because it just enums the terms from TermsEnum. Right... the FIeldCache will be ordered correctly... but when the sort code compares values across segments? Ahh yes we'd have to use a comparator based on codepoint, not String.compareTo, at that point. I think we should first fix FieldCache to return BytesRef-based getStrings/getStringIndex (LUCENE-2380) I'll go take it. Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Hudson: Lucene-trunk #1187
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1187/changes Changes: [mikemccand] LUCENE-2393: add total TF tracking to HighFreqTerms tool [mikemccand] LUCENE-2459: fix FilterIndexReader to (by default) emulate flex API on top of pre-flex API [mikemccand] LUCENE-2449: fix DBLRU cache to clone key when it promotes an entry during lookup -- [...truncated 13128 lines...] [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-memory [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.index.memory... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.5.0_22 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-memory/stylesheet.css... [javadoc] Note: Custom tags that were not seen: @lucene.experimental, @lucene.internal [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/memory/lucene-memory-2010-05-14_02-03-41-javadoc.jar [echo] Building misc... javadocs: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.index... [javadoc] Loading source files for package org.apache.lucene.misc... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.5.0_22 [javadoc] Building tree for all the packages and classes... [javadoc] http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/index/MultiPassIndexSplitter.java:43: warning - Tag @link: reference not found: IndexWriter#addIndexes(IndexReader[]) [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc/stylesheet.css... [javadoc] Note: Custom tags that were not seen: @lucene.internal [javadoc] 1 warning [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/misc/lucene-misc-2010-05-14_02-03-41-javadoc.jar [echo] Building queries... javadocs: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.search... [javadoc] Loading source files for package org.apache.lucene.search.regex... [javadoc] Loading source files for package org.apache.lucene.search.similar... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.5.0_22 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries/stylesheet.css... [javadoc] Note: Custom tags that were not seen: @lucene.experimental, @lucene.internal [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/queries/lucene-queries-2010-05-14_02-03-41-javadoc.jar [echo] Building queryparser... javadocs: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queryparser [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.queryParser.analyzing... [javadoc] Loading source files for package org.apache.lucene.queryParser.complexPhrase... [javadoc] Loading source files for package org.apache.lucene.queryParser.core... [javadoc] Loading source files for package org.apache.lucene.queryParser.core.builders... [javadoc] Loading source files for package org.apache.lucene.queryParser.core.config... [javadoc] Loading source files for package org.apache.lucene.queryParser.core.messages... [javadoc] Loading source files for package org.apache.lucene.queryParser.core.nodes... [javadoc] Loading source files for package org.apache.lucene.queryParser.core.parser... [javadoc] Loading source files for package org.apache.lucene.queryParser.core.processors... [javadoc] Loading source files for package org.apache.lucene.queryParser.core.util... [javadoc] Loading source files for package org.apache.lucene.queryParser.ext... [javadoc] Loading source files for package
Re: Build failed in Hudson: Lucene-trunk #1187
the problem is a logic bug (e.g. i have no clue how to really fix except to switch over to a UTF-8 sort order). in converting automaton to utf-8/32, and trying to emulate the utf-16 term dictionary order, the byte transition ranges (although sorted in utf-16 order) are themselves in utf-8/32 order: e.g. a byte range of 0xe0-0xef is problematic during enumeration since the 0xee-0xef component should be sorted last in utf-16 order. i know a workaround until we switch over, but its gonna cause wasted seeks at the least (its just wrong). On Thu, May 13, 2010 at 11:12 PM, Apache Hudson Server hud...@hudson.zones.apache.org wrote: See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1187/changes Changes: [mikemccand] LUCENE-2393: add total TF tracking to HighFreqTerms tool [mikemccand] LUCENE-2459: fix FilterIndexReader to (by default) emulate flex API on top of pre-flex API [mikemccand] LUCENE-2449: fix DBLRU cache to clone key when it promotes an entry during lookup -- [...truncated 13128 lines...] [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-memory [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.index.memory... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.5.0_22 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-memory/stylesheet.css... [javadoc] Note: Custom tags that were not seen: �...@lucene.experimental, @lucene.internal [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/memory/lucene-memory-2010-05-14_02-03-41-javadoc.jar [echo] Building misc... javadocs: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.index... [javadoc] Loading source files for package org.apache.lucene.misc... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.5.0_22 [javadoc] Building tree for all the packages and classes... [javadoc] http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/contrib/misc/src/java/org/apache/lucene/index/MultiPassIndexSplitter.java:43: warning - Tag @link: reference not found: IndexWriter#addIndexes(IndexReader[]) [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-misc/stylesheet.css... [javadoc] Note: Custom tags that were not seen: �...@lucene.internal [javadoc] 1 warning [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/misc/lucene-misc-2010-05-14_02-03-41-javadoc.jar [echo] Building queries... javadocs: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.search... [javadoc] Loading source files for package org.apache.lucene.search.regex... [javadoc] Loading source files for package org.apache.lucene.search.similar... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.5.0_22 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queries/stylesheet.css... [javadoc] Note: Custom tags that were not seen: �...@lucene.experimental, @lucene.internal [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/queries/lucene-queries-2010-05-14_02-03-41-javadoc.jar [echo] Building queryparser... javadocs: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/docs/api/contrib-queryparser [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.queryParser.analyzing... [javadoc] Loading source files for package org.apache.lucene.queryParser.complexPhrase... [javadoc] Loading source files for package org.apache.lucene.queryParser.core... [javadoc] Loading source files for package org.apache.lucene.queryParser.core.builders... [javadoc] Loading source