Need admin help with unsubscribe
Hello all, Sorry for posting this, but unsubscribe did not work for me. Please refer to my second attempt below. The first was on 01/12/2010. Thanks, András --Eredeti üzenet-- Dátum:2010. december 7., kedd, 02:16:57 Feladó: Imre András ia...@freemail.hu Tárgy:unsubscribe Címzett: pylucene-dev-unsubscr...@lucene.apache.org unsubscribe
[jira] Commented: (LUCENE-2855) Contrib queryparser should not use CharSequence as Map key
[ https://issues.apache.org/jira/browse/LUCENE-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979274#action_12979274 ] Simon Willnauer commented on LUCENE-2855: - +1 - just put your name after the description in the changes.txt Contrib queryparser should not use CharSequence as Map key -- Key: LUCENE-2855 URL: https://issues.apache.org/jira/browse/LUCENE-2855 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 3.0.3 Reporter: Adriano Crestani Assignee: Adriano Crestani Fix For: 3.0.4 Attachments: lucene_2855_adriano_crestani_2011_01_08.patch Today, contrib query parser uses MapCharSequence,... in many different places, which may lead to problems, since CharSequence interface does not enforce the implementation of hashcode and equals methods. Today, it's causing a problem with QueryTreeBuilder.setBuilder(CharSequence,QueryBuilder) method, that does not works as expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979276#action_12979276 ] Earwin Burrfoot commented on LUCENE-2840: - bq. But doesn't that mean that an app w/ rare queries but each query is massive fails to use all available concurrency? Yes. But that's not my case. And likely not someone else's. I think if you want to be super-generic, it's better to defer exact threading to the user, instead of doing a one-size-fits-all solution. Else you risk conjuring another ConcurrentMergeScheduler. While we're at it, we can throw in some sample implementation, which can satisfy some of the users, but not everyone. Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher) --- Key: LUCENE-2840 URL: https://issues.apache.org/jira/browse/LUCENE-2840 Project: Lucene - Java Issue Type: Sub-task Components: Search Reporter: Uwe Schindler Priority: Minor Fix For: 4.0 Spin-off from parent issue: {quote} We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979277#action_12979277 ] Earwin Burrfoot commented on LUCENE-2843: - And we're nearing a day when we keep the whole term dictionary in memory (as Sphinx does for instance). At that point a gazillion of term lookup-related hacks (like lookup cache) become obsolete :) Term dictionary itself can also be memory-mapped after this, instead of being read and built from disk, which makes new segment opening near-instantaneous. Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979284#action_12979284 ] Doron Cohen commented on LUCENE-2840: - Is it a possible that with this, searching a large optimized index (single segment) might be slower than searching an un-optimzed index of the same size, since the latter enjoys concurrency? If so, is it too wild for more than one thread to handle that single segment? Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher) --- Key: LUCENE-2840 URL: https://issues.apache.org/jira/browse/LUCENE-2840 Project: Lucene - Java Issue Type: Sub-task Components: Search Reporter: Uwe Schindler Priority: Minor Fix For: 4.0 Spin-off from parent issue: {quote} We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979292#action_12979292 ] Michael McCandless commented on LUCENE-2843: In-memory terms dict would be great. I agree it'd fundamentally change how we execute eg the automaton queries (suddenly we can just intersect against the terms dict instead of doing the seek/next thing); FuzzyQuery might be a direct search through the terms dict instead of first building the LevN DFA; respelling similarly... But, I suspect we'll always have to support the on-disk only option because some apps seem to have an insane number of terms. Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979293#action_12979293 ] Michael McCandless commented on LUCENE-2840: bq. I think if you want to be super-generic, it's better to defer exact threading to the user, instead of doing a one-size-fits-all solution. Else you risk conjuring another ConcurrentMergeScheduler. I think something like CMS (basically a custom ES w/ proper thread prio/scheduling) will be necessary here. Until Java can schedule threads the way an OS schedules processes we'll need to emulate it ourselves. You want long running queries (or, merges) to be gracefully down prioritized so that new/fast queries (merges) finish quickly. And you want searches (merges) to use the allowed concurrency fully. Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher) --- Key: LUCENE-2840 URL: https://issues.apache.org/jira/browse/LUCENE-2840 Project: Lucene - Java Issue Type: Sub-task Components: Search Reporter: Uwe Schindler Priority: Minor Fix For: 4.0 Spin-off from parent issue: {quote} We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979295#action_12979295 ] Simon Willnauer commented on LUCENE-1260: - bq. For trunk, here is what i suggest: I didn't follow the entire thread here but is it worth all the effort what robert is suggesting or should we simply land docvalues branch and make norms a DocValues field? The infrastructure is already there, its integrated into codec and gives users the freedom to use any Type they want. Norm codec strategy in Similarity - Key: LUCENE-1260 URL: https://issues.apache.org/jira/browse/LUCENE-1260 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Karl Wettin Assignee: Michael McCandless Fix For: 4.0 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260_defaultsim.patch The static span and resolution of the 8 bit norms codec might not fit with all applications. My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 3570 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3570/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:87) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049) Build Log (for compile errors): [...truncated 3068 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979303#action_12979303 ] Robert Muir commented on LUCENE-1260: - bq. I didn't follow the entire thread here but is it worth all the effort what robert is suggesting or should we simply land docvalues branch and make norms a DocValues field? The infrastructure is already there, its integrated into codec and gives users the freedom to use any Type they want. Simon, the the problem is encode/decode is in Similarity (instead of somewhere else). So, you would have the same problem with DocValues! Norm codec strategy in Similarity - Key: LUCENE-1260 URL: https://issues.apache.org/jira/browse/LUCENE-1260 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Karl Wettin Assignee: Michael McCandless Fix For: 4.0 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260_defaultsim.patch The static span and resolution of the 8 bit norms codec might not fit with all applications. My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979305#action_12979305 ] Earwin Burrfoot commented on LUCENE-2843: - As I said, there's already a search server with strictly in-memory (in mmap sense. it can theoretically be paged out) terms dict AND widespread adoption. Their users somehow manage. My guess is that's because people with insane number of terms store various crap like unique timestamps as terms. With CSF (attributes in Sphinx lingo), and some nice filters that can work over CSF, there's no longer any need to stuff your timestamps in the same place you stuff your texts. That can be reflected in documentation, and then, suddenly, we can drop on-disk only support. Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979306#action_12979306 ] Earwin Burrfoot commented on LUCENE-2840: - A lot of fork-join type frameworks don't even care. Even though scheduling threads is something people supposedly use them for. Why? I guess that's due to low yield/cost ratio. You frequently quote progress, not perfection in relation to the code, but why don't we apply this same principle to our threading guarantees? I don't want to use allowed concurrency fully. That's not realistic. I want 85% of it. That's already a huge leap ahead of single-threaded searches. Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher) --- Key: LUCENE-2840 URL: https://issues.apache.org/jira/browse/LUCENE-2840 Project: Lucene - Java Issue Type: Sub-task Components: Search Reporter: Uwe Schindler Priority: Minor Fix For: 4.0 Spin-off from parent issue: {quote} We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979308#action_12979308 ] Michael McCandless commented on LUCENE-2324: {quote} I think with B we're saying even if the calling thread is bound to DWPT #1, if DWPT #2 is greater in size and the aggregate RAM usage exceeds the max, using the calling thread, we take DWPT #2 out of production, flush, and return it? {quote} Right -- the thread affinity has nothing to do with which thread gets to flush which DWPT. Once flush is triggered, the thread doing the flushing is free to flush any DWPT. {quote} Maybe we can simply throw out the DWPT and put recycling byte[]s and/or pooling DWPTs back in later if it's necessary? {quote} OK let's start there and put back re-use only if we see a real perf issue? bq. What I meant was the following situation: Suppose we have two DWPTs and IW.commit() is called. The first DWPT finishes flushing successfully, is returned to the pool and idle again. The second DWPT flush fails with an aborting exception. Hmm, tricky. I think I'd lean towards keeping segment 1. Discarding it would be inconsistent w/ aborts hit during the flushed by RAM case? EG if seg 1 was flushed due to RAM usage, succeeds, and then later seg 2 is flushed due to RAM usage, but aborts. In this case we would still keep seg 1? I think aborting a flush should only lose the docs in that one DWPT (as it is today). Remember, a call to commit may succeed in flushing seg 1 to disk, and updating the in-memory segment infos, but on hitting the aborting exc to seg 2, will throw that to the caller, not having committed *any* change to the index. Exceptions thrown during the prepareCommit (phase 1) part of commit mean nothing is changed in the index. Alternatively... we could abort the entire IW session (as eg we handle OOME today) if ever an aborting exception was hit? This might be cleaner? But it's really a nuke the world option which scares me. EG it could be a looong indexing session (app doesn't call commit() until the end) and we could be throwing away *alot* of progress. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979313#action_12979313 ] Robert Muir commented on LUCENE-2843: - bq. As I said, there's already a search server with strictly in-memory (in mmap sense. it can theoretically be paged out) terms dict AND widespread adoption. Their users somehow manage I don't like the reasoning that, just because sphinx does it and their 'users manage', that makes it ok. sphinx also requires mysql, which only when started supporting *real* utf-8?! (not that 3-byte crap they tried to pass off instead) I don't think we should really be looking there for inspiration. Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.
[ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2846: Attachment: LUCENE-2846.patch here's an initial patch hacked up by mike and I... also removed the multireader norms method that takes a byte[]+offset from IndexReader. one oddity is that MultiNorms.norms() always returns a filled byte[] here for non-atomic readers (never null). But i think this is ok for MultiNorms, its not used in searching (only for SlowMultiReaderWrapper etc) i think somehow it would be good to have more tests that test doesnt have field versus omits norms, and also (likely not in this is issue) we should think about IR's norm-setting methods. I don't like that these use Similarity.getDefault(): it seems we could require you to pass in the Sim for the float case. I also don't like that we expose a public setNorm that takes a byte value either! Long-term we should look at pulling this norm-encoding stuff out of Sim... the Sim should just be dealing with floats, this encoding stuff belongs somewhere else. omitTF is viral, but omitNorms is anti-viral. - Key: LUCENE-2846 URL: https://issues.apache.org/jira/browse/LUCENE-2846 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2846.patch omitTF is viral. if you add document 1 with field foo as omitTF, then document 2 has field foo without omitTF, they are both treated as omitTF. but omitNorms is the opposite. if you have a million documents with field foo with omitNorms, then you add just one document without omitting norms, now you suddenly have a million 'real norms'. I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s. but another option is to make omitTF anti-viral, which is more schemaless i guess. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979317#action_12979317 ] Simon Willnauer commented on LUCENE-1260: - bq. So, you would have the same problem with DocValues! hmm, not sure if I understand this correctly. how values are encoded / decoded depends on the DocValues implementation which can be customized since it is exposed via codec. That means that users of the API always operate on float and the encoding and decoding happens inside codec and per field. So encode/decode in Sim would be obsolet, right? Norm codec strategy in Similarity - Key: LUCENE-1260 URL: https://issues.apache.org/jira/browse/LUCENE-1260 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Karl Wettin Assignee: Michael McCandless Fix For: 4.0 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260_defaultsim.patch The static span and resolution of the 8 bit norms codec might not fit with all applications. My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979319#action_12979319 ] Robert Muir commented on LUCENE-1260: - {quote} hmm, not sure if I understand this correctly. how values are encoded / decoded depends on the DocValues implementation which can be customized since it is exposed via codec. That means that users of the API always operate on float and the encoding and decoding happens inside codec and per field. So encode/decode in Sim would be obsolet, right? {quote} the issues remaining here involve mostly fake norms, for the omitNorms case (also empty norms I think). So, the stuff I listed must be fixed regardless, to clean up the fake norms case, it does not matter if real norms are encoded with CSF or not. Doing things like cleaning up how we deal with fake norms, and removing Similarity.get/setDefault is completely unrelated to DocValues... its just stuff we must fix. As long as we have these statics like Similarity.get/setDefault, its not even useful to think about things like flexible scoring or per-field SImilarity...! Norm codec strategy in Similarity - Key: LUCENE-1260 URL: https://issues.apache.org/jira/browse/LUCENE-1260 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Karl Wettin Assignee: Michael McCandless Fix For: 4.0 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260_defaultsim.patch The static span and resolution of the 8 bit norms codec might not fit with all applications. My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979326#action_12979326 ] Michael McCandless commented on LUCENE-1260: I think we need to stop faking norms, independent of whether/when we cutover to CSF to store norms / index stats? Ie the two issues are orthogonal (and both are important!). Norm codec strategy in Similarity - Key: LUCENE-1260 URL: https://issues.apache.org/jira/browse/LUCENE-1260 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Karl Wettin Assignee: Michael McCandless Fix For: 4.0 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260_defaultsim.patch The static span and resolution of the 8 bit norms codec might not fit with all applications. My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979328#action_12979328 ] Yonik Seeley commented on LUCENE-1260: -- bq. I think we need to stop faking norms, independent of whether/when we cutover to CSF to store norms / index stats? +1, it was only intended to be a short-term thing for back compat (see way back to LUCENE-448) Norm codec strategy in Similarity - Key: LUCENE-1260 URL: https://issues.apache.org/jira/browse/LUCENE-1260 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Karl Wettin Assignee: Michael McCandless Fix For: 4.0 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260_defaultsim.patch The static span and resolution of the 8 bit norms codec might not fit with all applications. My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.
[ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979331#action_12979331 ] Robert Muir commented on LUCENE-2846: - an alternative to totally clear up the faking here that mike thought of: If we can somehow differentiate between omitNorms (null), and 'doesnt have field' (say, exception), we wouldn't need to fake. In multinorms we could then safely return null if any reader returns null, but throw an exception if all readers throw an exception. omitTF is viral, but omitNorms is anti-viral. - Key: LUCENE-2846 URL: https://issues.apache.org/jira/browse/LUCENE-2846 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2846.patch omitTF is viral. if you add document 1 with field foo as omitTF, then document 2 has field foo without omitTF, they are both treated as omitTF. but omitNorms is the opposite. if you have a million documents with field foo with omitNorms, then you add just one document without omitting norms, now you suddenly have a million 'real norms'. I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s. but another option is to make omitTF anti-viral, which is more schemaless i guess. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979334#action_12979334 ] Michael McCandless commented on LUCENE-2843: Yes doc values should cut back on these large term dicts. But, I'm not a fan of pure disk-based terms dict. Expecting the OS to make good decisions on what gets swapped out is risky -- Lucene is better informed than the OS on which data structures are worth spending RAM on (norms, terms index, field cache, del docs). If indeed the terms dict (thanks to FSTs) becomes small enough to fit in RAM, then we should load it into RAM (and do away w/ the terms index). Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979337#action_12979337 ] Michael McCandless commented on LUCENE-2840: bq. You frequently quote progress, not perfection in relation to the code, but why don't we apply this same principle to our threading guarantees? Oh we should definitely apply progress not perfection here -- in fact we already are: for starters (today), we bind concurrency to segments (so eg an optimized index has no concurrency), and we just use an ES (punt this thread scheduling problem to the caller). This is better than nothing, but not good enough -- we can do better. There's another quote that applies here: big dreams, small steps. My comment above is dreaming but when it comes time to actually get the real work done / making progress towards that dream, of course we take baby steps / progress not perfection. Design discussions should start w/ the big dreams but then once you've got a rough sense of where you want to get to in the future you shift back to the baby steps you do today, in the direction of that future goal. Maybe I should wrap my comments in /dream tags and /babysteps tags! Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher) --- Key: LUCENE-2840 URL: https://issues.apache.org/jira/browse/LUCENE-2840 Project: Lucene - Java Issue Type: Sub-task Components: Search Reporter: Uwe Schindler Priority: Minor Fix For: 4.0 Spin-off from parent issue: {quote} We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979345#action_12979345 ] Yonik Seeley commented on LUCENE-2843: -- bq. Their users somehow manage. That neglects to count those who are not users because they could not manage with the limitations ;-) Anyway, being able to optionally keep the term dict in memory, per-field, if it's below a certain limits (terms/memory or whatever) would be very cool! Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979346#action_12979346 ] Earwin Burrfoot commented on LUCENE-2843: - bq. I don't like the reasoning that, just because sphinx does it and their 'users manage', that makes it ok. I'm in no way advocating it as an all-round better solution. It has it's wrinkles just as anything else. My reasoning is merely that alternative exists, and it is viable. As proven by pretty high-profile users. They have memory-resident term dictionary, and it works, I heard no complaints regarding this ever. bq. sphinx also requires mysql Have you read anything at all? It has an integration ready, for the layman user who just wants to stick a fulltext search into their little app, but it is in no way reliant on it. Sphinx is a direct alternative to Solr. {quote} But, I'm not a fan of pure disk-based terms dict. Expecting the OS to make good decisions on what gets swapped out is risky - Lucene is better informed than the OS on which data structures are worth spending RAM on (norms, terms index, field cache, del docs). If indeed the terms dict (thanks to FSTs) becomes small enough to fit in RAM, then we should load it into RAM (and do away w/ the terms index). {quote} That's a bit delusional. If a system is forced to swap out, it'll swap your explicitly managed RAM just as likely as memory-mapped files. I've seen this countless times. But then, you have a number of benefits - like sharing filesystem cache when opening same file multiple times, offloading things from Java heap (which is almost always a good thing), fastest load-into-memory times possible. Sorry, if I sound offending at times, but, damn, there's a whole world of simple and efficient code lying ahead in that direction :) Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979347#action_12979347 ] Robert Muir commented on LUCENE-2843: - bq. Have you read anything at all? Nope, havent looked at their code... i think i stopped at the documentation when i saw how they analyzed text! bq. Sorry, if I sound offending at times, but, damn, there's a whole world of simple and efficient code lying ahead in that direction So where is the problem? You can make your own all-on-disk impl, or all-in-ram impl and contribute it? And you dont have to implement terms dict cache, thats contained in the implementation? My problem is that we shouldnt assume all users can fit all their terms in RAM. I think its great to offer alternative impls that work all in ram, and maybe if termsdict X where X is some configurable value, even consider using these automatically in standardcodec... but i don't see any benefit of 'forcing' this when we have this whole flexible indexing thing! Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979348#action_12979348 ] Yonik Seeley commented on LUCENE-2843: -- bq. My reasoning is merely that alternative exists, and it is viable. As proven by pretty high-profile users. Actually, I sort of agree. I read the in memory too fast and didn't realize you were talking about memory mapped. There are other parts of sphinx that are kept directly in memory (not memory mapped) and do limit it's single-node scalability too much IMO. Unfortunately, Java has additional overhead wrt mmap, and you also can't do some stuff that you could do in C. All this means is that trade-offs that made sense for C/C++ solutions may or may not make sense for Java solutions. Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979353#action_12979353 ] Robert Muir commented on LUCENE-2843: - bq. Unfortunately, Java has additional overhead wrt mmap Its not just that, you cant assume mmap even works (32-bit platform, even some troubles on 64-bit windows). Because this is a search engine library, not just a server on 64-bit linux only, then we need to support other situations like 32-bit users doing desktop search. In other words, Test2BTerms in src/test should pass on my 32-bit windows machine with whatever we default to. Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979366#action_12979366 ] Earwin Burrfoot commented on LUCENE-2843: - bq. Nope, havent looked at their code... i think i stopped at the documentation when i saw how they analyzed text! All my points are contained within their documentation. No need to look at the code (it's as shady as Lucene's). In the same manner, Lucene had crappy analyzis for years, until you've taken hold of (unicode) police baton. So let's not allow color differences between our analyzers affect our judgement on other parts of ours : ) bq. In other words, Test2BTerms in src/test should pass on my 32-bit windows machine with whatever we default to. I'm questioning is there any legal, adequate reason to have that much terms. I'm agreeing on mmap+32bit/mmap+windows point for reasonable amount of terms though :/ A hybrid solution, with term-dict being loaded completely into memory (either via mmap, or into arrays) on per-field basis, is probably best in the end, however sad it may be. Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979372#action_12979372 ] Robert Muir commented on LUCENE-2843: - bq. A hybrid solution, with term-dict being loaded completely into memory (either via mmap, or into arrays) on per-field basis, is probably best in the end, however sad it may be. Whats the sad part again? why does it bother you if there is another alternative codec setup or terms dict implementation if you aren't using it? Should we also only have RAMDirectory and MMapDirectory and its sad that we have NIOFSDirectory? Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.
[ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2846: Attachment: LUCENE-2846.patch here's an updated patch: * The IR.setNorm(float) is also removed, forcing the user to use the correct similarity versus us using the wrong one (the static) * MultiNorms doesn't fake norms anymore, instead it handles the case of non-existent field versus omitted norms. * When a document doesnt have a field, its (undefined) norms are written as zero bytes instead of Similarity.getDefault().encodeNorm(1f). * All uses of Similarity.get/setDefault are now gone in lucene core, except for in IndexSearcher and IndexWriterConfig. omitTF is viral, but omitNorms is anti-viral. - Key: LUCENE-2846 URL: https://issues.apache.org/jira/browse/LUCENE-2846 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2846.patch, LUCENE-2846.patch omitTF is viral. if you add document 1 with field foo as omitTF, then document 2 has field foo without omitTF, they are both treated as omitTF. but omitNorms is the opposite. if you have a million documents with field foo with omitNorms, then you add just one document without omitting norms, now you suddenly have a million 'real norms'. I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s. but another option is to make omitTF anti-viral, which is more schemaless i guess. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.
[ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2846: Attachment: LUCENE-2846.patch sorry i had a piece of backwards logic in MultiNorms. of course all tests pass either way, which is why we need a good mixed-schema test (with RIW) for this issue before it can go in (no matter what we do) omitTF is viral, but omitNorms is anti-viral. - Key: LUCENE-2846 URL: https://issues.apache.org/jira/browse/LUCENE-2846 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2846.patch, LUCENE-2846.patch, LUCENE-2846.patch omitTF is viral. if you add document 1 with field foo as omitTF, then document 2 has field foo without omitTF, they are both treated as omitTF. but omitNorms is the opposite. if you have a million documents with field foo with omitNorms, then you add just one document without omitting norms, now you suddenly have a million 'real norms'. I think it would be good for omitNorms to be viral too, just for consistency, and also to prevent huge byte[]'s. but another option is to make omitTF anti-viral, which is more schemaless i guess. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979382#action_12979382 ] Jason Rutherglen commented on LUCENE-2324: -- bq. Once flush is triggered, the thread doing the flushing is free to flush any DWPT. OK. bq. OK let's start there and put back re-use only if we see a real perf issue? I think that's best. Balancing RAM isn't implemented in the branch, we can't predict the future usage of DWPT(s) (which could languish consuming RAM with byte[]s well after they're flushed due to a sudden drop in the number of calling threads external to IW). {quote}But it's really a nuke the world option which scares me. EG it could be a looong indexing session (app doesn't call commit() until the end) and we could be throwing away alot of progress.{quote} Right. Another option is to on commit try to flush all segments, meaning even if one DWPT/segment aborts, continue on with the other DWPTs (ie, a best effort). Then perhaps throw an exception with a report of which segment flushes succeeded, or simply return a report object detailing what happened during commit (somewhat expert usage though). Either way I think we need to give a few options to the user, then choose a default and see if it sticks. The default should probably be best effort. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2855) Contrib queryparser should not use CharSequence as Map key
[ https://issues.apache.org/jira/browse/LUCENE-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-2855: - Attachment: lucene_2855_adriano_crestani_2011_01_09.patch Thanks for pointing out the problems, here is the new patch Contrib queryparser should not use CharSequence as Map key -- Key: LUCENE-2855 URL: https://issues.apache.org/jira/browse/LUCENE-2855 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 3.0.3 Reporter: Adriano Crestani Assignee: Adriano Crestani Fix For: 3.0.4 Attachments: lucene_2855_adriano_crestani_2011_01_08.patch, lucene_2855_adriano_crestani_2011_01_09.patch Today, contrib query parser uses MapCharSequence,... in many different places, which may lead to problems, since CharSequence interface does not enforce the implementation of hashcode and equals methods. Today, it's causing a problem with QueryTreeBuilder.setBuilder(CharSequence,QueryBuilder) method, that does not works as expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979390#action_12979390 ] David Smiley commented on LUCENE-2611: -- Steven, I don't know if another issue should be created but there's some extra additions to the IntelliJ setup that would be nice. in vcs.xml, add this: {code:xml} component name=IssueNavigationConfiguration option name=links list IssueNavigationLink option name=issueRegexp value=[A-Z]+\-\d+ / option name=linkRegexp value=http://issues.apache.org/jira/browse/$0; / /IssueNavigationLink /list /option /component {code} And in workspace.xml, /project/compone...@name=ChangeListManager]/ add {code:xml} ignored path=.idea/ / ignored mask=*.iml / {code} And perhaps the copyright setup should be set up for ASL. IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x-part2.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2186) First cut at column-stride fields (index values storage)
[ https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979395#action_12979395 ] Jason Rutherglen commented on LUCENE-2186: -- Out of curiosity, re: LUCENE-2312, are we planning on putting CSF into Lucene 4.x? What's left to be done? First cut at column-stride fields (index values storage) Key: LUCENE-2186 URL: https://issues.apache.org/jira/browse/LUCENE-2186 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: CSF branch, 4.0 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, mem.py I created an initial basic impl for storing index values (ie column-stride value storage). This is still a work in progress... but the approach looks compelling. I'm posting my current status/patch here to get feedback/iterate, etc. The code is standalone now, and lives under new package oal.index.values (plus some util changes, refactorings) -- I have yet to integrate into Lucene so eg you can mark that a given Field's value should be stored into the index values, sorting will use these values instead of field cache, etc. It handles 3 types of values: * Six variants of byte[] per doc, all combinations of fixed vs variable length, and stored either straight (good for eg a title field), deref (good when many docs share the same value, but you won't do any sorting) or sorted. * Integers (variable bit precision used as necessary, ie this can store byte/short/int/long, and all precisions in between) * Floats (4 or 8 byte precision) String fields are stored as the UTF8 byte[]. This patch adds a BytesRef, which does the same thing as flex's TermRef (we should merge them). This patch also adds basic initial impl of PackedInts (LUCENE-1990); we can swap that out if/when we get a better impl. This storage is dense (like field cache), so it's appropriate when the field occurs in all/most docs. It's just like field cache, except the reading API is a get() method invocation, per document. Next step is to do basic integration with Lucene, and then compare sort performance of this vs field cache. For the sort by String value case, I think RAM usage GC load of this index values API should be much better than field caache, since it does not create object per document (instead shares big long[] and byte[] across all docs), and because the values are stored in RAM as their UTF8 bytes. There are abstract Writer/Reader classes. The current reader impls are entirely RAM resident (like field cache), but the API is (I think) agnostic, ie, one could make an MMAP impl instead. I think this is the first baby step towards LUCENE-1231. Ie, it cannot yet update values, and the reading API is fully random-access by docID (like field cache), not like a posting list, though I do think we should add an iterator() api (to return flex's DocsEnum) -- eg I think this would be a good way to track avg doc/field length for BM25/lnu.ltc scoring. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 3586 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3586/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple Error Message: expected:3 but was:2 Stack Trace: junit.framework.AssertionFailedError: expected:3 but was:2 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049) at org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple(TestLBHttpSolrServer.java:126) Build Log (for compile errors): [...truncated 8211 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2839) Visibility of Scorer.score(Collector, int, int) is wrong
[ https://issues.apache.org/jira/browse/LUCENE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-2839: - Assignee: Uwe Schindler Visibility of Scorer.score(Collector, int, int) is wrong Key: LUCENE-2839 URL: https://issues.apache.org/jira/browse/LUCENE-2839 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 The method for scoring subsets in Scorer has wrong visibility, its marked protected, but protected methods should not be called from other classes. Protected methods are intended for methods that should be overridden by subclasses and are called by (often) final methods of the same class. They should never be called from foreign classes. This method is called from another class out-of-scope: BooleanScorer(2) - so it must be public, but it's protected. This does not lead to a compiler error because BS(2) is in same package, but may lead to problems if subclasses from other packages override it. When implementing LUCENE-2838 I hit a trap, as I thought tis method should only be called from the class or Scorer itsself, but in fact its called from outside, leading to bugs, because I had not overridden it. As ConstantScorer did not use it I have overridden it with throw UOE and suddenly BooleanQuery was broken, which made it clear that it's called from outside (which is not the intention of protected methods). We cannot fix this in 3.x, as it would break backwards for classes that overwrite this method, but we can fix visibility in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2839) Visibility of Scorer.score(Collector, int, int) is wrong
[ https://issues.apache.org/jira/browse/LUCENE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2839: -- Attachment: LUCENE-2839-3.x.patch LUCENE-2839.patch Here the patch for trunk and 3.x, will commit soon. In 3.x I simply added a note to Scorer's javadocs, that tells the user, that subclasses in user's code should declare the method as public to ease transition to 4.0. Visibility of Scorer.score(Collector, int, int) is wrong Key: LUCENE-2839 URL: https://issues.apache.org/jira/browse/LUCENE-2839 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-2839-3.x.patch, LUCENE-2839.patch The method for scoring subsets in Scorer has wrong visibility, its marked protected, but protected methods should not be called from other classes. Protected methods are intended for methods that should be overridden by subclasses and are called by (often) final methods of the same class. They should never be called from foreign classes. This method is called from another class out-of-scope: BooleanScorer(2) - so it must be public, but it's protected. This does not lead to a compiler error because BS(2) is in same package, but may lead to problems if subclasses from other packages override it. When implementing LUCENE-2838 I hit a trap, as I thought tis method should only be called from the class or Scorer itsself, but in fact its called from outside, leading to bugs, because I had not overridden it. As ConstantScorer did not use it I have overridden it with throw UOE and suddenly BooleanQuery was broken, which made it clear that it's called from outside (which is not the intention of protected methods). We cannot fix this in 3.x, as it would break backwards for classes that overwrite this method, but we can fix visibility in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2839) Visibility of Scorer.score(Collector, int, int) is wrong
[ https://issues.apache.org/jira/browse/LUCENE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2839. --- Resolution: Fixed Committed trunk revision: 1057010, Committed javadoc updates revision: 1057011 Visibility of Scorer.score(Collector, int, int) is wrong Key: LUCENE-2839 URL: https://issues.apache.org/jira/browse/LUCENE-2839 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-2839-3.x.patch, LUCENE-2839.patch The method for scoring subsets in Scorer has wrong visibility, its marked protected, but protected methods should not be called from other classes. Protected methods are intended for methods that should be overridden by subclasses and are called by (often) final methods of the same class. They should never be called from foreign classes. This method is called from another class out-of-scope: BooleanScorer(2) - so it must be public, but it's protected. This does not lead to a compiler error because BS(2) is in same package, but may lead to problems if subclasses from other packages override it. When implementing LUCENE-2838 I hit a trap, as I thought tis method should only be called from the class or Scorer itsself, but in fact its called from outside, leading to bugs, because I had not overridden it. As ConstantScorer did not use it I have overridden it with throw UOE and suddenly BooleanQuery was broken, which made it clear that it's called from outside (which is not the intention of protected methods). We cannot fix this in 3.x, as it would break backwards for classes that overwrite this method, but we can fix visibility in trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2186) First cut at column-stride fields (index values storage)
[ https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979404#action_12979404 ] Simon Willnauer commented on LUCENE-2186: - bq. Out of curiosity, re: LUCENE-2312, are we planning on putting CSF into Lucene 4.x? What's left to be done? we are very close - to land on trunk there is about an evening of work left. JDoc is missing here and there plus some tests for FieldComparators - thats it! First cut at column-stride fields (index values storage) Key: LUCENE-2186 URL: https://issues.apache.org/jira/browse/LUCENE-2186 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: CSF branch, 4.0 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, mem.py I created an initial basic impl for storing index values (ie column-stride value storage). This is still a work in progress... but the approach looks compelling. I'm posting my current status/patch here to get feedback/iterate, etc. The code is standalone now, and lives under new package oal.index.values (plus some util changes, refactorings) -- I have yet to integrate into Lucene so eg you can mark that a given Field's value should be stored into the index values, sorting will use these values instead of field cache, etc. It handles 3 types of values: * Six variants of byte[] per doc, all combinations of fixed vs variable length, and stored either straight (good for eg a title field), deref (good when many docs share the same value, but you won't do any sorting) or sorted. * Integers (variable bit precision used as necessary, ie this can store byte/short/int/long, and all precisions in between) * Floats (4 or 8 byte precision) String fields are stored as the UTF8 byte[]. This patch adds a BytesRef, which does the same thing as flex's TermRef (we should merge them). This patch also adds basic initial impl of PackedInts (LUCENE-1990); we can swap that out if/when we get a better impl. This storage is dense (like field cache), so it's appropriate when the field occurs in all/most docs. It's just like field cache, except the reading API is a get() method invocation, per document. Next step is to do basic integration with Lucene, and then compare sort performance of this vs field cache. For the sort by String value case, I think RAM usage GC load of this index values API should be much better than field caache, since it does not create object per document (instead shares big long[] and byte[] across all docs), and because the values are stored in RAM as their UTF8 bytes. There are abstract Writer/Reader classes. The current reader impls are entirely RAM resident (like field cache), but the API is (I think) agnostic, ie, one could make an MMAP impl instead. I think this is the first baby step towards LUCENE-1231. Ie, it cannot yet update values, and the reading API is fully random-access by docID (like field cache), not like a posting list, though I do think we should add an iterator() api (to return flex's DocsEnum) -- eg I think this would be a good way to track avg doc/field length for BM25/lnu.ltc scoring. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979405#action_12979405 ] Steven Rowe commented on LUCENE-2611: - Hi David, Thanks for the input. I don't think another issue is necessary. I added the {{.idea/vcs.xml}} change to auto-linkify issues in log comments. I didn't know this option existed. Where does it do the auto-linkification? I don't see it in the log comment editor, and I also don't see it when I use browse an individual file's log messages (using the popup from the svnbar plugin toolbar icon). But I did not add the {{.idea/workspace.xml}} change you propose (ignoring {{.idea/}} and {{.iml}} files), because those files are already ignored via {{svn:ignore}} properties. When I added them, nothing changed for me - the files still show up in the project tree view greyed out, just as they did before I added the option. I'm not sure it's a good idea to add copyright setup for ASL - I don't know enough about what this plugin does. IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x-part2.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2186) First cut at column-stride fields (index values storage)
[ https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979407#action_12979407 ] Jason Rutherglen commented on LUCENE-2186: -- bq. we are very close - to land on trunk there is about an evening of work left. JDoc is missing here and there plus some tests for FieldComparators - thats it! Nice! Once it's in I'll try to get started on the RT field cache/doc values, which can likely be implemented and tested somewhat independent of the RT inverted index. First cut at column-stride fields (index values storage) Key: LUCENE-2186 URL: https://issues.apache.org/jira/browse/LUCENE-2186 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: CSF branch, 4.0 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, mem.py I created an initial basic impl for storing index values (ie column-stride value storage). This is still a work in progress... but the approach looks compelling. I'm posting my current status/patch here to get feedback/iterate, etc. The code is standalone now, and lives under new package oal.index.values (plus some util changes, refactorings) -- I have yet to integrate into Lucene so eg you can mark that a given Field's value should be stored into the index values, sorting will use these values instead of field cache, etc. It handles 3 types of values: * Six variants of byte[] per doc, all combinations of fixed vs variable length, and stored either straight (good for eg a title field), deref (good when many docs share the same value, but you won't do any sorting) or sorted. * Integers (variable bit precision used as necessary, ie this can store byte/short/int/long, and all precisions in between) * Floats (4 or 8 byte precision) String fields are stored as the UTF8 byte[]. This patch adds a BytesRef, which does the same thing as flex's TermRef (we should merge them). This patch also adds basic initial impl of PackedInts (LUCENE-1990); we can swap that out if/when we get a better impl. This storage is dense (like field cache), so it's appropriate when the field occurs in all/most docs. It's just like field cache, except the reading API is a get() method invocation, per document. Next step is to do basic integration with Lucene, and then compare sort performance of this vs field cache. For the sort by String value case, I think RAM usage GC load of this index values API should be much better than field caache, since it does not create object per document (instead shares big long[] and byte[] across all docs), and because the values are stored in RAM as their UTF8 bytes. There are abstract Writer/Reader classes. The current reader impls are entirely RAM resident (like field cache), but the API is (I think) agnostic, ie, one could make an MMAP impl instead. I think this is the first baby step towards LUCENE-1231. Ie, it cannot yet update values, and the reading API is fully random-access by docID (like field cache), not like a posting list, though I do think we should add an iterator() api (to return flex's DocsEnum) -- eg I think this would be a good way to track avg doc/field length for BM25/lnu.ltc scoring. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 3590 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3590/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:87) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049) Build Log (for compile errors): [...truncated 3101 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2272) Join
[ https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2272: - Component/s: search Join Key: SOLR-2272 URL: https://issues.apache.org/jira/browse/SOLR-2272 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-2272.patch Limited join functionality for Solr, mapping one set of IDs matching a query to another set of IDs, based on the indexed tokens of the fields. Example: fq={!join from=parent_ptr to:parent_id}child_doc:query -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979454#action_12979454 ] Chris Male commented on LUCENE-2611: .bq I'm not sure it's a good idea to add copyright setup for ASL - I don't know enough about what this plugin does. I've used the copyright plugin a lot and its a great way to ensure that the ASL is added to any new files. Might be useful to add it to reduce the hassle for new contributors. IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x-part2.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979454#action_12979454 ] Chris Male edited comment on LUCENE-2611 at 1/9/11 8:51 PM: bq. I'm not sure it's a good idea to add copyright setup for ASL - I don't know enough about what this plugin does. I've used the copyright plugin a lot and its a great way to ensure that the ASL is added to any new files. Might be useful to add it to reduce the hassle for new contributors. was (Author: cmale): .bq I'm not sure it's a good idea to add copyright setup for ASL - I don't know enough about what this plugin does. I've used the copyright plugin a lot and its a great way to ensure that the ASL is added to any new files. Might be useful to add it to reduce the hassle for new contributors. IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x-part2.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-trunk - Build # 1421 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1421/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:87) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049) Build Log (for compile errors): [...truncated 7055 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2310) DocBuilder's getTimeElapsedSince Error
DocBuilder's getTimeElapsedSince Error -- Key: SOLR-2310 URL: https://issues.apache.org/jira/browse/SOLR-2310 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: JDK1.6 Reporter: tom liu i has a job which runs about 65 hours, but the dataimport?command=status http requests returns 5 hours. in getTimeElapsedSince method of DocBuilder: {noformat} static String getTimeElapsedSince(long l) { l = System.currentTimeMillis() - l; return (l / (6 * 60)) % 60 + : + (l / 6) % 60 + : + (l / 1000) % 60 + . + l % 1000; } {noformat} the hours Compute is wrong, it mould be : {noformat} static String getTimeElapsedSince(long l) { l = System.currentTimeMillis() - l; return (l / (6 * 60)) + : + (l / 6) % 60 + : + (l / 1000) % 60 + . + l % 1000; } {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene-trunk - Build # 1421 - Failure
On Sun, Jan 9, 2011 at 9:40 PM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1421/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed maybe this is specific to pulsing? I noticed its failed 3 times with this identical pulsing stacktrace: Lucene-trunk/1421, tests-only/3590, tests-only/3570 However, this time it failed in a nightly build (perhaps the indexes are still available on the hudson machine if we salvage before the next nightly build?) it should be under lucene/build/test/N/jrecrashXXtmp/ all 3 times the stacktrace is: test: terms, freq, prox...ERROR [Java heap space] java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Position.clone(PulsingPostingsWriterImpl.java:104) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Document.clone(PulsingPostingsWriterImpl.java:74) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingTermState.clone(PulsingPostingsReaderImpl.java:72) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingDocsEnum.reset(PulsingPostingsReaderImpl.java:234) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl.docs(PulsingPostingsReaderImpl.java:189) at org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.docs(PrefixCodedTermsReader.java:515) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:756) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:489) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:83) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2657: Attachment: LUCENE-2657.patch Added profiles to populate internal repositories at {{lucene/dist/maven/}} and {{solr/dist/maven/}} with generated artifacts. To populate {{lucene/dist/maven/}} with POMs and binary, source and javadoc artifacts, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap,deploy-to-lucene-dist-maven-repository deploy cd lucene mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy cd ../modules mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy {code} To populate {{lucene/dist/solr/}}, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap cd solr mvn -DskipTests -Pdeploy-to-solr-dist-maven-repository,javadocs-jar,source-jar deploy {code} Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979496#action_12979496 ] Steven Rowe edited comment on LUCENE-2657 at 1/10/11 2:38 AM: -- Added profiles to populate internal repositories at {{lucene/dist/maven/}} and {{solr/dist/maven/}} with generated artifacts. To populate {{lucene/dist/maven/}} with POMs and binary, source and javadoc artifacts, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap,deploy-to-lucene-dist-maven-repository deploy cd lucene mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy cd ../modules mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy {code} To populate {{lucene/dist/solr/}}, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap install cd solr mvn -DskipTests -Pdeploy-to-solr-dist-maven-repository,javadocs-jar,source-jar deploy {code} was (Author: steve_rowe): Added profiles to populate internal repositories at {{lucene/dist/maven/}} and {{solr/dist/maven/}} with generated artifacts. To populate {{lucene/dist/maven/}} with POMs and binary, source and javadoc artifacts, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap,deploy-to-lucene-dist-maven-repository deploy cd lucene mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy cd ../modules mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy {code} To populate {{lucene/dist/solr/}}, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap cd solr mvn -DskipTests -Pdeploy-to-solr-dist-maven-repository,javadocs-jar,source-jar deploy {code} Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org