[jira] [Created] (SOLR-2452) rewrite solr build system
rewrite solr build system - Key: SOLR-2452 URL: https://issues.apache.org/jira/browse/SOLR-2452 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Fix For: 3.2, 4.0 As discussed some in SOLR-2002 (but that issue is long and hard to follow), I think we should rewrite the solr build system. Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1301#comment-1301 ] Robert Muir commented on SOLR-2452: --- I brought my previous patch up to date and committed it to https://svn.apache.org/repos/asf/lucene/dev/branches/solr2452 i ripped all the existing stuff out of the solr build: we can always add stuff back but I wanted to start lean and mean. compile/test/clean/dependencies/etc should work, and are extended from lucene's build. appreciate anyone who wants to spend time trying to help. rewrite solr build system - Key: SOLR-2452 URL: https://issues.apache.org/jira/browse/SOLR-2452 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Fix For: 3.2, 4.0 As discussed some in SOLR-2002 (but that issue is long and hard to follow), I think we should rewrite the solr build system. Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Does solr support secure enterprise search?
Hello, Does solr support secure enterprise search? That's to say, person can only visit to the concerns of the information within their authorities. If I wanna meet the goal, what can I do? Thanks for your help. 2011-04-01 Best wishes Zhenpeng Fang 方 振鹏 Dept. Software Engineering Xiamen University
[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014472#comment-13014472 ] David Mark Nemeskey commented on LUCENE-2959: - Robert, As for the problems with BM25F {quote} * for any field, Lucene has a per-field terms dictionary that contains that term's docFreq. To compute BM25f's IDF method would be challenging, because it wants a docFreq across all the fields. * the same issue applies to length normalization, lucene has a field length but really no concept of document length. {quote} One thing that is not clear for me is why these limitations would not be a problem for BM25. As I see it, the difference between the two methods is that BM25 simply computes tfs, idfs and document length from the whole document -- which, according to what you said, is not available Lucene. That's why I figured that a variant of BM25F would actually be more straightforward to implement. {quote} (its not clear to me at a glance either from the original paper, if this should be across only the fields in the query, across all the fields in the document, and if a static schema is implied in this scoring system (in lucene document 1 can have 3 fields and document 2 can have 40 different ones, even with different properties). {quote} Actually I am not sure there is a consensus on what BM25F actually is. :) For example, the BM25 formula can be applied to the weighted sum of field tfs, or alternatively, the per-field BM25 scores can be summarized as well after normalization. I've seen both called (maybe incorrectly) BM25F. If I understand correctly, the current scoring algorithm takes into account only the fields explicitly specified in the query. Is that right? If so, I see no reason why BM25 should behave otherwise. Which of course also means that we probably won't be able to save the summarized doc length and idf. Robert, would you be so kind to have a look at my proposal? It can be found at http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/davidnemeskey/1. It's basically the same as what I sent to the mailing list. I wrote that I want to implement BM25, BM25F and DFR (the framework, I meant with one or two smoothing models), as well as to convert the original scoring to the new framework. In light of the thread here, I guess it would be better to modify these goals, perhaps by: * deleting the conversion part? * committing myself to BM25/BM25F only? * explicitly stating that I want a higher level API based on the low-level one? As for the last item, it is only if I continue / join the work in 2392. Since I guess nobody wants two ranking frameworks, of course I will, but then in this part of the proposal should I just concentrate on the higher level API? Thanks! [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: Examples, Javadocs, Query/Scoring Reporter: David Mark Nemeskey Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 6591 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6591/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange Error Message: null Stack Trace: junit.framework.AssertionFailedError: at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1076) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1008) at org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:236) at org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:168) Build Log (for compile errors): [...truncated 9184 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-2378: -- Description: Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - write a DFA based suggester effectively identical to ternary tree based solution right now, - baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code) - modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case) - benchmark again - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] was:Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - write a DFA based suggester effectively identical to ternary tree based solution right now, - baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code) - modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case) - benchmark again - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does solr support secure enterprise search?
You should really be asking this on the user list solr-u...@lucene.apache.org. Solr does not provide any security features - you would be expected to implement security within your application that you put in front of Solr. LucidWorks Enterprise (a commercial package based upon Solr) does offer security features. Upayavira On Fri, 01 Apr 2011 16:37 +0800, michong900617 michong900...@xmu.edu.cn wrote: Hello, Does solr support secure enterprise search? That's to say, person can only visit to the concerns of the information within their authorities. If I wanna meet the goal, what can I do? Thanks for your help. 2011-04-01 Best wishes Zhenpeng Fang 方 振鹏 Dept. Software Engineering Xiamen University --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Does solr support secure enterprise search?
The user list solr-u...@lucene.apache.org. may be the best place to ask. But if I understand well there is multiple questions in your request: - one is related to secure access and I would suggest using HTTPS - user login and for this I suggest to use a portal that manage user logins (out of SOLR scope) and that will integrate SOLR - access restriction for each user. This is possible but not provided out of the box with SolR. That's my own answer but some others may provide more up to date information. 2011/4/1 michong900617 michong900...@xmu.edu.cn Hello, Does solr support secure enterprise search? That's to say, person can only visit to the concerns of the information within their authorities. If I wanna meet the goal, what can I do? Thanks for your help. 2011-04-01 -- Best wishes Zhenpeng Fang 方 振鹏 Dept. Software Engineering Xiamen University -- Gérard Dupont Information Processing Control and Cognition (IPCC) CASSIDIAN - an EADS company Document Learning team - LITIS Laboratory
add(CharSequence) in automaton builder
Mike, can you remember what ordering is required for add(CharSequence)? I see it requires INPUT_TYPE.BYTE4 assert fst.getInputType() == FST.INPUT_TYPE.BYTE4; but this would imply the order of full unicode codepoints on the input? Is this what String comparators do by default (I doubt, but wanted to check if you know first). Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: add(CharSequence) in automaton builder
On Fri, Apr 1, 2011 at 7:58 AM, Dawid Weiss dawid.we...@gmail.com wrote: Mike, can you remember what ordering is required for add(CharSequence)? I see it requires INPUT_TYPE.BYTE4 assert fst.getInputType() == FST.INPUT_TYPE.BYTE4; but this would imply the order of full unicode codepoints on the input? Is this what String comparators do by default (I doubt, but wanted to check if you know first). (sorry not mike, but) you are right, String.compareTo() compares in utf-16 order by default. this is not consistent with the order the FST builder expects (utf8/utf32 order) So if you are going to order the terms before passing them to Builder, you should either use a utf-16-in-utf-8-order comparator* (or simply use codePointAt and friends and compare those ints, probably slower...) different ways of impl'ing the comparator below: * http://icu-project.org/docs/papers/utf16_code_point_order.html * http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: add(CharSequence) in automaton builder
(sorry not mike, but) you are right, String.compareTo() compares in He, he, thanks Robert. We have these anti-child-abuse commercials on tv right now you never know who's on the other side... how appropriate for this situation. utf-16 order by default. this is not consistent with the order the FST builder expects (utf8/utf32 order) Yes, this is what I also figured out. The unicode code point order is also impl. in BytesRef.getUTF8SortedAsUnicodeComparator, correct? For what I need I'll use raw utf8 byte order, it doesn't matter as long as it's consistent. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 6591 - Failure
I committed fix -- this was in the backwards tests... Mike http://blog.mikemccandless.com On Fri, Apr 1, 2011 at 6:35 AM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6591/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange Error Message: null Stack Trace: junit.framework.AssertionFailedError: at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1076) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1008) at org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:236) at org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:168) Build Log (for compile errors): [...truncated 9184 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: add(CharSequence) in automaton builder
On Fri, Apr 1, 2011 at 8:25 AM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Yes, this is what I also figured out. The unicode code point order is also impl. in BytesRef.getUTF8SortedAsUnicodeComparator, correct? For what I need I'll use raw utf8 byte order, it doesn't matter as long as it's consistent. yes, if you are already working with bytes, definitely just stay with binary order (utf8 and utf32 are the same order, its only utf16/String/chars that are wackos) sorry, since you were talking about the charsequence api to builder, i assumed for a second you were working with chars/Strings, and forgot about how this is confusingly mixed with, yet distinct from, the whole BYTE1/BYTE4 selection in builder :) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Questions about 3.1.0 release, SVN and common-build.xml
Hi I noticed that 3.1.0's tag in svn is http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1. Should it not be http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1_0? At least, that's what's specified under Publishing on ReleaseTodo wiki. Also, the common-build.xml under the tag and the downloaded sources specifies version to be 3.1-SNAPSHOT. On the ReleaseTodo I found this: ... and the default version in lucene/common-build.xml on the branch to X.Y (remove the -SNAPSHOT suffix), so I guess 'SNAPSHOT' should have been removed, but also version should be set to 3.1.0. I apologize for finding these *after* the release has been created. I don't think it's critical that we fix the common-build.xml, but perhaps update the ReleaseTodo accordingly, so we do it right on 3.2? Can we rename the tag? Is it critical? Shai
Re: add(CharSequence) in automaton builder
On Fri, Apr 1, 2011 at 8:29 AM, Robert Muir rcm...@gmail.com wrote: sorry, since you were talking about the charsequence api to builder, i assumed for a second you were working with chars/Strings, and forgot about how this is confusingly mixed with, yet distinct from, the whole BYTE1/BYTE4 selection in builder :) It IS really confusing! Really, the Builder FST need to be parameterized also on the input type (it's already parameterized on the output type), but confronting the required generics to accomplish this was. scary. Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014538#comment-13014538 ] Simon Willnauer commented on LUCENE-2573: - bq. Thanks, Simon, for running the benchmarks! Good results overall, even though it's puzzling why flushing would be CPU intensive. well during flush we are encoding lots of VInts thats making it cpu intensive. I actually run the benchmark through a profiler and found out what the problem was with my benchmarks. When I indexed with DWPT my HDD was soo busy flushing segments concurrently that the read performance suffered and my indexing threads blocked on the line doc file where I read the records from. This explains the large amounts of spikes towards 0 doc/sec. The profiler also showed that we are waiting on ThreadState#lock() constantly with at least 3 threads. I changed the current behavior of the threadpool to not clear the thread bindings when I replace a DWPT for flushing an voila! we have comparable peak ingest rate. !http://people.apache.org/~simonw/DocumentsWriterPerThread_dps_01.png! Note the difference DWPT indexes the documents in 6 min 15 seconds! !http://people.apache.org/~simonw/Trunk_dps_01.png! Here we have 13 min 40 seconds! NICE! !http://people.apache.org/~simonw/DocumentsWriterPerThread_flush_01.png! Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Questions about 3.1.0 release, SVN and common-build.xml
On Fri, Apr 1, 2011 at 8:31 AM, Shai Erera ser...@gmail.com wrote: Hi I noticed that 3.1.0's tag in svn is http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1. Should it not be http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1_0? At least, that's what's specified under Publishing on ReleaseTodo wiki. Yes, I did this intentionally to try to discourage a 3.1.1. Is it really necessary to have confusing 3-part bugfix releases when branch_3x itself is a stable branch?! Shouldnt we just work on 3.2 now? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Questions about 3.1.0 release, SVN and common-build.xml
On Fri, Apr 1, 2011 at 8:42 AM, Robert Muir rcm...@gmail.com wrote: On Fri, Apr 1, 2011 at 8:31 AM, Shai Erera ser...@gmail.com wrote: Hi I noticed that 3.1.0's tag in svn is http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1. Should it not be http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1_0? At least, that's what's specified under Publishing on ReleaseTodo wiki. Yes, I did this intentionally to try to discourage a 3.1.1. Is it really necessary to have confusing 3-part bugfix releases when branch_3x itself is a stable branch?! Shouldnt we just work on 3.2 now? (sorry i refer to the branch, not the tag here, but i think it still makes sense). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Questions about 3.1.0 release, SVN and common-build.xml
On Apr 1, 2011, at 8:31 AM, Shai Erera wrote: Hi I noticed that 3.1.0's tag in svn is http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1. Should it not be http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1_0? At least, that's what's specified under Publishing on ReleaseTodo wiki. Yeah, we can move it.
Re: Questions about 3.1.0 release, SVN and common-build.xml
The branch is ok -- 3_1 branch is intended for 3.1.x future releases indeed. If we can commit to releasing 3.2 instead of 3.1.1 in case only bug fixes are present, then I'm ok with it. We'd also need to commit, in general, to release more often. So if we decide to release say every 3 months, then 3.2 can include all the bug fixes for 3.1. If that's the case (and I support it wholeheartedly), why create a branch for 3.1 at all - we could just tag branches_3x? Also, the release artifacts are named 3.1.0, suggesting there will be a 3.1.1 -- hence why I wrote this email. But again, +1 on: * Not releasing 3.1.1, but instead 3.2 * Not branching 3x, but instead only tag it * Name the artifacts of future releases x.y only. Shai On Fri, Apr 1, 2011 at 2:43 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Apr 1, 2011 at 8:42 AM, Robert Muir rcm...@gmail.com wrote: On Fri, Apr 1, 2011 at 8:31 AM, Shai Erera ser...@gmail.com wrote: Hi I noticed that 3.1.0's tag in svn is http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1. Should it not be http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1_0? At least, that's what's specified under Publishing on ReleaseTodo wiki. Yes, I did this intentionally to try to discourage a 3.1.1. Is it really necessary to have confusing 3-part bugfix releases when branch_3x itself is a stable branch?! Shouldnt we just work on 3.2 now? (sorry i refer to the branch, not the tag here, but i think it still makes sense). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: add(CharSequence) in automaton builder
sorry, since you were talking about the charsequence api to builder, i assumed for a second you were working with chars/Strings, and forgot about how this is confusingly mixed with, yet distinct from, the whole BYTE1/BYTE4 selection in builder :) I am working with strings because that's what the Lookup API is providing... which I think should change, but it's something for another round of patches. The BYTE1/BYTE4 is confusing and I believe at least some sort of documentation should be added there to clarify what it's for and how it should be used. Again -- something to clarify as part of another task. I should have that Lookup impl. ready tomorrow, had to reiterate over certain things first and it took me longer than expected. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Questions about 3.1.0 release, SVN and common-build.xml
On Fri, Apr 1, 2011 at 8:49 AM, Shai Erera ser...@gmail.com wrote: The branch is ok -- 3_1 branch is intended for 3.1.x future releases indeed. If we can commit to releasing 3.2 instead of 3.1.1 in case only bug fixes are present, then I'm ok with it. We'd also need to commit, in general, to release more often. So if we decide to release say every 3 months, then 3.2 can include all the bug fixes for 3.1. I don't think we have to commit to anything explicitly? but maybe we should see how things go? Releasing lucene and solr is a heavy-duty job and why make bugfix-only branches (this is a lot of merging and stuff required for committers) when we can issue releases with bugfixes and also a couple stable improvements? Personally, i decided today to stop putting bugs in my code in the first place :) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014547#comment-13014547 ] Robert Muir commented on LUCENE-2959: - {quote} One thing that is not clear for me is why these limitations would not be a problem for BM25. As I see it, the difference between the two methods is that BM25 simply computes tfs, idfs and document length from the whole document – which, according to what you said, is not available Lucene. That's why I figured that a variant of BM25F would actually be more straightforward to implement. {quote} A variant sounds really interesting? I think you know better than me here, I just looked at the original paper and thought to myself that to implement this by the book might not be feasible for a while. {quote} Robert, would you be so kind to have a look at my proposal? It can be found at http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/davidnemeskey/1. It's basically the same as what I sent to the mailing list. I wrote that I want to implement BM25, BM25F and DFR (the framework, I meant with one or two smoothing models), as well as to convert the original scoring to the new framework. In light of the thread here, I guess it would be better to modify these goals, perhaps by: deleting the conversion part? committing myself to BM25/BM25F only? explicitly stating that I want a higher level API based on the low-level one? {quote} I think you can decide what you want to do? Obviously I would love to see all of it done :) But its your choice, I could see you going a couple different ways: * closer to your original proposal, you could still develop a flexible scoring API on top of Similarity. Hey, all I did was move stuff from Scorer to Similarity really, which does give flexibility, but its probably not what an IR researcher would want (its low-level and confusing). So you could make a SimpleSimilarity or EasySimilarity or something thats presents a much simpler API (something closer to what terrier/indri present) on top of this, for easily implementing ranking functions? I think this would be extremely valuable long-term: who cares if we have a low-level flexible scoring API that only speed demons like, but IR practitioners find confusing and hideous? Someone who is trying to experiment with an enhancement to relevance likely doesn't care if their TREC run takes 30 seconds instead of 20 seconds if the API is really easy and they aren't wasting time fighting with lucene? If you go this route, you could implement BM25, DFR, etc as you suggested as examples to how to use this API, and there would be more of a focus on API quality and simplicity instead of performance. * or alternatively, you could refine your proposal to implement a really production strength version of one of these scoring systems on top of the low-level API, that would ideally have competitive performance/documentation/etc with Lucene's default scoring today. If you decide to do this, then yes, I would definitely suggest picking only one, because I think its a ton of work as I listed above, and I think there would be more focus on practical things (some probably being nuances of lucene) and performance. [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: Examples, Javadocs, Query/Scoring Reporter: David Mark Nemeskey Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Questions about 3.1.0 release, SVN and common-build.xml
Hi Shai, On 4/1/2011 at 8:32 AM, Shai Erera wrote: Also, the common-build.xml under the tag and the downloaded sources specifies version to be 3.1-SNAPSHOT. On the ReleaseTodo I found this: ... and the default version in lucene/common-build.xml on the branch to X.Y (remove the -SNAPSHOT suffix), so I guess 'SNAPSHOT' should have been removed, but also version should be set to 3.1.0. I'm pretty sure the ReleaseTodo page is wrong on this. Building from a source distribution should *not* produce artifacts that have the same version in their names as the binary release. We don't want same-version-but-different binary artifacts being accidentally produced. There's nothing stopping people from changing this themselves, of course, so leaving the pre-release version name, including the -SNAPSHOT suffix, in the source release is just a passive defense against this kind of mistake. I'll change the ReleaseTodo page if there are no objections. Steve
[Lucene.Net] Incubator PMC/Board report for April 2011 (lucene-net-...@lucene.apache.org)
Dear Lucene.NET Developers, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 20 April 2011, 10 am Pacific. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted one week before the board meeting, to allow sufficient time for review. Please submit your report with sufficient time to allow the incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is one week prior to the board meeting. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. This should be appended to the Incubator Wiki page at: http://wiki.apache.org/incubator/April2011 Note: This manually populated. You may need to wait a little before this page is created from a template. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC
Re: Unsupported encoding GB18030
On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl jan@cominvent.com wrote: Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23 When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error: % java -jar post.jar gb18030-example.xml jar gb18030-example.xml SimplePostTool: version 1.3 SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file gb18030-example.xml SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030 The same works on my MacBook with Java1.6.0_24 Interesting - things seem fine for me on Win7 Java1.6.0_24, but I don't have XP around any longer to see if that's the factor somehow. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Unsupported encoding GB18030
On Fri, Apr 1, 2011 at 10:00 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl jan@cominvent.com wrote: Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23 When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error: % java -jar post.jar gb18030-example.xml jar gb18030-example.xml SimplePostTool: version 1.3 SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file gb18030-example.xml SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030 The same works on my MacBook with Java1.6.0_24 Interesting - things seem fine for me on Win7 Java1.6.0_24, but I don't have XP around any longer to see if that's the factor somehow. Its worth mentioning, there is no guarantee the JRE will support GB18030 encoding. There are only 6 charsets guaranteed to exist: http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Unsupported encoding GB18030
On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl jan@cominvent.com wrote: Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23 When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error: % java -jar post.jar gb18030-example.xml jar gb18030-example.xml SimplePostTool: version 1.3 SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file gb18030-example.xml SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030 The same works on my MacBook with Java1.6.0_24 Interesting - things seem fine for me on Win7 Java1.6.0_24, but I don't have XP around any longer to see if that's the factor somehow. It seems that this JVM used on Windows does not support the particular encoding. This is not Solr's fault, maybe it's some stripped down foreign JDK like IBM's or whatever. But even Sun only guarantees some encodings to be present in any JVM, but GB18030 is for sure very optional. As far as I remember in early JDK days, there were extra eastern JDKs around, that had extra charsets, maybe thats still the case for Win XP? Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Unsupported encoding GB18030
On Fri, Apr 1, 2011 at 10:07 AM, Robert Muir rcm...@gmail.com wrote: On Fri, Apr 1, 2011 at 10:00 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl jan@cominvent.com wrote: Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23 When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error: % java -jar post.jar gb18030-example.xml jar gb18030-example.xml SimplePostTool: version 1.3 SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file gb18030-example.xml SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030 The same works on my MacBook with Java1.6.0_24 Interesting - things seem fine for me on Win7 Java1.6.0_24, but I don't have XP around any longer to see if that's the factor somehow. Its worth mentioning, there is no guarantee the JRE will support GB18030 encoding. There are only 6 charsets guaranteed to exist: http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html Indexing *.xml is a very common thing for new users to do. If this is likely to fail for enough users, we should move, remove, or at least change the filename to something like gb18030-example.xml.gb18030 so it won't get picked up by accident. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #76: POMs out of sync
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/76/ No tests ran. Build Log (for compile errors): [...truncated 9507 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014619#comment-13014619 ] David Mark Nemeskey commented on LUCENE-2959: - {quote} I think you can decide what you want to do? {quote} Fair enough. :) I guess I'll stick with my original proposal then, though I might change a few things here and there; maybe change the focus from flexibility (as it seems to be already underway) to simplicity. [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: Examples, Javadocs, Query/Scoring Reporter: David Mark Nemeskey Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2061) Generate jar containing test classes.
[ https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated SOLR-2061: -- Attachment: SOLR-2061.patch This patch includes a new Test Framework Javadoc link from the Solr website's index.html. Committing shortly. Generate jar containing test classes. - Key: SOLR-2061 URL: https://issues.apache.org/jira/browse/SOLR-2061 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.1 Reporter: Drew Farris Assignee: Steven Rowe Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate and deploy a jar contaiing the test classes so other projects could write unit tests using the framework in Solr. This may take care of SOLR-717 as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Questions about 3.1.0 release, SVN and common-build.xml
Ok, we can keep SNAPSHOT. I was just thinking it'll be nice if the tag sets the right version, for convenience. It's not so hard to do. BTW, I don't build the code, just run jar-src so I can attach the source to the jars for debugging purposes. If we had packaged them already (not that I propose that we do that), I wouldn't be downloading the source at all. Thanks, Shai On Friday, April 1, 2011, Uwe Schindler u...@thetaphi.de wrote: +1. In all previous releases we were leaving the -dev in the common-build.xml, it's simply now -SNAPSHOT. Whenever somebody compiles the code himself, there might be changes in it so the reproduced JAR files are not identical to the released ones. This was the same for all releases before (at least since 2.9.0 where we had the discussion, too). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: Friday, April 01, 2011 3:19 PM To: dev@lucene.apache.org Subject: RE: Questions about 3.1.0 release, SVN and common-build.xml Hi Shai, On 4/1/2011 at 8:32 AM, Shai Erera wrote: Also, the common-build.xml under the tag and the downloaded sources specifies version to be 3.1-SNAPSHOT. On the ReleaseTodo I found this: ... and the default version in lucene/common-build.xml on the branch to X.Y (remove the -SNAPSHOT suffix), so I guess 'SNAPSHOT' should have been removed, but also version should be set to 3.1.0. I'm pretty sure the ReleaseTodo page is wrong on this. Building from a source distribution should *not* produce artifacts that have the same version in their names as the binary release. We don't want same- version-but-different binary artifacts being accidentally produced. There's nothing stopping people from changing this themselves, of course, so leaving the pre-release version name, including the -SNAPSHOT suffix, in the source release is just a passive defense against this kind of mistake. I'll change the ReleaseTodo page if there are no objections. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Unsupported encoding GB18030
Hi Yonik, I started my virtual box with fresh windows xp snapshot. Downloaded JDK 1.6.0_24 and Solr 3.1.0. Started solr and then java -jar post.jar *.xml - success. You should before we start to fix something that's not an issue ask this person which JDK exactly he uses and where he downloaded it. Is it maybe not an Oracle one? (this GB encoding is very common - if a JVM does not support it (it must not) it can only be some western-european one like I mentioned in my mail). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, April 01, 2011 4:21 PM To: dev@lucene.apache.org Cc: Robert Muir Subject: Re: Unsupported encoding GB18030 On Fri, Apr 1, 2011 at 10:07 AM, Robert Muir rcm...@gmail.com wrote: On Fri, Apr 1, 2011 at 10:00 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl jan@cominvent.com wrote: Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23 When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error: % java -jar post.jar gb18030-example.xml jar gb18030-example.xml SimplePostTool: version 1.3 SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file gb18030-example.xml SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030 The same works on my MacBook with Java1.6.0_24 Interesting - things seem fine for me on Win7 Java1.6.0_24, but I don't have XP around any longer to see if that's the factor somehow. Its worth mentioning, there is no guarantee the JRE will support GB18030 encoding. There are only 6 charsets guaranteed to exist: http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset. html Indexing *.xml is a very common thing for new users to do. If this is likely to fail for enough users, we should move, remove, or at least change the filename to something like gb18030-example.xml.gb18030 so it won't get picked up by accident. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25- 26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Questions about 3.1.0 release, SVN and common-build.xml
Shai, the source jars are available from the maven central repo, e.g.: http://repo2.maven.org/maven2/org/apache/lucene/lucene-core/3.1.0/lucene-core-3.1.0-sources.jar -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Friday, April 01, 2011 11:12 AM To: dev@lucene.apache.org Subject: Re: Questions about 3.1.0 release, SVN and common-build.xml Ok, we can keep SNAPSHOT. I was just thinking it'll be nice if the tag sets the right version, for convenience. It's not so hard to do. BTW, I don't build the code, just run jar-src so I can attach the source to the jars for debugging purposes. If we had packaged them already (not that I propose that we do that), I wouldn't be downloading the source at all. Thanks, Shai On Friday, April 1, 2011, Uwe Schindler u...@thetaphi.de wrote: +1. In all previous releases we were leaving the -dev in the common- build.xml, it's simply now -SNAPSHOT. Whenever somebody compiles the code himself, there might be changes in it so the reproduced JAR files are not identical to the released ones. This was the same for all releases before (at least since 2.9.0 where we had the discussion, too). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: Friday, April 01, 2011 3:19 PM To: dev@lucene.apache.org Subject: RE: Questions about 3.1.0 release, SVN and common-build.xml Hi Shai, On 4/1/2011 at 8:32 AM, Shai Erera wrote: Also, the common-build.xml under the tag and the downloaded sources specifies version to be 3.1-SNAPSHOT. On the ReleaseTodo I found this: ... and the default version in lucene/common-build.xml on the branch to X.Y (remove the -SNAPSHOT suffix), so I guess 'SNAPSHOT' should have been removed, but also version should be set to 3.1.0. I'm pretty sure the ReleaseTodo page is wrong on this. Building from a source distribution should *not* produce artifacts that have the same version in their names as the binary release. We don't want same- version-but-different binary artifacts being accidentally produced. There's nothing stopping people from changing this themselves, of course, so leaving the pre-release version name, including the -SNAPSHOT suffix, in the source release is just a passive defense against this kind of mistake. I'll change the ReleaseTodo page if there are no objections. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3003) Move UnInvertedField into Lucene core
[ https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014703#comment-13014703 ] Yonik Seeley commented on LUCENE-3003: -- bq. Attached: 32-bit results Ah, bummer. It's every 8 bytes, but with a 4 byte offset! I guess we could make it based on if we detect 32 vs 64 bit jvm... but maybe first see if anyone has any ideas about how to use something like pagedbytes instead. Move UnInvertedField into Lucene core - Key: LUCENE-3003 URL: https://issues.apache.org/jira/browse/LUCENE-3003 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3003.patch, LUCENE-3003.patch, byte_size_32-bit-openjdk6.txt Solr's UnInvertedField lets you quickly lookup all terms ords for a given doc/field. Like, FieldCache, it inverts the index to produce this, and creates a RAM-resident data structure holding the bits; but, unlike FieldCache, it can handle multiple values per doc, and, it does not hold the term bytes in RAM. Rather, it holds only term ords, and then uses TermsEnum to resolve ord - term. This is great eg for faceting, where you want to use int ords for all of your counting, and then only at the end you need to resolve the top N ords to their text. I think this is a useful core functionality, and we should move most of it into Lucene's core. It's a good complement to FieldCache. For this first baby step, I just move it into core and refactor Solr's usage of it. After this, as separate issues, I think there are some things we could explore/improve: * The first-pass that allocates lots of tiny byte[] looks like it could be inefficient. Maybe we could use the byte slices from the indexer for this... * We can improve the RAM efficiency of the TermIndex: if the codec supports ords, and we are operating on one segment, we should just use it. If not, we can use a more RAM-efficient data structure, eg an FST mapping to the ord. * We may be able to improve on the main byte[] representation by using packed ints instead of delta-vInt? * Eventually we should fold this ability into docvalues, ie we'd write the byte[] image at indexing time, and then loading would be fast, instead of uninverting -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2052) Allow for a list of filter queries and a single docset filter in QueryComponent
[ https://issues.apache.org/jira/browse/SOLR-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tylerw updated SOLR-2052: - Attachment: SOLR-2052-4.patch Updated patch that applies cleanly against trunk and works with groups/field collapsing features. Allow for a list of filter queries and a single docset filter in QueryComponent --- Key: SOLR-2052 URL: https://issues.apache.org/jira/browse/SOLR-2052 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Environment: Mac OS X, Java 1.6 Reporter: Stephen Green Priority: Minor Fix For: Next Attachments: SOLR-2052-2.patch, SOLR-2052-3.patch, SOLR-2052-4.patch, SOLR-2052.patch SolrIndexSearcher.QueryCommand allows you to specify a list of filter queries or a single filter (as a DocSet), but not both. This restriction seems arbitrary, and there are cases where we can have both a list of filter queries and a DocSet generated by some other non-query process (e.g., filtering documents according to IDs pulled from some other source like a database.) Fixing this requires a few small changes to SolrIndexSearcher to allow both of these to be set for a QueryCommand and to take both into account when evaluating the query. It also requires a modification to ResponseBuilder to allow setting the single filter at query time. I've run into this against 1.4, but the same holds true for the trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Questions about 3.1.0 release, SVN and common-build.xml
Thanks ! Shai On Friday, April 1, 2011, Steven A Rowe sar...@syr.edu wrote: Shai, the source jars are available from the maven central repo, e.g.: http://repo2.maven.org/maven2/org/apache/lucene/lucene-core/3.1.0/lucene-core-3.1.0-sources.jar -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Friday, April 01, 2011 11:12 AM To: dev@lucene.apache.org Subject: Re: Questions about 3.1.0 release, SVN and common-build.xml Ok, we can keep SNAPSHOT. I was just thinking it'll be nice if the tag sets the right version, for convenience. It's not so hard to do. BTW, I don't build the code, just run jar-src so I can attach the source to the jars for debugging purposes. If we had packaged them already (not that I propose that we do that), I wouldn't be downloading the source at all. Thanks, Shai On Friday, April 1, 2011, Uwe Schindler u...@thetaphi.de wrote: +1. In all previous releases we were leaving the -dev in the common- build.xml, it's simply now -SNAPSHOT. Whenever somebody compiles the code himself, there might be changes in it so the reproduced JAR files are not identical to the released ones. This was the same for all releases before (at least since 2.9.0 where we had the discussion, too). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: Friday, April 01, 2011 3:19 PM To: dev@lucene.apache.org Subject: RE: Questions about 3.1.0 release, SVN and common-build.xml Hi Shai, On 4/1/2011 at 8:32 AM, Shai Erera wrote: Also, the common-build.xml under the tag and the downloaded sources specifies version to be 3.1-SNAPSHOT. On the ReleaseTodo I found this: ... and the default version in lucene/common-build.xml on the branch to X.Y (remove the -SNAPSHOT suffix), so I guess 'SNAPSHOT' should have been removed, but also version should be set to 3.1.0. I'm pretty sure the ReleaseTodo page is wrong on this. Building from a source distribution should *not* produce artifacts that have the same version in their names as the binary release. We don't want same- version-but-different binary artifacts being accidentally produced. There's nothing stopping people from changing this themselves, of course, so leaving the pre-release version name, including the -SNAPSHOT suffix, in the source release is just a passive defense against this kind of mistake. I'll change the ReleaseTodo page if there are no objections. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Questions about 3.1.0 release, SVN and common-build.xml
Yeah, have noticed this shortly ago, too. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Friday, April 01, 2011 5:57 PM To: dev@lucene.apache.org Subject: Re: Questions about 3.1.0 release, SVN and common-build.xml Thanks ! Shai On Friday, April 1, 2011, Steven A Rowe sar...@syr.edu wrote: Shai, the source jars are available from the maven central repo, e.g.: http://repo2.maven.org/maven2/org/apache/lucene/lucene- core/3.1.0/luce ne-core-3.1.0-sources.jar -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Friday, April 01, 2011 11:12 AM To: dev@lucene.apache.org Subject: Re: Questions about 3.1.0 release, SVN and common-build.xml Ok, we can keep SNAPSHOT. I was just thinking it'll be nice if the tag sets the right version, for convenience. It's not so hard to do. BTW, I don't build the code, just run jar-src so I can attach the source to the jars for debugging purposes. If we had packaged them already (not that I propose that we do that), I wouldn't be downloading the source at all. Thanks, Shai On Friday, April 1, 2011, Uwe Schindler u...@thetaphi.de wrote: +1. In all previous releases we were leaving the -dev in the common- build.xml, it's simply now -SNAPSHOT. Whenever somebody compiles the code himself, there might be changes in it so the reproduced JAR files are not identical to the released ones. This was the same for all releases before (at least since 2.9.0 where we had the discussion, too). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: Friday, April 01, 2011 3:19 PM To: dev@lucene.apache.org Subject: RE: Questions about 3.1.0 release, SVN and common- build.xml Hi Shai, On 4/1/2011 at 8:32 AM, Shai Erera wrote: Also, the common-build.xml under the tag and the downloaded sources specifies version to be 3.1-SNAPSHOT. On the ReleaseTodo I found this: ... and the default version in lucene/common-build.xml on the branch to X.Y (remove the -SNAPSHOT suffix), so I guess 'SNAPSHOT' should have been removed, but also version should be set to 3.1.0. I'm pretty sure the ReleaseTodo page is wrong on this. Building from a source distribution should *not* produce artifacts that have the same version in their names as the binary release. We don't want same- version-but-different binary artifacts being accidentally produced. There's nothing stopping people from changing this themselves, of course, so leaving the pre-release version name, including the -SNAPSHOT suffix, in the source release is just a passive defense against this kind of mistake. I'll change the ReleaseTodo page if there are no objections. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3003) Move UnInvertedField into Lucene core
[ https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014723#comment-13014723 ] Michael McCandless commented on LUCENE-3003: bq. It is inefficient - but I never saw a way around it since the lists are all being built in parallel (due to the fact that we are uninverting). Lucene's indexer (TermsHashPerField) has precisely this same problem -- every unique term must point to two (well, one if omitTFAP) growable byte arrays. We use slices into a single big (paged) byte[], where first slice is tiny and can only hold like 5 bytes, but then points to the next slice which is a bit bigger, etc. We could look @ refactoring that for this use too... Though this is just the one-time startup cost. bq. Another small easy optimization I hadn't gotten around to yet was to lower the indexIntervalBits and make it configurable. I did make it configurable to the Lucene class (you can pass it in to ctor), but for Solr I left it using every 128th term. {quote} Another small optimization would be to store an array of offsets to length-prefixed byte arrays, rather than a BytesRef[]. At least the values are already in packed byte arrays via PagedBytes. {quote} Both FieldCache and docvalues (branch) store an array-of-terms like this (the array of offsets is packed ints). We should also look at using an FST, which'd be the most compact but the ord - term lookup cost goes up. Anyway I think we can pursue these cool ideas on new [future] issues... Move UnInvertedField into Lucene core - Key: LUCENE-3003 URL: https://issues.apache.org/jira/browse/LUCENE-3003 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3003.patch, LUCENE-3003.patch, byte_size_32-bit-openjdk6.txt Solr's UnInvertedField lets you quickly lookup all terms ords for a given doc/field. Like, FieldCache, it inverts the index to produce this, and creates a RAM-resident data structure holding the bits; but, unlike FieldCache, it can handle multiple values per doc, and, it does not hold the term bytes in RAM. Rather, it holds only term ords, and then uses TermsEnum to resolve ord - term. This is great eg for faceting, where you want to use int ords for all of your counting, and then only at the end you need to resolve the top N ords to their text. I think this is a useful core functionality, and we should move most of it into Lucene's core. It's a good complement to FieldCache. For this first baby step, I just move it into core and refactor Solr's usage of it. After this, as separate issues, I think there are some things we could explore/improve: * The first-pass that allocates lots of tiny byte[] looks like it could be inefficient. Maybe we could use the byte slices from the indexer for this... * We can improve the RAM efficiency of the TermIndex: if the codec supports ords, and we are operating on one segment, we should just use it. If not, we can use a more RAM-efficient data structure, eg an FST mapping to the ord. * We may be able to improve on the main byte[] representation by using packed ints instead of delta-vInt? * Eventually we should fold this ability into docvalues, ie we'd write the byte[] image at indexing time, and then loading would be fast, instead of uninverting -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014724#comment-13014724 ] Michael Busch commented on LUCENE-2573: --- Awesome speedup! Finally all this work shows great results!! What's surprising is that the merge time is lower with DWPT. How can that be, considering we're doing more merges? Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1076) Allow MergePolicy to select non-contiguous merges
[ https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1076: --- Attachment: LUCENE-1076.patch Phew, this patch almost fell below the event horizon of my TODO list... I'm attaching new modernized one. I also mod'd the policy to not select two max-sized merges at once. I think it's ready to commit... Allow MergePolicy to select non-contiguous merges - Key: LUCENE-1076 URL: https://issues.apache.org/jira/browse/LUCENE-1076 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-1076.patch, LUCENE-1076.patch, LUCENE-1076.patch I started work on this but with LUCENE-1044 I won't make much progress on it for a while, so I want to checkpoint my current state/patch. For backwards compatibility we must leave the default MergePolicy as selecting contiguous merges. This is necessary because some applications rely on temporal monotonicity of doc IDs, which means even though merges can re-number documents, the renumbering will always reflect the order in which the documents were added to the index. Still, for those apps that do not rely on this, we should offer a MergePolicy that is free to select the best merges regardless of whether they are continuguous. This requires fixing IndexWriter to accept such a merge, and, fixing LogMergePolicy to optionally allow it the freedom to do so. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3006) Javadocs warnings should fail the build
[ https://issues.apache.org/jira/browse/LUCENE-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-3006: Attachment: LUCENE-3006-modules-javadoc-warning-cleanup.patch Patch annihilating modules/ javadoc warnings (in analysis/icu/ and benchmark/). Committing shortly. Javadocs warnings should fail the build --- Key: LUCENE-3006 URL: https://issues.apache.org/jira/browse/LUCENE-3006 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.2, 4.0 Reporter: Grant Ingersoll Attachments: LUCENE-3006-javadoc-warning-cleanup.patch, LUCENE-3006-modules-javadoc-warning-cleanup.patch, LUCENE-3006.patch, LUCENE-3006.patch, LUCENE-3006.patch We should fail the build when there are javadocs warnings, as this should not be the Release Manager's job to fix all at once right before the release. See http://www.lucidimagination.com/search/document/14bd01e519f39aff/brainstorming_on_improving_the_release_process -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3006) Javadocs warnings should fail the build
[ https://issues.apache.org/jira/browse/LUCENE-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014767#comment-13014767 ] Steven Rowe commented on LUCENE-3006: - bq. Patch annihilating modules/ javadoc warnings (in analysis/icu/ and benchmark/). Committed on trunk r1087830. Javadocs warnings should fail the build --- Key: LUCENE-3006 URL: https://issues.apache.org/jira/browse/LUCENE-3006 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.2, 4.0 Reporter: Grant Ingersoll Attachments: LUCENE-3006-javadoc-warning-cleanup.patch, LUCENE-3006-modules-javadoc-warning-cleanup.patch, LUCENE-3006.patch, LUCENE-3006.patch, LUCENE-3006.patch We should fail the build when there are javadocs warnings, as this should not be the Release Manager's job to fix all at once right before the release. See http://www.lucidimagination.com/search/document/14bd01e519f39aff/brainstorming_on_improving_the_release_process -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #76: POMs out of sync
This build failed because of javadocs warnings under modules/. I committed fixes under LUCENE-3006. I guess the nightly Ant Lucene trunk build doesn't build modules/ javadocs? Steve -Original Message- From: Apache Hudson Server [mailto:hud...@hudson.apache.org] Sent: Friday, April 01, 2011 10:32 AM To: dev@lucene.apache.org Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #76: POMs out of sync Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/76/ No tests ran. Build Log (for compile errors): [...truncated 9507 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2061) Generate jar containing test classes.
[ https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe resolved SOLR-2061. --- Resolution: Fixed Committed: - trunk: r1087722, r1087723, r1087834 - branch_3x: r1087833 Generate jar containing test classes. - Key: SOLR-2061 URL: https://issues.apache.org/jira/browse/SOLR-2061 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.1 Reporter: Drew Farris Assignee: Steven Rowe Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate and deploy a jar contaiing the test classes so other projects could write unit tests using the framework in Solr. This may take care of SOLR-717 as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2061) Generate jar containing test classes.
[ https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014773#comment-13014773 ] Steven Rowe edited comment on SOLR-2061 at 4/1/11 6:02 PM: --- Committed: - trunk: r1087722, r1087723, r1087834 - branch_3x: r1087833, r1087834 was (Author: steve_rowe): Committed: - trunk: r1087722, r1087723, r1087834 - branch_3x: r1087833 Generate jar containing test classes. - Key: SOLR-2061 URL: https://issues.apache.org/jira/browse/SOLR-2061 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.1 Reporter: Drew Farris Assignee: Steven Rowe Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate and deploy a jar contaiing the test classes so other projects could write unit tests using the framework in Solr. This may take care of SOLR-717 as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2061) Generate jar containing test classes.
[ https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014773#comment-13014773 ] Steven Rowe edited comment on SOLR-2061 at 4/1/11 6:04 PM: --- Committed: - trunk: r1087722, r1087723, r1087834 - branch_3x: r1087833 was (Author: steve_rowe): Committed: - trunk: r1087722, r1087723, r1087834 - branch_3x: r1087833, r1087834 Generate jar containing test classes. - Key: SOLR-2061 URL: https://issues.apache.org/jira/browse/SOLR-2061 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.1 Reporter: Drew Farris Assignee: Steven Rowe Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate and deploy a jar contaiing the test classes so other projects could write unit tests using the framework in Solr. This may take care of SOLR-717 as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6613 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6613/ 1 tests failed. FAILED: org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2746) at java.util.ArrayList.ensureCapacity(ArrayList.java:187) at java.util.ArrayList.add(ArrayList.java:378) at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:60) at org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132) at org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171) at org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155) at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:222) at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:188) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:140) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3195) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2828) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1747) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1742) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1738) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2457) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1180) at org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting(TestIndexWriter.java:2688) Build Log (for compile errors): [...truncated 3159 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6615 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6615/ 1 tests failed. REGRESSION: org.apache.solr.spelling.suggest.SuggesterTest.testBenchmark Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.IdentityHashMap.resize(IdentityHashMap.java:469) at java.util.IdentityHashMap.put(IdentityHashMap.java:445) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:128) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) Build Log (for compile errors): [...truncated 8750 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6616 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6616/ 1 tests failed. FAILED: org.apache.solr.spelling.suggest.SuggesterTest.testBenchmark Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.IdentityHashMap.resize(IdentityHashMap.java:469) at java.util.IdentityHashMap.put(IdentityHashMap.java:445) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:128) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) Build Log (for compile errors): [...truncated 8744 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2444) Update fl syntax to support: pseudo fields, AS, transformers, and wildcards
[ https://issues.apache.org/jira/browse/SOLR-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-2444: Summary: Update fl syntax to support: pseudo fields, AS, transformers, and wildcards (was: support wildcards in fl parameter, improve DocTransformer parsing) I just started a new branch and implemented some of the things we have suggested. Check: https://svn.apache.org/repos/asf/lucene/dev/branches/pseudo/ This implements: h3. SQL style AS {code} ?fl=id,field AS display {code} will display 'field' with the name 'display' h3. Pseudo Fields You can define pseudo fields with ?hl.pseudo=key:value Any key that matches something in the fl param gets replaced with value. For example: {code} ?fl=id,pricefl.pseudo=price:real_price_field {code} is the same as {code} ?fl=id,real_price_field AS price {code} h3. Transformer Syntax [name] The previous underscore syntax is replaced with brackets. {code} ?fl=id,[value:10] AS 10 {code} Hopefully this will make it more clear that it is calling a function. Update fl syntax to support: pseudo fields, AS, transformers, and wildcards --- Key: SOLR-2444 URL: https://issues.apache.org/jira/browse/SOLR-2444 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Attachments: SOLR-2444-fl-parsing.patch, SOLR-2444-fl-parsing.patch The ReturnFields parsing needs to be improved. It should also support wildcards -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2444) Update fl syntax to support: pseudo fields, AS, transformers, and wildcards
[ https://issues.apache.org/jira/browse/SOLR-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014838#comment-13014838 ] Ryan McKinley commented on SOLR-2444: - Just commited the changes -- yonik, i replaced your fancy parsing with something i can understand (StringTokenizer and indexof) I figure we should agree on a syntax first, and then optimize the fl parsing (out of my league) Update fl syntax to support: pseudo fields, AS, transformers, and wildcards --- Key: SOLR-2444 URL: https://issues.apache.org/jira/browse/SOLR-2444 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Attachments: SOLR-2444-fl-parsing.patch, SOLR-2444-fl-parsing.patch The ReturnFields parsing needs to be improved. It should also support wildcards -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014841#comment-13014841 ] David Smiley commented on SOLR-2155: To anyone listening: I'll continue to support my latest patch here with any bug fixes or basic things. As of today I'll principally be working directly with Ryan McKinley on his lucene-spatial-playground code-base. He ported my patch to this framework as the predominant means of searching for points (single or multi-value) and I'm going to finish what he started. This new framework is superior to the geospatial mess in Lucene/Solr right now (no offense to any involved). It won't be long before it's ready for broad use as a replacement for anything existing. I look forward to exploring new indexing techniques with this framework, and for it to eventually become part of Lucene/Solr. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6619 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6619/ 1 tests failed. REGRESSION: org.apache.solr.spelling.suggest.SuggesterTest.testBenchmark Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.IdentityHashMap.resize(IdentityHashMap.java:469) at java.util.IdentityHashMap.put(IdentityHashMap.java:445) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:128) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) Build Log (for compile errors): [...truncated 8764 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2433) Make FieldProperties bit masks protected
[ https://issues.apache.org/jira/browse/SOLR-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley resolved SOLR-2433. - Resolution: Fixed Fix Version/s: 3.2 Assignee: Ryan McKinley Make FieldProperties bit masks protected Key: SOLR-2433 URL: https://issues.apache.org/jira/browse/SOLR-2433 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Trivial Fix For: 3.2, 4.0 Attachments: SOLR-2433-ProtectedFieldProperties.patch bit mask values are now package protected, so we have to duplicate: {code:java} final static int INDEXED = 0x0001; final static int TOKENIZED = 0x0002; final static int STORED = 0x0004; final static int BINARY = 0x0008; final static int OMIT_NORMS = 0x0010; ... {code} to set these fields explicitly. This is important for complex fields like LatLonType and poly fields in general -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-332) Visibility of static int fields in FieldProperties should be increased to allow custom FieldTypes to use them
[ https://issues.apache.org/jira/browse/SOLR-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley resolved SOLR-332. Resolution: Fixed made protected in SOLR-2433 Visibility of static int fields in FieldProperties should be increased to allow custom FieldTypes to use them - Key: SOLR-332 URL: https://issues.apache.org/jira/browse/SOLR-332 Project: Solr Issue Type: Improvement Affects Versions: 1.3 Reporter: Jonathan Woods Priority: Minor Constants in org.apache.solr.schema aren't visible to classes outside that package, yet they're useful e.g. for custom FieldTypes. Could their visibility be increased? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014873#comment-13014873 ] Lance Norskog commented on SOLR-2155: - Excellent! Geo is a complex topic, too big for a one-man project. Lance Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Solr Wiki] Update of Troubleshooting by YonikSeeley
I'm confused ... this isn't a troubleshooting page, it's a request for help diagnosing an error -- there's no tips/tricks/advice here, just someone getting confused between solr.xml and tomcat context files. shouldn't we just delete this? : The Troubleshooting page has been changed by YonikSeeley. : The comment on this change is: add troubleshooting page. : http://wiki.apache.org/solr/Troubleshooting : : -- : : New page: : : * [[Troubleshooting HTTP Status 404 - missing core name in path]] : -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2335) FunctionQParser can't handle fieldnames containing whitespace
[ https://issues.apache.org/jira/browse/SOLR-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2335: --- Description: FunctionQParser has some simplistic assumptions about what types of field names it will deal with, in particular it can't deal with field names containing whitespaces. was: We use an external file field configured as dynamic field. The dynamic field name (and so the name of the provided file) may contain spaces. But currently it is not possible to query for such fields. The following query results in a ParseException: q=_val_:(experience_foo\ bar) org.apache.lucene.queryParser.ParseException: Cannot parse '_val_:(experience_foo\ bar)': Expected ',' at position 15 in 'experience_foo bar' We use following configuration for the externalFileField: types ... fieldType name=experienceRankFile keyField=id defVal=0 stored=false indexed=false class=solr.ExternalFileField valType=float/ /types fields dynamicField name=experience_* type=experienceRankFile / ... /field Summary: FunctionQParser can't handle fieldnames containing whitespace (was: External file field name containing whitespace not supported) Updating summary/description based on root of problem. Description form original bug reporter... {quote} We use an external file field configured as dynamic field. The dynamic field name (and so the name of the provided file) may contain spaces. But currently it is not possible to query for such fields. The following query results in a ParseException: q=_val_:(experience_foo\ bar) org.apache.lucene.queryParser.ParseException: Cannot parse '_val_:(experience_foo\ bar)': Expected ',' at position 15 in 'experience_foo bar' We use following configuration for the externalFileField: types ... fieldType name=experienceRankFile keyField=id defVal=0 stored=false indexed=false class=solr.ExternalFileField valType=float/ /types fields dynamicField name=experience_* type=experienceRankFile / ... /field {quote} The original reasons for these assumptions in FunctionQParser are still generally good: it helps keep the syntax and the parsing simpler then they would otherwise need to be. I think an easy improvement we could make is to leave the current parsing logic the way it is, but provide a new FieldValueSourceParaser that expects a single (quoted) string as input, and just returns the FieldValueSource for that field. So these two would be equivilent... {code} {!func}myFieldName {!func}field(myFieldName) {code} ...but it would also be possible to write... {code} {!func}field(1 my wacky Field*Name) {code} FunctionQParser can't handle fieldnames containing whitespace - Key: SOLR-2335 URL: https://issues.apache.org/jira/browse/SOLR-2335 Project: Solr Issue Type: Bug Affects Versions: 1.4.1 Reporter: Miriam Doelle Priority: Minor FunctionQParser has some simplistic assumptions about what types of field names it will deal with, in particular it can't deal with field names containing whitespaces. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6624 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6624/ 1 tests failed. REGRESSION: org.apache.solr.spelling.suggest.SuggesterTest.testBenchmark Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.IdentityHashMap.resize(IdentityHashMap.java:469) at java.util.IdentityHashMap.put(IdentityHashMap.java:445) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:128) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) Build Log (for compile errors): [...truncated 8754 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Solr Wiki] Update of Troubleshooting by YonikSeeley
On Fri, Apr 1, 2011 at 8:55 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I'm confused ... this isn't a troubleshooting page, it's a request for help diagnosing an error -- there's no tips/tricks/advice here, just someone getting confused between solr.xml and tomcat context files. shouldn't we just delete this? Heh - I only scanned it quick enough to realize it shouldn't be a top-level link. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2335) FunctionQParser can't handle fieldnames containing whitespace
[ https://issues.apache.org/jira/browse/SOLR-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014935#comment-13014935 ] Yonik Seeley commented on SOLR-2335: oh, that's clever. I like it! FunctionQParser can't handle fieldnames containing whitespace - Key: SOLR-2335 URL: https://issues.apache.org/jira/browse/SOLR-2335 Project: Solr Issue Type: Bug Affects Versions: 1.4.1 Reporter: Miriam Doelle Priority: Minor Attachments: SOLR-2335.patch FunctionQParser has some simplistic assumptions about what types of field names it will deal with, in particular it can't deal with field names containing whitespaces. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2335) FunctionQParser can't handle fieldnames containing whitespace
[ https://issues.apache.org/jira/browse/SOLR-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014936#comment-13014936 ] Hoss Man commented on SOLR-2335: the other thing this should make possible is sorting on fields that historicly havne't been sortable... {code} sort=field(1 my wacky Field*Name) desc {code} ... the sort parsing code *could* even be optimized to detect when a function sort results in a FieldValueSource and swap it out with a regular sort ... but i'm not sure if there are any gotchas there. FunctionQParser can't handle fieldnames containing whitespace - Key: SOLR-2335 URL: https://issues.apache.org/jira/browse/SOLR-2335 Project: Solr Issue Type: Bug Affects Versions: 1.4.1 Reporter: Miriam Doelle Priority: Minor Attachments: SOLR-2335.patch FunctionQParser has some simplistic assumptions about what types of field names it will deal with, in particular it can't deal with field names containing whitespaces. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 6613 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6613/ 1 tests failed. REGRESSION: org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2894) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589) at java.lang.StringBuffer.append(StringBuffer.java:337) at java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617) at org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93) at org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304) at org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010) Build Log (for compile errors): [...truncated 5264 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6625 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6625/ 1 tests failed. FAILED: org.apache.solr.spelling.suggest.SuggesterTest.testBenchmark Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.IdentityHashMap.resize(IdentityHashMap.java:469) at java.util.IdentityHashMap.put(IdentityHashMap.java:445) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:128) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:132) at org.apache.lucene.util.RamUsageEstimator.size(RamUsageEstimator.java:153) at org.apache.lucene.util.RamUsageEstimator.sizeOfArray(RamUsageEstimator.java:178) Build Log (for compile errors): [...truncated 8761 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org