RE: issues.apache.org compromised: please update your passwords
: I disabled the account by assigning a dummy eMail and gave it a random password. : : I was not able to unassign the issues, as most issues were Closed, : where no modifications can be done anymore. Reopening and changing Uwe: it may be too late (depending on wether you remember the dummy password) but an alternate course of action would have been to change the email address to the PMC list (priv...@lucene) which is not publicly archived. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Changing the subject for a JIRA-issue (Was: [jira] Created: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into f
: Is it possible to change it? If not, what is the policy here? To open a : new issue and close the old one? ... : In this case, that would mean either closing this issue and opening a new one, : or taking the discussion to the mailing list where subject headers may be : modified as the conversation evolves. Any one who can edit an issue (ie: all hte committers, and anyone in the developer group) can change the summary (which change the email subjects) It's not clear to me what the summar of LUCENE-2335 should be, but McCandless opened the issue, he can certainly fix the summar as the issue evolves. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1144
: No, no, no, Lucene still has no need for maven or ivy for dependency management. : We can just hack around all issues with ant scripts. it doesn't really matter if it's ant scripts, or ivy declarations, or maven pom entries -- the point is the same. We can't distribute the jars, but we can distribute programatic means for users to fetch teh jars themselves. (even if we magicly switched to ivy or maven for dependency management, problems like this build failure would still exist if a/the dependency repo was down at build time, and we'd still likely distribute fat binary tar balls containing all the dependency jars whose licenses are compatible with the ASL. (users who download the binary artifacts shouldn't *have* to know ivy/maven to use LUcene, anymore then they have to know ant right now) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #1144
: I was wondering yesterday why aren't the required libs checked in to SVN? We Licensing issues. we can't redistribute them (but we can provide the build.xml code to fetch them) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Contrib tests fail if core jar is not up to date
: In addition to what Shai mentioned, I wanted to say that there are : other oddities about how the contrib tests run in ant. For example, : I'm not sure why we create the junitfailed.flag files (I think it has : something to do with detecting top-level that a single contrib : failed). Correct ... even if one contrib fails, test-contrib attempts to run the tests for all the other contribs, and then fails if any junitfailed.flag files are found in any contribs. The assumption was if you were specificly testing a single contrib you'd be using the contrib specific build from it's own directory, and it would still fail fast -- it's only if you run test-contrib from the top level that it ignores when ant test fails for individual contribs, and then reports the failure at the end. It's a hack, but it's a useful hack for getting nightly builds that can report on the tests for all contribs, even if the first one fails (it's less useful when one contrib depends on another, but that's a more complex issue) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
: build and nicely gets all dependencies to Lucene and Tika whenever I build : or release, no problem there and certainly no need to have it merged into : Lucene's svn! The key distinction is that Solr is allready in Lucene's svn -- The question is how reorg things in a way that makes it easier to build Solr and Lucene-Java all at once, while wtill making it easy to build just Lucene-Java. : Professionally i work on a (world-class) geocoder that also nicely depends : on Lucene by using maven, no problems there at all and no need to merge : that code in Lucene's svn! Unless maven has some features i'm not aware of, your nicely depends works buy pulling Lucene jars from a repository -- changing Solr to do that (instead of having committed jars) would be farrly simple (with or w/o maven), but that's not the goal. The goal is to make it easy to build both at once, have patches that update both, and (make it easy to) have atomic svn commits that touch both. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
: with, if id didn't happen on the lists, it didn't happen. Its the same as +1 But as the IRC channel gets used more and more, it would *also* be nice if there was an archive of the IRC channel so that there is a place to go look to understand the back story behind an idea once it's synthesized and posted to the lists/jira. That's the huge advantage IRC has over informal conversations at hackathons, apachecon, and meetups -- there can in fact be easily archivable/parsable/searchable records of the communication. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
: prime-time as the new solr trunk! Lucene and Solr need to move to a : common trunk for a host of reasons, including single patches that can : cover both, shared tags and branches, and shared test code w/o a test : jar. Without a clearer picture of how people envision development overhead working as we move forward, it's really hard to understand how any of these ideas make sense... 1) how should hte automated build process(es) work? 2) how are we going to do branching/tagging for releases? particularly in situations where one product is ready for a rlease and hte other isn't? 3) how are we going to deal with mino bug fix release tagging? 4) should it be possible for people to check out Lucene-Java w/o checking out Solr? (i suspect a whole lot of people who only care about the core library are going to really adamantly not want to have to check out all of Solr just to work on the core) : Both projects move to a new trunk: : /something/trunk/java, /something/trunk/solr by gut says something like this will more the most sense, assuming /something/trunk == /java/trunk and java actually means core ... ie: this discussion should really be part and parcel with how contribs should be reorged. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [DISCUSS] Do away with Contrib Committers and make core committers
: Subject: [DISCUSS] Do away with Contrib Committers and make core committers +1 -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [Lucene-java Wiki] Update of ReleaseTodo by RobertMuir
: nice of the wiki software to change every single line! this type of thing seems to happen anytime you edit in GUI mode for the first time since the MoinMOin upgradea few months back -- it's normalizing all the whitespace. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: back_compat folders in tags when I SVN update
: I prefer to see tags used for what it is, a place to park an actually : released; it shouldn't be used for testing or its content changed : dynamically. I have no opinion about the rest of this thread (changing the back compat testing to use a specific revision on the previous release branch) but as for this specific comment: it's really a mistake to think of tags as only being for releases. the TTB convetion in svn (trunk, tags, branches) stems from what's considered a best practice when migrating from CVS: trunk corrisponds to MAIN in cvs, the branches directory corrisponds to the list of branching tags in CVS, and the tag directory corrisponds to the list of tags in CVS. there is nothing special about the concept of a CVS/SVN tag that should make it synonymous in peopls minds with a release ... yes we tag every release, but there are lots of other reasons to tag things in both CVS and SVN: release candidates are frequently taged, many other projects tag stable builds from their continuous integration system ... a developer could create an aritrary checkpoint tag to denote when there was a dramatic shift in development in a project in case people wnated to easily find when that shift happened so they could go back and fork a branch at that point if that approach was deemed unsuccessful. bottom line: not a good idea to assume all tags are releases. (that said: the TTB convetion is nothing more then a convention ... there's nothing to stop us from using a more verbose directory hierarchy to isolate release tags in a single place.. ./trunk ./branches/branch_a ./... ./tags ./tags/releases ./tags/releases/2_9_0 ./... ./tags/some_misc_tag ) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: back_compat folders in tags when I SVN update
: Why do I see \java\tags\lucene_*_back_compat_tests_2009*\ directories (well : over 100 so far) when I SVN update? Are you saying you have http://svn.apache.org/repos/asf/lucene/java/; checked out in it's entirety? That seems ... problematic. New tags/branches could be created at anytime -- it's even possibly to have Hudson autotag every build if we wanted. Server side these tags are essentially free but if you checkout at the top level you pay the price of local storage on update. I would rethink your checkout strategy. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene Java 2.9.2
: https://issues.apache.org/jira/browse/LUCENENET-331). This begs the : question, if Lucene.Net takes just this one patch, than Lucene.Net 2.9.1 is : now 2.9.1.1 (which I personally don't like to see happening as I prefer to : see a 1-to-1 release match). As a general comment on this topic: I would suggest that if the goal of Lucene.Net is to be a 1-to-1 port (which seems like a good goal, but is certainly not mandatory if the Lucene.Net community has other ambitions) then the cleanest thing for users would be to keep the version numbers in sync 1-to-1. it reasises some questions about what to do if a bug is discovered in the *porting*. ie: if after Lucene.Net 2.9.2 is released, it's discovered that there was a glitch, and it doesn't actually match the behavior of Lucene-JAva 2.9.2 what should be done? ... Lucene.Net 2.9.3 and Lucene.Net 2.9.2.1 could all concivably conflict with version numbers Lucene-Java *might* someday release. Having an anotaiton strategy that doesn't extend the dot notation used by Lucene-Java might make sense (ie: Lucene.Net 2.9.2-a -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: nightly.sh
: I configured hudson to simply run the hudson.sh from the nightly checkout. +1 -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (NAG) Push fast-vector-highlighter mvn artifacts for 3.0 and 2.9
: What to do now, any votes on adding the missing maven artifacts for : fast-vector-highlighter to 2.9.1 and 3.0.0 on the apache maven reposititory? It's not even clear to me that anything special needs to be done before publishing those jars to maven. 2.9.1 and 3.0.0 were already voted on and released -- including all of the source code in them. The safest bet least likely to anger the process gods is just to call a vote (new thread with VOTE in the subject) and cast a vote ... considering the sources has already been reviewed it should go pretty quick. : : I rebuilt the maven-dir for 2.9.1 and 3.0.0, merged them (3.0.0 is top- : level : version) and extracted only fast-vector-highlighter: : : http://people.apache.org/~uschindler/staging-area/ : : I will copy this dir to the maven folder on people.a.o, when I got votes : (how many)? At least someone should check the signatures. : : By the way, we have a small error in our ant build.xml that inserts : svnversion into the manifest file. This version is not the version of the : last changed item (would be svnversion -c) but the current svn version, : even : that I checked out the corresponding tags. It's no problem at all, but not : very nice. : : Maybe we should change build.xml to call svnversion -c in future, to get : the real number. : : Uwe : : - : Uwe Schindler : H.-H.-Meier-Allee 63, D-28213 Bremen : http://www.thetaphi.de : eMail: u...@thetaphi.de : : : -Original Message- : From: Grant Ingersoll [mailto:gsing...@apache.org] : Sent: Saturday, December 05, 2009 10:26 PM : To: java-dev@lucene.apache.org : Subject: Re: Push fast-vector-highlighter mvn artifacts for 3.0 and 2.9 : : I suppose we could put up the artifacts on a dev site and then we could : vote to release both of them pretty quickly. I think that should be : easy : to do, since it pretty much only involves verifying the jar and the : signatures. : : On Dec 5, 2009, at 1:03 PM, Simon Willnauer wrote: : :hi folks, :The maven artifacts for fast-vector-highlighter have never been pushed :since it was released because there were no pom.xml.template inside :the module. I added a pom file a day ago in the context of :LUCENE-2107. I already talked to uwe and grant how to deal with this :issues and if we should push the artifact for Lucene 2.9 / 3.0. Since :this is only a metadata file we could consider rebuilding the :artefacts and publish them for those releases. I can not remember that :anything like that happened before, so we should discuss how to deal :with this situation and if we should wait until 3.1. : :simon : :- :To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org :For additional commands, e-mail: java-dev-h...@lucene.apache.org : : : : : - : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: java-dev-h...@lucene.apache.org : : : : - : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: java-dev-h...@lucene.apache.org : : : : - : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: java-dev-h...@lucene.apache.org : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Jira emails via Gmail
: I signed up for a login, and voted for this issue. If others did the same, : that might help. if you read the comments in the issue, there's really nothing that can be fixed in Jira to make this work better -- jira already puts an In-Reply-To header on all of the messages so that mail clients who do threading correctly can use them -- the problem is that Gmail isn't looking at those headers, and is instead focusing on subject. voting for JRA-12640 probably won't accomplish much, since the bug is in GMail, not Jira -- but voting for JRA-3609 might help since then Jira could be customized to keep the subject consistent for all types of messages related to a single issue... http://jira.atlassian.com/browse/JRA-3609 ...or you could file a bug with GMail asking them to implement the defacto standard algorithm for email message threading, using all of the various headers that exist for this purpose... http://www.jwz.org/doc/threading.html -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene.zones.apache.org dead?
: Hudson says, the lucene node is dead, so builds are stuck since 2 days. Does : anybody knows more? Uwe: i didn't find any evidence that you opend an infra bug about this, so i went ahead and created one... https://issues.apache.org/jira/browse/INFRA-2351 -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Junit4
: putting too many irons in the fire, especially non-critical ones. I don't : see a way to assign it to myself, either I'm missing something or I'm just : underprivileged G, so if someone would go ahead and assign it to me I'll : work on it post 3.0. Jira's ACLs prevent issues from being assigned to people who aren't listed in the Contributors group. THe policy has been to add people to that list (for issue assignment) on request, so i hooked you up. (NOTE: if anyone else has issues they're actively working on and would like to be flagged as a Contributor in Jira so that the issues can be assigned directly to you for tracking purpose, please speak up) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1974) BooleanQuery can not find all matches in special condition
: I think the other tests do not catch it because the error only happens : if the docID is over 8192 (the chunk size that BooleanScorer uses). : Most of our tests work on smaller sets of docs. I don't have time to try this out right now, but i wonder if just modifying the QueryUtils wrap* functions to create bigger empty indexes (with thousands of deleted docs instead of just a handful) would have triggered this bug ... might be worth testing against 2.9.0 to make sure there aren't any other weird edge cases before cutting 2.9.1. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r820115 - /lucene/java/trunk/common-build.xml
: - property name=javac.source value=1.4/ : - property name=javac.target value=1.4/ : + property name=javac.source value=1.5/ : + property name=javac.target value=1.5/ Isn't that one of the signs of the apocolypse? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?
: However, they may be something with the fact that Lucene's Analyzers : automatically close the reader when its done analyzing. I think this : encourages people not to explicitly close them, and creates the potential of : having open fd's if an exception is thrown in the middle of the analysis or : before addDocument/updateDocument is called. It's always been the case that users should close their own Readers -- lucene's docs have never indicated that they will close hte reader for you, it's just a helpful side effect that once IndexWRiter has consumed all hte chars from a Reader it calls close() -- the caller should still close() explicitly for precisely the reasons you listed, but there's really no downside to multiple close calls. even if we werent' worried about breaking existing client code (where people never call close themselves) it would still be a good idea to leave the close() calls in because the sooner the Readers are closed the sooner the descriptor can be released -- no reason to wait (ie: during a serialized merge for example) until addDocument is done if hte Reader has been completley exhausted. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?
: So in 2.9, the Reader is correctly closed, if the TokenStream chain is : correctly set up, passing all close() calls to the delegate. Thanks for digging into that Uwe. So Daniel: The ball is in your court here: what analyzer / tokenizer+tokenfilters is your app using in the cases where you see Readers not getting closed by Lucene -- if they involve your own custom Tokenizers then that may be where the problem is, but if all the Analysis pieces you are using come out of hte box with Lucene please let us know so we can check them. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?
: That is my opinion, too. Closing the readers should be done by the caller in I don't disagree with either of you, but... : a finally block and not automatically by the IW. I only wanted to confirm, : that the behaviour of 2.9 did not change. Closing readers two times is not a ...i wanted to try and confirm that as well. if we conciously decide that IndexWriter is going to *stop* closing all Readers that's fine with me, but in the absence of a specific statement like that in the release notes we should strive for no suprises. (that doesn't have to come in the form of code changes, it can simply be an announcemnt on java-user and documented cavet in the applicable code ... but as yet we don't have confirmation that any behavior change exists. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?
: Thanks Mark for the pointer, I thought somehow that lucene closed them as a : convenience, I don't know if it did that in previous releases (aka 2.4.1) but : I'll close them myself from now on. FWIW: As far as i know, Lucene has always closed the Reader for you when calling addDocument/updateDocument -- BUT -- the docs never promized that Lucene would close any Readers used in Fields. In fact the Field constructor docs say you may not close the Reader until addDocument has been called suggesting that you should close it yourself. (Reader.close() is very clear that there should be no effect on closing a Reader multiple times, so this is safe no matter what Lucene does) That said: If the behavior has changed in 2.9, this could easily bite lots of people in the ass if they haven't been closing their readers and now they run out of file handles. I wrote a quick test to try and reproduce the problem you're describing, but as far as i can tell 2.9.0 (final) still seems to close the Reader for you. Can anyone else reproduce this problem of Reader's in Field's not getting closed? (my test is below) --BEGIN-- package org.apache.lucene; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import org.apache.lucene.analysis.KeywordAnalyzer; import org.apache.lucene.index.*; import org.apache.lucene.document.*; import org.apache.lucene.util.LuceneTestCase; import org.apache.lucene.store.RAMDirectory; import java.io.*; public class TestFieldWithReaderClosing extends LuceneTestCase { IndexWriter writer = null; Document d = null; CloseStateReader reader; public void setUp() throws Exception { writer = new IndexWriter(new RAMDirectory(), new KeywordAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); d = new Document(); d.add(new Field(id, x, Field.Store.YES, Field.Index.ANALYZED)); reader = new CloseStateReader(foo); d.add(new Field(contents, reader)); } public void tearDown() throws Exception { writer.close(); writer = null; reader.close(); reader = null; } public void testAdd() throws Exception { writer.addDocument(d); assertEquals(close count should be 1, 1, reader.getCloseCount()); writer.close(); assertEquals(close count should still be 1, 1, reader.getCloseCount()); } public void testEmptyUpdate() throws Exception { writer.updateDocument(new Term(id,x), d); assertEquals(close count should be 1, 1, reader.getCloseCount()); writer.close(); assertEquals(close count should still be 1, 1, reader.getCloseCount()); } public void testAddAndUpdate() throws Exception { writer.addDocument(d); assertEquals(close count should be 1, 1, reader.getCloseCount()); d = new Document(); d.add(new Field(id, x, Field.Store.YES, Field.Index.ANALYZED)); reader = new CloseStateReader(foo); d.add(new Field(contents, reader)); writer.updateDocument(new Term(id,x), d); assertEquals(new close count should be 1, 1, reader.getCloseCount()); writer.close(); assertEquals(new close count should still be 1, 1, reader.getCloseCount()); } static class CloseStateReader extends StringReader { private int closeCount = 0; public CloseStateReader(String s) { super(s); } public synchronized void close() { closeCount++; super.close(); } public int getCloseCount() { return closeCount; } } } - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: [VOTE] Release Lucene 2.9.0
: - db/bdb fails to compile with 1.4 because of a ClassFormatError in one of : the bundled libs, so this contrib is in reality 1.5 only. there's not much we can do about that, no one can blame us if the dependency requires 1.5 : - Tests of contrib/misc use String.contains(), which is 1.5 only. As it just : searches for an whitespace, it can be replaced by indexOf(' ')=0 : - contrib/regex fails to build, because the JavaRegExpCapability defines an : (unused) constant based on the value in Pattern.LITERAL, which does not : exist in 1.4. Removing this constant fixes the problem. I'm willing to publicly say oh well on these changes. we've always said that contribs don't make the same back compat commitments as core... - contrib/misc still works until 1.4, it's only the test that doesn't work so oh well it's not worth cutting a new relase (if someone is using contrib/misc w/1.4 and wants to run the tests, i don't think it's an undue burden to suggest that they can change that one line and get 1.4 compat) - as for contrib/regex -- this change was made to add functionality, if at the time of the change people had said this means making contrib/regex require 1.5 i don't think anyone would have objected. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene 2.9.0
: http://people.apache.org/~markrmiller/staging-area/lucene2.9/ +1 -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Conflict discovered in 'whoweare.html'
: And I done it. Then I noticed this: : : http://wiki.apache.org/lucene-java/TopLevelProject That's about the TLP site (http://lucene.apache.org/) anything in a subdirectory is handled by the individual project site directories. according to HowToUpdateTheWebsite, both the versioned unversioned portions of the site are handled by grant's crontab using svn export : How can I solve the conflict? I don't think you need to worry about it ... once upon a time, the site was updated by people using svn co anytime there was a change, so there's still svn metadata there, but since it's updated via svn export now, that metadata is irrelevant. ...that's my hunch anyway, it's assuming everything on HowToUpdateTheWebsite is correct. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: ReleaseTodo steps
: They are there just not replicated or shown in mirrors? : http://www.apache.org/dist/lucene/java/ : : : Its pretty odd they don't go out to the mirrors - I mean, whats the : point? Users can't use them to verify anything anyway if they don't have : them. Anyone know anything about this? It's intentional: you always want to get the hash from the authoritative source (and not a mirror) so you can actaully verify the checksum. (particularly if you don't have gpg to check the signature files). http://tomcat.apache.org/download-connectors.cgi.. Alternatively, you can verify the MD5 signature (hash value) on the files. Make sure you get these files from the main site, rather than from a mirror. The above [MD5] links automatically retrieve the signature files from the main site. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #955
: but it says the tests only ran for 12 minutes, so it took a day to compile? The JUnit report on total testing time is just the sum of the timing reported for each test, and as the testIndexWRiter report notes... : duration0.0030/duration ... : errorDetailsForked Java VM exited abnormally. Please note : the time in the report does not reflect the time until the VM : exit./errorDetails -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: ReleaseTodo steps
: md5sum generates a hash line like this: : a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz : : Then when you do a check, it knows what file to check against. : : The Maven artifacts just list the hash though. So it seems proper to : remove the second part and just put the hash? Some background on the macro... https://issues.apache.org/jira/browse/LUCENE-904 And some info about what maven creates/expects in the MD5 files (i only skimmed this)... http://www.nabble.com/Checksum-Format-for-.md5-and-.sha1-Files-td21249817.html -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing
: Could a git branch make things easier for mega-features like this? why not just start a subversion branch? : : Further steps towards flexible indexing : --- : : Key: LUCENE-1458 : URL: https://issues.apache.org/jira/browse/LUCENE-1458 : Project: Lucene - Java : Issue Type: New Feature : Components: Index : Affects Versions: 2.9 : Reporter: Michael McCandless : Assignee: Michael McCandless : Priority: Minor : Attachments: LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2 : : : I attached a very rough checkpoint of my current patch, to get early : feedback. All tests pass, though back compat tests don't pass due to : changes to package-private APIs plus certain bugs in tests that : happened to work (eg call TermPostions.nextPosition() too many times, : which the new API asserts against). : [Aside: I think, when we commit changes to package-private APIs such : that back-compat tests don't pass, we could go back, make a branch on : the back-compat tag, commit changes to the tests to use the new : package private APIs on that branch, then fix nightly build to use the : tip of that branch?o] : There's still plenty to do before this is committable! This is a : rather large change: :* Switches to a new more efficient terms dict format. This still : uses tii/tis files, but the tii only stores term long offset : (not a TermInfo). At seek points, tis encodes term freq/prox : offsets absolutely instead of with deltas delta. Also, tis/tii : are structured by field, so we don't have to record field number : in every term. : . : On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB : - 0.64 MB) and tis file is 9% smaller (75.5 MB - 68.5 MB). : . : RAM usage when loading terms dict index is significantly less : since we only load an array of offsets and an array of String (no : more TermInfo array). It should be faster to init too. : . : This part is basically done. :* Introduces modular reader codec that strongly decouples terms dict : from docs/positions readers. EG there is no more TermInfo used : when reading the new format. : . : There's nice symmetry now between reading writing in the codec : chain -- the current docs/prox format is captured in: : {code} : FormatPostingsTermsDictWriter/Reader : FormatPostingsDocsWriter/Reader (.frq file) and : FormatPostingsPositionsWriter/Reader (.prx file). : {code} : This part is basically done. :* Introduces a new flex API for iterating through the fields, : terms, docs and positions: : {code} : FieldProducer - TermsEnum - DocsEnum - PostingsEnum : {code} : This replaces TermEnum/Docs/Positions. SegmentReader emulates the : old API on top of the new API to keep back-compat. : : Next steps: :* Plug in new codecs (pulsing, pfor) to exercise the modularity / : fix any hidden assumptions. :* Expose new API out of IndexReader, deprecate old API but emulate : old API on top of new one, switch all core/contrib users to the : new API. :* Maybe switch to AttributeSources as the base class for TermsEnum, : DocsEnum, PostingsEnum -- this would give readers API flexibility : (not just index-file-format flexibility). EG if someone wanted : to store payload at the term-doc level instead of : term-doc-position level, you could just add a new attribute. :* Test performance iterate. : : -- : This message is automatically generated by JIRA. : - : You can reply to this email to add a comment to the issue online. : : : - : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: java-dev-h...@lucene.apache.org : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: NumericRange Field and LuceneUtils?
: Subject: NumericRange Field and LuceneUtils? : References: 9ac0c6aa090932s69804fa5vbf5590ea6181e...@mail.gmail.com : In-Reply-To: 9ac0c6aa090932s69804fa5vbf5590ea6181e...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Efficiently running a single test class' tests?
: which I assume is in seconds. So the great bulk of the ant test : seems to be spent in various ant housecleaning tasks, trying to verify : that everything is indeed built, and/or looking for test classes that : might match the name ShingleFilterTest. Bear in mind, each contrib is built/tested seperately, so it's not just looking for every test that might match the pattern, it's iterating over each contrib and checking them all for a test that matches. : I tried running : : ant test-contrib -Dtestcase=ShingleFilterTest : : to see if limiting to contrib would be any faster. That came back in 5 : minutes, 27 seconds. Which is better, but still in the same ballpark. what kind of machine are you using? ... because on my box that only takes about 40 seconds. if you are working on a contrib, and want to just run tests in that contrib, switching to that working directory and running the targets there is always going to be faster... hoss...@brunner:~/lucene/java$ time ant test-contrib -Dtestcase=ShingleFilterTest tmp.out real0m32.142s user0m17.744s sys 0m8.074s hoss...@brunner:~/lucene/java$ cd contrib/analyzers/ hoss...@brunner:~/lucene/java/contrib/analyzers$ time ant test -Dtestcase=ShingleFilterTest ../../tmp-contrib.out real0m2.450s user0m1.644s sys 0m0.664s -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Updating Lucene index from two different threads in a web application
http://people.apache.org/~hossman/#java-dev Please Use java-u...@lucene Not java-...@lucene Your question is better suited for the java-u...@lucene mailing list ... not the java-...@lucene list. java-dev is for discussing development of the internals of the Lucene Java library ... it is *not* the appropriate place to ask questions about how to use the Lucene Java library when developing your own applications. Please resend your message to the java-user mailing list, where you are likely to get more/better responses since that list also has a larger number of subscribers. : Date: Mon, 31 Aug 2009 15:15:06 -0700 (PDT) : From: mitu2009 musicfrea...@gmail.com : Reply-To: java-dev@lucene.apache.org : To: java-dev@lucene.apache.org : Subject: Updating Lucene index from two different threads in a web application : : : Hi, : : I've a web application which uses Lucene for company search functionality. : When registered users add a new company,it is saved to database and also : gets indexed in Lucene based company search index in real time. : : When adding company in Lucene index, how do I handle use case of two or more : logged-in users posting a new company at the same time?Also, will both these : companies get indexed without any file lock, lock time out, etc. related : issues? : : Would appreciate if i could help with code as well. : : Thanks. : -- : View this message in context: http://www.nabble.com/Updating-Lucene-index-from-two-different-threads-in-a-web-application-tp25231264p25231264.html : Sent from the Lucene - Java Developer mailing list archive at Nabble.com. : : : - : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: java-dev-h...@lucene.apache.org : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Porting Java Lucene 2.9 to Lucene.Net (was: RE: Lucene 2.9 RC2 now available for testing)
: My question is, I would prefer to track SVN commits to keep track of : changes, vs. what I'm doing now. This will allow us to stay weeks : behind a Java release vs. months or years as it is now. However, while : I'm subscribed to SVN's commits mailing list, I'm not getting all those : commits! For example, a commit made this past Friday, I never got an : email for, while other commits I do. Any idea what maybe going on? i suggest you track things based on a combination of svn base url (ie: trunk vs a branch) and the specific svn revision number at the moment of your latest checkout -- that way you don't even need to subscribe to the commit list, just do an svn diff -r whenever you have some time to work on it and see what's been committed since the last time you worked on it. Hell: you could probably script all of this and have hudson do it for you. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Back-Compat on Contribs
: releases 2.9. Robert raised a question if we should mark smartcn as : experimental so that we can change interfaces and public methods etc. : during the refactoring. Would that make sense for 2.9 or is there no : such thing as a back compat policy for modules like that. http://wiki.apache.org/lucene-java/BackwardsCompatibility ... Contrib Packages All contribs are not created equal. The compatibility commitments of a contrib package can vary based on it's maturity and intended usage. The README.txt file for each contrib should identify it's approach to compatibility. If the README.txt file for a contrib package does not address it's backwards compatibility commitments users should assume it does not make any compatibility commitments. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Created: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages
: Thanks for the help finishing up the javadoc cleanup Hoss - we almost : have a clean javadoc run - which is fantastic, because I didn't think it : was going to be possible. I think its just this and 1863 and the run is : clean. you obviously haven't tried ant javadocs -Djavadoc.access=private lately ... i'm working on cleaning that up at the moment. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Created: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages
: you obviously haven't tried ant javadocs -Djavadoc.access=private lately : ... i'm working on cleaning that up at the moment. : tried it? I'm not even aware of it. Not mentioned in the release todo. yeah ... it's admittedly esoteric, but it helps surface bugs in docs on private level methods (which are useful for long term maintence) i'm thinking we should change the nightly build to set -Djavadoc.access=private so we at least expose more errors earlier. (assuming we also setup the hudson to report stats on javadoc warnings ... i've seen it in other instances but don't know if it requires a special plugin) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Created: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages
: i'm thinking we should change the nightly build to set : -Djavadoc.access=private so we at least expose more errors earlier. : (assuming we also setup the hudson to report stats on javadoc : warnings ... i've seen it in other instances but don't know if it requires : a special plugin) : If it gives more errors, shouldnt it be set always and everywhere? Why : not ... it doesn't just change the level of error checking -- it changes which methods get generated docs access refers to the java access level (public, protected, package, private) that should be exposed ... for releases we only want protected (the default in our build file) so we only advertise classes/methods/fields we expect consummers to use/override -- but as a side effect the javadoc tool never checks the docs on package/private members for correctness. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene website - benchmarks page
pulling a crap doc fro mteh release seems sound to me. alternately: couldn't we just replace it with the output from the contrib/benchmarker on some of the bigger tests (the full wikipedia ones) comparing 2.4 with 2.9 ? then just make it a pre-release TODO item for the future: update that page to reflect the benchmarks of the current release. : I would suggest we move it to the wiki (I think we can simply remove : the 1.2 and 1.3 benchmarks) and try to get a more recent benchmark : soon. In other words a benchmark page on the wiki could be maintained : by all users and commiters and would encurage people to publish their : results as the hurdle is not as high as it is if you wanna get : something on the official website. : I'm happy to add the page and encurage people on the user list to add : their benchmarks and performance experiences. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene website - benchmarks page
: Prob want to run it on decent hardware as well (eg mabye I shouldn't do : it with my 5200 rpm laptop drives). as long as both are run on the same hardware, and the page lists the hardware, it's the relative numbers that matter the most. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RAT on just src/java ???
I noticed that the Release TODO recommends running ant rat-sources to look for possible errors ... but the rat-soruces tag is setup to only analyze the src/java directory -- not any of the other source files included in the release (contrib, tests, demo, etc...) let alone the full release artifacts I though the whole point of RAT is to make sure you aren't releasing something you shouldn't be? I'm currently running rat on the dist zip/tgz products ... but does anyone know of a reason why it was setup this way? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: RAT on just src/java ???
: reason why I did only src/java. I agree we should have it cover all : sources. Hmmm... rat is a memory hog, but the rat ant task is ridiculous (probably because it only supports being bpassed filesets containing actualy files to analyze, i can't figure out a way to just give it a directory (FYI: what we currently have is a fileset anchored at a src/java and ant then gives rat all the files it finds under that. I vote we scrap an rat-sources alltogether and script this, it's not something most people need to run so i'm less worried if doesn't have robust support on multiple platforms lemme see what i can whip up real fast... -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: RAT on just src/java ???
: from the commandline i'm seeing about what you're seeing, from the ant correction .. even calling RAT directly (via ant's java) contrib takes a few minutes -- but it doens't chew up RAM (it was the uncompressed dist artifacts that were really fast on the comman line i think) : I wonder if you are hitting the temp bench files - those are nasty - : eclipse hates those sometimes too ... hmmm ... yeah, the work files are showing up in the contrib report ... alright, i think i've got the right idea how to make this work well now -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: RAT on just src/java ???
: How much RAM is it taking for you? I've got it scanning I didn't look into it htat hard. : demo/test/src/contrib and it takes 6 seconds - the mem does appear to : pop to like 160MB from 70 real quick - what are you seeing for RAM reqs? are you running from the commandline, or from ant? if you're running from ant, what does your target loook like? from the commandline i'm seeing about what you're seeing, from the ant task using something like the target below (if i remember correctly) it was hozing me bad... target name=rat-sources depends=rat-sources-typedef description=runs the tasks over src/java rat:report xmlns:rat=antlib:org.apache.rat.anttasks fileset dir=. include name=src/** / include name=contrib/** / /fileset /rat:report /target -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Lucene 2.9 release size
: This prompts the question (in my mind anyway): should source releases include third-party binary jars? if i remember correctly, the historical argument has been that this way the source release contains everything you need to compile the source. except that if i remember correctly (and i'm very tired at the moment) there are some contribs that won't compile without downloading additional jars (bdb?) so really the jars included in the source release artifacts just represent the jars that *can* be included in the source release. Not to dredge up maven/ivy dependencay management arguments -- but even if we wanted to be certain we were compiling specific versions, without depending on any special dependencay managment system/repo we could just have the source releases download the jars from our own site so people who don't care about compiling those contribs can get smaller source distributions. ...but i doubt it's worth trying to tackle before 2.9. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
ICU license info in NOTICE.txt ?
i notice this file has the full licensing info for ICU... contrib/collation/lib/ICU-LICENSE.txt ...but isn't there also suppose to be at least a one line mention of this in the top level NOTICE.txt file? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
competeing license ifo for snowball code?
can someone explain this to me... http://svn.apache.org/viewvc/lucene/java/trunk/contrib/snowball/LICENSE.txt?view=co http://svn.apache.org/viewvc/lucene/java/trunk/contrib/snowball/SNOWBALL-LICENSE.txt?view=co ...that first one seems like a (very old) mistake. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r807763 - /lucene/java/trunk/build.xml
: FWIW, committers can get Hudson accounts. See are you sure about that? I never understood the reason, but the wiki has always said... if you are a member of an ASF PMC, get in touch and we'll set you up with an account. : http://wiki.apache.org/general/Hudson. Committers can also get Lucene Zone -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: competeing license ifo for snowball code?
: There is a discussion about this at: : :http://issues.apache.org/jira/browse/LUCENE-740 Hmmm... ok. even with that in mind, I don't understand why we need ./contrib/snowball/LICENSE.txt -- all of (lucene) source code is already covered by ./LICENSE.txt right? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: ApacheCon US - Lucene Meetup?!
: I'm curious if there is a meetup this year @ ApacheCon US similar to : the one at ApacheCon Europe earlier this year? There's one on the schedule for tuesday night... http://wiki.apache.org/apachecon/ApacheMeetupsUs09 I'v updated the Lucene wiki page about apachecon (orriginally created for planning) to reflect the current state of affairs and summary of recent discussions (on gene...@lucene) about the apachecon gameplan... http://wiki.apache.org/lucene-java/LuceneAtApacheConUs2009 -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r807763 - /lucene/java/trunk/build.xml
: Grant does the cutover to hudson.zones still invoke the nightly.sh? I : thought it did? (But then looking at the console output from the : build, I can't correlate it..). nightly.sh is not run, there's a complicated set of shell commands configured in hudson that gets run instead. (why it's not just exec'ing a shellscript in svn isn't clear to me ... but it starts with set -x so the build log should make it clear exactly what's running. you can see from that log: the nightly ant target is still used. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Cleaning up Javadoc warnings in contribs
As a general rule: if the javadoc command generates a warning, it's a pretty good indication that the resulting javadocs aren't going to look the way you expect. (there may be lots of places where the javadocs look wrong and no warning is logged -- but the reverse is almost never true) The other day, I went through all of the warnings produced by ant javadocs-core and fixed the offending javadoc comments. It would be great if each of the various defacto contrib maintainers (you know who you are) could take a look at the warnings produced by each of the contribs. They're pretty easy to spot if you grep the raw console output from the nightly builds for [javadoc] and warning ... hoss...@coaster:~$ curl -s http://hudson.zones.apache.org/hudson/job/Lucene-trunk/922/consoleText | grep [javadoc] | grep warning | perl -nle 'print $1 if m{contrib/([^/]*)/}' | sort | uniq -c 96 analyzers 32 benchmark 52 collation 8 db 32 fast-vector-highlighter 32 highlighter 24 memory 40 queryparser 8 regex 52 remote 8 snowball 8 xml-query-parser -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Sorting cleanup and FieldCacheImpl.Entry confusion
: I don't know why Entry has int type and String locale, either. I : agree it'd be cleaner for FieldSortedHitQueue to store these on its : own, privately. : : Note that FieldSortedHitQueue is deprecated in favor of : FieldValueHitQueue, and that FieldValueHitQueue doesn't cache : comparators anymore. yeah ... but i'm hesitent to try and refactor that code at this point, especitally if FieldSortedHitQueue is going to be removed in 3.0. I'm thinking that for the time being, it's probably simpler to just comment those properties as being removable once FieldSortedHitQueue is removed, and leave them out of the CacheEntry (debugging/sanity) API, since there's no code path that will cause them to be set in FieldCacheImpl. is that cool with people? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Sorting cleanup and FieldCacheImpl.Entry confusion
Hey everybody, over in LUCENE-1749 i'm trying to make sanity checking of the FieldCache possible, and i'm banging my head into a few walls, and hoping people can help me fill in the gaps about how sorting w/FieldCache is *suppose* to work. For starters: i was getting confused why some debugging code wasn't showing the Locale specified when getting the String[] cache for Locale.US. Looking at FieldSortedHitQueue.comparatorStringLocale, i see that it calls FieldCache.DEFAULT.getStrings(reader, field) and doesn't pass the Locale at all -- which makes me wonder why FieldCacheImpl.Entry bothers having a locale member at all? ... it seems like the only purpose is so FieldSortedHitQueue can abuse the Entry object as a key for it's own static final FieldCacheImpl.Cache Comparators ... but couldn't it just use it's on key object and keep FieldCacheImpl.Entry simpler? Ditto for the int type property of FieldCacheImpl.Entry, which has the comment // which SortField type ... it's used by FieldSortedHitQueue in it's Comparators cache (and getCachedComparator) but FieldCacheImpl never uses it, but the time the FieldCache is access, the type has already been translated into the appropriate method (getInts, getBytes, etc...) if FieldSortedHitQueue used it's own private inner class for it's comparator cache, the FieldCacheImpl.Entry code could eliminate a lot of cruft, and the class would get much simpler. Does anyone know a good reason *why* it's implemented the way it currently is? or is this simply the end result of code gradually being refactored out of FieldCcaheImpl and into FieldSortedHitQueue ? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: backwards compat tests
: I wonder: if we run an svn commit . tags/lucene_2_4.../src whether : svn will do this as a single transaction? Because . (the trunk : checkout) and tags/lucene_2_4... are two separate svn checkouts. (I : haven't tested). If it does, then I think this approach is cleanest? you can't have an atomic commit across independent checkouts -- the common root dir needs to be a valid svn working copy. but you can have a common root dir that is a valid svn working copy (without checking out the entire svn hierarchy) by using non-recursive checkouts (-N). you don't even need the full subdir hierarchy, just checkout and descendent directory into that initial working directory hoss...@coaster:~/svn-test$ svn ls https://my.work.svn/svn-demo/ branches/ tags/ trunk/ hoss...@coaster:~/svn-test$ svn co -N https://my.work.svn/svn-demo/ demo Checked out revision 332746. hoss...@coaster:~/svn-test$ cd demo hoss...@coaster:~/svn-test/demo$ svn co https://my.work.svn/svn-demo/trunk/a-direcory/ trunk-a Atrunk-a/one_line_file.txt Checked out revision 332746. hoss...@coaster:~/svn-test/demo$ svn co https://my.work.svn/svn-demo/branches/BRANCH_DEMO_3/a-direcory branch-a Abranch-a/one_line_file.txt Checked out revision 332746. hoss...@coaster:~/svn-test/demo$ svn status ? trunk-a ? branch-a hoss...@coaster:~/svn-test/demo$ svn status trunk-a branch-a/ hoss...@coaster:~/svn-test/demo$ echo foo trunk-a/one_line_file.txt hoss...@coaster:~/svn-test/demo$ echo bar branch-a/one_line_file.txt hoss...@coaster:~/svn-test/demo$ svn commit -m cross checkout commit trunk-a branch-a Sendingbranch-a/one_line_file.txt Sendingtrunk-a/one_line_file.txt Transmitting file data .. Committed revision 332747. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1749) FieldCache introspection API
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738721#action_12738721 ] Chris Hostetter commented on LUCENE-1749: - : I've got one more draft here with the smallest of tweaks - javadoc : spelling errors, and one perhaps one or two other tiny things - stuff I : just would toss out rather than merge - but are you doing anything here : right now Hoss? I think not at the moment, so if thats the case I'll put : up one more patch before you grab the conch back. Otherwise I'll hold : off on anything till you put something up. you have the conch ... i haven't worked on anything related to this issue since my last patch. i'll try to look at it again tomorow. -Hoss FieldCache introspection API Key: LUCENE-1749 URL: https://issues.apache.org/jira/browse/LUCENE-1749 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Hoss Man Priority: Minor Fix For: 2.9 Attachments: fieldcache-introspection.patch, LUCENE-1749-hossfork.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch FieldCache should expose an Expert level API for runtime introspection of the FieldCache to provide info about what is in the FieldCache at any given moment. We should also provide utility methods for sanity checking that the FieldCache doesn't contain anything odd... * entries for the same reader/field with different types/parsers * entries for the same field/type/parser in a reader and it's subreader(s) * etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Updated: (LUCENE-1749) FieldCache introspection API
: changes to just go per reader for each doc - and a couple other unrelated tiny tweaks. FWIW: now that this issues has uncovered a few genuine bugs in code (as opposed to justs tests being odd) it would probably be better to track those bugs and their patches in seperate issues that can be individually refered to in CHANGES.txt (and reopened as needed) committing those bug fixes can be done independently of commiting the sanity checker. (PS: i'm making this suggestiong based purely on skiming the jiraemail stream from the last day or so ... i haven't looked at the patches but the decriptions seem to suggest they contain actual bug fixes, not just test modifications) : : FieldCache introspection API : : : Key: LUCENE-1749 : URL: https://issues.apache.org/jira/browse/LUCENE-1749 : Project: Lucene - Java : Issue Type: Improvement : Components: Search : Reporter: Hoss Man : Priority: Minor : Fix For: 2.9 : : Attachments: fieldcache-introspection.patch, LUCENE-1749-hossfork.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch : : : FieldCache should expose an Expert level API for runtime introspection of the FieldCache to provide info about what is in the FieldCache at any given moment. We should also provide utility methods for sanity checking that the FieldCache doesn't contain anything odd... : * entries for the same reader/field with different types/parsers : * entries for the same field/type/parser in a reader and it's subreader(s) : * etc... : : -- : This message is automatically generated by JIRA. : - : You can reply to this email to add a comment to the issue online. : : : - : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: java-dev-h...@lucene.apache.org : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1769) Fix wrong clover analysis because of backwards-tests, upgrade clover to 2.4.3 or better
: I didn't realize the nightly build runs the tests twice (with w/o : clover); I agree, running only with clover seems fine? i'm not caught up on this issue, but i happen to notice this comment in email. the reason the tests are run twice is because in between the two runs we package up the jars. clover instruments all the classes, so if we only ran hte tests once (w/clover), and then packaged the jars the nightly builds would include clover instrumented bytecode. if you look at the old Jira issues about clover this is discussed there. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1749) FieldCache introspection API
: In the insanity check, when you drop into the sequential subreaders - I : think its got to be recursive - you might have a multi at the top with : other subs, or any combo thereof. I can add to next patch. i don't have the code in front of me, but i thought i was adding the sub readers to the list it's iterating over, so it will eventually recurse all the way to the bottom. : : FieldCache introspection API : : : Key: LUCENE-1749 : URL: https://issues.apache.org/jira/browse/LUCENE-1749 : Project: Lucene - Java : Issue Type: Improvement : Components: Search : Reporter: Hoss Man : Priority: Minor : Fix For: 2.9 : : Attachments: fieldcache-introspection.patch, LUCENE-1749-hossfork.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch : : : FieldCache should expose an Expert level API for runtime introspection of the FieldCache to provide info about what is in the FieldCache at any given moment. We should also provide utility methods for sanity checking that the FieldCache doesn't contain anything odd... : * entries for the same reader/field with different types/parsers : * entries for the same field/type/parser in a reader and it's subreader(s) : * etc... : : -- : This message is automatically generated by JIRA. : - : You can reply to this email to add a comment to the issue online. : : : - : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: java-dev-h...@lucene.apache.org : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: [jira] Commented: (LUCENE-1764) SampleComparable doesn't work well in contrib/remote tests
: SortField.equals() and hashCode() contain a hint: : : /** Returns true if codeo/code is equal to this. If a :* {...@link SortComparatorSource} (deprecated) or {...@link :* FieldCache.Parser} was provided, it must properly :* implement equals (unless a singleton is always used). */ : : Maybe we should make this more visible, contain all different SortField : comparator/parsers and place it in the the setter methods for parser and : comparators. SortField doesn't seem like the right place at all -- people constructing instances of SortField, or calling setter methods of SortField shouldn't have to care about this at all -- it's people who extend SortComparatorSource or FieldCache.Parser who need to be aware of these issues, so shouldn't the class level javadocs for those packages spell it out? (ideally those abstract classes would declare hasCode and equals as abstract to *force* people to implement them ... but ship has sailed) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1764) SampleComparable doesn't work well in contrib/remote tests
: We prob want a javadoc warning of some kind too though right? Its not : immediately obvious that when you switch to using remote, you better : have implemented some form of equals/hashcode or you will have a memory : leak. Hmmm, now i'm confused. Uwe's comment in the issue said This is noted in the docs. and i beleived him and figured the problem was exclusive to the SampleComparable in the test ... but now that i'm *looking* at the docs, i don't see any red flags (in SortField, RemoteSearchable, SoreComparator, etc...) Uwe? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [ApacheCon US] Travel Assistance
: Is the assistance restricted to people presenting and committers? nope... http://www.apache.org/travel/index.html -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene 2.9 Again
: LUCENE-1749 FieldCache introspection API Unassigned 16/Jul/09 : : You have time to work on this Hoss? i'd have more time if there weren't so many darn solr-user questions that no one else answers. The meat of the patch (adding an API to inspect the cache) could be commited as is today -- i just don't know if the API makes sense (needs more eyeballs), and the real value add will be getting the sanity testing utilities in place ... those are only about half done. i'll try to work on it more this week(end) but if there isn't any progress from me, someone else (ahem: Miller?) should probably prune it down to the core function, add whatever javadocs are missing, and commit. (better to have release with a simple inspection API then to delay releasing while a fancy inspection methods gets hashed out) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Documentation Suggestion
: OK, I agree this makes sense and would be good for major features. : : Btw: For the new TokenStream API I wrote in the original patch (JIRA-1422) a : quite elaborate section in the package.html of the analysis package. Yeah ... whenever javadocs make sense, they're probably better then wiki docs ... in the case of Solr the userbase is rarely Java users, so it's good to have hollistic documentation somewhere other then javadocs. To me, the key is to make sure all functionality is documented *somewhere* before it gets committed. if it makes sense in javadocs great, if it's too widespread to fit neatly into the javadoc method/class/package structure, a wiki ting everything together is handy. That said: even with simple javadocs, having them on the wiki makes it a lot easier to read then needing to downlaod/apply the patch *then* generate javadocs to read the cross linked info. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
back compat policy changes?
(Please remain calm, this is just a request for clarification/summation) As I slowly catch up on the 9000+ Lucene related emails that I accumulated during my 2 month hiatus, I notice several rather large threads (i think totally ~400 messages) on the subject of our back compat policy (where it works, where it's failing us; where it hurts users because it works as designed, where it hurts users because it doesn't work as designed; how we could change it to be better, why we shouldn't change it; etc...) I won't pretend that i've read all of those messages ... i won't even pretend that I've skimmed all those messages, but i did skim *some* of those messages, and in some of the later threads there seemed to be a lot of concensus about ideas that (as far as i can tell) were not just leave things alone. With that in mind, i was kind of suprised to see that the neither of the two wiki pages (that i know of) related to backwards compatibility have been updated since *well* before all of the recent threads... http://wiki.apache.org/lucene-java/BackwardsCompatibility?action=info http://wiki.apache.org/lucene-java/Java_1%2e5_Migration?action=info My request is that someone who was involved in the previous discussions take a stab at updating one or both of those docs to reflect what the concensus of the community was. Other people can then review the diff for those documentation changes and spot check ewther they feel it reflects the concensus as they understand it. But until the written policy has been changed, our policy (by definition) hasn't really been changed. In short: Patches Welcome! -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Deleting old javadoc files on Hudson
: Done. Thanks for testing! I hate to be a buzz kill, but all this really does is replace the outdated javadoc generated index.html file with a new one that points at the subdirs we've created ... I don't see how this solves the root problem: Hudson doesn't delete the old files https://hudson.dev.java.net/issues/show_bug.cgi?id=1000 The Publish JavaDoc feature copies a configured path for javadocs into an existing archive directory -- any file that existed in a previous build of the javadocs and isn't in the current javadocs will still be there. All we've done is stop linking to the old flattened doc hierarchy, but any caches, bookmarks, or search engines linking to them will still find valid pages. In addition to my previous suggestion... http://www.gossamer-threads.com/lists/lucene/java-dev/70655#70655 ...another config option we could try is Retain javadoc for each successful build. There is a warning that this causes it to take up more disk (because it keeps the javadocs for each build) but I *think* if we use that option, it will create a brand new javadoc directory for each build. (it looks like our uncompressed javadocs are about 5 times as big as our binary artifacts ... but we currently keep the last 30 builds which seem excessive. If we cut hte number of archived builds we keep to 5 we'd wind up using less disk) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Common Bottlenecks
On Tue, 9 Jun 2009, Vico Marziale wrote: : highly-multicore processors to speed computer forensics tools. For the : moment I am trying to figure out what the most common performance bottleneck : inside of Lucene itself is. I will then take a crack at porting some (small) : portion of Lucene to CUDA (http://www.nvidia.com/object/cuda_what_is.html) : and see what kind of speedups are achievable. ... : appears to be a likely candidate. I've run the demo code through a profiler, : but it was less than helpful, especially in light of the fact bottlenecks : are going to be dependent on the way the Lucene API is used. In : general, what is the most computationally expensive part of the process? Vico: it doesn't look like you got any replies to your question. performance isn't something i generally focus on when working on lucene, but my suggestion for finding hot spots that could be improved is to look at the benchmark tests in the contrib/benchmark directory. Running some of those in a profiler should help you spot the likely candidates for improvements when dealing with non-trivial usecases. one thing to keep in mind is that search performance tends to be completley seperate form indexing performance ... you may want to tackle just one of those types of code paths. search tends to be the type of task that people are most concerned with optimizing for speed. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?
: The javadocs state clearly it must be MapString,String. Plus, the : type checking is in fact enforced (you hit an exception if you violate : it), dynamically (like Python). : : And then I was thinking with 1.5 (3.0 -- huh, neat how it's exactly : 2X) we'd statically type it (change Map to MapString,String). the other option i've seen in similar situations is to document that MapObject,Object is allowed, but that the Object will be toString()ed and the resulting value is what will be used. In the common case of Strings, the functionality is the same without requiring any explicit casting or instanceof error checking. the added bonuses are: 1) people can pass other simple objects (Integers, Foats, Booleans) and 99% of the time get what they want. 2) people can pass wrapper objects that implement toString() in a non trivial way and have the string produced for them lazily when the time comes to use the String. (ie: if my string value is expensive to produce, i can defer that cost until needed in case the commit fails for some other reason before my string is even used) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: bulk fixing svn eol-style?
: We have a number of sources that don't have eol-style set to native... This should also serve as a reminder for all committers to make sure they have sane auto-prop configs for their svn client when svn adding files -- SVN doesn't have any way to configure these on the server side, so you're responsible for setting them. The solr wiki has some recommended config options (which should probably get copied to the lucene-java wiki)... http://wiki.apache.org/solr/CommitterInfo#head-849f78497222f424339b79417056f4e510349fcb -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?
: But then when you retrieve your metadata it's converted to String - String. Correct ... the documentation should make it clear that what gets persisted is a String, but the method of giving the String to the API is by passing an Obejct that will be toString()ed. (Asside: it would be really nice if Java had a Stringable interface) It's not the prettiest API in the world, in a pure Java1.5 code base i wouldn't even suggest it, but in 1.4 code bases it tends to be a lot more freindly then then to document that people must pass a collection of Stings and cast them all. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Shouldn't IndexWriter.commit(Map) accept Properties instead?
: If the user serializes object, opens the index on another machine where : different versions of these classes are installed and he did not use : serialVersionId to create a version info in index. As long as you only : serialize standard Java classes like String, HashMap,... you will have no : problem with that, but with own classes a lot of care must be taken that : they can be serialized in different versions. In my case with the stored : document Field it was just a LinkedHashSet of String or something like that : (very easy for serialization). : : An the second problem is, that if you want to open such an index e.g. with : PyLucene? Should PyLucene just ignore the binary serialization data? Right ... i wouldn't advocate using Java serialization here for all of those reasons (especially since so many people have worked so hard to move towards dealing with pure byte[]s on disk instead of java serialized Strings) So to be clear: I wasn't in any way advocating that we do arbitrary serialization, or do anything different with the String values once we get them from the caller -- i was just suggesting an alternate API for getting String values from the caller in a way that didn't involve casting. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Weird problem w/ JIRA
: I'm back to getting duplicate emails. Every email sent on LUCENE-1708 was : sent to my email, and java-dev. So this really looks like it's a JIRA : project setting, since I only get these duplicates on issues I open. Am I : the only one? That's they way Jira works by default... it sends an email to everyone involved with an Issue (the reporter, the asignee, the watchers, etc...) we then have the project configured to *also* send notificatio of every change to java-dev. : Is it possible to change the settings of the project on JIRA? Or at least : allow me to say I don't want to get updates on this issue? not once you're opened it. the simplest solution is to make your jira email account something that gets filtered away seperately from mailing list account info (ie: directly into the trash) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Tests fail to compile on JDK 1.4?
: We had some discussions about it, the easiest is, to set the bootclasspath : in the javac/ task to an older rt.jar during compilation. Because this : needs updates for e.g. Hudson (rt.jar missing) we said, that the one, who : releases the final version should simply check this before on the : compilation computer in the release process. there are ways to automate this sanity check in ant, i took a stab at this a while back... https://issues.apache.org/jira/browse/LUCENE-718 ...but i never moved forward with it becuase most people didn't seemed that concerned. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Modularization
: Then during build we can package up certain combinations. I think : there should be sub-kitchen-sink jars by area, eg a jar that contains : all analyzers/tokenstreams/filters, all queries/filters, etc. Or just make it trivial to get all jars that fit a given profile w/o actually merging those jars into an uber-jar ... does maven's dependency management have any like bundles or virtual packages so we could publish a lucene-all-analzers POM that didn't have an actual lucene-all-analyzers.jar but listed dependencies on all of the individual jars? (FYI: Perl's CPAN has the concept of a Bundle that's just an empty distribution that depends on other distributions so you have an single refrence point for installing them) : So, how would you refactor the various sources of : analyzers/tokenstream/tokenfilters we have today : (src/java/org/apache/lucene/analysis/*, contrib/snowball/*, : contrib/collation/* and contrib/analyzers/*)? (Even contrib/memory : has a neat PatternAnalyzer, that operates on a string using a regexp : to get tokenns out, that only now am I just discovering). I think ideally the existig contrib/analysis would be broken up by language -- even if that means only 2 or 3 classes per jar -- but i don't deal with multilingual stuff much so i don't have much of an opinoin ... perhaps the majority of our users that deal with non-english tend to deal with *lots* of langauges so having a single multilingual-analysis module would be suitable. : We also need to think about how this impacts our back-compat policy. : EG when are we allowed to split up modules into sub-modules, or merge : them. spliting a module should always be fair game as long as the new module(s) maintain the same back compat policy ... it's not a burden to ask people to start using 2 jars instead of 1 jar (especially if we're already going to have an easy way to bundle jars up into uber-jars) in theory merging modules should require that the new module adopt the most restrictive back-compat policy of the previous modules. : Assuming there's general consensus on this break core into modules : approach, I think the next step is to take in inventory of all of : Lucene's classes and roughly divide them into proposed modules, and : iterate on that? Hoss do you want to take a first stab at that? Heh. i'm not sure i could even answer the want question in the afirmative. This is essentially a question of refactoring, and I think approaching this incrimentally would be the best strategy ... either by first finding some low hanging fruit in core that could be extracted int oa contrib easily (spans, query parser) or by restructuring the build system to put contribs and the demo on equal footing with core as modules and reasses as progress is made. on a personal note: even if i wanted to lead this charge, i really can't right now ... folks may have noticed my involvement with lucene has been markedly lower in the last few months, i expect it to get even lower over the next 2 months before it will (hopefully) get higher. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Modularization
: We've been doing this using just one source tree (like in Lucene), and : instead ensuring the separation using the build system. We did not, like you I think you are missunderstanding my previous comment ... Lucene-Java does not currenlty have one source tree in the sense that someone else suggested (i forget who) and i was commenting on ... at the moment Lucene has several source trees (src/java, src/demo, and each dir matching contrib/*/src). Based on your examples, i believe we are suggesting the same thing: building seperate modules from seperate base directories (in your case foo/A and foo/B) with well defined dependencies. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Modularization
: If there are any serious moves to reorganize things, we should at least : consider the benefits of maven. +1 we can certainly do a lot to improve things just by refacting stuff from core into contrib, and improving the visibility of contribs and documentation about contribs -- but if we're going to make massive changes to how things are built or how the source code is organized, then utilizing maven as the build system seems like an obvious choice to me. (and i don't even like maven that much) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Modularization
After stiring things up, and then being off-list for ~10 days, I'm in an interesting position coming back to this thread and seeing the discussion *after* it essentially ended, with a lot of semi-concensus but no clear sense of hard and fast resolution or plan of action. FWIW, here are the notes i made based on reading the thread about the various sentiments i noticed expressed (wether i agree with them or not) in order to try and get a handle on what had been discussed. some of these were the optinion of a single person and i've paraphrased, others are my generalization of similar comments made by various people... - contrib has a bad rap - widely varying degrees of quality/stability in contrib code, hard to get people to rely on the good ones because of the less good ones - many people want a good, out of hte box, kitchen sink experience (ie: one monolithic jar containing all the essentials) - need easy discoverability of all things of a given type (ie: all queries, all filters, all analyzers, etc...) .. ie: combined javadocs. - need easy installation of of all things of a given type (ie: a jar containing all types of queries, a jar containing all types of analyzers, etc...) - still need to deal with contribs that have external dependencies - still need to deal with contribs that require future versions of langauge (Java1.7 when core is still 1.5 compat) - users need better guidance about why something is a contrib (additional functionality, alternate functionality, example of use, tool, etc...) - while we should maintain/increase modularization, documentation should make features of contribs more promonent without stressing the isolation resulting from code modularization. - we should merge all contrib core code into a unified src/ tree, and make the pacakging independent of the physical location in svn (ie: jars based on java package, not directory) While I'm mostly in favor of all of these sentiments, and think it's really just a question of how to go about it, the last one is actually something i've pretty stronly opposed to -- I think the best way forward is to have lots of small, well isolated source trees. code isolation (by directory hierarchy) is hte best way i've seen to ensure modularization, and protect against inadvertent dependency bleeding. If we want to be able to produce small jars targeted at specific goals, and we want o.l.a.foo.FooClass to be in foo.jar and o.l.a.bar.BarClass to be in bar.jar then we shouldn't have src/java/o/l/a/foo/FooClass.java and src/java/o/l/a/bar/BarClass.java -- doing so makes it way to easy for inadvertnent dependencies to crop up that make FooClass depend on bar class, and thus make it impossible to use foo.jar without also using bar.jar at runtime. it's certainly possible to have all source code in a single directory hierarchy, and then rely on the build system to ensure your don't inwarranted dependencies, but that requires you do express rules in the build system about what exactly the acceptible dependencies are, and it relies on everyone using the buildsystem correctly (missguided users of hand-holding IDEs could get very frustrated when the patches they submit violate rules of an overly complicated set of ant build files) FWIW: having lots/more of very small, isolated, hierarcies also wouldn't hinder any attempts at having kitchen-sink or essential jars -- combining the classes from lots of little isolated code trees is a lot easier then extracting a few classes from one big code tree. One underlying assumption that seems to have permiated the existing discussion (without ever being explicitly stated) is the idea that most currently lives in src/java is the core and would be a single module ... personally i'd like to challege that assumption. I'd like to suggest that besides obvious things that could be refactored out into other modules (span queries, queryparser) there are lots of additional ways that src/java could be sliced... - interfaces and abstract clases and concrete classes for reading an index in one index-api.jar (ie: Directory but no FSDirectory; IndexReader but not MultiReader) - ditto for creating/updating an index in one index-update.jar (ie: IndexWriter, TokenStream, Tokenizer, TokenFilter, Analyzer but not any impls of the last 3) - ditto for searching in index-search.jar (ie: Searcher, Searchable, HitCollector, Query ... but not any concrete subclasses - simple-analysis.jar (SimpleAnalyzer, WhitespaceAnalyzer, LetterTokenizer, LowercaseFilter, etc...) - english-analysis.jar (StandardAnalyzer, etc...) - primative-queries.jar (TermQuery, BooleanQuery, MatchAllDocsQuery, MultiTermQuery, etc...) - range-queries.jar (RangeQuery, RangeFilter, ConstantScoreRangeQuery) ...etc... The crux of my point being that what we think of today as the lucene core is actually kind of big and bloated, and already has *a* kitchen sink thrown in -- it's just not neccessarily
Re: List Moderators
: Every now and again, someone emails me off list asking to be removed from the : list and I always forward them to Erik, b/c I know he is a moderator. : However, I was wondering who else is besides Erik, since, AIUI, there needs to : be at least 3 in ASF-land, right? : : So, if you're a list moderator for dev/user, please stand up. the docs for say committers have instructions for checking the moderators for any list, however the process seems to no longer work (probably because mail handling got moved onto a different box)... http://www.apache.org/dev/committers.html#mailing-list-moderators https://svn.apache.org/repos/private/committers/docs/resources.txt ...might be worth following up with INFRA to sanity check the list of moderators on all lucene lists, make sure we have three *active* moderators on each list. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: New flexible query parser
: My vote for contrib would depend on the state of the code - if it passes all : the tests and is truly back compat, and is not crazy slower, I don't see why : we don't move it in right away depending on confidence levels. That would : ensure use and attention that contrib often misses. The old parser could hang : around in deprecation. FWIW: It's always bugged me that the existing queryParser is in the core anyway ... as i've mentioned before: I'd love to see us move towards putting more features and add-on functionality in contribs and keeping the core as lean as possible: just the core functionality for indexing searching ... when things are split up, it's easy for people who want every lucene feature to include a bunch of jars; it's harder for people who want to run lucene in a small footprint (embedded apps?) to extract classes from a big jar. so my vote would be to make it a contrib ... even if we do deprecate the current query parser because this can be 100% back compatible -- it just makes it a great opportunity to get query parsing out of hte core. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Is TopDocCollector's collect() implementation correct?
(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) -- Forwarded message -- Date: Fri, 20 Mar 2009 15:29:13 -0700 (PDT) : TopDocCollector's (TDC) implementation of collect() seems a bit problematic : to me. This code isn't an area i'm very familiar with, but your assessment seems correct ... it looks like when LUCENE-1356 introduced the ability to provide a PriorityQueue to the constructor, the existing optimization when the score was obvoiusly too low was overlooked. It looks like this same bug got propogated to TopScoreDocCollector when it was introduced as well. : Introduce in TDC a private boolean which signals whether the default PQ is : used or not. If it's not used, don't do the 'else if' at all. If it is used, : then the 'else if' is safe. Then code could look like: my vote would just be to change the = comarison to a hq.lessThan call ... but i can understand how your proposal might be more efficient -- I'll let the performance experts fight it out ... but i definitely think you should fil a bug. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Using Highlighter for highlighting Phrase query
(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) : Date: Fri, 20 Mar 2009 15:30:27 -0700 (PDT) : : http://people.apache.org/~hossman/#java-dev : Please Use java-u...@lucene Not java-...@lucene : : Your question is better suited for the java-u...@lucene mailing list ... : not the java-...@lucene list. java-dev is for discussing development of : the internals of the Lucene Java library ... it is *not* the appropriate : place to ask questions about how to use the Lucene Java library when : developing your own applications. Please resend your message to : the java-user mailing list, where you are likely to get more/better : responses since that list also has a larger number of subscribers. : : : : : Date: Tue, 17 Mar 2009 07:38:08 -0700 (PDT) : : From: mitu2009 musicfrea...@gmail.com : : Reply-To: java-dev@lucene.apache.org : : To: java-dev@lucene.apache.org : : Subject: Using Highlighter for highlighting Phrase query : : : : : : Am using this version of Lucene highlighter.net API. I want to get a phrase : : highlighted only when ALL of its words are present in the search : : results..But,am not able to do sofor example, if my input search string : : is Leading telecom company, then the API only highlights telecom in the : : results if the result does not contain the words leading and company... : : : : Here is the code i'm using: : : : : SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter(); : : : : var appData = : : (string)AppDomain.CurrentDomain.GetData(DataDirectory); : : var folderpath = System.IO.Path.Combine(appData, MyFolder); : : : : indexReader = IndexReader.Open(folderpath); : : : : Highlighter highlighter = new Highlighter(htmlFormatter, new : : QueryScorer(finalQuery.Rewrite(indexReader))); : : : : : : highlighter.SetTextFragmenter(new SimpleFragmenter(800)); : : : : int maxNumFragmentsRequired = 5; : : : : string highlightedText = string.Empty; : : : : TokenStream tokenStream = this._analyzer.TokenStream(fieldName, : : new System.IO.StringReader(fieldText)); : : : : highlightedText = highlighter.GetBestFragments(tokenStream, : : fieldText, maxNumFragmentsRequired, ...); : : : : return highlightedText; : : : : -- : : View this message in context: http://www.nabble.com/Using-Highlighter-for-highlighting-Phrase-query-tp22560334p22560334.html : : Sent from the Lucene - Java Developer mailing list archive at Nabble.com. : : : : : : - : : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : : For additional commands, e-mail: java-dev-h...@lucene.apache.org : : : : : : -Hoss : : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Using MultiFieldQueryParser
(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) : Date: Fri, 20 Mar 2009 15:30:59 -0700 (PDT) : : http://people.apache.org/~hossman/#java-dev : Please Use java-u...@lucene Not java-...@lucene : : Your question is better suited for the java-u...@lucene mailing list ... : not the java-...@lucene list. java-dev is for discussing development of : the internals of the Lucene Java library ... it is *not* the appropriate : place to ask questions about how to use the Lucene Java library when : developing your own applications. Please resend your message to : the java-user mailing list, where you are likely to get more/better : responses since that list also has a larger number of subscribers. : : : : Date: Tue, 17 Mar 2009 08:47:05 -0700 (PDT) : : From: mitu2009 musicfrea...@gmail.com : : Reply-To: java-dev@lucene.apache.org : : To: java-dev@lucene.apache.org : : Subject: Using MultiFieldQueryParser : : : : : : Hi, : : : : Am working on a book search api using Lucene.User can search for a book : : whose title or description field contains C.F.A.. : : Am using Lucene's MultiFieldQueryParser..But after parsing, its removing the : : dots in the string. : : : : What am i missing here? : : : : Thanks. : : : : -- : : View this message in context: http://www.nabble.com/Using-MultiFieldQueryParser-tp22562134p22562134.html : : Sent from the Lucene - Java Developer mailing list archive at Nabble.com. : : : : : : - : : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : : For additional commands, e-mail: java-dev-h...@lucene.apache.org : : : : : : -Hoss : : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: move TrieRange* to core?
(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) : Date: Fri, 20 Mar 2009 16:51:05 -0700 (PDT) : : : I think we should move TrieRange* into core before 2.9? : : -0 : : I think we should try to move more things *out* of the core in 3.0 (as : i've mentioned in other threads) ... but i certianly understand the : arguments for going the other direction. : : : It's received alot of attention, from both developers (Uwe Yonik did : : lots of iterations, and Solr is folding it in) and user interest. : : it's a chicken/egg problem that we move things into the core because they : are very useful and we want to give them more visibilty, but if we had : less things in the core and more things in contribs (query parser, spans, : standard analyzer, non-primative Query impls, etc...) then contribs as a : whole would be more visible. ... I'm getting a sense of deja-vu, ah : yes, here it is ... : : http://www.nabble.com/Moving-SweetSpotSimilarity-out-of-contrib-to19267437.html#a19320894 : : : -Hoss : : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Getting tokens from search results. Simple concept
: What I would LOVE is if I could do it in a standard Lucene search like I : mentioned earlier. : Hit.doc[0].getHitTokenList() :confused: : Something like this... The Query/Scorer APIs don't provide any mechanism for information like that to be conveyed back up the call chain -- mainly because it's more heavy weight then most people need. If you have custom Query/Scorer implementations, you can keep track of whatever state you want when executing a QUery -- in fact the SpanQuery family of queries do keep track of exactly the type of info you seem to want, and after executing a query, you can ask it for the Spans of any matching document -- the down side is the a loss in performance of query execution (because it takes time/memory to keep track of all the matches) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Use of Unicode data in Lucene
: I can implement the functionality just using the data tables from the Unicode : Consortium, including http://www.unicode.org/reports/tr39, but there's still : the issue of the Unicode data license and its compatibility with Apache 2.0. : : Does anybody know whether http://www.unicode.org/copyright.html creates an : issue? What's the process for vetting a license? Or is this something I should : be posting to a different list? The authoritative docs to be familiar with are... http://www.apache.org/legal/3party.html and http://www.apache.org/legal/resolved.html ..but it's not clear to me exactly where the Unicode copyright/licenseing rules fall into the spectrum. The best place to ask questions about license compatibility issues is legal-disc...@apache (i'm pretty sure Ken already found that out since he posted there, just mentioning it for anyone else who might be interested) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Sorting and multi-term fields again
: TrieRange fields is needed), I again thought about the issue. Maybe we could : change FieldCache to only put the very first term from a field of the : document into the cache, enabling sorting against this field. If possible, : this would be very nice and in my opinion better that the idea proposed in : the issue. in the fairly common case of tokenized fields, the first term found during enumeration isn't neccessarily (or even frequently) the first term in the pre-tokenized string ... so this doesn't help people very much. the recommended solution in the tokenized case is to have a duplicate non tokenized field -- that seems like the best solution in the non-tokenized case as well (where the caller is conciously choosing to add multiple Field instances with the same fieldName to a a Document)... pick which Field Value represents the value you want used during sorting, and add that value to the documetning using an alternate fieldName. I've never encountered any serious objecting to this approach. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: sort lucene results
: but i need the result by the word place in the sentence like this: : : bbb text 4 , text 2 bbb text , text 1 ok ok ok bbb .. 1) SpanFirstQuery should work, it scores higher the closer the nested query is to the start -- just use a really high limit,. if you are only dealing with simple Term/Phrase queries it's easy to switch to using SpanTerm and SpanNear queries inside of a SpanFirstQuery. 2) Please Use java-u...@lucene Not java-...@lucene http://people.apache.org/~hossman/#java-dev Your question is better suited for the java-u...@lucene mailing list ... not the java-...@lucene list. java-dev is for discussing development of the internals of the Lucene Java library ... it is *not* the appropriate place to ask questions about how to use the Lucene Java library when developing your own applications. Please resend your message to the java-user mailing list, where you are likely to get more/better responses since that list also has a larger number of subscribers. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Jukka's not on Who We Are yet
: Subject: Jukka's not on Who We Are yet : : Jukka's not on http://lucene.apache.org/java/docs/whoweare.html That list is specificly the Lucene-Java committers. Jukka is listed on the PMC list... http://lucene.apache.org/who.html -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1398) Add ReverseStringFilter
: I don't know how others feel, but I'd personally like to stop the : practice of making more Analyzer classes whenever a new TokenFilter is : added. +1 -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: LIA2 on l.a.o/java OK?
: I'm OK with LIA2 on the front page - as Erik suggests it does help lend : credibility to a project. +1 to more visibility to books focused on lucene on official www site pages (not just hte wiki) +1 to prominent display via a section on the main page like wicket currently has, with links to more info on each book (those links could easily go to wiki pages if we don't want to have to maintain the full detail pages in forrest) .. the News section is too long/outdated anwyay, so shortening it up to help make books more visible is a good thing. : So the test-case for this statement would be - what if there was a : terrible book published? I can't see it happening myself but you have to : ask if there is some inferred recommendation of quality on any links we : provide. site changes are commits, they (will) happen because someone submits a patch and someone (possibly the same person) commits. if no on thinks a book is worth promoting, no one will submit a patch. if someone does submit a patch, but the community concessus is that a book is bad and shouldn't be mentioned on the site, then the patch won't get committed (or will get rolled back if the concensus is retroactive). If each book links to a wiki page then people can be free to write whatever comments/opinions about books they want, even if there isn't a clear community concensus to withhold a book from the site. : It's the only book dedicated exclusively to Lucene that I'm aware of, and all of What about the JP and DE books listed on the Resource page? from what i can tell, they seem to be focused entirely on Lucene. (if the goal is to promote Lucene books to promote Lucene adoption we shouldn't be exclusive to English langauge books just because english is currently the LCD of hte community) Personal bias noted - I support putting it on the home page, and also news blurbs when there is activity, like when it goes to print and is available in hardcopy. (FWIW: i have no bias here, but i still concur) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: IndexWriter.rollback() logic
: Also in the futuer please post your questions to java-dev@lucene.apache.org I believe jason ment to type java-u...@lucene... http://people.apache.org/~hossman/#java-dev Please Use java-u...@lucene Not java-...@lucene Your question is better suited for the java-u...@lucene mailing list ... not the java-...@lucene list. java-dev is for discussing development of the internals of the Lucene Java library ... it is *not* the appropriate place to ask questions about how to use the Lucene Java library when developing your own applications. Please resend your message to the java-user mailing list, where you are likely to get more/better responses since that list also has a larger number of subscribers. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: failure in TestTrieRangeQuery
: By allowing Random to randomly seed itself, we effectively test a much : much larger space, ie every time we all run the test, it's different. We can : potentially cast a much larger net than a fixed seed. i guess i'm just in favor of less randomness and more iterations. : Fixing the bug is the easy part; discovering a bug is present is where : we need all the help we can get ;) yes, but knowing a bug is there w/o having any idea what it is or how to trigger it can be very frustrating. it would be enough for tests to pick a random number, log it, and then use it as the seed ... that way if you get a failure you at least know what seed was used and you can then hardcode it temporarily to reproduce/debug -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: RE: Hudson Java Docs?
: I think, the outdated docs should be removed from the server to also : disappear from search engines. : : +1 that may be easier said then done. Each build is done in a clean workspace, and then a config option in hudson tells it what to copy to the main javadoc URL... http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/javadoc/ right now we've got that configured to be trunk/build/docs/api -- which is the right thing to do, and as you can see copies all of the correct stuff, but aparently hudson isn't cleaning up old files... https://hudson.dev.java.net/issues/show_bug.cgi?id=1000 A work arround would be the idea i remember someone suggestion earlier in this thread: create a splace page at trunk/build/docs/api/index.html that points to the other directories. (anyone want to crank out a patch for this?) Alternately, we could turn off the Publish Javadoc feature, and instead add trunk/build/docs/api to the list of files to Archive and then start refering to a URL like this (doesn't work at the moment) for all the javadocs... http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/lastSuccessfulBuild/artifact/trunk/build/docs/api/ turning that Javadoc feature off should eliminate the existing Javadoc links in the hudson navigation, but I suspect the old files would still be there (and in search engine caches) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: how often is site updated?
: Wiki is updated w/ the info. Basically, it runs nightly. If you want it done : more often, I can change it. doesn't matter to me ... just wasn't sure if there was a problem since i didn't know when to expect it. it all looks fine. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
how often is site updated?
According to this doc... http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite ...Grant's crontab is used to update /www/lucene.apache.org/java/docs from... http://svn.apache.org/repos/asf/lucene/java/site/docs ...but the wiki page isn't very explicit about how often that cron script runs. I committed some changes a little over 3 hours ago, but i'm not seeing them on people.apache.org yet. Grant: can you add some clarification to the wiki page with the frequency of the cronjob? (and if it should have updated by now check and see if there's a problem.) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: stored fields / unicode compression
Catching up on my holiday email, I on't think there were any replies to this question yet. The low level file formats used by Lucene is an area I don't have time/expertise to follow carefully, but if i'm remember correctly the concensus is/was to more more towards pure (byte[] data, int start, int end) based APIs for efficiency, with String based APIs provided as syntactic sugar via a facade, and deprecating the existing internal gzip compression in favor of similar external compression facades. So something like you describe could be done as is using the byte[] interfaces *and* be generally useful to others. Taking a step back to look at the broader picture, this is the kind of thing that in Solr could be implemented as a new FieldType : Date: Fri, 26 Dec 2008 19:00:11 -0500 : From: Robert Muir : Subject: stored fields / unicode compression : : Has there been any thoughts of using SCSU or BOCU-1 instead of UTF-8 for : stored fields? : Personally I don't put huge amounts of text in stored fields but these : encodings/compression work extremely well on short strings like titles, etc. : Removing the unicode penalty for non-latin text (i.e. cut in half) is : nothing to sneeze at since with lots of docs my stored fields still become : pretty huge, biggest part of the index. : : I know I could use one of these schemes right now and store everything as : bytes... but just thinking it might be something of more general use. The : GZIP compression that is supported isn't very useful as it typically makes : short snippets bigger... : : Performance compared to UTF-8 is here... seems like a general win to me (but : maybe I am missing something) : http://unicode.org/notes/tn6/#Performance -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: running ant test with multiple threads/processes?
: Has anyone explored ways to have ant test take advantage of concurrency? : Since each JUnit test source (TestXXX.java) is independent, this should be : possible. : I'd love to have ant test test-tag run faster on an N-core machine. I've see some attempts at a generalized solution to this in the past, but none of them ever seemed to successful. manually spliting tests up into buckets and running parallel junit tasks for each bucket tends to be the approach many projects take. in our case the first quick win might be to just add a new attribute to the contrib-crawl macro that says wether it can be parallelized or not, and then replace the sequential task with a parallel threadCount=... task (use a threadCount=1 for things contrib-crawls that can't be parallelized) test-contrib and javadocs-contrib should be parallelizable, but build-contrib won't be (since some contribs depend on other contribs) that should help some ... but if you really want to parallelize test-core, we would need to hardcode some N junit calls each containing a filset (although with some creativity we could probably dynamicly divide the tests up into N filesets using things like the sort, first and restrict resource collections) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
ANNOUNCE: Welcome Ryan McKinley as Contrib/Documentation Committer
I'm happy to announce that in recognition of his efforts in moving forward with creating a spatial searching contrib (and his ongoing experience as both a Solr committer and PMC member) The PMC has voted to make Ryan McKinley a Lucene-Java Contrib and Documentation committer. Congrats Ryan, please make sure to add yourself to the contrib committers list. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Searching in same position across multiple fields
: 1) Use a modified SpanNearQuery. If we assume that country + phone will always : be one token, we can rely on the fact that the positions of 'au' and '5678' in : Fred's document will be different. : :SpanQuery q1 = new SpanTermQuery(new Term(addresscountry, au)); :SpanQuery q2 = new SpanTermQuery(new Term(addressphone, 5678)); :SpanQuery snq = new SpanNearQuery(new SpanQuery[]{q1, q2}, 0, false); : : the slop of 0 means that we'll only return those where the two terms are in : the same position in their respective fields. This works brilliantly, BUT : requires a change to SpanNearQuery's constructor (which checks that all the : clauses are against the same field). Are people amenable to perhaps adding : another constructor to SNQ which doesn't do the check, or subclassing it to do : the same (give it a protected non-checking constructor for the subclass to : call)? this has actually come up a couple of times over the years (i think Doug was the first person i ever heard suggest it) in the context of PhraseQuery ... the initial thought was that just removing the term1.field=term2.field assertion would allow something liek this to work, but i don't think anyone every tried creating a patch w/tests to verify it. I think it would be a great idea. : 2) It gets slightly more complicated in the case of variable-length terms. For ... : getPositionIncrementGap -- if we knew that 'address' would be, at most, 20 : tokens, we might use a position increment gap of 100, and make the slop factor : 50; this works fine for the simple case (yay!), but with a great many : addresses-per-user starts to get more complicated, as the gap counts from the : last term (so the position sequence for a single value field might be 0, 100, : 200, but for the address field it might be 0, 1, 2, 3, 103, 104, 105, 106, : 206, 207... so it's going to get out of sync). The simplest option here seems couldn't this be solved by an Analyzer that counts the token per fieldname and implements getPositionIncrementGap as.. int result - SOME_BIG_NUM - tokensSeenMap.get(fieldname); tokensSeenMap.put(fieldname, 0); return result; ? -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org