from:"Chris Hostetter"

RE: issues.apache.org compromised: please update your passwords

2010-04-14 Thread Chris Hostetter


: I disabled the account by assigning a dummy eMail and gave it a random 
password.
: 
: I was not able to unassign the issues, as most issues were Closed, 
: where no modifications can be done anymore. Reopening and changing 

Uwe: it may be too late (depending on wether you remember the dummy 
password) but an alternate course of action would have been to change the 
email address to the PMC list (priv...@lucene) which is not publicly 
archived.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Changing the subject for a JIRA-issue (Was: [jira] Created: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into f

2010-04-08 Thread Chris Hostetter


:  Is it possible to change it? If not, what is the policy here? To open a
:  new issue and close the old one?
...
: In this case, that would mean either closing this issue and opening a new one,
: or taking the discussion to the mailing list where subject headers may be
: modified as the conversation evolves.  

Any one who can edit an issue (ie: all hte committers, and anyone in the 
developer group) can change the summary (which change the email 
subjects)

It's not clear to me what the summar of LUCENE-2335 should be, but 
McCandless opened the issue, he can certainly fix the summar as the issue 
evolves.




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Build failed in Hudson: Lucene-trunk #1144

2010-04-01 Thread Chris Hostetter


: No, no, no, Lucene still has no need for maven or ivy for dependency 
management.
: We can just hack around all issues with ant scripts.

it doesn't really matter if it's ant scripts, or ivy declarations, or 
maven pom entries -- the point is the same.

We can't distribute the jars, but we can distribute programatic means for 
users to fetch teh jars themselves.

(even if we magicly switched to ivy or maven for dependency management, 
problems like this build failure would still exist if a/the dependency 
repo was down at build time, and we'd still likely distribute fat binary 
tar balls containing all the dependency jars whose licenses are compatible 
with the ASL.  (users who download the binary artifacts shouldn't *have* 
to know ivy/maven to use LUcene, anymore then they have to know ant right 
now)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Build failed in Hudson: Lucene-trunk #1144

2010-03-31 Thread Chris Hostetter


: I was wondering yesterday why aren't the required libs checked in to SVN? We

Licensing issues.

we can't redistribute them (but we can provide the build.xml code to fetch 
them)


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Contrib tests fail if core jar is not up to date

2010-03-18 Thread Chris Hostetter


: In addition to what Shai mentioned, I wanted to say that there are
: other oddities about how the contrib tests run in ant. For example,
: I'm not sure why we create the junitfailed.flag files (I think it has
: something to do with detecting top-level that a single contrib
: failed).

Correct ... even if one contrib fails, test-contrib attempts to run the 
tests for all the other contribs, and then fails if any junitfailed.flag 
files are found in any contribs.

The assumption was if you were specificly testing a single contrib you'd 
be using the contrib specific build from it's own directory, and it would 
still fail fast -- it's only if you run test-contrib from the top level 
that it ignores when ant test fails for individual contribs, and then 
reports the failure at the end.

It's a hack, but it's a useful hack for getting nightly builds that can 
report on the tests for all contribs, even if the first one fails (it's 
less useful when one contrib depends on another, but that's a more complex 
issue)

-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: lucene and solr trunk

2010-03-17 Thread Chris Hostetter

: build and nicely gets all dependencies to Lucene and Tika whenever I build
: or release, no problem there and certainly no need to have it merged into
: Lucene's svn!

The key distinction is that Solr is allready in Lucene's svn -- The 
question is how reorg things in a way that makes it easier to build Solr 
and Lucene-Java all at once, while wtill making it easy to build just 
Lucene-Java.

: Professionally i work on a (world-class) geocoder that also nicely depends
: on Lucene by using maven, no problems there at all and no need to merge
: that code in Lucene's svn!

Unless maven has some features i'm not aware of, your nicely depends 
works buy pulling Lucene jars from a repository -- changing Solr to do 
that (instead of having committed jars) would be farrly simple (with or 
w/o maven), but that's not the goal.  The goal is to make it easy to build 
both at once, have patches that update both, and (make it easy to) have 
atomic svn commits that touch both.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Chris Hostetter


: with, if id didn't happen on the lists, it didn't happen. Its the same as

+1

But as the IRC channel gets used more and more, it would *also* be nice if 
there was an archive of the IRC channel so that there is a place to go 
look to understand the back story behind an idea once it's synthesized and 
posted to the lists/jira.

That's the huge advantage IRC has over informal conversations at 
hackathons, apachecon, and meetups -- there can in fact be easily 
archivable/parsable/searchable records of the communication.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: lucene and solr trunk

2010-03-15 Thread Chris Hostetter

: prime-time as the new solr trunk!  Lucene and Solr need to move to a
: common trunk for a host of reasons, including single patches that can
: cover both, shared tags and branches, and shared test code w/o a test
: jar.

Without a clearer picture of how people envision development overhead 
working as we move forward, it's really hard to understand how any of 
these ideas make sense...
  1) how should hte automated build process(es) work?
  2) how are we going to do branching/tagging for releases?  particularly 
in situations where one product is ready for a rlease and hte other isn't?
  3) how are we going to deal with mino bug fix release tagging?
  4) should it be possible for people to check out Lucene-Java w/o 
checking out Solr?

(i suspect a whole lot of people who only care about the core library are 
going to really adamantly not want to have to check out all of Solr just 
to work on the core)

: Both projects move to a new trunk:
:   /something/trunk/java, /something/trunk/solr

by gut says something like this will more the most sense, assuming 
/something/trunk == /java/trunk and java actually means core ... 
ie: this discussion should really be part and parcel with how contribs 
should be reorged.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [DISCUSS] Do away with Contrib Committers and make core committers

2010-03-14 Thread Chris Hostetter


: Subject: [DISCUSS] Do away with Contrib Committers and make core committers

+1


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [Lucene-java Wiki] Update of ReleaseTodo by RobertMuir

2010-01-18 Thread Chris Hostetter


: nice of the wiki software to change every single line!

this type of thing seems to happen anytime you edit in GUI mode for the 
first time since the MoinMOin upgradea few months back -- it's normalizing 
all the whitespace.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: back_compat folders in tags when I SVN update

2010-01-06 Thread Chris Hostetter


: I prefer to see tags used for what it is, a place to park an actually
: released; it shouldn't be used for testing or its content changed
: dynamically.

I have no opinion about the rest of this thread (changing the back compat 
testing to use a specific revision on the previous release branch) but as 
for this specific comment: it's really a mistake to think of tags as only 
being for releases.  the TTB convetion in svn (trunk, tags, branches) 
stems from what's considered a best practice when migrating from CVS: 
trunk corrisponds to MAIN in cvs, the branches directory corrisponds to 
the list of branching tags in CVS, and the tag directory corrisponds to 
the list of tags in CVS.

there is nothing special about the concept of a CVS/SVN tag that should 
make it synonymous in peopls minds with a release ... yes we tag every 
release, but there are lots of other reasons to tag things in both CVS 
and SVN: release candidates are frequently taged, many other projects tag 
stable builds from their continuous integration system ... a developer 
could create an aritrary checkpoint tag to denote when there was a 
dramatic shift in development in a project in case people wnated to easily 
find when that shift happened so they could go back and fork a branch at 
that point if that approach was deemed unsuccessful.

bottom line: not a good idea to assume all tags are releases.

(that said: the TTB convetion is nothing more then a convention ... 
there's nothing to stop us from using a more verbose directory hierarchy 
to isolate release tags in a single place..
   ./trunk
   ./branches/branch_a
   ./...
   ./tags
   ./tags/releases
   ./tags/releases/2_9_0
   ./...
   ./tags/some_misc_tag
)


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: back_compat folders in tags when I SVN update

2010-01-05 Thread Chris Hostetter


: Why do I see \java\tags\lucene_*_back_compat_tests_2009*\ directories (well
: over 100 so far) when I SVN update?

Are you saying you have http://svn.apache.org/repos/asf/lucene/java/; 
checked out in it's entirety?

That seems ... problematic.  New tags/branches could be created at anytime 
-- it's even possibly to have Hudson autotag every build if we wanted.  
Server side these tags are essentially free but if you checkout at the 
top level you pay the price of local storage on update.

I would rethink your checkout strategy.




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene Java 2.9.2

2010-01-05 Thread Chris Hostetter


: https://issues.apache.org/jira/browse/LUCENENET-331).  This begs the
: question, if Lucene.Net takes just this one patch, than Lucene.Net 2.9.1 is
: now 2.9.1.1 (which I personally don't like to see happening as I prefer to
: see a 1-to-1 release match).

As a general comment on this topic: I would suggest that if the goal of 
Lucene.Net is to be a 1-to-1 port (which seems like a good goal, but 
is certainly not mandatory if the Lucene.Net community has other 
ambitions) then the cleanest thing for users would be to keep the version 
numbers in sync 1-to-1.

it reasises some questions about what to do if a bug is discovered in the 
*porting*.  ie: if after Lucene.Net 2.9.2 is released, it's discovered 
that there was a glitch, and it doesn't actually match the behavior of 
Lucene-JAva 2.9.2 what should be done? ... Lucene.Net 2.9.3 and 
Lucene.Net 2.9.2.1 could all concivably conflict with version numbers 
Lucene-Java *might* someday release.

Having an anotaiton strategy that doesn't extend the dot notation 
used by Lucene-Java might make sense (ie: Lucene.Net 2.9.2-a


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: nightly.sh

2009-12-16 Thread Chris Hostetter


: I configured hudson to simply run the hudson.sh from the nightly checkout.

+1

-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: (NAG) Push fast-vector-highlighter mvn artifacts for 3.0 and 2.9

2009-12-08 Thread Chris Hostetter


: What to do now, any votes on adding the missing maven artifacts for
: fast-vector-highlighter to 2.9.1 and 3.0.0 on the apache maven reposititory?

It's not even clear to me that anything special needs to be done before 
publishing those jars to maven.  2.9.1 and 3.0.0 were already voted on and 
released -- including all of the source code in them.

The safest bet least likely to anger the process gods is just to call a 
vote (new thread with VOTE in the subject) and cast a vote ... considering 
the sources has already been reviewed it should go pretty quick.

: 
:  I rebuilt the maven-dir for 2.9.1 and 3.0.0, merged them (3.0.0 is top-
:  level
:  version) and extracted only fast-vector-highlighter:
:  
:  http://people.apache.org/~uschindler/staging-area/
:  
:  I will copy this dir to the maven folder on people.a.o, when I got votes
:  (how many)? At least someone should check the signatures.
:  
:  By the way, we have a small error in our ant build.xml that inserts
:  svnversion into the manifest file. This version is not the version of the
:  last changed item (would be svnversion -c) but the current svn version,
:  even
:  that I checked out the corresponding tags. It's no problem at all, but not
:  very nice.
:  
:  Maybe we should change build.xml to call svnversion -c in future, to get
:  the real number.
:  
:  Uwe
:  
:  -
:  Uwe Schindler
:  H.-H.-Meier-Allee 63, D-28213 Bremen
:  http://www.thetaphi.de
:  eMail: u...@thetaphi.de
:  
:  
:   -Original Message-
:   From: Grant Ingersoll [mailto:gsing...@apache.org]
:   Sent: Saturday, December 05, 2009 10:26 PM
:   To: java-dev@lucene.apache.org
:   Subject: Re: Push fast-vector-highlighter mvn artifacts for 3.0 and 2.9
:  
:   I suppose we could put up the artifacts on a dev site and then we could
:   vote to release both of them pretty quickly.  I think that should be
:  easy
:   to do, since it pretty much only involves verifying the jar and the
:   signatures.
:  
:   On Dec 5, 2009, at 1:03 PM, Simon Willnauer wrote:
:  
:hi folks,
:The maven artifacts for fast-vector-highlighter have never been pushed
:since it was released because there were no pom.xml.template inside
:the module. I added a pom file a day ago in the context of
:LUCENE-2107. I already talked to uwe and grant how to deal with this
:issues and if we should push the artifact for Lucene 2.9 / 3.0. Since
:this is only a metadata file we could consider rebuilding the
:artefacts and publish them for those releases. I can not remember that
:anything like that happened before, so we should discuss how to deal
:with this situation and if we should wait until 3.1.
:   
:simon
:   
:-
:To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
:For additional commands, e-mail: java-dev-h...@lucene.apache.org
:   
:  
:  
:  
:   -
:   To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
:   For additional commands, e-mail: java-dev-h...@lucene.apache.org
:  
:  
:  
:  -
:  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
:  For additional commands, e-mail: java-dev-h...@lucene.apache.org
: 
: 
: 
: -
: To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: java-dev-h...@lucene.apache.org
: 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Jira emails via Gmail

2009-11-25 Thread Chris Hostetter


: I signed up for a login, and voted for this issue.  If others did the same,
: that might help.

if you read the comments in the issue, there's really nothing that can be 
fixed in Jira to make this work better -- jira already puts an
In-Reply-To header on all of the messages so that mail clients who do 
threading correctly can use them -- the problem is that Gmail isn't 
looking at those headers, and is instead focusing on subject.

voting for JRA-12640 probably won't accomplish much, since the bug is in 
GMail, not Jira -- but voting for JRA-3609 might help since then Jira 
could be customized to keep the subject consistent for all types of 
messages related to a single issue...

http://jira.atlassian.com/browse/JRA-3609

...or you could file a bug with GMail asking them to implement the 
defacto standard algorithm for email message threading, using all of the 
various headers that exist for this purpose...

http://www.jwz.org/doc/threading.html



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: lucene.zones.apache.org dead?

2009-11-25 Thread Chris Hostetter


: Hudson says, the lucene node is dead, so builds are stuck since 2 days. Does
: anybody knows more?

Uwe: i didn't find any evidence that you opend an infra bug about this, so 
i went ahead and created one...

https://issues.apache.org/jira/browse/INFRA-2351



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Junit4

2009-11-13 Thread Chris Hostetter

: putting too many irons in the fire, especially non-critical ones. I don't
: see a way to assign it to myself, either I'm missing something or I'm just
: underprivileged G, so if someone would go ahead and assign it to me I'll
: work on it post 3.0.

Jira's ACLs prevent issues from being assigned to people who aren't listed 
in the Contributors group.  THe policy has been to add people to that 
list (for issue assignment) on request, so i hooked you up.

(NOTE: if anyone else has issues they're actively working on and would 
like to be flagged as a Contributor in Jira so that the issues can be 
assigned directly to you for tracking purpose, please speak up)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1974) BooleanQuery can not find all matches in special condition

2009-10-13 Thread Chris Hostetter


: I think the other tests do not catch it because the error only happens 
: if the docID is over 8192 (the chunk size that BooleanScorer uses).  
: Most of our tests work on smaller sets of docs.

I don't have time to try this out right now, but i wonder if just 
modifying the QueryUtils wrap* functions to create bigger empty indexes 
(with thousands of deleted docs instead of just a handful) would have 
triggered this bug ... might be worth testing against 2.9.0 to make sure 
there aren't any other weird edge cases before cutting 2.9.1.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: svn commit: r820115 - /lucene/java/trunk/common-build.xml

2009-09-29 Thread Chris Hostetter


:  -  property name=javac.source value=1.4/
:  -  property name=javac.target value=1.4/
:  +  property name=javac.source value=1.5/
:  +  property name=javac.target value=1.5/

Isn't that one of the signs of the apocolypse?



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

2009-09-29 Thread Chris Hostetter


: However, they may be something with the fact that Lucene's Analyzers
: automatically close the reader when its done analyzing. I think this
: encourages people not to explicitly close them, and creates the potential of
: having open fd's if an exception is thrown in the middle of the analysis or
: before addDocument/updateDocument is called.

It's always been the case that users should close their own Readers -- 
lucene's docs have never indicated that they will close hte reader for 
you, it's just a helpful side effect that once IndexWRiter has consumed 
all hte chars from a Reader it calls close() -- the caller should still 
close() explicitly for precisely the reasons you listed, but there's 
really no downside to multiple close calls.

even if we werent' worried about breaking existing client code (where 
people never call close themselves) it would still be a good idea to leave 
the close() calls in because the sooner the Readers are closed the sooner 
the descriptor can be released -- no reason to wait (ie: during a 
serialized merge for example) until addDocument is done if hte Reader has 
been completley exhausted.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

2009-09-27 Thread Chris Hostetter


: So in 2.9, the Reader is correctly closed, if the TokenStream chain is
: correctly set up, passing all close() calls to the delegate.

Thanks for digging into that Uwe.

So Daniel: The ball is in your court here: what analyzer / 
tokenizer+tokenfilters is your app using in the cases where you see 
Readers not getting closed by Lucene -- if they involve your own custom 
Tokenizers then that may be where the problem is, but if all the Analysis 
pieces you are using come out of hte box with Lucene please let us know so 
we can check them.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

2009-09-27 Thread Chris Hostetter


: That is my opinion, too. Closing the readers should be done by the caller in

I don't disagree with either of you, but...

: a finally block and not automatically by the IW. I only wanted to confirm,
: that the behaviour of 2.9 did not change. Closing readers two times is not a

...i wanted to try and confirm that as well.  if we conciously decide that 
IndexWriter is going to *stop* closing all Readers that's fine with me, 
but in the absence of a specific statement like that in the release notes 
we should strive for no suprises.  (that doesn't have to come in the form 
of code changes, it can simply be an announcemnt on java-user and 
documented cavet in the applicable code ... but as yet we don't have 
confirmation that any behavior change exists.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

2009-09-26 Thread Chris Hostetter


: Thanks Mark for the pointer, I thought somehow that lucene closed them as a
: convenience, I don't know if it did that in previous releases (aka 2.4.1) but
: I'll close them myself from now on.

FWIW: As far as i know, Lucene has always closed the Reader for you when 
calling addDocument/updateDocument -- BUT -- the docs never promized 
that Lucene would close any Readers used in Fields.  In fact the Field 
constructor docs say you may not close the Reader until addDocument has 
been called suggesting that you should close it yourself.  
(Reader.close() is very clear that there should be no effect on closing a 
Reader multiple times, so this is safe no matter what Lucene does)

That said: If the behavior has changed in 2.9, this could easily bite lots 
of people in the ass if they haven't been closing their readers and now 
they run out of file handles.  I wrote a quick test to try and reproduce 
the problem you're describing, but as far as i can tell 2.9.0 
(final) still seems to close the Reader for you.

Can anyone else reproduce this problem of Reader's in Field's not getting 
closed?  (my test is below)

--BEGIN--
package org.apache.lucene;

/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.store.RAMDirectory;

import java.io.*;

public class TestFieldWithReaderClosing extends LuceneTestCase {

  IndexWriter writer = null;
  Document d = null;
  CloseStateReader reader;
  public void setUp() throws Exception {
writer = new IndexWriter(new RAMDirectory(),
 new KeywordAnalyzer(), true,
 IndexWriter.MaxFieldLength.LIMITED);
d = new Document();
d.add(new Field(id, x, Field.Store.YES, Field.Index.ANALYZED));
reader = new CloseStateReader(foo);
d.add(new Field(contents, reader));
  }
  public void tearDown() throws Exception {
writer.close();
writer = null;
reader.close();
reader = null;
  }
  
  public void testAdd() throws Exception {
writer.addDocument(d);
assertEquals(close count should be 1, 1, reader.getCloseCount());
writer.close();
assertEquals(close count should still be 1, 1, reader.getCloseCount());
  }
  public void testEmptyUpdate() throws Exception {
writer.updateDocument(new Term(id,x), d);
assertEquals(close count should be 1, 1, reader.getCloseCount());
writer.close();
assertEquals(close count should still be 1, 1, reader.getCloseCount());
  }
  public void testAddAndUpdate() throws Exception {
writer.addDocument(d);
assertEquals(close count should be 1, 1, reader.getCloseCount());
d = new Document();
d.add(new Field(id, x, Field.Store.YES, Field.Index.ANALYZED));
reader = new CloseStateReader(foo);
d.add(new Field(contents, reader));
writer.updateDocument(new Term(id,x), d);
assertEquals(new close count should be 1, 1, reader.getCloseCount());
writer.close();
assertEquals(new close count should still be 1, 1, 
reader.getCloseCount());
  }

  
  static class CloseStateReader extends StringReader {
private int closeCount = 0;
public CloseStateReader(String s) {
  super(s);
}
public synchronized void close() {
  closeCount++;
  super.close();
}
public int getCloseCount() {
  return closeCount;
}
  }
}


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: [VOTE] Release Lucene 2.9.0

2009-09-24 Thread Chris Hostetter


: - db/bdb fails to compile with 1.4 because of a ClassFormatError in one of
: the bundled libs, so this contrib is in reality 1.5 only.

there's not much we can do about that, no one can blame us if the 
dependency requires 1.5

: - Tests of contrib/misc use String.contains(), which is 1.5 only. As it just
: searches for an whitespace, it can be replaced by indexOf(' ')=0
: - contrib/regex fails to build, because the JavaRegExpCapability defines an
: (unused) constant based on the value in Pattern.LITERAL, which does not
: exist in 1.4. Removing this constant fixes the problem.

I'm willing to publicly say oh well on these changes.  

we've always said that contribs don't make the same back compat 
commitments as core...
 - contrib/misc still works until 1.4, it's only the test that doesn't 
work so oh well it's not worth cutting a new relase (if someone is using 
contrib/misc w/1.4 and wants to run the tests, i don't think it's an undue 
burden to suggest that they can change that one line and get 1.4 compat)
 - as for contrib/regex -- this change was made to add functionality, if 
at the time of the change people had said this means making contrib/regex 
require 1.5 i don't think anyone would have objected.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene 2.9.0

2009-09-24 Thread Chris Hostetter


: http://people.apache.org/~markrmiller/staging-area/lucene2.9/

+1



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Conflict discovered in 'whoweare.html'

2009-09-24 Thread Chris Hostetter

: And I done it. Then I noticed this:
: 
: http://wiki.apache.org/lucene-java/TopLevelProject

That's about the TLP site (http://lucene.apache.org/) anything in a 
subdirectory is handled by the individual project site directories.

according to HowToUpdateTheWebsite, both the versioned  unversioned 
portions of the site are handled by grant's crontab using svn export

: How can I solve the conflict?

I don't think you need to worry about it ... once upon a time, the site 
was updated by people using svn co anytime there was a change, so 
there's still svn metadata there, but since it's updated via svn export 
now, that metadata is irrelevant.

...that's my hunch anyway, it's assuming everything on 
HowToUpdateTheWebsite is correct.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: ReleaseTodo steps

2009-09-24 Thread Chris Hostetter


:  They are there just not replicated or shown in mirrors?
:  http://www.apache.org/dist/lucene/java/
:
: 
: Its pretty odd they don't go out to the mirrors - I mean, whats the
: point? Users can't use them to verify anything anyway if they don't have
: them. Anyone know anything about this?

It's intentional: you always want to get the hash from the authoritative 
source (and not a mirror) so you can actaully verify the checksum. 
(particularly if you don't have gpg to check the signature files).

http://tomcat.apache.org/download-connectors.cgi..
 Alternatively, you can verify the MD5 signature (hash value) on the 
  files. Make sure you get these files from the main site, rather than 
  from a mirror. The above [MD5] links automatically retrieve the 
  signature files from the main site.




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Build failed in Hudson: Lucene-trunk #955

2009-09-22 Thread Chris Hostetter


: but it says the tests only ran for 12 minutes, so it took a day to compile?

The JUnit report on total testing time is just the sum of the timing 
reported for each test, and as the testIndexWRiter report notes...

:   duration0.0030/duration
...
:   errorDetailsForked Java VM exited abnormally. Please note
:  the time in the report does not reflect the time until the VM
:  exit./errorDetails


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: ReleaseTodo steps

2009-09-21 Thread Chris Hostetter


: md5sum generates a hash line like this:
: a21f40c4f4fb1c54903e761caf43e1d7 *lucene-2.9.0.tar.gz
: 
: Then when you do a check, it knows what file to check against.
: 
: The Maven artifacts just list the hash though. So it seems proper to
: remove the second part and just put the hash?

Some background on the macro...
https://issues.apache.org/jira/browse/LUCENE-904

And some info about what maven creates/expects in the MD5 files (i only 
skimmed this)...
http://www.nabble.com/Checksum-Format-for-.md5-and-.sha1-Files-td21249817.html


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-09-11 Thread Chris Hostetter


: Could a git branch make things easier for mega-features like this?

why not just start a subversion branch?

: 
:  Further steps towards flexible indexing
:  ---
: 
:  Key: LUCENE-1458
:  URL: https://issues.apache.org/jira/browse/LUCENE-1458
:  Project: Lucene - Java
:   Issue Type: New Feature
:   Components: Index
: Affects Versions: 2.9
: Reporter: Michael McCandless
: Assignee: Michael McCandless
: Priority: Minor
:  Attachments: LUCENE-1458-back-compat.patch, LUCENE-1458.patch, 
LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
LUCENE-1458.patch, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
: 
: 
:  I attached a very rough checkpoint of my current patch, to get early
:  feedback.  All tests pass, though back compat tests don't pass due to
:  changes to package-private APIs plus certain bugs in tests that
:  happened to work (eg call TermPostions.nextPosition() too many times,
:  which the new API asserts against).
:  [Aside: I think, when we commit changes to package-private APIs such
:  that back-compat tests don't pass, we could go back, make a branch on
:  the back-compat tag, commit changes to the tests to use the new
:  package private APIs on that branch, then fix nightly build to use the
:  tip of that branch?o]
:  There's still plenty to do before this is committable! This is a
:  rather large change:
:* Switches to a new more efficient terms dict format.  This still
:  uses tii/tis files, but the tii only stores term  long offset
:  (not a TermInfo).  At seek points, tis encodes term  freq/prox
:  offsets absolutely instead of with deltas delta.  Also, tis/tii
:  are structured by field, so we don't have to record field number
:  in every term.
:  .
:  On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
:  - 0.64 MB) and tis file is 9% smaller (75.5 MB - 68.5 MB).
:  .
:  RAM usage when loading terms dict index is significantly less
:  since we only load an array of offsets and an array of String (no
:  more TermInfo array).  It should be faster to init too.
:  .
:  This part is basically done.
:* Introduces modular reader codec that strongly decouples terms dict
:  from docs/positions readers.  EG there is no more TermInfo used
:  when reading the new format.
:  .
:  There's nice symmetry now between reading  writing in the codec
:  chain -- the current docs/prox format is captured in:
:  {code}
:  FormatPostingsTermsDictWriter/Reader
:  FormatPostingsDocsWriter/Reader (.frq file) and
:  FormatPostingsPositionsWriter/Reader (.prx file).
:  {code}
:  This part is basically done.
:* Introduces a new flex API for iterating through the fields,
:  terms, docs and positions:
:  {code}
:  FieldProducer - TermsEnum - DocsEnum - PostingsEnum
:  {code}
:  This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
:  old API on top of the new API to keep back-compat.
:  
:  Next steps:
:* Plug in new codecs (pulsing, pfor) to exercise the modularity /
:  fix any hidden assumptions.
:* Expose new API out of IndexReader, deprecate old API but emulate
:  old API on top of new one, switch all core/contrib users to the
:  new API.
:* Maybe switch to AttributeSources as the base class for TermsEnum,
:  DocsEnum, PostingsEnum -- this would give readers API flexibility
:  (not just index-file-format flexibility).  EG if someone wanted
:  to store payload at the term-doc level instead of
:  term-doc-position level, you could just add a new attribute.
:* Test performance  iterate.
: 
: -- 
: This message is automatically generated by JIRA.
: -
: You can reply to this email to add a comment to the issue online.
: 
: 
: -
: To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: java-dev-h...@lucene.apache.org
: 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: NumericRange Field and LuceneUtils?

2009-09-11 Thread Chris Hostetter


: Subject: NumericRange Field and LuceneUtils?
: References: 9ac0c6aa090932s69804fa5vbf5590ea6181e...@mail.gmail.com
: In-Reply-To: 9ac0c6aa090932s69804fa5vbf5590ea6181e...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Efficiently running a single test class' tests?

2009-09-04 Thread Chris Hostetter


: which I assume is in seconds. So the great bulk of the ant test
: seems to be spent in various ant housecleaning tasks, trying to verify
: that everything is indeed built, and/or looking for test classes that
: might match the name ShingleFilterTest.

Bear in mind, each contrib is built/tested seperately, so it's not just 
looking for every test that might match the pattern, it's iterating over 
each contrib and checking them all for a test that matches.

: I tried running
: 
: ant test-contrib -Dtestcase=ShingleFilterTest
: 
: to see if limiting to contrib would be any faster. That came back in 5
: minutes, 27 seconds. Which is better, but still in the same ballpark.

what kind of machine are you using? ... because on my box that only takes 
about 40 seconds.

if you are working on a contrib, and want to just run tests in that 
contrib, switching to that working directory and running the targets there 
is always going to be faster...

hoss...@brunner:~/lucene/java$ time ant test-contrib 
-Dtestcase=ShingleFilterTest  tmp.out

real0m32.142s
user0m17.744s
sys 0m8.074s
hoss...@brunner:~/lucene/java$ cd contrib/analyzers/
hoss...@brunner:~/lucene/java/contrib/analyzers$ time ant test 
-Dtestcase=ShingleFilterTest  ../../tmp-contrib.out

real0m2.450s
user0m1.644s
sys 0m0.664s

-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Updating Lucene index from two different threads in a web application

2009-09-01 Thread Chris Hostetter

http://people.apache.org/~hossman/#java-dev
Please Use java-u...@lucene Not java-...@lucene

Your question is better suited for the java-u...@lucene mailing list ...
not the java-...@lucene list. java-dev is for discussing development of
the internals of the Lucene Java library ... it is *not* the appropriate
place to ask questions about how to use the Lucene Java library when
developing your own applications. Please resend your message to
the java-user mailing list, where you are likely to get more/better
responses since that list also has a larger number of subscribers.

: Date: Mon, 31 Aug 2009 15:15:06 -0700 (PDT)
: From: mitu2009 musicfrea...@gmail.com
: Reply-To: java-dev@lucene.apache.org
: To: java-dev@lucene.apache.org
: Subject: Updating Lucene index from two different threads in a web application
:
:
: Hi,
:
: I've a web application which uses Lucene for company search functionality.
: When registered users add a new company,it is saved to database and also
: gets indexed in Lucene based company search index in real time.
:
: When adding company in Lucene index, how do I handle use case of two or more
: logged-in users posting a new company at the same time?Also, will both these
: companies get indexed without any file lock, lock time out, etc. related
: issues?
:
: Would appreciate if i could help with code as well.
:
: Thanks.
: --
: View this message in context:
http://www.nabble.com/Updating-Lucene-index-from-two-different-threads-in-a-web-application-tp25231264p25231264.html
: Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
:
:
: -
: To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: java-dev-h...@lucene.apache.org
:

-Hoss

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Porting Java Lucene 2.9 to Lucene.Net (was: RE: Lucene 2.9 RC2 now available for testing)

2009-08-30 Thread Chris Hostetter


: My question is, I would prefer to track SVN commits to keep track of 
: changes, vs. what I'm doing now.  This will allow us to stay weeks 
: behind a Java release vs. months or years as it is now.  However, while 
: I'm subscribed to SVN's commits mailing list, I'm not getting all those 
: commits!  For example, a commit made this past Friday, I never got an 
: email for, while other commits I do.  Any idea what maybe going on?

i suggest you track things based on a combination of svn base url (ie: 
trunk vs a branch) and the specific svn revision number at the moment of 
your latest checkout -- that way you don't even need to subscribe to the 
commit list, just do an svn diff -r whenever you have some time to work 
on it and see what's been committed since the last time you worked on it.

Hell: you could probably script all of this and have hudson do it for 
you.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Back-Compat on Contribs

2009-08-27 Thread Chris Hostetter


: releases  2.9. Robert raised a question if we should mark smartcn as
: experimental so that we can change interfaces and public methods etc.
: during the refactoring. Would that make sense for 2.9 or is there no
: such thing as a back compat policy for modules like that.

http://wiki.apache.org/lucene-java/BackwardsCompatibility
...
Contrib Packages

All contribs are not created equal.

The compatibility commitments of a contrib package can vary based on it's 
maturity and intended usage. The README.txt file for each contrib should 
identify it's approach to compatibility. If the README.txt file for a 
contrib package does not address it's backwards compatibility commitments 
users should assume it does not make any compatibility commitments. 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Created: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages

2009-08-27 Thread Chris Hostetter


: Thanks for the help finishing up the javadoc cleanup Hoss - we almost
: have a clean javadoc run - which is fantastic, because I didn't think it
: was going to be possible. I think its just this and 1863 and the run is
: clean.

you obviously haven't tried ant javadocs -Djavadoc.access=private lately 
... i'm working on cleaning that up at the moment.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Created: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages

2009-08-27 Thread Chris Hostetter


:  you obviously haven't tried ant javadocs -Djavadoc.access=private lately 
:  ... i'm working on cleaning that up at the moment.

: tried it? I'm not even aware of it. Not mentioned in the release todo.

yeah ... it's admittedly esoteric, but it helps surface bugs in docs on 
private level methods (which are useful for long term maintence)

i'm thinking we should change the nightly build to set 
-Djavadoc.access=private so we at least expose more errors earlier. 
(assuming we also setup the hudson to report stats on javadoc 
warnings ... i've seen it in other instances but don't know if it requires 
a special plugin)


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Created: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages

2009-08-27 Thread Chris Hostetter


:  i'm thinking we should change the nightly build to set 
:  -Djavadoc.access=private so we at least expose more errors earlier. 
:  (assuming we also setup the hudson to report stats on javadoc 
:  warnings ... i've seen it in other instances but don't know if it requires 
:  a special plugin)

: If it gives more errors, shouldnt it be set always and everywhere? Why
: not ...

it doesn't just change the level of error checking -- it changes which 
methods get generated docs access refers to the java access level 
(public, protected, package, private) that should be exposed ... for 
releases we only want protected (the default in our build file) so we 
only advertise classes/methods/fields we expect consummers to use/override 
-- but as a side effect the javadoc tool never checks the docs on 
package/private members for correctness.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene website - benchmarks page

2009-08-27 Thread Chris Hostetter


pulling a crap doc fro mteh release seems sound to me.

alternately: couldn't we just replace it with the output from the 
contrib/benchmarker on some of the bigger tests (the full wikipedia ones) 
comparing 2.4 with 2.9 ?

then just make it a pre-release TODO item for the future: update that page 
to reflect the benchmarks of the current release.

: I would suggest we move it to the wiki (I think we can simply remove
: the 1.2 and 1.3 benchmarks) and try to get a more recent benchmark
: soon. In other words a benchmark page on the wiki could be maintained
: by all users and commiters and would encurage people to publish their
: results as the hurdle is not as high as it is if you wanna get
: something on the official website.
: I'm happy to add the page and encurage people on the user list to add
: their benchmarks and performance experiences.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene website - benchmarks page

2009-08-27 Thread Chris Hostetter


: Prob want to run it on decent hardware as well (eg mabye I shouldn't do
: it with my 5200 rpm laptop drives).

as long as both are run on the same hardware, and the page lists the 
hardware, it's the relative numbers that matter the most.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RAT on just src/java ???

2009-08-27 Thread Chris Hostetter



I noticed that the Release TODO recommends running ant rat-sources to 
look for possible errors ... but the rat-soruces tag is setup to only 
analyze the src/java directory -- not any of the other source files 
included in the release (contrib, tests, demo, etc...) let alone the full 
release artifacts


I though the whole point of RAT is to make sure you aren't releasing 
something you shouldn't be?


I'm currently running rat on the dist zip/tgz products ... but does anyone 
know of a reason why it was setup this way?




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: RAT on just src/java ???

2009-08-27 Thread Chris Hostetter


: reason why I did only src/java.  I agree we should have it cover all
: sources.

Hmmm... rat is a memory hog, but the rat ant task is ridiculous (probably 
because it only supports being bpassed filesets containing actualy files 
to analyze, i can't figure out a way to just give it a directory (FYI: 
what we currently have is a fileset anchored at a src/java and ant then 
gives rat all the files it finds under that.

I vote we scrap an rat-sources alltogether and script this, it's not 
something most people need to run so i'm less worried if doesn't have 
robust support on multiple platforms

lemme see what i can whip up real fast...


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: RAT on just src/java ???

2009-08-27 Thread Chris Hostetter


:  from the commandline i'm seeing about what you're seeing, from the ant 

correction .. even calling RAT directly (via ant's java) contrib takes a 
few minutes -- but it doens't chew up RAM (it was the uncompressed dist 
artifacts that were really fast on the comman line i think) 

: I wonder if you are hitting the temp bench files - those are nasty -
: eclipse hates those sometimes too ...

hmmm ... yeah, the work files are showing up in the contrib report ... 
alright, i think i've got the right idea how to make this work well 
now



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: RAT on just src/java ???

2009-08-27 Thread Chris Hostetter


: How much RAM is it taking for you? I've got it scanning

I didn't look into it htat hard.

: demo/test/src/contrib and it takes 6 seconds - the mem does appear to
: pop to like 160MB from 70 real quick - what are you seeing for RAM reqs?

are you running from the commandline, or from ant?  if you're running from 
ant, what does your target loook like?

from the commandline i'm seeing about what you're seeing, from the ant 
task using something like the target below (if i remember correctly) it 
was hozing me bad...

  target name=rat-sources depends=rat-sources-typedef
 description=runs the tasks over src/java
rat:report xmlns:rat=antlib:org.apache.rat.anttasks
  fileset dir=.
include name=src/** /
include name=contrib/** /
  /fileset
/rat:report
  /target





-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Lucene 2.9 release size

2009-08-27 Thread Chris Hostetter


: This prompts the question (in my mind anyway): should source releases include 
third-party binary jars?

if i remember correctly, the historical argument has been that this way 
the source release contains everything you need to compile the source.

except that if i remember correctly (and i'm very tired at the moment) 
there are some contribs that won't compile without downloading additional 
jars (bdb?) so really the jars included in the source release artifacts 
just represent the jars that *can* be included in the source release.

Not to dredge up maven/ivy dependencay management arguments -- but even if 
we wanted to be certain we were compiling specific versions, without 
depending on any special dependencay managment system/repo we could just 
have the source releases download the jars from our own site so people who 
don't care about compiling those contribs can get smaller source 
distributions.

...but i doubt it's worth trying to tackle before 2.9.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

ICU license info in NOTICE.txt ?

2009-08-27 Thread Chris Hostetter



i notice this file has the full licensing info for ICU...

   contrib/collation/lib/ICU-LICENSE.txt

...but isn't there also suppose to be at least a one line mention of 
this in the top level NOTICE.txt file?





-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

competeing license ifo for snowball code?

2009-08-27 Thread Chris Hostetter



can someone explain this to me...

http://svn.apache.org/viewvc/lucene/java/trunk/contrib/snowball/LICENSE.txt?view=co
http://svn.apache.org/viewvc/lucene/java/trunk/contrib/snowball/SNOWBALL-LICENSE.txt?view=co

...that first one seems like a (very old) mistake.






-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: svn commit: r807763 - /lucene/java/trunk/build.xml

2009-08-27 Thread Chris Hostetter


: FWIW, committers can get Hudson accounts.  See

are you sure about that?  I never understood the reason, but the wiki has 
always said...

if you are a member of an ASF PMC, get in touch and we'll set you up with 
an account.

: http://wiki.apache.org/general/Hudson.  Committers can also get Lucene Zone

-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: competeing license ifo for snowball code?

2009-08-27 Thread Chris Hostetter


: There is a discussion about this at:
: 
:http://issues.apache.org/jira/browse/LUCENE-740

Hmmm... ok.  even with that in mind, I don't understand why we need 
./contrib/snowball/LICENSE.txt -- all of (lucene) source code is already 
covered by ./LICENSE.txt right?

-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: ApacheCon US - Lucene Meetup?!

2009-08-25 Thread Chris Hostetter


: I'm curious if there is a meetup this year @ ApacheCon US similar to
: the one at ApacheCon Europe earlier this year?

There's one on the schedule for tuesday night...
http://wiki.apache.org/apachecon/ApacheMeetupsUs09

I'v updated the Lucene wiki page about apachecon (orriginally created 
for planning) to reflect the current state of affairs and summary of 
recent discussions (on gene...@lucene) about the apachecon gameplan...

http://wiki.apache.org/lucene-java/LuceneAtApacheConUs2009


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: svn commit: r807763 - /lucene/java/trunk/build.xml

2009-08-25 Thread Chris Hostetter


: Grant does the cutover to hudson.zones still invoke the nightly.sh?  I
: thought it did?  (But then looking at the console output from the
: build, I can't correlate it..).

nightly.sh is not run, there's a complicated set of shell commands 
configured in hudson that gets run instead. (why it's not just exec'ing a 
shellscript in svn isn't clear to me ... but it starts with set -x so 
the build log should make it clear exactly what's running.

you can see from that log: the nightly ant target is still used.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Cleaning up Javadoc warnings in contribs

2009-08-18 Thread Chris Hostetter



As a general rule: if the javadoc command generates a warning, it's a 
pretty good indication that the resulting javadocs aren't going to look 
the way you expect. (there may be lots of places where the javadocs look 
wrong and no warning is logged -- but the reverse is almost never true)


The other day, I went through all of the warnings produced by ant 
javadocs-core and fixed the offending javadoc comments.  It would be 
great if each of the various defacto contrib maintainers (you know who 
you are) could take a look at the warnings produced by each of the 
contribs.


They're pretty easy to spot if you grep the raw console output from the 
nightly builds for [javadoc] and warning ...


hoss...@coaster:~$ curl -s http://hudson.zones.apache.org/hudson/job/Lucene-trunk/922/consoleText | 
grep [javadoc] | grep warning | perl -nle 'print $1 if m{contrib/([^/]*)/}' 
| sort | uniq -c
 96 analyzers
 32 benchmark
 52 collation
  8 db
 32 fast-vector-highlighter
 32 highlighter
 24 memory
 40 queryparser
  8 regex
 52 remote
  8 snowball
  8 xml-query-parser





-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Sorting cleanup and FieldCacheImpl.Entry confusion

2009-08-10 Thread Chris Hostetter

: I don't know why Entry has int type and String locale, either.  I
: agree it'd be cleaner for FieldSortedHitQueue to store these on its
: own, privately.
: 
: Note that FieldSortedHitQueue is deprecated in favor of
: FieldValueHitQueue, and that FieldValueHitQueue doesn't cache
: comparators anymore.

yeah ... but i'm hesitent to try and refactor that code at this point, 
especitally if FieldSortedHitQueue is going to be removed in 3.0.

I'm thinking that for the time being, it's probably simpler to just 
comment those properties as being removable once FieldSortedHitQueue is 
removed, and leave them out of the CacheEntry (debugging/sanity) API, 
since there's no code path that will cause them to be set in 
FieldCacheImpl.

is that cool with people?


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Sorting cleanup and FieldCacheImpl.Entry confusion

2009-08-06 Thread Chris Hostetter



Hey everybody, over in LUCENE-1749 i'm trying to make sanity checking of 
the FieldCache possible, and i'm banging my head into a few walls, and 
hoping people can help me fill in the gaps about how sorting w/FieldCache 
is *suppose* to work.


For starters: i was getting confused why some debugging code wasn't 
showing the Locale specified when getting the String[] cache for 
Locale.US.


Looking at FieldSortedHitQueue.comparatorStringLocale, i see that it calls 
FieldCache.DEFAULT.getStrings(reader, field) and doesn't pass the Locale 
at all -- which makes me wonder why FieldCacheImpl.Entry bothers having a 
locale member at all? ... it seems like the only purpose is so 
FieldSortedHitQueue can abuse the Entry object as a key for it's own 
static final FieldCacheImpl.Cache Comparators ... but couldn't it just 
use it's on key object and keep FieldCacheImpl.Entry simpler?


Ditto for the int type property of FieldCacheImpl.Entry, which has the 
comment // which SortField type ... it's used by FieldSortedHitQueue in 
it's Comparators cache (and getCachedComparator) but FieldCacheImpl never 
uses it, but the time the FieldCache is access, the type has already 
been translated into the appropriate method (getInts, getBytes, etc...)



if FieldSortedHitQueue used it's own private inner class for it's 
comparator cache, the FieldCacheImpl.Entry code could eliminate a lot of 
cruft, and the class would get much simpler.


Does anyone know a good reason *why* it's implemented the way it currently 
is? or is this simply the end result of code gradually being refactored 
out of FieldCcaheImpl and into FieldSortedHitQueue ?





-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: backwards compat tests

2009-08-04 Thread Chris Hostetter


: I wonder: if we run an svn commit . tags/lucene_2_4.../src whether
: svn will do this as a single transaction?  Because . (the trunk
: checkout) and tags/lucene_2_4... are two separate svn checkouts.  (I
: haven't tested).  If it does, then I think this approach is cleanest?

you can't have an atomic commit across independent checkouts -- the common 
root dir needs to be a valid svn working copy.

but you can have a common root dir that is a valid svn working copy 
(without checking out the entire svn hierarchy) by using non-recursive 
checkouts (-N).  you don't even need the full subdir hierarchy, just 
checkout and descendent directory into that initial working directory


hoss...@coaster:~/svn-test$ svn ls https://my.work.svn/svn-demo/
branches/
tags/
trunk/
hoss...@coaster:~/svn-test$ svn co -N https://my.work.svn/svn-demo/ demo
Checked out revision 332746.
hoss...@coaster:~/svn-test$ cd demo 
hoss...@coaster:~/svn-test/demo$ svn co 
https://my.work.svn/svn-demo/trunk/a-direcory/ trunk-a
Atrunk-a/one_line_file.txt
Checked out revision 332746.
hoss...@coaster:~/svn-test/demo$ svn co 
https://my.work.svn/svn-demo/branches/BRANCH_DEMO_3/a-direcory 
branch-a
Abranch-a/one_line_file.txt
Checked out revision 332746.
hoss...@coaster:~/svn-test/demo$ svn status
?  trunk-a
?  branch-a
hoss...@coaster:~/svn-test/demo$ svn status trunk-a branch-a/
hoss...@coaster:~/svn-test/demo$ echo foo  trunk-a/one_line_file.txt 
hoss...@coaster:~/svn-test/demo$ echo bar  branch-a/one_line_file.txt 
hoss...@coaster:~/svn-test/demo$ svn commit -m cross checkout commit 
trunk-a branch-a
Sendingbranch-a/one_line_file.txt
Sendingtrunk-a/one_line_file.txt
Transmitting file data ..
Committed revision 332747.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1749) FieldCache introspection API

2009-08-03 Thread Chris Hostetter (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738721#action_12738721
]

Chris Hostetter commented on LUCENE-1749:
-

: I've got one more draft here with the smallest of tweaks - javadoc
: spelling errors, and one perhaps one or two other tiny things - stuff I
: just would toss out rather than merge - but are you doing anything here
: right now Hoss? I think not at the moment, so if thats the case I'll put
: up one more patch before you grab the conch back. Otherwise I'll hold
: off on anything till you put something up.

you have the conch ... i haven't worked on anything related to this issue
since my last patch.

i'll try to look at it again tomorow.

-Hoss

FieldCache introspection API

Key: LUCENE-1749
URL: https://issues.apache.org/jira/browse/LUCENE-1749
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Hoss Man
Priority: Minor
Fix For: 2.9

Attachments: fieldcache-introspection.patch,
LUCENE-1749-hossfork.patch, LUCENE-1749.patch, LUCENE-1749.patch,
LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch,
LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch

FieldCache should expose an Expert level API for runtime introspection of the
FieldCache to provide info about what is in the FieldCache at any given
moment. We should also provide utility methods for sanity checking that the
FieldCache doesn't contain anything odd...
* entries for the same reader/field with different types/parsers
* entries for the same field/type/parser in a reader and it's subreader(s)
* etc...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Updated: (LUCENE-1749) FieldCache introspection API

2009-08-01 Thread Chris Hostetter


: changes to just go per reader for each doc - and a couple other unrelated 
tiny tweaks.

FWIW: now that this issues has uncovered a few genuine bugs in code (as 
opposed to justs tests being odd) it would probably be better to track 
those bugs and their patches in seperate issues that can be individually 
refered to in CHANGES.txt (and reopened as needed)

committing those bug fixes can be done independently of commiting the 
sanity checker.

(PS: i'm making this suggestiong based purely on skiming the jiraemail 
stream from the last day or so ... i haven't looked at the patches but the 
decriptions seem to suggest they contain actual bug fixes, not just test 
modifications)

: 
:  FieldCache introspection API
:  
: 
:  Key: LUCENE-1749
:  URL: https://issues.apache.org/jira/browse/LUCENE-1749
:  Project: Lucene - Java
:   Issue Type: Improvement
:   Components: Search
: Reporter: Hoss Man
: Priority: Minor
:  Fix For: 2.9
: 
:  Attachments: fieldcache-introspection.patch, 
LUCENE-1749-hossfork.patch, LUCENE-1749.patch, LUCENE-1749.patch, 
LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, 
LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch
: 
: 
:  FieldCache should expose an Expert level API for runtime introspection of 
the FieldCache to provide info about what is in the FieldCache at any given 
moment.  We should also provide utility methods for sanity checking that the 
FieldCache doesn't contain anything odd...
: * entries for the same reader/field with different types/parsers
: * entries for the same field/type/parser in a reader and it's 
subreader(s)
: * etc...
: 
: -- 
: This message is automatically generated by JIRA.
: -
: You can reply to this email to add a comment to the issue online.
: 
: 
: -
: To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: java-dev-h...@lucene.apache.org
: 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1769) Fix wrong clover analysis because of backwards-tests, upgrade clover to 2.4.3 or better

2009-07-31 Thread Chris Hostetter


: I didn't realize the nightly build runs the tests twice (with  w/o 
: clover); I agree, running only with clover seems fine?

i'm not caught up on this issue, but i happen to notice this comment in 
email.

the reason the tests are run twice is because in between the two runs we 
package up the jars.  clover instruments all the classes, so if we only 
ran hte tests once (w/clover), and then packaged the jars the nightly 
builds would include clover instrumented bytecode.

if you look at the old Jira issues about clover this is discussed there.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1749) FieldCache introspection API

2009-07-31 Thread Chris Hostetter


: In the insanity check, when you drop into the sequential subreaders - I 
: think its got to be recursive - you might have a multi at the top with 
: other subs, or any combo thereof. I can add to next patch.

i don't have the code in front of me, but i thought i was adding the sub 
readers to the list it's iterating over, so it will eventually recurse all 
the way to the bottom.


: 
:  FieldCache introspection API
:  
: 
:  Key: LUCENE-1749
:  URL: https://issues.apache.org/jira/browse/LUCENE-1749
:  Project: Lucene - Java
:   Issue Type: Improvement
:   Components: Search
: Reporter: Hoss Man
: Priority: Minor
:  Fix For: 2.9
: 
:  Attachments: fieldcache-introspection.patch, 
LUCENE-1749-hossfork.patch, LUCENE-1749.patch, LUCENE-1749.patch, 
LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, 
LUCENE-1749.patch, LUCENE-1749.patch
: 
: 
:  FieldCache should expose an Expert level API for runtime introspection of 
the FieldCache to provide info about what is in the FieldCache at any given 
moment.  We should also provide utility methods for sanity checking that the 
FieldCache doesn't contain anything odd...
: * entries for the same reader/field with different types/parsers
: * entries for the same field/type/parser in a reader and it's 
subreader(s)
: * etc...
: 
: -- 
: This message is automatically generated by JIRA.
: -
: You can reply to this email to add a comment to the issue online.
: 
: 
: -
: To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: java-dev-h...@lucene.apache.org
: 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: [jira] Commented: (LUCENE-1764) SampleComparable doesn't work well in contrib/remote tests

2009-07-29 Thread Chris Hostetter


: SortField.equals() and hashCode() contain a hint:
: 
:   /** Returns true if codeo/code is equal to this.  If a
:*  {...@link SortComparatorSource} (deprecated) or {...@link
:*  FieldCache.Parser} was provided, it must properly
:*  implement equals (unless a singleton is always used). */
: 
: Maybe we should make this more visible, contain all different SortField
: comparator/parsers and place it in the the setter methods for parser and
: comparators.

SortField doesn't seem like the right place at all -- people constructing 
instances of SortField, or calling setter methods of SortField shouldn't 
have to care about this at all -- it's people who extend 
SortComparatorSource or FieldCache.Parser who need to be aware of these 
issues, so shouldn't the class level javadocs for those packages spell it 
out?

(ideally those abstract classes would declare hasCode and equals as 
abstract to *force* people to implement them ... but ship has sailed)




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1764) SampleComparable doesn't work well in contrib/remote tests

2009-07-28 Thread Chris Hostetter


: We prob want a javadoc warning of some kind too though right? Its not 
: immediately obvious that when you switch to using remote, you better 
: have implemented some form of equals/hashcode or you will have a memory 
: leak.

Hmmm, now i'm confused.
Uwe's comment in the issue said This is noted in the docs. and i 
beleived him and figured the problem was exclusive to the SampleComparable 
in the test ... but now that i'm *looking* at the docs, i don't see any 
red flags (in SortField, RemoteSearchable, SoreComparator, etc...)

Uwe?


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [ApacheCon US] Travel Assistance

2009-07-22 Thread Chris Hostetter


: Is the assistance restricted to people presenting and committers?

nope...

http://www.apache.org/travel/index.html


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene 2.9 Again

2009-07-22 Thread Chris Hostetter


: LUCENE-1749 FieldCache introspection API Unassigned 16/Jul/09
: 
:   You have time to work on this Hoss?

i'd have more time if there weren't so many darn solr-user questions that 
no one else answers.

The meat of the patch (adding an API to inspect the cache) could be 
commited as is today -- i just don't know if the API makes sense (needs 
more eyeballs), and the real value add will be getting the sanity testing 
utilities in place ... those are only about half done.

i'll try to work on it more this week(end) but if there isn't any progress 
from me, someone else (ahem: Miller?) should probably prune it down to 
the core function, add whatever javadocs are missing, and commit.

(better to have release with a simple inspection API then to delay 
releasing while a fancy inspection methods gets hashed out)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Documentation Suggestion

2009-07-21 Thread Chris Hostetter


: OK, I agree this makes sense and would be good for major features.
: 
: Btw: For the new TokenStream API I wrote in the original patch (JIRA-1422) a
: quite elaborate section in the package.html of the analysis package.

Yeah ... whenever javadocs make sense, they're probably better then wiki 
docs ... in the case of Solr the userbase is rarely Java users, so it's 
good to have hollistic documentation somewhere other then javadocs.

To me, the key is to make sure all functionality is documented *somewhere* 
before it gets committed.  if it makes sense in javadocs great, if it's 
too widespread to fit neatly into the javadoc method/class/package 
structure, a wiki ting everything together is handy.

That said: even with simple javadocs, having them on the wiki makes it a 
lot easier to read then needing to downlaod/apply the patch *then* 
generate javadocs to read the cross linked info.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

back compat policy changes?

2009-06-24 Thread Chris Hostetter



(Please remain calm, this is just a request for clarification/summation)

As I slowly catch up on the 9000+ Lucene related emails that I accumulated 
during my 2 month hiatus, I notice several rather large threads (i think 
totally ~400 messages) on the subject of our back compat policy (where 
it works, where it's failing us; where it hurts users because it works as 
designed, where it hurts users because it doesn't work as designed; how we 
could change it to be better, why we shouldn't change it; etc...)


I won't pretend that i've read all of those messages ... i won't even 
pretend that I've skimmed all those messages, but i did skim *some* of 
those messages, and in some of the later threads there seemed to be a lot 
of concensus about ideas that (as far as i can tell) were not just leave 
things alone.


With that in mind, i was kind of suprised to see that the neither of the 
two wiki pages (that i know of) related to backwards compatibility have 
been updated since *well* before all of the recent threads...


http://wiki.apache.org/lucene-java/BackwardsCompatibility?action=info
http://wiki.apache.org/lucene-java/Java_1%2e5_Migration?action=info

My request is that someone who was involved in the previous discussions 
take a stab at updating one or both of those docs to reflect what the 
concensus of the community was.  Other people can then review the diff for 
those documentation changes and spot check ewther they feel it reflects 
the concensus as they understand it.  But until the written policy has 
been changed, our policy (by definition) hasn't really been changed.



In short: Patches Welcome!


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Deleting old javadoc files on Hudson

2009-06-24 Thread Chris Hostetter


: Done. Thanks for testing!

I hate to be a buzz kill, but all this really does is replace the outdated 
javadoc generated index.html file with a new one that points at the 
subdirs we've created ... I don't see how this solves the root problem: 
Hudson doesn't delete the old files
  https://hudson.dev.java.net/issues/show_bug.cgi?id=1000

The Publish JavaDoc feature copies a configured path for javadocs
into an existing archive directory -- any file that 
existed in a previous build of the javadocs and isn't in the 
current javadocs will still be there.  All we've done is stop linking to 
the old flattened doc hierarchy, but any caches, bookmarks, or search 
engines linking to them will still find valid pages.

In addition to my previous suggestion...
   http://www.gossamer-threads.com/lists/lucene/java-dev/70655#70655
...another config option we could try is Retain javadoc for each 
successful build.  There is a warning that this causes it to take up 
more disk (because it keeps the javadocs for each build) but I *think*
if we use that option, it will create a brand new javadoc directory for 
each build.
 
(it looks like our uncompressed javadocs are about 5 times as big as our 
binary artifacts ... but we currently keep the last 30 builds which seem 
excessive.  If we cut hte number of archived builds we keep to 5 we'd wind 
up using less disk)


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Common Bottlenecks

2009-06-23 Thread Chris Hostetter

On Tue, 9 Jun 2009, Vico Marziale wrote:

: highly-multicore processors to speed computer forensics tools. For the
: moment I am trying to figure out what the most common performance bottleneck
: inside of Lucene itself is. I will then take a crack at porting some (small)
: portion of Lucene to CUDA (http://www.nvidia.com/object/cuda_what_is.html)
: and see what kind of speedups are achievable.
...
: appears to be a likely candidate. I've run the demo code through a profiler,
: but it was less than helpful, especially in light of the fact bottlenecks
: are going to be dependent on the way the Lucene API is used. In
: general, what is the most computationally expensive part of the process?

Vico: it doesn't look like you got any replies to your question.  
performance isn't something i generally focus on when working on lucene, 
but my suggestion for finding hot spots that could be improved is to look 
at the benchmark tests in the contrib/benchmark directory.

Running some of those in a profiler should help you spot the likely 
candidates for improvements when dealing with non-trivial usecases.

one thing to keep in mind is that search performance tends to be 
completley seperate form indexing performance ... you may want to tackle 
just one of those types of code paths. search tends to be the type of task 
that people are most concerned with optimizing for speed.




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

2009-06-22 Thread Chris Hostetter

: The javadocs state clearly it must be MapString,String.  Plus, the
: type checking is in fact enforced (you hit an exception if you violate
: it), dynamically (like Python).
: 
: And then I was thinking with 1.5 (3.0 -- huh, neat how it's exactly
: 2X) we'd statically type it (change Map to MapString,String).

the other option i've seen in similar situations is to document that 
MapObject,Object is allowed, but that the Object will be toString()ed 
and the resulting value is what will be used.

In the common case of Strings, the functionality is the same without 
requiring any explicit casting or instanceof error checking.

the added bonuses are:
  1) people can pass other simple objects (Integers, Foats, Booleans) 
and 99% of the time get what they want.
  2) people can pass wrapper objects that implement toString() in a non 
trivial way and have the string produced for them lazily when the time 
comes to use the String.  (ie: if my string value is expensive to produce, 
i can defer that cost until needed in case the commit fails for some other 
reason before my string is even used)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: bulk fixing svn eol-style?

2009-06-22 Thread Chris Hostetter


: We have a number of sources that don't have eol-style set to native...

This should also serve as a reminder for all committers to make sure they 
have sane auto-prop configs for their svn client when svn adding files 
-- SVN doesn't have any way to configure these on the server side, so 
you're responsible for setting them.

The solr wiki has some recommended config options (which should probably 
get copied to the lucene-java wiki)...

http://wiki.apache.org/solr/CommitterInfo#head-849f78497222f424339b79417056f4e510349fcb



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

2009-06-22 Thread Chris Hostetter


: But then when you retrieve your metadata it's converted to String - String.

Correct ... the documentation should make it clear that what gets 
persisted is a String, but the method of giving the String to the API is 
by passing an Obejct that will be toString()ed.

(Asside: it would be really nice if Java had a Stringable interface)

It's not the prettiest API in the world, in a pure Java1.5 code base i 
wouldn't even suggest it, but in 1.4 code bases it tends to be a lot 
more freindly then then to document that people must pass a collection of 
Stings and cast them all.




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Shouldn't IndexWriter.commit(Map) accept Properties instead?

2009-06-22 Thread Chris Hostetter


: If the user serializes object, opens the index on another machine where
: different versions of these classes are installed and he did not use
: serialVersionId to create a version info in index. As long as you only
: serialize standard Java classes like String, HashMap,... you will have no
: problem with that, but with own classes a lot of care must be taken that
: they can be serialized in different versions. In my case with the stored
: document Field it was just a LinkedHashSet of String or something like that
: (very easy for serialization).
: 
: An the second problem is, that if you want to open such an index e.g. with
: PyLucene? Should PyLucene just ignore the binary serialization data?

Right ... i wouldn't advocate using Java serialization here for all of 
those reasons (especially since so many people have worked so hard to move 
towards dealing with pure byte[]s on disk instead of java serialized 
Strings)

So to be clear: I wasn't in any way advocating that we do arbitrary 
serialization, or do anything different with the String values once we 
get them from the caller -- i was just suggesting an alternate API for 
getting String values from the caller in a way that didn't involve 
casting.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Weird problem w/ JIRA

2009-06-22 Thread Chris Hostetter


: I'm back to getting duplicate emails. Every email sent on LUCENE-1708 was
: sent to my email, and java-dev. So this really looks like it's a JIRA
: project setting, since I only get these duplicates on issues I open. Am I
: the only one?

That's they way Jira works by default... it sends an email to everyone 
involved with an Issue (the reporter, the asignee, the watchers, etc...)  
we then have the project configured to *also* send notificatio of every 
change to java-dev.

: Is it possible to change the settings of the project on JIRA? Or at least
: allow me to say I don't want to get updates on this issue?

not once you're opened it.

the simplest solution is to make your jira email account something that 
gets filtered away seperately from mailing list account info (ie: directly 
into the trash)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Tests fail to compile on JDK 1.4?

2009-06-18 Thread Chris Hostetter


: We had some discussions about it, the easiest is, to set the bootclasspath
: in the javac/ task to an older rt.jar during compilation. Because this
: needs updates for e.g. Hudson (rt.jar missing) we said, that the one, who
: releases the final version should simply check this before on the
: compilation computer in the release process.

there are ways to automate this sanity check in ant, i took a stab at 
this a while back...
  https://issues.apache.org/jira/browse/LUCENE-718

...but i never moved forward with it becuase most people didn't seemed 
that concerned.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Modularization

2009-04-09 Thread Chris Hostetter


: Then during build we can package up certain combinations.  I think
: there should be sub-kitchen-sink jars by area, eg a jar that contains
: all analyzers/tokenstreams/filters, all queries/filters, etc.

Or just make it trivial to get all jars that fit a given profile w/o 
actually merging those jars into an uber-jar ... does maven's 
dependency management have any like bundles or virtual packages so 
we could publish a lucene-all-analzers POM that didn't have an actual 
lucene-all-analyzers.jar but listed dependencies on all of the individual 
jars?

(FYI: Perl's CPAN has the concept of a Bundle that's just an empty 
distribution that depends on other distributions so you have an single 
refrence point for installing them)

: So, how would you refactor the various sources of
: analyzers/tokenstream/tokenfilters we have today
: (src/java/org/apache/lucene/analysis/*, contrib/snowball/*,
: contrib/collation/* and contrib/analyzers/*)?  (Even contrib/memory
: has a neat PatternAnalyzer, that operates on a string using a regexp
: to get tokenns out, that only now am I just discovering).

I think ideally the existig contrib/analysis would be broken up by 
language -- even if that means only 2 or 3 classes per jar -- but i don't 
deal with multilingual stuff much so i don't have much of an opinoin ... 
perhaps the majority of our users that deal with non-english tend to deal 
with *lots* of langauges so having a single multilingual-analysis module 
would be suitable.

: We also need to think about how this impacts our back-compat policy.
: EG when are we allowed to split up modules into sub-modules, or merge
: them.

spliting a module should always be fair game as long as the new module(s) 
maintain the same back compat policy ... it's not a burden to ask people 
to start using 2 jars instead of 1 jar (especially if we're already going 
to have an easy way to bundle jars up into uber-jars)

in theory merging modules should require that the new module adopt the 
most restrictive back-compat policy of the previous modules.

: Assuming there's general consensus on this break core into modules
: approach, I think the next step is to take in inventory of all of
: Lucene's classes and roughly divide them into proposed modules, and
: iterate on that?  Hoss do you want to take a first stab at that?

Heh.  i'm not sure i could even answer the want question in the 
afirmative.  This is essentially a question of refactoring, and I think 
approaching this incrimentally would be the best strategy ... either by 
first finding some low hanging fruit in core that could be extracted int 
oa contrib easily (spans, query parser) or by restructuring the build 
system to put contribs and the demo on equal footing with core as 
modules and reasses as progress is made.

on a personal note: even if i wanted to lead this charge, i really can't 
right now ... folks may have noticed my involvement with lucene has been 
markedly lower in the last few months, i expect it to get even lower over 
the next 2 months before it will (hopefully) get higher. 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Modularization

2009-04-09 Thread Chris Hostetter


: We've been doing this using just one source tree (like in Lucene), and
: instead ensuring the separation using the build system. We did not, like you

I think you are missunderstanding my previous comment ... Lucene-Java does 
not currenlty have one source tree in the sense that someone else 
suggested (i forget who) and i was commenting on ... at the moment Lucene 
has several source trees (src/java, src/demo, and each dir matching 
contrib/*/src).  

Based on your examples, i believe we are suggesting the same thing: 
building seperate modules from seperate base directories (in your case 
foo/A and foo/B) with well defined dependencies.






-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Modularization

2009-04-09 Thread Chris Hostetter



: If there are any serious moves to reorganize things, we should at least
: consider the benefits of maven.

+1

we can certainly do a lot to improve things just by refacting stuff from 
core into contrib, and improving the visibility of contribs and 
documentation about contribs -- but if we're going to make massive changes 
to how things are built or how the source code is organized, then 
utilizing maven as the build system seems like an obvious choice to me.

(and i don't even like maven that much)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Modularization

2009-03-30 Thread Chris Hostetter


After stiring things up, and then being off-list for ~10 days, I'm in an 
interesting position coming back to this thread and seeing the discussion 
*after* it essentially ended, with a lot of semi-concensus but no clear 
sense of hard and fast resolution or plan of action.

FWIW, here are the notes i made based on reading the thread about the 
various sentiments i noticed expressed (wether i agree with them or 
not) in order to try and get a handle on what had been discussed.  
some of these were the optinion of a single person and i've paraphrased, 
others are my generalization of similar comments made by various 
people...

- contrib has a bad rap
- widely varying degrees of quality/stability in contrib code, hard to get 
people to rely on the good ones because of the less good ones
- many people want a good, out of hte box, kitchen sink experience (ie: 
one monolithic jar containing all the essentials)
- need easy discoverability of all things of a given type (ie: all 
queries, all filters, all analyzers, etc...) .. ie: combined javadocs.
- need easy installation of of all things of a given type (ie: a jar 
containing all types of queries, a jar containing all types of analyzers, 
etc...)
- still need to deal with contribs that have external dependencies
- still need to deal with contribs that require future versions of 
langauge (Java1.7 when core is still 1.5 compat)
- users need better guidance about why something is a contrib 
(additional functionality, alternate functionality, example of use, tool, 
etc...)
- while we should maintain/increase modularization, documentation should 
make features of contribs more promonent without stressing the isolation 
resulting from code modularization.
- we should merge all contrib  core code into a unified src/ tree, and 
make the pacakging independent of the physical location in svn (ie: jars 
based on java package, not directory)

While I'm mostly in favor of all of these sentiments, and think it's 
really just a question of how to go about it, the last one is actually 
something i've pretty stronly opposed to -- I think the best way forward 
is to have lots of small, well isolated source trees.

code isolation (by directory hierarchy) is hte best way i've seen to 
ensure modularization, and protect against inadvertent dependency 
bleeding.  If we want to be able to produce small jars targeted at 
specific goals, and we want o.l.a.foo.FooClass to be in foo.jar and 
o.l.a.bar.BarClass to be in bar.jar then we shouldn't have 
src/java/o/l/a/foo/FooClass.java and src/java/o/l/a/bar/BarClass.java -- 
doing so makes it way to easy for inadvertnent dependencies to crop up 
that make FooClass depend on bar class, and thus make it impossible to use 
foo.jar without also using bar.jar at runtime.

it's certainly possible to have all source code in a single directory 
hierarchy, and then rely on the build system to ensure your don't 
inwarranted dependencies, but that requires you do express rules in the 
build system about what exactly the acceptible dependencies are, and it 
relies on everyone using the buildsystem correctly (missguided users of 
hand-holding IDEs could get very frustrated when the patches they submit 
violate rules of an overly complicated set of ant build files)

FWIW: having lots/more of very small, isolated, hierarcies also wouldn't 
hinder any attempts at having kitchen-sink or essential jars --
combining the classes from lots of little isolated code trees is a lot 
easier then extracting a few classes from one big code tree. 

One underlying assumption that seems to have permiated the existing 
discussion (without ever being explicitly stated) is the idea that most 
currently lives in src/java is the core and would be a single module 
... personally i'd like to challege that assumption.  I'd like to suggest 
that besides obvious things that could be refactored out into other 
modules (span queries, queryparser) there are lots of additional ways 
that src/java could be sliced...

 - interfaces and abstract clases and concrete classes for reading an 
index in one index-api.jar (ie: Directory but no FSDirectory; IndexReader 
but not MultiReader)
 - ditto for creating/updating an index in one index-update.jar (ie: 
IndexWriter, TokenStream, Tokenizer, TokenFilter, Analyzer  but 
not any impls of the last 3)
 - ditto for searching in index-search.jar (ie: Searcher, Searchable, 
HitCollector, Query ... but not any concrete subclasses
 - simple-analysis.jar (SimpleAnalyzer, WhitespaceAnalyzer, 
LetterTokenizer, LowercaseFilter, etc...)
 - english-analysis.jar (StandardAnalyzer, etc...)
 - primative-queries.jar (TermQuery, BooleanQuery, MatchAllDocsQuery, 
MultiTermQuery, etc...)
 - range-queries.jar (RangeQuery, RangeFilter, ConstantScoreRangeQuery)

   ...etc...


The crux of my point being that what we think of today as the lucene 
core is actually kind of big and bloated, and already has *a* kitchen 
sink thrown in -- it's just not neccessarily

Re: List Moderators

2009-03-26 Thread Chris Hostetter


: Every now and again, someone emails me off list asking to be removed from the
: list and I always forward them to Erik, b/c I know he is a moderator.
: However, I was wondering who else is besides Erik, since, AIUI, there needs to
: be at least 3 in ASF-land, right?
: 
: So, if you're a list moderator for dev/user, please stand up.

the docs for say committers have instructions for checking the moderators 
for any list, however the process seems to no longer work (probably 
because mail handling got moved onto a different box)...

http://www.apache.org/dev/committers.html#mailing-list-moderators
https://svn.apache.org/repos/private/committers/docs/resources.txt

...might be worth following up with INFRA to sanity check the list of 
moderators on all lucene lists, make sure we have three *active* 
moderators on each list.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: New flexible query parser

2009-03-20 Thread Chris Hostetter


: My vote for contrib would depend on the state of the code - if it passes all
: the tests and is truly back compat, and is not crazy slower, I don't see why
: we don't move it in right away depending on confidence levels. That would
: ensure use and attention that contrib often misses. The old parser could hang
: around in deprecation.

FWIW: It's always bugged me that the existing queryParser is in the core 
anyway ... as i've mentioned before: I'd love to see us move towards 
putting more features and add-on functionality in contribs and keeping the 
core as lean as possible: just the core functionality for indexing  
searching ... when things are split up, it's easy for people who want 
every lucene feature to include a bunch of jars; it's harder for people 
who want to run lucene in a small footprint (embedded apps?) to extract 
classes from a big jar.

so my vote would be to make it a contrib ... even if we do deprecate the 
current query parser because this can be 100% back compatible -- it just 
makes it a great opportunity to get query parsing out of hte core.




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Is TopDocCollector's collect() implementation correct?

2009-03-20 Thread Chris Hostetter



(resending msg from earlier today during @apache mail outage -- i didn't 
get a copy from the list, so i'm assuming no one did)


-- Forwarded message --
Date: Fri, 20 Mar 2009 15:29:13 -0700 (PDT)

: TopDocCollector's (TDC) implementation of collect() seems a bit problematic
: to me.

This code isn't an area i'm very familiar with, but your assessment seems
correct ... it looks like when LUCENE-1356 introduced the ability to
provide a PriorityQueue to the constructor, the existing optimization when
the score was obvoiusly too low was overlooked.

It looks like this same bug got propogated to TopScoreDocCollector
when it was introduced as well.

: Introduce in TDC a private boolean which signals whether the default PQ is
: used or not. If it's not used, don't do the 'else if' at all. If it is used,
: then the 'else if' is safe. Then code could look like:

my vote would just be to change the = comarison to a hq.lessThan call
... but i can understand how your proposal might be more efficient -- I'll
let the performance experts fight it out ... but i definitely think you
should fil a bug.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Using Highlighter for highlighting Phrase query

2009-03-20 Thread Chris Hostetter


(resending msg from earlier today during @apache mail outage -- i didn't 
get a copy from the list, so i'm assuming no one did)

: Date: Fri, 20 Mar 2009 15:30:27 -0700 (PDT)
: 
: http://people.apache.org/~hossman/#java-dev
: Please Use java-u...@lucene Not java-...@lucene
: 
: Your question is better suited for the java-u...@lucene mailing list ...
: not the java-...@lucene list.  java-dev is for discussing development of
: the internals of the Lucene Java library ... it is *not* the appropriate
: place to ask questions about how to use the Lucene Java library when
: developing your own applications.  Please resend your message to
: the java-user mailing list, where you are likely to get more/better
: responses since that list also has a larger number of subscribers.
: 
: 
: 
: : Date: Tue, 17 Mar 2009 07:38:08 -0700 (PDT)
: : From: mitu2009 musicfrea...@gmail.com
: : Reply-To: java-dev@lucene.apache.org
: : To: java-dev@lucene.apache.org
: : Subject: Using Highlighter for highlighting Phrase query
: : 
: : 
: : Am using this version of Lucene highlighter.net API. I want to get a phrase
: : highlighted only when ALL of its words are present in the search
: : results..But,am not able to do sofor example, if my input search string
: : is Leading telecom company, then the API only highlights telecom in the
: : results if the result does not contain the words leading and company...
: : 
: : Here is the code i'm using:
: : 
: : SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
: : 
: : var appData =
: : (string)AppDomain.CurrentDomain.GetData(DataDirectory);
: : var folderpath = System.IO.Path.Combine(appData, MyFolder);
: : 
: : indexReader = IndexReader.Open(folderpath);
: : 
: : Highlighter highlighter = new Highlighter(htmlFormatter, new
: : QueryScorer(finalQuery.Rewrite(indexReader)));
: : 
: : 
: : highlighter.SetTextFragmenter(new SimpleFragmenter(800));
: : 
: : int maxNumFragmentsRequired = 5;
: : 
: : string highlightedText = string.Empty;
: : 
: : TokenStream tokenStream = this._analyzer.TokenStream(fieldName,
: : new System.IO.StringReader(fieldText));
: : 
: : highlightedText = highlighter.GetBestFragments(tokenStream,
: : fieldText, maxNumFragmentsRequired, ...);
: : 
: : return highlightedText;
: : 
: : -- 
: : View this message in context: 
http://www.nabble.com/Using-Highlighter-for-highlighting-Phrase-query-tp22560334p22560334.html
: : Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
: : 
: : 
: : -
: : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
: : For additional commands, e-mail: java-dev-h...@lucene.apache.org
: : 
: 
: 
: 
: -Hoss
: 
: 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Using MultiFieldQueryParser

2009-03-20 Thread Chris Hostetter


(resending msg from earlier today during @apache mail outage -- i didn't 
get a copy from the list, so i'm assuming no one did)

: Date: Fri, 20 Mar 2009 15:30:59 -0700 (PDT)
: 
: http://people.apache.org/~hossman/#java-dev
: Please Use java-u...@lucene Not java-...@lucene
: 
: Your question is better suited for the java-u...@lucene mailing list ...
: not the java-...@lucene list.  java-dev is for discussing development of
: the internals of the Lucene Java library ... it is *not* the appropriate
: place to ask questions about how to use the Lucene Java library when
: developing your own applications.  Please resend your message to
: the java-user mailing list, where you are likely to get more/better
: responses since that list also has a larger number of subscribers.
: 
: 
: : Date: Tue, 17 Mar 2009 08:47:05 -0700 (PDT)
: : From: mitu2009 musicfrea...@gmail.com
: : Reply-To: java-dev@lucene.apache.org
: : To: java-dev@lucene.apache.org
: : Subject: Using MultiFieldQueryParser
: : 
: : 
: : Hi,
: : 
: : Am working on a book search api using Lucene.User can search for a book
: : whose title or description field contains C.F.A..
: : Am using Lucene's MultiFieldQueryParser..But after parsing, its removing the
: : dots in the string. 
: : 
: : What am i missing here?
: : 
: : Thanks.
: : 
: : -- 
: : View this message in context: 
http://www.nabble.com/Using-MultiFieldQueryParser-tp22562134p22562134.html
: : Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
: : 
: : 
: : -
: : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
: : For additional commands, e-mail: java-dev-h...@lucene.apache.org
: : 
: 
: 
: 
: -Hoss
: 
: 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: move TrieRange* to core?

2009-03-20 Thread Chris Hostetter


(resending msg from earlier today during @apache mail outage -- i didn't 
get a copy from the list, so i'm assuming no one did)

: Date: Fri, 20 Mar 2009 16:51:05 -0700 (PDT)
: 
: : I think we should move TrieRange* into core before 2.9?
: 
: -0
: 
: I think we should try to move more things *out* of the core in 3.0 (as 
: i've mentioned in other threads) ... but i certianly understand the 
: arguments for going the other direction.
: 
: : It's received alot of attention, from both developers (Uwe  Yonik did
: : lots of iterations, and Solr is folding it in) and user interest.
: 
: it's a chicken/egg problem that we move things into the core because they 
: are very useful and we want to give them more visibilty, but if we had 
: less things in the core and more things in contribs (query parser, spans, 
: standard analyzer, non-primative Query impls, etc...) then contribs as a 
: whole would be more visible.  ... I'm getting a sense of deja-vu, ah 
: yes, here it is ...
: 
: 
http://www.nabble.com/Moving-SweetSpotSimilarity-out-of-contrib-to19267437.html#a19320894
: 
: 
: -Hoss
: 
: 



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Getting tokens from search results. Simple concept

2009-03-05 Thread Chris Hostetter


: What I would LOVE is if I could do it in a standard Lucene search like I
: mentioned earlier. 
: Hit.doc[0].getHitTokenList() :confused:
: Something like this...

The Query/Scorer APIs don't provide any mechanism for information like 
that to be conveyed back up the call chain -- mainly because it's more 
heavy weight then most people need.

If you have custom Query/Scorer implementations, you can keep track of 
whatever state you want when executing a QUery -- in fact the SpanQuery 
family of queries do keep track of exactly the type of info you seem to 
want, and after executing a query, you can ask it for the Spans of any 
matching document -- the down side is the a loss in performance of query 
execution (because it takes time/memory to keep track of all the matches)


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Use of Unicode data in Lucene

2009-03-05 Thread Chris Hostetter


: I can implement the functionality just using the data tables from the Unicode
: Consortium, including http://www.unicode.org/reports/tr39, but there's still
: the issue of the Unicode data license and its compatibility with Apache 2.0.
: 
: Does anybody know whether http://www.unicode.org/copyright.html creates an
: issue? What's the process for vetting a license? Or is this something I should
: be posting to a different list?

The authoritative docs to be familiar with are...
http://www.apache.org/legal/3party.html
and http://www.apache.org/legal/resolved.html

..but it's not clear to me exactly where the Unicode copyright/licenseing 
rules fall into the spectrum.

The best place to ask questions about license compatibility issues is 
legal-disc...@apache (i'm pretty sure Ken already found that out since he 
posted there, just mentioning it for anyone else who might be interested)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Sorting and multi-term fields again

2009-03-05 Thread Chris Hostetter

: TrieRange fields is needed), I again thought about the issue. Maybe we could
: change FieldCache to only put the very first term from a field of the
: document into the cache, enabling sorting against this field. If possible,
: this would be very nice and in my opinion better that the idea proposed in
: the issue.

in the fairly common case of tokenized fields, the first term found 
during enumeration isn't neccessarily (or even frequently) the first 
term in the pre-tokenized string ... so this doesn't help people very 
much.

the recommended solution in the tokenized case is to have a duplicate 
non tokenized field -- that seems like the best solution in the 
non-tokenized case as well (where the caller is conciously choosing to add 
multiple Field instances with the same fieldName to a a Document)...  pick 
which Field Value represents the value you want used during sorting, and 
add that value to the documetning using an alternate fieldName.

I've never encountered any serious objecting to this approach.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: sort lucene results

2009-02-25 Thread Chris Hostetter


: but i need the result by the word place in the sentence like this:
: 
: bbb text 4 , text 2 bbb text  , text 1 ok ok ok bbb ..

1) SpanFirstQuery should work, it scores higher the closer the nested 
query is to the start -- just use a really high limit,.  if you are only 
dealing with simple Term/Phrase queries it's easy to switch to using SpanTerm 
and 
SpanNear queries inside of a SpanFirstQuery.

2) Please Use java-u...@lucene Not java-...@lucene
http://people.apache.org/~hossman/#java-dev

Your question is better suited for the java-u...@lucene mailing list ...
not the java-...@lucene list.  java-dev is for discussing development of
the internals of the Lucene Java library ... it is *not* the appropriate
place to ask questions about how to use the Lucene Java library when
developing your own applications.  Please resend your message to
the java-user mailing list, where you are likely to get more/better
responses since that list also has a larger number of subscribers.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Jukka's not on Who We Are yet

2009-02-24 Thread Chris Hostetter

: Subject: Jukka's not on Who We Are yet
: 
: Jukka's not on http://lucene.apache.org/java/docs/whoweare.html

That list is specificly the Lucene-Java committers.  Jukka is listed on 
the PMC list...

http://lucene.apache.org/who.html





-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1398) Add ReverseStringFilter

2009-02-23 Thread Chris Hostetter


: I don't know how others feel, but I'd personally like to stop the 
: practice of making more Analyzer classes whenever a new TokenFilter is 
: added.

+1




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: LIA2 on l.a.o/java OK?

2009-02-23 Thread Chris Hostetter


: I'm OK with LIA2 on the front page - as Erik suggests it does help lend 
: credibility to a project.   

+1 to more visibility to books focused on lucene on official www site 
pages (not just hte wiki)

+1 to prominent display via a section on the main page like wicket 
currently has, with links to more info on each book (those links could 
easily go to wiki pages if we don't want to have to maintain the full 
detail pages in forrest) .. the News section is too long/outdated anwyay, 
so shortening it up to help make books more visible is a good thing.

: So the test-case for this statement would be - what if there was a 
: terrible book published? I can't see it happening myself but you have to 
: ask if there is some inferred recommendation of quality on any links we 
: provide.

site changes are commits, they (will) happen because someone submits a 
patch and someone (possibly the same person) commits.  if no on thinks a 
book is worth promoting, no one will submit a patch.  if someone does 
submit a patch, but the community concessus is that a book is bad and 
shouldn't be mentioned on the site, then the patch won't get committed (or 
will get rolled back if the concensus is retroactive).

If each book links to a wiki page then people can be free to write 
whatever comments/opinions about books they want, even if there isn't a 
clear community concensus to withhold a book from the site.

:  It's the only book dedicated exclusively to Lucene that I'm aware of, and 
all of 

What about the JP and DE books listed on the Resource page?  from what i 
can tell, they seem to be focused entirely on Lucene.  (if the goal is to 
promote Lucene books to promote Lucene adoption we shouldn't be exclusive 
to English langauge books just because english is currently the LCD of hte 
community)

 Personal bias noted - I support putting it on the home page, and also news 
 blurbs when there is activity, like when it goes to print and is available in 
 hardcopy.

(FWIW: i have no bias here, but i still concur)


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: IndexWriter.rollback() logic

2009-02-23 Thread Chris Hostetter


: Also in the futuer please post your questions to java-dev@lucene.apache.org

I believe jason ment to type java-u...@lucene...

http://people.apache.org/~hossman/#java-dev
Please Use java-u...@lucene Not java-...@lucene

Your question is better suited for the java-u...@lucene mailing list ...
not the java-...@lucene list.  java-dev is for discussing development of
the internals of the Lucene Java library ... it is *not* the appropriate
place to ask questions about how to use the Lucene Java library when
developing your own applications.  Please resend your message to
the java-user mailing list, where you are likely to get more/better
responses since that list also has a larger number of subscribers.




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: failure in TestTrieRangeQuery

2009-02-03 Thread Chris Hostetter


: By allowing Random to randomly seed itself, we effectively test a much
: much larger space, ie every time we all run the test, it's different.  We can
: potentially cast a much larger net than a fixed seed.

i guess i'm just in favor of less randomness and more iterations.

: Fixing the bug is the easy part; discovering a bug is present is where
: we need all the help we can get ;)

yes, but knowing a bug is there w/o having any idea what it is or how to 
trigger it can be very frustrating.

it would be enough for tests to pick a random number, log it, and then use 
it as the seed ... that way if you get a failure you at least know what 
seed was used and you can then hardcode it temporarily to reproduce/debug


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: RE: Hudson Java Docs?

2009-01-26 Thread Chris Hostetter


:  I think, the outdated docs should be removed from the server to also
:  disappear from search engines.
: 
: +1

that may be easier said then done.

Each build is done in a clean workspace, and then a config option in 
hudson tells it what to copy to the main javadoc URL...

http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/javadoc/

right now we've got that configured to be trunk/build/docs/api -- which 
is the right thing to do, and as you can see copies all of the correct 
stuff, but aparently hudson isn't cleaning up old files...

https://hudson.dev.java.net/issues/show_bug.cgi?id=1000

A work arround would be the idea i remember someone suggestion earlier in 
this thread: create a splace page at trunk/build/docs/api/index.html that 
points to the other directories.  (anyone want to crank out a patch for 
this?)

Alternately, we could turn off the Publish Javadoc feature, and instead 
add trunk/build/docs/api to the list of files to Archive and then start 
refering to a URL like this (doesn't work at the moment) for all the 
javadocs...

http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/lastSuccessfulBuild/artifact/trunk/build/docs/api/

turning that Javadoc feature off should eliminate the existing Javadoc 
links in the hudson navigation, but I suspect the old files would still be 
there (and in search engine caches)




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: how often is site updated?

2009-01-22 Thread Chris Hostetter


: Wiki is updated w/ the info.  Basically, it runs nightly.  If you want it done
: more often, I can change it.

doesn't matter to me ... just wasn't sure if there was a problem since i 
didn't know when to expect it.  it all looks fine.


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

how often is site updated?

2009-01-21 Thread Chris Hostetter



According to this doc...
  http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite
...Grant's crontab is used to update /www/lucene.apache.org/java/docs 
from...

  http://svn.apache.org/repos/asf/lucene/java/site/docs

...but the wiki page isn't very explicit about how often that cron script 
runs.  I committed some changes a little over 3 hours ago, but i'm not 
seeing them on people.apache.org yet.


Grant: can you add some clarification to the wiki page with the frequency 
of the cronjob? (and if it should have updated by now check and see if 
there's a problem.)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: stored fields / unicode compression

2009-01-08 Thread Chris Hostetter


Catching up on my holiday email, I on't think there were any replies to 
this question yet.  

The low level file formats used by Lucene is an area I don't have 
time/expertise to follow carefully, but if i'm remember correctly the 
concensus is/was to more more towards pure (byte[] data, int start, int 
end) based APIs for efficiency, with String based APIs provided as 
syntactic sugar via a facade, and deprecating the existing internal gzip 
compression in favor of similar external compression facades.  So 
something like you describe could be done as is using the byte[] 
interfaces *and* be generally useful to others.

Taking a step back to look at the broader picture, this is the kind of 
thing that in Solr could be implemented as a new FieldType

: Date: Fri, 26 Dec 2008 19:00:11 -0500
: From: Robert Muir
: Subject: stored fields / unicode compression
: 
: Has there been any thoughts of using SCSU or BOCU-1 instead of UTF-8 for
: stored fields?
: Personally I don't put huge amounts of text in stored fields but these
: encodings/compression work extremely well on short strings like titles, etc.
: Removing the unicode penalty for non-latin text (i.e. cut in half) is
: nothing to sneeze at since with lots of docs my stored fields still become
: pretty huge, biggest part of the index.
: 
: I know I could use one of these schemes right now and store everything as
: bytes... but just thinking it might be something of more general use. The
: GZIP compression that is supported isn't very useful as it typically makes
: short snippets bigger...
: 
: Performance compared to UTF-8 is here... seems like a general win to me (but
: maybe I am missing something)
: http://unicode.org/notes/tn6/#Performance


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: running ant test with multiple threads/processes?

2009-01-01 Thread Chris Hostetter


: Has anyone explored ways to have ant test take advantage of concurrency?
: Since each JUnit test source (TestXXX.java) is independent, this should be
: possible.
: I'd love to have ant test test-tag run faster on an N-core machine.

I've see some attempts at a generalized solution to this in the past, but 
none of them ever seemed to successful.

manually spliting tests up into buckets and running parallel junit 
tasks for each bucket tends to be the approach many projects take.  in our 
case the first quick win might be to just add a new attribute to the 
contrib-crawl macro that says wether it can be parallelized or not, 
and then replace the sequential task with a parallel threadCount=... 
task (use a threadCount=1 for things contrib-crawls that can't be 
parallelized)

test-contrib and javadocs-contrib should be parallelizable, but 
build-contrib won't be (since some contribs depend on other contribs)


that should help some ... but if you really want to parallelize test-core, 
we would need to hardcode some N junit calls each containing a filset 
(although with some creativity we could probably dynamicly divide the 
tests up into N filesets using things like the sort, first and 
restrict resource collections)

-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

ANNOUNCE: Welcome Ryan McKinley as Contrib/Documentation Committer

2008-12-24 Thread Chris Hostetter



I'm happy to announce that in recognition of his efforts in moving 
forward with creating a spatial searching contrib (and his ongoing 
experience as both a Solr committer and PMC member) The PMC has voted 
to make Ryan McKinley a Lucene-Java Contrib and Documentation committer.


Congrats Ryan, please make sure to add yourself to the contrib committers 
list.




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Searching in same position across multiple fields

2008-12-16 Thread Chris Hostetter


: 1) Use a modified SpanNearQuery. If we assume that country + phone will always
: be one token, we can rely on the fact that the positions of 'au' and '5678' in
: Fred's document will be different.
: 
:SpanQuery q1 = new SpanTermQuery(new Term(addresscountry, au));
:SpanQuery q2 = new SpanTermQuery(new Term(addressphone, 5678));
:SpanQuery snq = new SpanNearQuery(new SpanQuery[]{q1, q2}, 0, false);
: 
: the slop of 0 means that we'll only return those where the two terms are in
: the same position in their respective fields. This works brilliantly, BUT
: requires a change to SpanNearQuery's constructor (which checks that all the
: clauses are against the same field). Are people amenable to perhaps adding
: another constructor to SNQ which doesn't do the check, or subclassing it to do
: the same (give it a protected non-checking constructor for the subclass to
: call)?

this has actually come up a couple of times over the years (i think Doug 
was the first person i ever heard suggest it) in the context of 
PhraseQuery ... the initial thought was that just removing the 
term1.field=term2.field assertion would allow something liek this to work, 
but i don't think anyone every tried creating a patch w/tests to verify 
it.

I think it would be a great idea.

: 2) It gets slightly more complicated in the case of variable-length terms. For
...
: getPositionIncrementGap -- if we knew that 'address' would be, at most, 20
: tokens, we might use a position increment gap of 100, and make the slop factor
: 50; this works fine for the simple case (yay!), but with a great many
: addresses-per-user starts to get more complicated, as the gap counts from the
: last term (so the position sequence for a single value field might be 0, 100,
: 200, but for the address field it might be 0, 1, 2, 3, 103, 104, 105, 106,
: 206, 207... so it's going to get out of sync). The simplest option here seems

couldn't this be solved by an Analyzer that counts the token per fieldname 
and implements getPositionIncrementGap as..

int result - SOME_BIG_NUM - tokensSeenMap.get(fieldname);
tokensSeenMap.put(fieldname, 0);
return result;

?


-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

1 2 3 4 5 6 >

1 - 100 of 547 matches

Mail list logo