from:"Otis Gospodnetic"

Re: updating jakarta site

2005-02-28 Thread Otis Gospodnetic

I recall somebody mentioning jakarta-site2 being locked... and moved to
SVN, if I recall correctly.  Should I be checking out 
http://svn.apache.org/repos/asf/jakarta/site/ and using that to
generate the new Lucene site docs, or the old CVS version of
jakarta-site2?

Thanks,
Otis


--- Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> I'd like to help.  So are we still in jakarta-site2 land, Toto?
> 
> Otis
> 
> --- Erik Hatcher <[EMAIL PROTECTED]> wrote:
> 
> > I'm gun shy for forging ahead without community consensus based on
> me
> > 
> > pushing too fast earlier.  I'm +1 on forging ahead with all of what
> 
> > Henri brings up.
> > 
> > Doug?  Others?
> > 
> > As Henri mentions, we need a mail and download page added to the
> > Lucene 
> > website since this will be removed from the Jakarta side of things.
>  
> > Look at ant.apache.org for an example.  Volunteers to do this?
> > 
> > Erik
> > 
> > 
> > On Feb 27, 2005, at 1:37 AM, Henri Yandell wrote:
> > 
> > > Would anyone mind me changing jakarta.apache.org to switch Lucene
> > to 
> > > TLP there?
> > >
> > > Mainly this would involve:
> > >
> > > Addition of news item concerning promotion
> > > Movement of Lucene from Subprojects to Ex-Jakarta
> > > Movement of Lucene on [EMAIL PROTECTED] page
> > > Removal of Lucene from front-page table (main content)
> > > Removal of Lucene from CVS/SVN page
> > > Removal of Lucene from Download pages
> > > Removal of Lucene from FAQ page (amongst other cleanup)
> > > Modification of http://wiki.apache.org/general/FrontPage
> > > Redirect of jakarta.apache.org/lucene to 
> > > lucene.apache.org/java/docs/index.html
> > >
> > > When you have a separate mail page, we can then:
> > >
> > > Strike-through of Lucene from mail-index
> > >
> > > Obvious other changes to the Lucene site that I assume you're
> > planning 
> > > are:
> > >
> > > Separate mail page
> > > Change Jakarta logo in top left to Apache
> > > Remove paragraph concerning Jakarta on front page
> > > Remove 'Bugs' link
> > > Change welcome to say Apache where it says Apache Jakarta.
> > >
> > > Hen
> > >
> > >
> >
> -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > 
> > 
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: updating jakarta site

2005-02-28 Thread Otis Gospodnetic

I'd like to help.  So are we still in jakarta-site2 land, Toto?

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> I'm gun shy for forging ahead without community consensus based on me
> 
> pushing too fast earlier.  I'm +1 on forging ahead with all of what 
> Henri brings up.
> 
> Doug?  Others?
> 
> As Henri mentions, we need a mail and download page added to the
> Lucene 
> website since this will be removed from the Jakarta side of things.  
> Look at ant.apache.org for an example.  Volunteers to do this?
> 
>   Erik
> 
> 
> On Feb 27, 2005, at 1:37 AM, Henri Yandell wrote:
> 
> > Would anyone mind me changing jakarta.apache.org to switch Lucene
> to 
> > TLP there?
> >
> > Mainly this would involve:
> >
> > Addition of news item concerning promotion
> > Movement of Lucene from Subprojects to Ex-Jakarta
> > Movement of Lucene on [EMAIL PROTECTED] page
> > Removal of Lucene from front-page table (main content)
> > Removal of Lucene from CVS/SVN page
> > Removal of Lucene from Download pages
> > Removal of Lucene from FAQ page (amongst other cleanup)
> > Modification of http://wiki.apache.org/general/FrontPage
> > Redirect of jakarta.apache.org/lucene to 
> > lucene.apache.org/java/docs/index.html
> >
> > When you have a separate mail page, we can then:
> >
> > Strike-through of Lucene from mail-index
> >
> > Obvious other changes to the Lucene site that I assume you're
> planning 
> > are:
> >
> > Separate mail page
> > Change Jakarta logo in top left to Apache
> > Remove paragraph concerning Jakarta on front page
> > Remove 'Bugs' link
> > Change welcome to say Apache where it says Apache Jakarta.
> >
> > Hen
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SVN error

2005-02-26 Thread Otis Gospodnetic

Thanks Garrett & Erik, the svn switch relocate worked flawlessly.  Back
to flippin' crepes...

Otis

--- Garrett Rooney <[EMAIL PROTECTED]> wrote:

> Erik Hatcher wrote:
> > You have to use https for commits.  Perhaps re-checkout completely
> with  
> > https first, though an "svn switch" might do the trick also.
> 
> If you're in your working copy you can run the command
> 
> $ svn switch --relocate http:// https://
> 
> That'll switch your working copy from http to https.  Note that when 
> you're doing a switch between branches in a repository (the usual
> use) 
> you don't want the --relocate flag, that's only used when the base
> URL 
> of the repository has changed for some reason, like when switching 
> between http and https.
> 
> -garrett
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

SVN error

2005-02-26 Thread Otis Gospodnetic

Hello,

Just tried committing Paul's Javadoc changes and got this error:

svn: Commit failed (details follow):
svn: MKACTIVITY of
'/repos/asf/!svn/act/edc733fe-07f1-0310-9206-f01fb80005d4': 403
Forbidden (http://svn.apache.org)
svn: Your commit message was left in a temporary file:
svn:   
'/home/otis/dev/repos/lucene/java/trunk/src/java/org/apache/lucene/svn-commit.tmp'


I haven't made any commits since the move to SVN, so I'm not sure if
this is an error on my end or on the ASF end.  I did ssh into
svn.apache.org where I used svnpasswd to set my SVN password.  Is there
something else I need to do?

Thanks,
Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Incubating Lucene.Net

2005-02-23 Thread Otis Gospodnetic

Yes, that looks like the one to fill out and fax.

Otis

--- George Aroush <[EMAIL PROTECTED]> wrote:

> Hi Erik,
> 
> I don't have CLA.  Is this the one:
> http://www.apache.org/licenses/icla.txt
> ?  I will read it though and fax it in the next day or so.
> 
> Regards,
> 
> -- George 
> 
> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, February 23, 2005 2:04 PM
> To: Lucene Developers List
> Subject: Re: Incubating Lucene.Net
> 
> 
> On Feb 23, 2005, at 10:55 AM, George Aroush wrote:
> 
> > Hi folks,
> > 1) Has all the required votes came in?  Are we ready for the next 
> > step?  Is there anything more that I have to do?
> 
> We're done with the votes and ready to move on.  Sorry I let that
> slip. 
>   lucene4c is at least "in progress" in the incubator now - it now is
> waiting on some infrastructure work to get the repository and access
> set up.
> 
> George - do you have a CLA on file with Apache?  If not, that would
> be a
> necessary next step to get you as a committer on the incubator
> repository.
> 
> > 2) One outstanding subject to vote/agree on is the package name. 
> Will 
> > it be dotLucene or Lucene.Net?  My pick is Lucene.Net
> 
> Lucene.Net I believe was the consensus.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: About the license in code...

2005-02-17 Thread Otis Gospodnetic

Mario,

I don't know whether there are some legal requirements that dictate
where the license should go.  Apache projects typically include the
license in the code.

On a somewhat related note - if you would like your Lucene port to
Delphi to join Lucene project in the future, please keep in mind that
the Apache Software Foundation will accept only projects released under
the ASF license.

Otis

--- "Mario Alejandro M." <[EMAIL PROTECTED]> wrote:

> I'm porting Lucene to Delphi, based in DotLucene. I have setup the
> proyect in http://sourceforge.net/projects/mutis/.
> 
> I don't full understand what i can do about the license. What are the
> limitations.
> 
> Also, i want to know if can do this in the code:
> 
> unit PhraseScorer;
> 
> //Read the license in License.txt
> 
> and not put the whole license in each unit in the proyect.
> -- 
> Mario Alejandro Montoya
> MCP
> www.solucionesvulcano.com
> !Obtenga su sitio Web dinámico!
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Invitations for Plucene, CLucene, PyLucene

2005-02-17 Thread Otis Gospodnetic

Hello,

Just wanted to let you know that I sent email to Plucene, CLucene, and
PyLucene developers and invited them to follow the steps of dotLucene
and Lucene4C and join Lucene at ASF.

Hopefully we'll see their emails on this list soon.  I will also email
Lupy developers and see if they are still interested.

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Incubating Lucene.Net

2005-02-17 Thread Otis Gospodnetic

I prefer dotLucene, because it will be less confusing for people new to
the project.  In Lucene in Action I had to explicitly mention a dead
Lucene.NET project on SourceForge, so readers wouldn't mix it with the
other one called. ah, see, I don't know which one was dead and
which one was alive.  Doesn't matter, they are both dead.  Anyhow,
dotLucene sounds better to me for this reason.

Otis

> Any thoughts on Lucene.Net/dotLucene package name are welcome.
> 
> Regards,
> 
> -- George 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Re: Incubating Lucene.Net

2005-02-17 Thread Otis Gospodnetic

+1

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> Lucene.Net has my +1.
> 
> Other PMC members please cast your vote also.
> 
> As for Garrett's concerns, it is my understanding that dotLucene is
> not  
> based the previous Lucene.NET codebase.  Though George mentions  
> Lookout, Beagle, and some other projects - are these projects using
> the  
> dotLucene codebase?  I thought that Lookout used the previous  
> Lucene.NET project.
> 
> George - could you clarify the lineage of your project and list what 
> 
> projects are using it specifically?  Also, perhaps we should stick
> with  
> calling this dotLucene for now to avoid confusion with the other  
> codebase.
> 
>   Erik
> 
> On Feb 17, 2005, at 11:14 AM, George Aroush wrote:
> 
> > Proposal for new project Lucene.Net (aka dotLucene)
> >
> > George Aroush -- [EMAIL PROTECTED]
> >
> >
> >
>
---
> 
> > -
> > 
> >
> > (0) rationale
> >
> > Lucene.Net (aka dotLucene) is a source code port of Jakarta Lucene 
> 
> > from Java
> > to C#.  The port is a one-to-one port of Lucene's high and low
> level  
> > APIs,
> > public and internal APIs, and the underlying algorithms of Lucene
> as  
> > well as
> > the index format.  Every Java file released with Jakarta Lucene is 
> 
> > ported to
> > Lucene.Net C#.  In addition, any index file generated with
> Lucene.Net  
> > is
> > 100% cross compatible with Jakarta Lucene and via versa.  Finally,
> > Lucene.Net preserves the look-and-feel of C#'s naming convention
> for
> > packages, classes, methods and documentation.
> >
> > Lucene.Net 1.4.3 is currently a six-month-old open source project,
> and  
> > is
> > now hosted at SourceForge.net and is backed by its own non-profit
> > organization.  Since Lucene.Net is already based on Jakarta Lucene
> and  
> > thus
> > uses the Apache 2.0 license is therefore an appropriate candidate
> to be
> > moved to the Apache foundation.
> >
> > I anticipate that Lucene.Net will join the recently proposed
> > search.apache.org top-level project, with Lucene and its various
> ports.
> >
> > (0.1) criteria
> >
> > Community:
> >
> > Lucene.Net has an established user community.  However, the
> development
> > community currently consists of primarily George Aroush, the
> submitter  
> > of
> > this proposal.
> >
> > Core Developers:
> >
> > Currently, Lucene.Net has one active committer, George Aroush.
> >
> > Alignment:
> >
> > Lucene.Net currently users Visual Studio.Net 2003.  In addition, it
> is  
> > being
> > used by Mono.
> >
> > (0.2) warning signs
> >
> > Orphaned products:
> >
> > Lucene.Net is not an orphan.
> >
> > Inexperience with open source:
> >
> > Lucene.Net's committers are experienced with open source.
> >
> > Homogenous developers:
> >
> > Lucene.Net's committers do not all share an employer or nation. All
> > decisions are made openly on public mailing lists.
> >
> > Reliance on salaried developers:
> >
> > Lucene.Net has no salaried developers.
> >
> > No ties to other Apache products:
> >
> > Lucene.Net has strong ties to Lucene.
> >
> > A fascination with the Apache brand:
> >
> > Lucene.Net has a strong brand already.  It has followers and
> projects  
> > based
> > on it such as Lookout, .Text, Beagle and Ascirum.
> >
> > (1) scope of the subprojects
> >
> > All code is currently licensed under the same license as Jakarta
> Lucene
> > which is Apache 2.0 license.  I have not yet signed the Contributor
>  
> > License
> > Agreements but I look forward to it.
> >
> > (3) identify the ASF resources to be created
> >
> > (3.1) mailing list(s)
> >
> > Same as Jakarta Lucene
> >
> > (3.2) Subversion or CVS repositories
> >
> > TBD
> >
> > (3.3) Jira
> >
> > TBD
> >
> > (4) identify the initial set of committers
> >
> > Same as Jakarta Lucene.
> >
> > (5) identify apache sponsoring individual
> >
> > Erik Hatcher, Doug Cutting, and Otis Gospodnetic.
> >
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Incubate lucene4c?

2005-02-17 Thread Otis Gospodnetic

+1

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> The Incubator requires the Lucene PMC vote on whether to accept the 
> lucene4c codebase.
> 
> +1 from me.
> 
> Other Lucene PMC members - please cast your vote on this thread.
> 
>   Erik
> 
> 
> Begin forwarded message:
> 
> > From: "Cliff Schmidt" <[EMAIL PROTECTED]>
> > Date: February 17, 2005 5:12:36 AM EST
> > To: 
> > Subject: RE: [PROPOSAL] Lucene4c
> > Reply-To: general@incubator.apache.org
> >
> > Garrett,
> >
> > You're right that all new code bases should come through the
> Incubator.
> > However, the appropriate PMC to vote on whether it should be
> accepted
> > into the incubator is the sponsoring PMC, which in this case
> appears to
> > be the Lucene PMC.  (The Incubator PMC does sponsor some projects,
> > usually the ones that are expected to eventually be their own TLP.)
> >
> > The Incubator PMC is responsible for supporting what comes in and 
> > voting
> > whether it is ready to graduate.  So, once the Lucene PMC votes to
> > incubate it, the Incubator PMC will help you figure out exactly
> what
> > needs to be done and will then vote on graduation.
> >
> > Hope that helps.
> >
> > Cliff
> >
> > On Monday, February 14, 2005 10:58 AM, Erik Hatcher wrote:
> >
> >> On Feb 14, 2005, at 12:36 PM, Geir Magnusson Jr wrote:
> >>>
> >>> On Feb 14, 2005, at 10:23 AM, Jim Jagielski wrote:
> >>>
>  All donated code should really go through the Incubator, even if
>  only to do the required IP checklist.
> >>>
> >>> Right.
> >>>
> >>> By my question is why doesn't this go through the Lucene project?
> >>> The Lucene PMC could bring the codebase into their project and
> >>> register the IP stuff here w/ the incubator.
> >>
> >> Is there some precedent for this?  I'm not sure what is meant by
> >> "register the IP stuff here".  Could you elaborate on what this
> >> entails.
> >>
> >> I'd gladly bring the codebase into Lucene's repository if that is
> the
> >> consensus.  It was created entirely by Garrett and he's agreed to
> >> donate it, so the IP should be pretty clear cut.
> >>
> >>Erik
> >>
> >>
> >>>
> >>> I hope I'm just misunderstanding, but this appears to be a
> proposal
> >>> to create a new project at the ASF called "Lucene4c"
> >>>
> >>> geir
> >>>
> 
>  On Feb 14, 2005, at 8:59 AM, Erik Hatcher wrote:
> 
> > I presume this codebase is substantial enough that it requires
> > incubation?  Or because it was a single developer, could he
> > contribute it directly to the Lucene project and bypass
> > incubation?
> >
> > Erik
> >
> >
> > On Feb 14, 2005, at 8:35 AM, Garrett Rooney wrote:
> >
> >> I'd like to propose the Lucene4c project for incubation.
> >>
> >> Lucene4c is a port of the Lucene search engine from Java to C,
> >> using the Apache Portable Runtime library for portability.
> >> The project is far from complete, and code to date is
> >> primarily concerned with reading an existing Lucene index,
> >> which must be created with another Lucene implementation
> >> (currently only Java Lucene has been tested).  The plan is to
> >> complete support for the rest of the index format and then
> >> move on to implementing search functionality (beyond the
> >> current proof of concept code anyway). Once we've reached
> >> that point work will begin on actual indexing functionality
> >> so that Lucene4c can stand alone, without the use of another
> >> Lucene implementation for bootstrapping.
> >>
> >> The project would be part of the new Lucene top leve project,
> >> and Erik Hatcher has offerred to serve as a sponsor.
> >>
> >> While I have yet to expand the community of developers
> >> further than myself, I am anxious to do so, and I expect to
> >> be able to draw both from people as of yet unassociated with
> >> Lucene who have expressed interest in such a project and from
> >> existing Lucene developers who have expressed interest in
> >> establishing cross-language compatibility tests for the
> >> various Lucene ports.
> >>
> >> Lucene4c already has ties to existing ASF projects,
> >> particularly Lucene itself and APR.  Bringing it into the ASF
> >> would only strengthen those ties.
> >>
> >> More details, including where to get the current release or
> >> development versions of the code can be found at the Lucene4c
> >> web site at http://electricjellyfish.net/garrett/lucene4c/
> >>
> >> -garrett
> >>
> >>
> > ---
> >> - -
> >> To unsubscribe, e-mail:
> >> [EMAIL PROTECTED] For additional
> >> commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
> 
> > - To unsubscribe, e-mail:
> > [EMAIL PROTECTED] For additional
> > commands, e-mail: [EMAIL PROT

Re: lucene.apache.org

2005-02-14 Thread Otis Gospodnetic

I'm with Garrett.  I think we do need a top level dev@ list for
discussion of cross-port issues, like index format and compatibility,
etc.  We also need per-port and per-app lists.

Otis


--- Garrett Rooney <[EMAIL PROTECTED]> wrote:

> Doug Cutting wrote:
> > Erik Hatcher wrote:
> > 
> >> I've amended my request for e-mail lists here with Doug's
> preference:
> >>
> >> http://issues.apache.org/jira/browse/INFRA-195
> > 
> > 
> > Do others agree this is the best approach?  I don't mean to be 
> > autocratic.  Do we imagine different pools of users and developers
> for 
> > different Lucene sub-projects, or one big pool for all of them?  I 
> > assume they'll be mostly disjoint.
> 
> Personally I'm in favor of separate lists for subprojects, with a 
> separate list for the top level for discussing things like changing
> file 
> formats and other stuff that more than one port would need to be 
> concerned about.  I didn't bring it up before since I'm the new kid
> on 
> the block and I didn't want to seem presumptuous.
> 
> -garrett
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [ANNOUNCE] lucene4c 0.02

2005-02-13 Thread Otis Gospodnetic

I think Garrett will have to push lucene4c through the Incubator to get
to Lucene, as you've already discussed.

I was going to wait a bit with inviting various Lucene ports to Lucene
until we have the mailing lists set up and at lucene.apache.org, to
make things a bit more tangible for people.

I'll see how it goes with Garrett and Lucene, so I can then point
people to look at lucene4c as the guiding example.

Otis


--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> Garrett,
> 
> Now that we have Subversion, are you interested in us creating an
> area 
> for lucene4c at Apache?
> 
> Does your code warrant incubation?  Or could it go right into
> Lucene's 
> repository?
> 
>   Erik
> 
> 
> On Feb 13, 2005, at 11:14 AM, Garrett Rooney wrote:
> 
> > I'm happy to announce the release of version 0.02 of Lucene4c, a
> port 
> > of Lucene to the C language using the Apache Portable Runtime.
> >
> > The primary new feature in this release is support for compressed
> file 
> > stream directories, and along with that comes a directory
> abstraction 
> > similar to that found in Java Lucene and a large number of bug
> fixes.
> >
> > This release also brings with it an actual web site, where this and
> 
> > previous versions of the code can be downloaded.
> >
> > http://electricjellyfish.net/garrett/lucene4c/
> >
> > Now that compressed file streams are supported I will likely turn
> my 
> > attention to writing code to read the remaining parts of the index 
> > format, and then continue onward towards actual useful searching 
> > functionality.
> >
> > Again, any feedback, questions, comments, or patches implementing 
> > parts of the missing functionality should be directed to me ;-)
> >
> > -garrett
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SearchBean?

2005-02-07 Thread Otis Gospodnetic

I never used SearchBean myself, but I believe people used it just for
sorting before Tim Jones added sorting to the core.  I haven't heard
anyone asking any SearchBean question since then, so I think it can
go...

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> Is the SearchBean code in the Sandbox still useful now that we have 
> sorting in Lucene 1.4?  If so, what does it offer that the core does 
> not provide now?
> 
> As I'm cleaning up the sandbox and migrating it to a "contrib" area, 
> I'm evaluating the pieces and making sure it makes sense to keep or
> if 
> it is no longer useful or should be reorganized in some way.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Study Group (WAS Re: Normalized Scoring)

2005-02-07 Thread Otis Gospodnetic

I think I see what you are after.  I'm after the same knowledge. :)

The only things that I can recommend are books:
  Modern Information Retrieval
  Managing Gigabytes

And online resources like:
  http://finance.groups.yahoo.com/group/mg/ (note the weird host name)
  http://www.sims.berkeley.edu/~hearst/irbook/

There is a pile of stuff in Citeseer, but those papers never really dig
into the details and always require solid previous knowledge of the
field.  They are no replacement for a class or a textbook.

If you find a good forum for IR, please share.

Otis


--- Kelvin Tan <[EMAIL PROTECTED]> wrote:

> Wouldn't it be great if we can form a study-group of Lucene folks who
> want to take the "next step"? I feel uneasy posting non-Lucene
> specific questions to dev or user even if its related to IR.
> 
> Feels to me like there could be a couple like us, who didn't do a
> dissertation in IR, but would like a more indepth knowledge for
> practical purposes. Basically, the end result is that we are able to
> tune or extend lucene by using the Expert api (classes marked as
> Expert). Perhaps a possible outcome is a tuning tutorial for advanced
> users who already know how to use Lucene.
> 
> What do you think?
> 
> k
> 
> On Sat, 5 Feb 2005 22:10:26 -0800 (PST), Otis Gospodnetic wrote:
> > Exactly.  Luckily, since then I've learned a bit from lucene-dev
> > discussions and side IR readings, so some of the topics are making
> > more sense now.
> >
> > Otis
> >
> > --- Kelvin Tan <[EMAIL PROTECTED]> wrote:
> >
> >> Hi Otis, I was re-reading this whole theoretical thread about
> >> idf, scoring, normalization, etc from last Oct and couldn't help
> >> laughing out loud when I read your post, coz it summed up what I
> >> was thinking the whole time. I think its really great to have
> >> people like Chuck and Paul (Eshlot) around. I'm learning so much.
> >>
> >> k
> >>
> >> On Thu, 21 Oct 2004 10:05:51 -0700 (PDT), Otis Gospodnetic wrote:
> >>
> >>> Hi Chuck,
> >>>
> >>> The relative lack of responses is not because there is no
> >>> interest, but probably because there are only a few people on
> >>> lucene-dev who can follow/understand every detail of your
> >>> proposal.  I understand and hear you, but I have a hard time
> >>> 'visualizing' some of the formulas in your proposal.  What you
> >>> are saying sounds right to me, but I don't have enough
> >>> theoretical knowledge to go one way or the other.
> >>>
> >>> Otis
> >>>
> >>>
> >>> --- Chuck Williams <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> Hi everybody,
> >>>>
> >>>> Although there doesn't seem to be much interest in this I
> >>>> have one significant improvement to the below and a couple
> >>>> observations that clarify the situation.
> >>>>
> >>>> To illustrate the problem better normalization is intended to
> >>>> address,
> >>>> in my current application for BooleanQuery's of two terms, I
> >>>> frequently
> >>>> get a top score of 1.0 when only one of the terms is matched.
> >>>> 1.0 should indicate a "perfect match".  I'd like set my UI up
> >>>> to present the
> >>>> results differently depending on how good the different
> >>>> results are (e.g., showing a visual indication of result
> >>>> quality, dropping the really bad results entirely, and
> >>>> segregating the good results from other
> >>>> only vaguely relevant results).  To build this kind of
> >>>> "intelligence" into the UI, I certainly need to know whether
> >>>> my top result matched all
> >>>> query terms, or only half of them.  I'd like to have the
> >>>> score tell me
> >>>> directly how good the matches are.  The current normalization
> >>>> does not achieve this.
> >>>>
> >>>> The intent of a new normalization scheme is to preserve the
> >>>> current scoring in the following sense:  the ratio of the nth
> >>>> result's score to
> >>>> the best result's score remains the same.  Therefore, the
> >>>> only question
> >>>> is what normalization factor to apply to all scores.  This
> >>>> reduces to a
> >>>> very speci

Re: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring

2005-02-05 Thread Otis Gospodnetic

Exactly.  Luckily, since then I've learned a bit from lucene-dev
discussions and side IR readings, so some of the topics are making more
sense now.

Otis

--- Kelvin Tan <[EMAIL PROTECTED]> wrote:

> Hi Otis, I was re-reading this whole theoretical thread about idf,
> scoring, normalization, etc from last Oct and couldn't help laughing
> out loud when I read your post, coz it summed up what I was thinking
> the whole time. I think its really great to have people like Chuck
> and Paul (Eshlot) around. I'm learning so much.
> 
> k
> 
> On Thu, 21 Oct 2004 10:05:51 -0700 (PDT), Otis Gospodnetic wrote:
> > Hi Chuck,
> >
> > The relative lack of responses is not because there is no interest,
> > but probably because there are only a few people on lucene-dev who
> > can follow/understand every detail of your proposal.  I understand
> > and hear you, but I have a hard time 'visualizing' some of the
> > formulas in your proposal.  What you are saying sounds right to me,
> > but I don't have enough theoretical knowledge to go one way or the
> > other.
> >
> > Otis
> >
> >
> > --- Chuck Williams <[EMAIL PROTECTED]> wrote:
> >
> >> Hi everybody,
> >>
> >> Although there doesn't seem to be much interest in this I have
> >> one significant improvement to the below and a couple
> >> observations that clarify the situation.
> >>
> >> To illustrate the problem better normalization is intended to
> >> address,
> >> in my current application for BooleanQuery's of two terms, I
> >> frequently
> >> get a top score of 1.0 when only one of the terms is matched.
> >> 1.0 should indicate a "perfect match".  I'd like set my UI up to
> >> present the
> >> results differently depending on how good the different results
> >> are (e.g., showing a visual indication of result quality,
> >> dropping the really bad results entirely, and segregating the
> >> good results from other
> >> only vaguely relevant results).  To build this kind of
> >> "intelligence" into the UI, I certainly need to know whether my
> >> top result matched all
> >> query terms, or only half of them.  I'd like to have the score
> >> tell me
> >> directly how good the matches are.  The current normalization
> >> does not achieve this.
> >>
> >> The intent of a new normalization scheme is to preserve the
> >> current scoring in the following sense:  the ratio of the nth
> >> result's score to
> >> the best result's score remains the same.  Therefore, the only
> >> question
> >> is what normalization factor to apply to all scores.  This
> >> reduces to a
> >> very specific question that determines the entire normalization --
> >>  what should the score of the best matching result be?
> >>
> >> The mechanism below has this property, i.e. it keeps the current
> >> score
> >> ratios, except that I removed one idf term for reasons covered
> >> earlier
> >> (this better reflects the current empirically best scoring
> >> algorithms).
> >> However, removing an idf is a completely separate issue.  The
> >> improved
> >> normalization is independent of whether or not that change is
> >> made.
> >>
> >> For the central question of what the top score should be, upon
> >> reflection, I don't like the definition below.  It defined the
> >> top score
> >> as (approximately) the percentage of query terms matched by the
> >> top scoring result.  A better conceptual definition is to use a
> >> weighted average based on the boosts.  I.e., downward propagate
> >> all boosts to the
> >> underlying terms (or phrases).  Secifically, the "net boost" of a
> >> term
> >> is the product of the direct boost of the term and all boosts
> >> applied to
> >> encompassing clauses.  Then the score of the top result becomes
> >> the sum
> >> of the net boosts of its matching terms divided by the sum of the
> >> net boosts of all query terms.
> >>
> >> This definition is a refinement of the original proposal below,
> >> and it
> >> could probably be further refined if some impact of the tf, idf
> >> and/or
> >> lengthNorm was desired in determining the top score.  These other
> >> factors seems to be harder to normalize for, although I've
> >> thought of some simple a

Re: whither sandbox

2005-02-04 Thread Otis Gospodnetic

Sounds good.  2. will force us to keep Sandbox pieces in sync with the
core.

Otis

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> So, now that we've got the sandbox in the same source tree let's
> decide 
> what we want to do with it.  I have previously argued that we should 
> make sure that sandbox code should be tagged and released in parallel
> 
> with core code (http://tinyurl.com/5d6tx).  Now that should be easy. 
> But how should we do it?
> 
> Here's my proposal:
>1. Move sandbox/contributions to src/contrib;
>2. Change build.xml to build, test & package sandbox packages too.
>3. Change sandbox build.xml's to build in a top-level
> build/contrib 
> directory, and package into a top-level dist/contrib directory, so
> that 
> no files are written in src/contrib.
> 
> Once this is done, then:
> 
>   "ant compile" will compile all core and contributed code, building 
> something like:
> 
>  build/
>classes/  -- core classes
>contrib/
>  highlighter/ -- highlighter classes
>  ...
> 
>   "ant test" will test all core and contributed code
> 
>   "ant dist" will create something like:
> 
>dist/
>  lucene-XX.tar.gz
>  lucene-XX.zip
>  contrib/
>highligher-XX.tar.gz
>...
> 
> And so on.
> 
> Does this sound like a good plan?
> 
> Doug
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [PROPOSAL] Lucene to search.apache.org

2005-02-01 Thread Otis Gospodnetic

--- George Aroush <[EMAIL PROTECTED]> wrote:

> 1) When do you think we can move dotLucene from SourceForge.net to
> lucene.apache.org?

Let's wait for Lucene itself to move first, so we don't complicate
things.  My guess is the Lucene move will take a few more weeks.

Otis

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SPANQUERY for phrase proximity search

2005-01-31 Thread Otis Gospodnetic

Hi Joaquin,

Check this:
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg05504.html

Otis

--- Joaquin Delgado <[EMAIL PROTECTED]> wrote:

> Is there any proposal to add a proper NEAR (proximity) operator to
> the
> default query language that can handle phrase proximity, implemented
> as
> SpanNearQuery?
> 
> With all the conversations about density queries and searching for
> "concepts" that appear in different fields, it just seems logical to
> treat exact phrases as single terms when the users' explicitly decide
> to
> use quotes along with unquoted terms. 
> 
> J.D.
> 
> -Original Message-
> From: Chuck Williams [mailto:[EMAIL PROTECTED] 
> Sent: Monday, January 31, 2005 6:20 PM
> To: Lucene Developers List
> Subject: RE: URL to compare 2 Similarity's ready-- Re: Scoring
> benchmark
> evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher
> problems with Similarity.docFreq() ?
> 
> Doug Cutting wrote:
>   > What did you think of my DensityPhraseQuery proposal?
> 
> It is a step in the direction of what I have in mind, but I'd like to
> go
> further.  How about a query class with these properties:
>   1.  Inputs are:
>   a.  F = list of fields
>   b.  B = list of field boosts (1:1 correspondence with F)
>   c.  T = list of terms or phrases, each either optional or
> required
>   d.  P = proximity-sloping window
>   2.  Generate matches that contain every required T in some F, and
> if
> no required T's then at least one optional T if some F.
>   3.  Score matches based on these considerations:
>   a.  Normal TermQuery and PhraseQuery scores for individual
> matches
> in individual fields.
>   b.  Boost scores for proximity of TermQuery and PhraseQuery
> matches in individual fields, based on some function of P (term
> proximity).
>   c.  Boost scores based on number of optional T's matched in at
> least one F (term diversity).
> 
> I think that meets all the objectives of my earlier posts.  I'd like
> to
> have it, and would be happy to contribute it if it sounds like the
> right
> thing.
> 
> Is there a better way?
> 
>   > If field boosting needs to then trump idf, we should be able to
> deal
>   > with that when we subsequently tune field boosting, no?  We can,
> e.g.,
>   > square the field boosts if we need.
> 
> Perhaps, but that seems to me to be a hack on top of a hack.  Current
> literature seems to consistently not square idf -- I found one
> reference
> that specifically says even Salton removed the squaring after he
> first
> proposed it a long time ago.  The simpler solution is just to remove
> the
> squaring.
> 
> Chuck
> 
>   > -Original Message-
>   > From: Doug Cutting [mailto:[EMAIL PROTECTED]
>   > Sent: Monday, January 31, 2005 3:04 PM
>   > To: Lucene Developers List
>   > Subject: Re: URL to compare 2 Similarity's ready-- Re: Scoring
> benchmark
>   > evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher
>   > problems with Similarity.docFreq() ?
>   > 
>   > Chuck Williams wrote:
>   > > That expansion is scalable, but it only accounts for proximity
> of
> all
>   > > query terms together.  E.g., it does not favor a match where t1
> and t2
>   > > are close together while t3 is distant over a match where all 3
> terms
>   > > are distant.  Worse, it would not favor a match with t1 and t2
> in
> a
>   > > short title, and t2 and t3 proximal in the content (with no
> occurrence
>   > > of t1 in the content) vs. a match with t1 and t2 in the title
> and
> t2
>   > and
>   > > t3 distant in the content.
>   > 
>   > Right.  I just mentioned this same weakness in a message replying
> to
>   > David.
>   > 
>   > >   > Is that distinct from my goal to develop an improved
>   > >   > MultiFieldQueryParser for Lucene 2.0?
>   > >
>   > > Not distinct, but I think the first step is to decide on the
> expansion
>   > > we want.  Unless somebody has a better idea, I think the best
> solution
>   > > is a new Query class that simultaneously supports multiple
> fields,
>   > term
>   > > diversity and term proximity.  It would be similar to
> SpansQuery,
> but
>   > > generalized.  It would be like BooleanQuery in the sense that
>   > individual
>   > > query clauses could be required or not.  Then, default AND
> could
> be
>   > > achieved by expanding queries to all-required.
>   > >
>   > > With this new Query class, revised versions of QueryParser and
>   > > MultiFieldQuery parser would generate it.
>   > >
>   > > Am I way off-base somewhere and/or is there a simpler approach
> to
> the
>   > > same end?
>   > 
>   > It just sounds like a lot to bite off at once.
>   > 
>   > What did you think of my DensityPhraseQuery proposal?  We could
> use
> this
>   > in place of a PhraseQuery w/ slop=infinity.  We'd need just one
> per
>   > field.
>   > 
>   > The straight boolean clauses are required for two reasons:
>   >1. To make sure that every query term appears in some field;
> and
>   >2. To reward a

Re: Indexing speed

2005-01-30 Thread Otis Gospodnetic

I believe most of the time is being spent in the Analyzer.  It should
be easy to empirically test this claim by using Field.Keyword instead
of Field.Text (Field.Keyword fields are not analyzed).  If that turns
out to be correct, then you could play with writing a custom and
optimal Analyzer.

Otis

--- Paul Smith <[EMAIL PROTECTED]> wrote:

> This relates to a previous post of mine regarding Context of 'lines'
> of 
> text (log4j events in my case):
> 
>
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg11869.html
> 
> I'm going through the process of writing quick and dirty 
> test-case/test-bed classes to validate whether my ideas are going to 
> work or not. 
> 
> For my first test, I thought I would write a quick indexer that
> indexed 
> a traditional log file by lines, with each line being a Document, so 
> that I could then search for matching lines and then do a context 
> search.   Yes this is exactly what 'grep' does and does very well,
> but I 
> thought if one was doing a lot of analysis of a log file (typical
> when 
> mentally analysing log files) it might be best to index it once, and 
> then search quickly many times.
> 
> Turns out that even using JUST a RamDirectory (which suprised me),  
> writing a Document for every line of text isn't as fast as I was
> hoping, 
> it is taking significantly longer than I hoped.  I played around with
> 
> the mergeFactor settings etc, but nothing really made much difference
> to 
> the indexing speed, other than NOT adding the Document to the
> index  
> I have tried this out on my Mac laptop, as well as a test Linux
> server 
> with no noticeable difference.  (Both scenarios have the reading log 
> file, and new index on the same physical drive, which I know is not
> the 
> _best_ setup, but still).
> 
> This could well be my own stupidness, so here's what I'm doing.
> 
> Statistics on the Log File
> =
> 
> The log file is 28meg, consisting of 409566 lines, of the form:
> 
> [2004-12-21 00:00:00,935 INFO 
> ][ommand.ProcessFaxCmd][http-80-Processor9][192.168.0.220][] 
> Finished 
> processing [mail box=stagingfax][MsgCount=0]
> [2004-12-21 00:00:00,986 INFO 
> ][ommand.ProcessFaxCmd][http-80-Processor9][192.168.0.220][] 
> Finished 
> processing [mail box=aconexnz9000][MsgCount=0]
> [2004-12-21 00:00:01,126 INFO ][ 
> monitor][http-80-Processor9][192.168.0.220][] Controller duration:
> 212ms 
> url=/Fax, fowardDuration=-1, total=212
> [2004-12-21 00:00:03,668 ERROR][essFaxDeliveryAction][Thread-157][][]
> 
> Could not connect to mail server! 
> [EMAIL PROTECTED]
> javax.mail.AuthenticationFailedException: Login failed:
> authentication 
> failure
> at
> com.sun.mail.imap.IMAPStore.protocolConnect(IMAPStore.java:330)
> at javax.mail.Service.connect(Service.java:233)
> at javax.mail.Service.connect(Service.java:134)
> at 
>
com.aconex.fax.action.ProcessFaxDeliveryAction.perform(ProcessFaxDeliveryAction.java:68)
> at 
>
com.aconex.scheduler.automatedTasks.FaxOutDeliveryMessageProcessorAT.run(FaxOutDeliveryMessageProcessorAT.java:62)
> 
> 
> ==
> Source code for test-bed:
> ==
> 
> public class TestBed1 {
> 
> public static void main(String[] args) throws Exception {
>
> if(args.length <1) throw new IllegalArgumentException("not 
> enough args");
> String filename = args[0];
>
> File file = new File(filename);
> Analyzer a = new SimpleAnalyzer();
>
> String indexLoc = "/tmp/testbed1/";
>
> //IndexWriter writer = new IndexWriter(indexLoc, a, true);
>
> RAMDirectory ramDir = new RAMDirectory();
> IndexWriter ramWriter = new IndexWriter(ramDir, a, true);
>
> long length = file.length();
>
> BufferedReader fileReader = new BufferedReader(new 
> FileReader(file));
>
> String line = "";
> double processed = 0;
> NumberFormat nf = NumberFormat.getPercentInstance();
> nf.setMaximumFractionDigits(0);
>
> String percent = "";
> String lastPercent = " ";
> long lines =0;
> while ((line = fileReader.readLine())!=null) {
> Document doc = new Document();
> doc.add(Field.UnStored("Line", line) );
> ramWriter.addDocument(doc);
> processed +=line.length();
> lines++;
> percent = nf.format(processed/length);
> if (!percent.equals(lastPercent)){
> lastPercent = percent;
> System.out.println(percent + "(lines=" + lines +
> ")");
> }
> }
> //writer.optimize();
> //writer.close();
>
>
> }
> }
> 
> ===
> 
> I did other simple tests by testing exactly how long it takes Java to
> 
> just read the lines of the file, and that is mega quick in
> comparison.  
> It's actually the "r

RE: -> Grouping Search Results by Clustering Snippets:

2005-01-28 Thread Otis Gospodnetic

This is very much of interest to me.  Although it's not in the UI, I
did integrate Lucene and Carrot2 in Simpy ( http://www.simpy.com ). 
Clustering is currently triggered only by a search.  Although you may
not be able to tell (again, sucky UI) Simpy is designed in a way that
will let me hook in a recommender system, much like you describe it. 
Users store links into their Simpy accounts, they tag them, perform
searches, find other users, add them to their Topics (Simpy-specific
thing), and so on, so there is a lot of knowledge about a user that can
be derived from all that.  Currently, the only quasi-smart thing that
goes beyond a simple search is 'More users like this', and even that
has a small bug that I need to fix for the next release, but what you
are describing sounds very much like one of the directions in which I
want to take Simpy and its users. :)

Otis


--- Adam Saltiel <[EMAIL PROTECTED]> wrote:

> This has been implemented in open source, but not with lucene?
> http://www.cs.put.poznan.pl/dweiss/carrot/
> and
> http://carrot2.sourceforge.net/
> David Weiss is a Polish academic at Poznan University, Poland. He and
> others have implemented a servlet based web app that uses pipe lined
> components that communicate using http and implement a couple of
> clustering algorithms.
> Clustering, of course, can go way beyond search result presentation
> and
> there are some very suggestive examples at
> http://www.sics.se/humle/socialcomputing/
> Where the encore project (Martin Svennson) is based on orthogonal
> transformations of a large sparse matrix (a possible method for
> matrix
> dimension reduction). I think it would be interesting to hook a
> recommender system into lucene, thus clustering would take place on
> the
> basis of user profile which may be built up automatically by
> accumulating clicks and comparing to other visitors, with some
> intelligent weighting to node inputs.
> This calls into question what really a search is, does it have to be
> instigated by the user or might their context and history suggest
> enough
> to pull in additional material? So this would be on top of snippets
> and
> also influence what snippets are returned as well as their
> presentation.
> Coller still would be to be able to recognise the user without a
> login.
> This might be implemented with cookies, but to deal with the user in
> terms of types of interests, a series of faceted profiles, so that
> portals could become fluidly dynamic. Sounds far flung, but I
> actually
> think it is just round the corner.
> Let me know if this is of interest.
> 
> Adam
> 
> > -Original Message-
> > From: integer [daniel prawdzik] [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, January 26, 2005 5:17 PM
> > To: lucene-dev@jakarta.apache.org
> > Subject: -> Grouping Search Results by Clustering Snippets:
> >
> > Grouping Search Results by Clustering Snippets:
> >
> > The presentation of search engines are typically long unsorted
> lists
> of
> > results. To find the page youre looking for, is often
> time-consuming
> > and unsatisfying.
> > Showing the results in groups by similar  topics is a quite more
> > suitable solution to give an user a quick overview over the
> results.
> > This can be done by a technology called cluster analysis. Actually
> Im
> > working on my diploma master thesis about this topic. In my
> > understanding, its too nice to be born for the archive, so I want
> to
> > implement this feature in an opensource software. The coding of
> this
> > programm already gone pretty far, Ive got some tests done and the
> > results are impresive and might still get better [you can see some
> > results on http://www.trist.de/CV/Text-Mining/ -> sorry, only in
> german]
> >
> > To make a long story short:
> > Im wondering, if this is an attractive feature for the lucene
> > community?
> >
> > regards,
> > integer
> >
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: -> Grouping Search Results by Clustering Snippets:

2005-01-26 Thread Otis Gospodnetic

Herr Integer ;)

Yes it is - very interesting!
We are working on establishing lucene.apache.org - Lucene as a top
level Apache project, which could serve as a good home for projects
like yours.

If you could remind us after we make the lucene.apache.org move, we
could try to get your project in there.

Otis


--- "integer [daniel prawdzik]" <[EMAIL PROTECTED]> wrote:

> Grouping Search Results by Clustering Snippets:
> 
> The presentation of search engines are typically long unsorted lists
> of
> results. To find the page youre looking for, is often time-consuming
> and unsatisfying. 
> Showing the results in groups by similar  topics is a quite more
> suitable solution to give an user a quick overview over the results.
> This can be done by a technology called cluster analysis. Actually
> Im
> working on my diploma master thesis about this topic. In my
> understanding, its too nice to be born for the archive, so I want to
> implement this feature in an opensource software. The coding of this
> programm already gone pretty far, Ive got some tests done and the
> results are impresive and might still get better [you can see some
> results on http://www.trist.de/CV/Text-Mining/ -> sorry, only in
> german]
> 
> To make a long story short: 
> Im wondering, if this is an attractive feature for the lucene
> community?
> 
> regards,
> integer
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [PROPOSAL] Lucene to search.apache.org

2005-01-17 Thread Otis Gospodnetic

I was imagining non-Lucene projects under ir.apache.org.  For instance,
a project that clusters documents or a project that does language
recognition, or maybe a POS tagger, etc.  All this is related to IR and
to Lucene, but it doesn't necessarily use Lucene.

Lucene is a nice name and has a good brand already, so we could go with
it, if we think it will make sense to host projects like the ones above
under Lucene.  Personally I'd like to see a bit of separation, so we
can avoid confusion between Lucene the java searching library, and
Lucene the ASF project.

Otis


--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> Maybe we should just call it lucene.apache.org, and move the current 
> Lucene project to lucene.apache.org/java?  The other projects we
> imagine 
> adding (Nutch, DotLucene, CLucene, etc.) are all Lucene-related, no? 
> Lucene has a pretty good brand name...
> 
> Doug
> 
> Otis Gospodnetic wrote:
> > ir.apache.org is what I was thinking, too.  +1 for IR from me. 
> It's
> > broad enough to serve as a home for other related projects, not
> just
> > the initial group of them.
> > 
> > Otis
> > 
> > --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > 
> > 
> >>Scott Ganyo wrote:
> >>
> >>>Not especially creative, but "index.apache.org" looks to be
> >>
> >>available.
> >>
> >>>S
> >>>
> >>>On Jan 17, 2005, at 3:29 AM, Erik Hatcher wrote:
> >>>
> >>>
> >>>>Looks like we should consider alternate names.  Suggestions??
> >>
> >>ir.apache.org
> >>
> >>(not Infra-Red, but Information Retrieval)
> >>
> >>-- 
> >>Best regards,
> >>Andrzej Bialecki
> >>  ___. ___ ___ ___ _ _   __
> >>[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> >>___|||__||  \|  ||  |  Embedded Unix, System Integration
> >>http://www.sigram.com  Contact: info at sigram dot com
> >>
> >>
>
>>-
> >>To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> > 
> > 
> > 
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [PROPOSAL] Lucene to search.apache.org

2005-01-17 Thread Otis Gospodnetic

ir.apache.org is what I was thinking, too.  +1 for IR from me.  It's
broad enough to serve as a home for other related projects, not just
the initial group of them.

Otis

--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> Scott Ganyo wrote:
> > Not especially creative, but "index.apache.org" looks to be
> available.
> > 
> > S
> > 
> > On Jan 17, 2005, at 3:29 AM, Erik Hatcher wrote:
> > 
> >> Looks like we should consider alternate names.  Suggestions??
> 
> ir.apache.org
> 
> (not Infra-Red, but Information Retrieval)
> 
> -- 
> Best regards,
> Andrzej Bialecki
>   ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fwd: [PROPOSAL] Lucene to search.apache.org

2005-01-14 Thread Otis Gospodnetic

I didn't see this email until now.  I replied to the PMC list earlier
instead.  There are several people who, I think, didn't actually
contribute to Lucene on
http://jakarta.apache.org/lucene/docs/whoweare.html .

# Eugene Gluzberg (drag0n at apache.org) (but I worked with him and
remember him fixing some bugs ... one had to do with indexing the very
last character in a line of text or some such... 

# Matt Tucker (mtucker at apache.org) (but I thoughy he contributed a
piece of code that had to do with re-trying deletes of one of the index
files under Winblows)

# Cory Hubert (clhubert at apache.org)
# Dave Kor (davekor at apache.org)

# Tal Dayan (zapta at apache.org)


Otis


--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> Can anyone comment on Joshua Bloch's involvement with Lucene's
> codebase 
> and the issue Sam brings up below??
> 
> This came up as a side topic to my proposal to the Jakarta PMC of
> bring 
> Lucene to the top-level.
> 
>   Erik
> 
> 
> Begin forwarded message:
> 
> > From: Sam Ruby <[EMAIL PROTECTED]>
> >
> 
> > ... here's a license issue for the proposed new PMC to grapple
> with: 
> > apparently Josua Bloch is listed as a contributor.  He originally 
> > worked for Sun, but now works for Google.  According to him, he
> "did 
> > not knowingly contribute any code to Lucene".  Nor can I find him 
> > participating in either the user or developer mailing lists for 
> > Lucene.
> >
> > The code in question apparently is related to "array utilities",
> and 
> > he apparently did contribute to java.util.Arrays, but that code 
> > belongs to Sun and while the source is published, it is neither
> open 
> > source nor compatible with the Apache Software License.
> >
> > I'm still trying to track down more details, but if anybody can 
> > provide any insight into how the contribution was actually made, I 
> > would appreciate it.
> >
> > - Sam Ruby
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: what if the IndexReader crashes, after delete, before close.

2005-01-11 Thread Otis Gospodnetic

+1!

Daniel's suggestion for dealing with inevitable complaints and
pushbacks sounds like a reasonable template answer to use. :)

Otis

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> Sigh.  This stuff would get a lot simpler if we were able to use Java
> 
> 1.4's FileLock.  Then locks would be automatically cleared by the OS
> if 
> the JVM crashes.
> 
> Should we upgrade the JVM requirements to 1.4 for Lucene's 1.9/2.0 
> releases and update the locking code?
> 
> Doug
> 
> Luke Shannon wrote:
> > Here is how I handle it.
> > 
> > The Indexer is a Runnable. All the members it uses are static. The
> run()
> > method calls a syncronized method called go(). This kicks off the
> indexing.
> > 
> > Before you even get to here, the method in the CMS code that
> created the
> > thread object and instaniated the index is also sychronized.
> > 
> > Here is the code that handles the potential lock file that may be
> left
> > behind from a Reader or Writer.
> > 
> > Note: I found I had to check if the index existed before checking
> if it was
> > locked. If I checked if it was locked and the index had not been
> created yet
> > I got an error.
> > 
> > //if we have gotten to hear that this is the only index running.
> > //the index should not be locked. if it is the lock is "stale"
> > //and must be released before we can continue
> > try {
> > if (index.exists() && IndexReader.isLocked(indexFileLocation)) {
> > Trace.ERROR("INDEX INFO: Had to clear a stale index lock");
> > IndexReader.unlock(FSDirectory.getDirectory(index, false));
> > }
> > } catch (IOException e3) {
> > Trace.ERROR("INDEX ERROR: IMPORTANT. Was unable to clear a stale
> index lock:
> > " + e3);
> > }
> > 
> > HTH
> > 
> > Luke
> > 
> > - Original Message - 
> > From: "Peter Veentjer - Anchor Men" <[EMAIL PROTECTED]>
> > To: "Lucene Users List" 
> > Sent: Tuesday, January 11, 2005 3:24 AM
> > Subject: RE: what if the IndexReader crashes, after delete, before
> close.
> > 
> > 
> > 
> > 
> > -Oorspronkelijk bericht-
> > Van: Luke Shannon [mailto:[EMAIL PROTECTED]
> > Verzonden: maandag 10 januari 2005 15:46
> > Aan: Lucene Users List
> > Onderwerp: Re: what if the IndexReader crashes, after delete,
> before
> > close.
> > 
> > 
> > 
> >>>One thing that will happen is the lock file
> >>>will get left behind. This means when you start
> >>>back up and try to create another Reader you will
> >>>get a file lock error.
> > 
> > 
> > I have figured out that part the hard way ;) Why can`t I access my
> index
> > anymore?? Ahh.. The lock file
> > 
> > 
> >>>Our system is threaded and synchronized.
> >>>Thus when a Reader is being created I know
> >>>it is the only one (the Writer comes after
> >>>the reader has been closed). Before creating
> >>>it I check if the Index is locked. If it is,
> >>>I forcefully clear it. This prevents the above
> >>>problem from happening.
> > 
> > 
> > You can have more than 1 reader open at anytime. Even while a
> delete or
> > add is in progress. But you can`t use a reader where documents are
> > deleted (IndexReader) and added(IndexWriter) at the same time. If
> you
> > don`t have other threads doing delete/add you won`t have to
> synchronize
> > anything.
> > 
> > And how do you synchronize on it? I have applied the ReadWriteLock
> From
> > Doug Lea`s concurrency library after I have build my own
> > synchronization brick and somebody pointed out that I was
> implementing
> > the ReadWriteLock. But at the moment I don`t do any
> synchronization.
> > 
> > And I want to have a component that is executed if the system is
> started
> > and knows that to do if there is rubbish in the index directory. I
> want
> > that component to restore my index to a usable version (and even
> small
> > loss of information is acceptable because everything is checked
> once and
> > a while. And user-added-information is going to be stored in the
> > database. So nothing gets lost. The index can be rebuild..
> > 
> > 
> > 
> > 
> > Luke
> > 
> > - Original Message -
> > From: "Peter Veentjer - Anchor Men" <[EMAIL PROTECTED]>
> > To: 
> > Sent: Saturday, January 08, 2005 4:08 AM
> > Subject: what if the IndexReader crashes, after delete, before
> close.
> > 
> > 
> > What happens to the Index if the IndexReader crashes, after I have
> > deleted
> > documents, and before I have called close. Are the deletes ignored?
> Is
> > the
> > Index screwed up? Is the filesystem screwed up (if a document is
> deleted
> > new
> > delete-files appear) so are the delete-files still there (and can
> these
> > be
> > ignored the next time?). Can I restore the index to the previous
> state,
> > just
> > by removing those delete-files?
> > 
> > 
> > 
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > 
> > 
> > 
> >
> -
> > To unsubscribe, e-m

Re: Lucene site generation

2005-01-02 Thread Otis Gospodnetic

Probably, yes.  How do other TLPs generate their web sites?  Maven? 
Forrest?  Custom scripts?  Is it all up to each individual TLP?  Are
there any published, standardizes, tried templates, scripts, setups,
practices that you know of?

Thanks,
Otis

--- Henri Yandell <[EMAIL PROTECTED]> wrote:

> The Lucene site generation appears to be dependent on jakarta-site2.
> I
> assume you'll want to fix this when you move to TLP (or before), as
> it
> seems an unnecessary tie.
> 
> Hen
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: cvs commit: jakarta-lucene/xdocs/stylesheets project.xml

2004-12-31 Thread Otis Gospodnetic

I see.  I think I didn't use to have an account on www.apache.org (as
opposed to cvs.apache.org), but it looks like I can ssh in now.

However, we have permission problems there (and I remember seeing
related messages on lucene-dev a few times before).

It looks like we are all members of 'apcvs' group, so we should ensure
to always chmod -R g+w *

Erik & Doug: could you chmod -R g+w * from the top level?

Otis


-bash-2.05b$ cvs -q up -dP
? lucene.eps
? docs/api
? docs/lucene-sandbox/snowball
cvs [update aborted]: cannot make directory
src/java/org/apache/lucene/analysis/ru: No such file or directory

-bash-2.05b$ ll src/java/org/apache/lucene/analysis/ | head -4
total 66
-rw-r--r--  1 ehatcher  jakarta   2031 Mar 29  2004 Analyzer.java
drwxr-xr-x  2 ehatcher  jakarta512 Dec 30 23:47 CVS/
-rw-r--r--  1 ehatcher  jakarta   2748 Nov  9 07:12 CharTokenizer.java

-bash-2.05b$ groups ehatcher
ehatcher apcvs jakarta apmember apsite ant

-bash-2.05b$ groups otis
otis apcvs jakarta



--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> On Dec 30, 2004, at 11:27 PM, Otis Gospodnetic wrote:
> > Could somebody with the right privileges update the site?  It looks
> > like the ASF infrastructure is changing a bit, so I had to
> > update/remove links to Nagoya.
> 
> Done.  I don't think you need any special privileges.  ssh into 
> www.apache.org, cd /www/jakarta.apache.org/lucene and cvs -q up -dP 
> (the repo is checked out anonymously there).
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene Wiki auth problems

2004-12-30 Thread Otis Gospodnetic

Hello,

Can somebody share the 'How to log into Lucene Wiki' secret?  I don't
get that Wiki's authentication and authorization logic.  When I log in,
I still see some pages as 'Immutable' (bottom left, where the Edit link
is supposed to be).  Is it me, or is that Wiki buggy?

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene jGuru FAQ

2004-12-30 Thread Otis Gospodnetic

I agree with Erik regarding jGuru.  It wouldn't even be right of us/me
to ask them to remove that content.  I also mentioned that we should
give credit to jGuru out of courtesy, and since I don't see it on the
FAQ page anywhere, I'll add it.

And I agree with Daniel regarding that ancient FAQ.

Otis

--- Daniel Naber <[EMAIL PROTECTED]> wrote:

> On Tuesday 21 December 2004 21:09, Otis Gospodnetic wrote:
> 
> > Hm, mailing list sw doesn't like messages >100K in size nor ZIP
> > attachments.  I'm sending the FAQ to Daniel directly.
> 
> As all FAQ items from the jGuru FAQ are now copied to the new FAQ, is
> it 
> possible to replace the FAQ at jGuru with a link to the Wiki FAQ? I
> fear 
> that if we don't remove the old FAQ people will still find it and
> consider 
> it up-to-date.
> 
> Similar for the FAQ at sourceforge: Doug, could you delete it and add
> a 
> link to the new FAQ instead? (Or make me an administrator for the
> "lucene" 
> project at sourceforge and I'll do so. My Sourceforge user name is 
> dnaber.)
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: cvs commit: jakarta-lucene/xdocs/stylesheets project.xml

2004-12-30 Thread Otis Gospodnetic

Could somebody with the right privileges update the site?  It looks
like the ASF infrastructure is changing a bit, so I had to
update/remove links to Nagoya.

Otis

--- [EMAIL PROTECTED] wrote:

> otis2004/12/30 20:24:57
> 
>   Modified:docs benchmarks.html contributions.html demo.html
> demo2.html demo3.html demo4.html
> fileformats.html
> gettingstarted.html index.html
> luceneplan.html
> queryparsersyntax.html resources.html
> whoweare.html
>docs/lucene-sandbox index.html
>xdocs/stylesheets project.xml
>   Log:
>   - s/nagoya/mail-archives/
>   
>   Revision  ChangesPath
>   1.24  +2 -2  jakarta-lucene/docs/benchmarks.html
>   
>   Index: benchmarks.html
>   ===
>   RCS file: /home/cvs/jakarta-lucene/docs/benchmarks.html,v
>   retrieving revision 1.23
>   retrieving revision 1.24
>   diff -u -r1.23 -r1.24
>   --- benchmarks.html 30 Dec 2004 21:50:42 -  1.23
>   +++ benchmarks.html 31 Dec 2004 04:24:55 -  1.24
>   @@ -97,9 +97,9 @@
>
>
href="http://issues.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=Importance";>Lucene
> Bugs
>
>   -
href="http://nagoya.apache.org/eyebrowse/SummarizeList?listId=30";>Lucene-user
>   +
href="http://mail-archives.apache.org/eyebrowse/SummarizeList?listId=30";>Lucene-user
>
>   -
href="http://nagoya.apache.org/eyebrowse/SummarizeList?listId=29";>Lucene-dev
>   +
href="http://mail-archives.apache.org/eyebrowse/SummarizeList?listId=29";>Lucene-dev
>
>Lucene
> Sandbox
>
>   
>   
>   
>   1.47  +2 -2  jakarta-lucene/docs/contributions.html
>   
>   Index: contributions.html
>   ===
>   RCS file: /home/cvs/jakarta-lucene/docs/contributions.html,v
>   retrieving revision 1.46
>   retrieving revision 1.47
>   diff -u -r1.46 -r1.47
>   --- contributions.html  30 Dec 2004 21:50:42 -  1.46
>   +++ contributions.html  31 Dec 2004 04:24:55 -  1.47
>   @@ -101,9 +101,9 @@
>
>
href="http://issues.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=Importance";>Lucene
> Bugs
>
>   -
href="http://nagoya.apache.org/eyebrowse/SummarizeList?listId=30";>Lucene-user
>   +
href="http://mail-archives.apache.org/eyebrowse/SummarizeList?listId=30";>Lucene-user
>
>   -
href="http://nagoya.apache.org/eyebrowse/SummarizeList?listId=29";>Lucene-dev
>   +
href="http://mail-archives.apache.org/eyebrowse/SummarizeList?listId=29";>Lucene-dev
>
>Lucene
> Sandbox
>
>   
>   
>   
>   1.31  +2 -2  jakarta-lucene/docs/demo.html
>   
>   Index: demo.html
>   ===
>   RCS file: /home/cvs/jakarta-lucene/docs/demo.html,v
>   retrieving revision 1.30
>   retrieving revision 1.31
>   diff -u -r1.30 -r1.31
>   --- demo.html   30 Dec 2004 21:50:42 -  1.30
>   +++ demo.html   31 Dec 2004 04:24:55 -  1.31
>   @@ -97,9 +97,9 @@
>
>
href="http://issues.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=Importance";>Lucene
> Bugs
>
>   -
href="http://nagoya.apache.org/eyebrowse/Summ

Re: CFS file and file formats

2004-12-30 Thread Otis Gospodnetic

Hello,

I understand the technical reason for main() there, but logically this
belongs to an external utility class, I think.

Otis


--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> hi,
> 
> i already had a look at Garrett's implementation. I made some smaller
> 
> changes to improve the performance when extracting the files from the
> 
> compound. All tests work fine and the index is usable after
> extraction. 
> The new functionality is added as a public static void main () to 
> CompoundFileReader because of the reduced visibility (package) of 
> CompoundFileReader itself. It will be committed this afternoon.
> 
> Bernhard
> 
> > Doug Cutting wrote:
> >
> >> It would be useful to have a command-line utility (i.e., a static 
> >> main(String[]) method somewhere) that lists the files and sizes 
> >> contained inside a CFS file, and perhaps even an option to unpack
> it. 
> >> Anyone care to contribute this method?
> >
> >
> > Here's a diff to add this functionality to CompoundFileReader.  
> > Comments are of course welcome, as I'm not that fantastic a Java
> hacker.
> >
> > -garrett
> >
>
>
> >
> >Index: src/java/org/apache/lucene/index/CompoundFileReader.java
> >===
> >RCS file:
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/CompoundFileReader.java,v
> >retrieving revision 1.14
> >diff -u -r1.14 CompoundFileReader.java
> >--- src/java/org/apache/lucene/index/CompoundFileReader.java 28 Sep
> 2004 18:15:52 -   1.14
> >+++ src/java/org/apache/lucene/index/CompoundFileReader.java 25 Dec
> 2004 04:25:11 -
> >@@ -17,12 +17,14 @@
> >  */
> > 
> > import org.apache.lucene.store.Directory;
> >+import org.apache.lucene.store.FSDirectory;
> > import org.apache.lucene.store.IndexInput;
> > import org.apache.lucene.store.BufferedIndexInput;
> > import org.apache.lucene.store.IndexOutput;
> > import org.apache.lucene.store.Lock;
> > import java.util.HashMap;
> > import java.io.IOException;
> >+import java.io.FileOutputStream;
> > 
> > 
> > /**
> >@@ -233,5 +235,61 @@
> > }
> > 
> > 
> >+}
> >+
> >+public static void main(String [] args) {
> >+String dirname = null, filename = null;
> >+boolean extract = false;
> >+
> >+for (int i = 0; i < args.length; ++i) {
> >+if (args[i].equals("-extract")) {
> >+extract = true;
> >+} else if (dirname == null) {
> >+dirname = args[i];
> >+} else if (filename == null) {
> >+filename = args[i];
> >+}
> >+}
> >+
> >+if (dirname == null || filename == null) {
> >+System.out.println("Usage: CompoundFileReader directory
> cfsfile");
> >+System.out.println("");
> >+System.out.println("Prints the filename and size of
> each file "
> >+   + "within cfsfile.");
> >+System.out.println("");
> >+System.out.println("Add the -extract flag to extract
> files to the "
> >+   + "current working directory.");
> >+
> >+return;
> >+}
> >+
> >+try {
> >+Directory dir = FSDirectory.getDirectory(dirname,
> false);
> >+
> >+CompoundFileReader cfr = new CompoundFileReader(dir,
> filename);
> >+
> >+String [] files = cfr.list();
> >+
> >+for (int i = 0; i < files.length; ++i) {
> >+long len = cfr.fileLength(files[i]);
> >+
> >+System.out.println(files[i] + "\t: " + len + "
> bytes");
> >+
> >+if (extract) {
> >+IndexInput ii = cfr.openInput(files[i]);
> >+
> >+FileOutputStream f = new
> FileOutputStream(files[i]);
> >+
> >+while (len-- != 0) {
> >+byte b = ii.readByte();
> >+f.write(b);
> >+}
> >+
> >+f.close();
> >+}
> >+}
> >+} catch (IOException ioe) {
> >+ioe.printStackTrace();
> >+}
> > }
> > }
> >
> >  
> >
>
>
> >
>
>-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ANN: Lucene benchmark tool

2004-12-21 Thread Otis Gospodnetic

Hi Andrzej,

What version of commons-compress is that?  There's no version
information in the Manifest.  I just want to include the version in the
Jar file name before putting it in CVS.

Thanks,
Otis

--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> Hi there,
> 
> After recent discussions on the speed of indexing/searching using 
> different parameters it became even clearer that we need a
> comprehensive 
> and repeatable benchmark.
> 
> I created a class which represents my first hack at benchmarking
> various 
> aspects of Lucene, using a range of different parameters. Since it
> uses 
> a standard, well-defined document collection, I hope that its results
> 
> should be more or less meaningful across different OS/hardware
> combinations.
> 
> I had a look at JUnitPerf, but found the API to be too limited for 
> collecting complex time-series data, so I basically rolled my own 
> benchmarking framework... If you know a better way to do it, I'm all
> ears.
> 
> I'm going to package it into a self-running application (WebStart?),
> but 
> for now you can try to compile and run it yourself. You can get it
> here:
> 
>   http://www.getopt.org/lb/LuceneBenchmark.java
> 
> It depends on the commons-compress.jar, specifically on the Tar 
> functionality. This JAR is in commons-sandbox, so it may not be
> readily 
> available - in that case you can get it here:
> 
>   http://www.getopt.org/lb/commons-compress.jar
> 
> (I will put an index page there, but for now use these direct links).
> 
> CAVEAT: please NOTE WELL that this benchmark runs at 100% CPU and
> 100% 
> disk I/O for SEVERAL HOURS even on a modern equipment (partial
> results 
> are printed on System.out from time to time). You have been warned -
> so 
> don't send me any fried mobo's or melted drives for repairs, ok?
> 
> You can cut down the number of input parameters to reduce the overall
> 
> time, or use the mini* document collection (but this reduces the
> number 
> of documents in index). See the comments in source.
> 
> Comments and patches are welcome!
> 
> -- 
> Best regards,
> Andrzej Bialecki
> 
> -
> Software Architect, System Integration Specialist
> CEN/ISSS EC Workshop, ECIMF project chair
> EU FP6 E-Commerce Expert/Evaluator
> -
> FreeBSD developer (http://www.freebsd.org)
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene jGuru FAQ

2004-12-21 Thread Otis Gospodnetic

Hm, mailing list sw doesn't like messages >100K in size nor ZIP
attachments.  I'm sending the FAQ to Daniel directly.

Otis

---
Hello Daniel & others,

Here is the Lucene jGuru FAQ in XML, zipped.  I checked with Terence
Parr and Tom Burns (jGuru) and we are free to use this to get our own
FAQ
started.  It would be nice to give jGuru credit in the new FAQ, though.

Otis
P.S.
If the attachment gets stripped by the mailing list software, I'll send
it to Daniel directly.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [PROPOSAL] search.apache.org

2004-12-21 Thread Otis Gospodnetic

You are right, no need to add Nutch to the proposal then.

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> 
> On Dec 20, 2004, at 9:43 PM, Otis Gospodnetic wrote:
> > Sounds ok to me.  There is no mention of Nutch, though.  If Nutch
> is
> > going through Incubator with its own proposal, maybe we need to say
> > that.
> > Let's give time developers in other time zones to add themselves to
> the
> > list.
> 
> Sam Ruby recommended that we not emphasize making search.apache.org
> an 
> umbrella project:
> 
> Sam Ruby had the following advice:
> 
> >> The board has a bias against self-referential definitions (the
> 'foo' 
> >> project is for managing software related to the 'foo' project - 
> >> believe it or not, this happens all too often).  So naming the 
> >> proposed project something like search is a good idea.
> >>
> >> The board tends to prefer projects whose scopes don't overlap with
> 
> >> other projects.  That does not appear to be a problem here.  In
> other 
> >> cases, this involves highlighting differences in technical
> approaches 
> >> taken by two "competing" projects.
> >>
> >> Finally, some members of the board have a strong bias against 
> >> umbrella projects.  My advice here is to not emphasize this aspect
> of 
> >> the proposal.  Overall, the fact that this reduces the size of the
> 
> >> Jakarta project umbrella, those with this bias will be happy.
> 
> While I'm confident that Nutch will be accepted for incubation and
> then 
> migrate out of incubation, this is not a done deal.  We have a 
> compelling reason to bring Lucene to TLP without Nutch.  Nutch has a 
> compelling reason to incubate without worrying about its future home 
> (which theoretically could be under Jakarta or Lucene (though of
> course 
> search.apache.org is where its aimed).
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [PROPOSAL] search.apache.org

2004-12-20 Thread Otis Gospodnetic

Sounds ok to me.  There is no mention of Nutch, though.  If Nutch is
going through Incubator with its own proposal, maybe we need to say
that.
Let's give time developers in other time zones to add themselves to the
list.

Otis


--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> I created a draft proposal here:
> 
>   http://wiki.apache.org/jakarta-lucene/TopLevelProposal
> 
> I placed Doug as Chair and the committers I could see handy in my 
> Lucene e-mail folder as initial PMC members.  Committers - feel free
> to 
> add or remove yourself from that list.  Henri - do you want to be on 
> the PMC also?   The PMC is defined as:
> 
> "Each Project Management Committee shall be responsible for the
> active 
> management of one or more projects identified by resolution of the 
> Board of Directors which may include, without limitation, the
> creation 
> or maintenance of "open-source" software for distribution to the
> public 
> at no charge. Subject to the direction of the Board of Directors, the
> 
> chairman of each Project Management Committee shall be primarily 
> responsible for project(s) managed by such committee, and he or she 
> shall establish rules and procedures for the day to day management of
> 
> project(s) for which the committee is responsible."
> 
> Let's discuss and modify this proposal soon.  Once this has been
> agreed 
> upon I'll send this to the Jakarta PMC.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: do we need two FAQs?

2004-12-20 Thread Otis Gospodnetic

You are correct about the commercial aspect of jGuru - I didn't think
of that.  No, there is no conflict of interest (adding FAQ entries is
not particularly exciting :)).  No, I'm not the only one thinking Wiki
is not the best choice.  However, it looks like people are leaning a
bit more towards the Wiki, so Wiki it is.  I'll try to get the XML dump
from jGuru, so we don't lose the effort.

Otis
P.S.
I have a premium jGuru account, so I can search (Daniel mentioned one
can't search th FAQa) and I don't see the ads.  Again, I forgot the
regular vs. premium differences.


--- Steven Rowe <[EMAIL PROTECTED]> wrote:

> I find it odd that the "official" FAQ for this important open source 
> project is maintained at an external commercial website.
> 
> When I first visited the jGuru FAQ a couple of years ago, I was put 
> off by the "PREMIUM" notice next to Otis's name, and (for whatever 
> reason) got the impression that only paying members could access the 
> content.  I later figured out this wasn't true, but only after having
> 
> abandoned the effort the first time (and this is exactly when a 
> project can least afford to put people off).
> 
> Otis, check yourself.  It's true that you are the FAQ maintainer, but
> 
> only you are agreeing to keep things as they are, in the face of
> vocal 
> opposition.  There is the *appearance* of a conflict of interest in 
> your stance (whether or not this is the case).
> 
> In my opinion, the features jGuru provides do not win over those 
> available via the Wiki (especially when you consider the blaring 
> commercial messages you have to endure at jGuru).  And the benefit of
> 
> having a single place to go for info about Lucene puts the Wiki on
> top 
> for me.
> 
> +1 Wiki
> -1 jGuru
> 
> Steve Rowe
> 
> Daniel Naber wrote:
> > On Monday 20 December 2004 23:23, Otis Gospodnetic wrote:
> > 
> > 
> >> I'd remove the older,
> >>unmaintained one. jGuru one is more up to date.
> > 
> > 
> > Unfortunately the start page of it contains three fat ads, that's
> three too 
> > much for an Open Source project. Also it cannot be searched, and
> it's 
> > difficult to read because not only the question is in bold but also
> the 
> > category and the name. Almost all of this is better at both the 
> > sourceforge FAQ and a Wiki FAQ.
> > 
> > Regards
> >  Daniel
> > 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: do we need two FAQs?

2004-12-20 Thread Otis Gospodnetic

To me this all sounds like creation of extra work.
We have 2 FAQs, and that IS lame.  Remove one.  I'd remove the older,
unmaintained one. jGuru one is more up to date.

I'd use Wiki for community info, which is how we are already usin Wiki.
No need for 1) 2) 3) options at all.

Done deal.  No discussion, no new work.

Otis (simplify is my new motton ;))


--- Henri Yandell <[EMAIL PROTECTED]> wrote:

> As I unintentionally kicked off the Wiki suggestion, thought I'd
> offer
> an opinion.
> 
> My original thought was that a Wiki wouldn't be as nice as the
> current
> ones, but it would be a nice thing if somehow we could have a wiki
> back-end with a nicer front-end. I wasn't trying to push for the wiki
> though, I mainly wanted to just see if there's a reason for us to try
> and get Infra to install some kind of FAQ software at the ASF.
> 
> It seems there are three options to try and have the content at the
> ASF:
> 
> 1) Use the Wiki. Good for open community maintained FAQs. Looks poor.
> 2) Put it on the site. I imagine there's something that can take a
> FAQ
> in XML and output a nice website etc. This is good for project
> maintained official-FAQ stuff.
> 3) Get dynamic FAQ software for the ASF (or code it :) ). I'm not
> sure
> if there are good use-cases for such a thing, ie) how many advantages
> do the current ones have over 1) and 2).
> 
> Blue-sky thoughts,
> 
> Hen
> 
> On Mon, 20 Dec 2004 11:46:57 -0500, Erik Hatcher
> <[EMAIL PROTECTED]> wrote:
> > I still view the wiki as the best place for the "FAQ", however it
> is
> > described.  As a committer with far too many "commitments", I'd
> prefer
> > to see a community maintained FAQ rather than one that requires us
> > committers to maintain it in source code control.
> > 
> > Erik
> > 
> > On Dec 20, 2004, at 11:23 AM, Otis Gospodnetic wrote:
> > 
> > > Wikis have their place as information repositories.  It sounds
> like
> > > what you are describing is not a FAQ, though, no?  I'm all for
> using
> > > Lucene Wiki as we are currently using - scratch pad, whiteboard,
> > > community tips, etc.  I think the FAQ is a slighly different
> beast,
> > > hence my double-checking with lucene-dev people.
> > >
> > > I'm still unsure about the best setup, so I'll wait a little
> longer to
> > > hear some more opinions.
> > >
> > > Otis
> > >
> > > --- Erik Hatcher <[EMAIL PROTECTED]> wrote:
> > >
> > >> On Dec 19, 2004, at 8:37 PM, Otis Gospodnetic wrote:
> > >>> OK.
> > >>> Is this _really_ what everyone's okay with?  I'm asking because
> > >> once we
> > >>> put the FAQ onto Wiki we:
> > >>>
> > >>> 1. no longer have authoritative FAQ - anyone can write to the
> FAQ
> > >>> 2. we have to monitor the Wiki FAQ for correctness
> > >>> 3. we have no user-friendly GUI/webapp for Q/A formatting,
> which
> > >> makes
> > >>> it more of a pain in the ass to contribute
> > >>>
> > >>> If anyone has any strong opinions, please share them.  If I
> don't
> > >> hear
> > >>> anything, I'll (try to) get the jGuru FAQ XML dump and give it
> to
> > >>> Daniel for 'wikification'.
> > >>
> > >> I prefer to leverage the wiki for this type of information for a
> > >> number
> > >> of reasons.  It is self-maintaining, not requiring a committer
> to
> > >> take
> > >> time out to commit changes and update the website.  We have
> e-mail
> > >> notification of wiki modifications and thus we already have
> > >> monitoring.
> > >>   There are many cases when using Lucene when the answer is "it
> > >> depends"
> > >> (such as the filtering question that just came up).  Having a
> wiki
> > >> "whiteboard" for these scenarios allows for a broader
> perspective.
> > >>
> > >>  Erik
> > >
> > >
> > >
> > >
> -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > 
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> >
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: do we need two FAQs?

2004-12-20 Thread Otis Gospodnetic

Wikis have their place as information repositories.  It sounds like
what you are describing is not a FAQ, though, no?  I'm all for using
Lucene Wiki as we are currently using - scratch pad, whiteboard,
community tips, etc.  I think the FAQ is a slighly different beast,
hence my double-checking with lucene-dev people.

I'm still unsure about the best setup, so I'll wait a little longer to
hear some more opinions.

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> On Dec 19, 2004, at 8:37 PM, Otis Gospodnetic wrote:
> > OK.
> > Is this _really_ what everyone's okay with?  I'm asking because
> once we
> > put the FAQ onto Wiki we:
> >
> > 1. no longer have authoritative FAQ - anyone can write to the FAQ
> > 2. we have to monitor the Wiki FAQ for correctness
> > 3. we have no user-friendly GUI/webapp for Q/A formatting, which
> makes
> > it more of a pain in the ass to contribute
> >
> > If anyone has any strong opinions, please share them.  If I don't
> hear
> > anything, I'll (try to) get the jGuru FAQ XML dump and give it to
> > Daniel for 'wikification'.
> 
> I prefer to leverage the wiki for this type of information for a
> number 
> of reasons.  It is self-maintaining, not requiring a committer to
> take 
> time out to commit changes and update the website.  We have e-mail 
> notification of wiki modifications and thus we already have
> monitoring. 
>   There are many cases when using Lucene when the answer is "it
> depends" 
> (such as the filtering question that just came up).  Having a wiki 
> "whiteboard" for these scenarios allows for a broader perspective.
> 
>   Erik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migration to SVN?

2004-12-20 Thread Otis Gospodnetic

I don't know how exactly SVN works, but one of the things we have
talked about in the past is that we currntly do not tag, branch, and
release sandbox components, and we all agree that we should.
Furthermore, because of API changes, keeping core and sandbox in sync
is important.  Perhaps we can make our lives simpler by tagging,
branching, and releasing sandbox WHENEVER we tag, branch, or release
the core, and we could do it in the way that keep tag, branch, and
release names in sync.
Something like:
  lucene_core_1.4.3 -- lucene_sandbox_1.4.3
  lucene_core_2.0.0   -- lucene_sandox_2.0.0

I'm not sure which directory structure would be best for this, but this
is an issue we've repeatedly talked in the past, so we should strngly
consider whichever structure permits this with the least effort.

SVN users: which of the structures proposed so far satisfy this?

Thanks,
Otis

--- Garrett Rooney <[EMAIL PROTECTED]> wrote:

> Giulio Cesare Solaroli wrote:
> > I am not a commiter (just an happy user!), but I would suggest the
> > following layout:
> > 
> > asf-repo/
> >   jakarta/
> > lucene/
> >   trunk/
> >  core/
> >  sandbox/
> >   branches/
> >   tags/
> > 
> > This layout will allow tags and branches of both core and sandbox
> to
> > be kept in sync without any effort.
> > 
> > Does this make any sense?
> 
> Six of one, half dozen of the other.  Honestly, I prefer to split the
> 
> projects out and give them their own trunk/branches/tags directories,
> 
> just because it seems cleaner to me and additionally because it's the
> 
> way other projects I've known at the ASF run things.  I like to be
> able 
> to look in the branches and tags directory and know what the 
> tags/branches correspond to immediately, as opposed to this kind of 
> layout where a branch/tag could be just the core, just the sandbox, 
> both, etc.  You wouldn't want to always branch/tag both, but
> sometimes 
> you might, the ambiguity bothers me.
> 
> -garrett
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migration to SVN?

2004-12-20 Thread Otis Gospodnetic

I don't know how exactly SVN works, but one of the things we have
talked about in the past is that we currntly do not tag, branch, and
release sandbox components, and we all agree that we should.
Furthermore, because of API changes, keeping core and sandbox in sync
is important.  Perhaps we can make our lives simpler by tagging,
branching, and releasing sandbox WHENEVER we tag, branch, or release
the core, and we could do it in the way that keep tag, branch, and
release names in sync.
Something like:
  lucene_core_1.4.3 -- lucene_sandbox_1.4.3
  lucene_core_2.0.0   -- lucene_sandox_2.0.0

I'm not sure which directory structure would be best for this, but this
is an issue we've repeatedly talked in the past, so we should strngly
consider whichever structure permits this with the least effort.

SVN users: which of the structures proposed so far satisfy this?

Thanks,
Otis

--- Garrett Rooney <[EMAIL PROTECTED]> wrote:

> Giulio Cesare Solaroli wrote:
> > I am not a commiter (just an happy user!), but I would suggest the
> > following layout:
> > 
> > asf-repo/
> >   jakarta/
> > lucene/
> >   trunk/
> >  core/
> >  sandbox/
> >   branches/
> >   tags/
> > 
> > This layout will allow tags and branches of both core and sandbox
> to
> > be kept in sync without any effort.
> > 
> > Does this make any sense?
> 
> Six of one, half dozen of the other.  Honestly, I prefer to split the
> 
> projects out and give them their own trunk/branches/tags directories,
> 
> just because it seems cleaner to me and additionally because it's the
> 
> way other projects I've known at the ASF run things.  I like to be
> able 
> to look in the branches and tags directory and know what the 
> tags/branches correspond to immediately, as opposed to this kind of 
> layout where a branch/tag could be just the core, just the sandbox, 
> both, etc.  You wouldn't want to always branch/tag both, but
> sometimes 
> you might, the ambiguity bothers me.
> 
> -garrett
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: do we need two FAQs?

2004-12-19 Thread Otis Gospodnetic

OK.
Is this _really_ what everyone's okay with?  I'm asking because once we
put the FAQ onto Wiki we:

1. no longer have authoritative FAQ - anyone can write to the FAQ
2. we have to monitor the Wiki FAQ for correctness
3. we have no user-friendly GUI/webapp for Q/A formatting, which makes
it more of a pain in the ass to contribute

If anyone has any strong opinions, please share them.  If I don't hear
anything, I'll (try to) get the jGuru FAQ XML dump and give it to
Daniel for 'wikification'.

Otis

--- Daniel Naber <[EMAIL PROTECTED]> wrote:

> On Tuesday 07 December 2004 02:23, Otis Gospodnetic wrote:
> 
> > jGuru can provide XML dump of a FAQ, and I believe I can obtain it,
> if
> > you want to use that to seed the Wiki FAQ.
> 
> Could you try to get that XML for me? I'll then semi-automatically
> import 
> it to the existing "FAQ" page in our wiki.
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ANN: Lucene benchmark tool

2004-12-18 Thread Otis Gospodnetic

Improvements we can add to it later.  I just want to get it in the
(Sandbox) repository quickly, because it frustrates me to see people
put effort into contributing (patches) and then just having them rot in
Bugzilla.

Otis

--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> Otis Gospodnetic wrote:
> > Hi Andrzej,
> > 
> > Can we slap ASL 2.0 on top of this and put it in the Sandbox?
> 
> Yes, I'd appreciate it.
> 
> This is just the very first version, which certainly could use some 
> improvements...
> 
> -- 
> Best regards,
> Andrzej Bialecki
>   ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ANN: Lucene benchmark tool

2004-12-17 Thread Otis Gospodnetic

Hi Andrzej,

Can we slap ASL 2.0 on top of this and put it in the Sandbox?

Otis

--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> Hi there,
> 
> After recent discussions on the speed of indexing/searching using 
> different parameters it became even clearer that we need a
> comprehensive 
> and repeatable benchmark.
> 
> I created a class which represents my first hack at benchmarking
> various 
> aspects of Lucene, using a range of different parameters. Since it
> uses 
> a standard, well-defined document collection, I hope that its results
> 
> should be more or less meaningful across different OS/hardware
> combinations.
> 
> I had a look at JUnitPerf, but found the API to be too limited for 
> collecting complex time-series data, so I basically rolled my own 
> benchmarking framework... If you know a better way to do it, I'm all
> ears.
> 
> I'm going to package it into a self-running application (WebStart?),
> but 
> for now you can try to compile and run it yourself. You can get it
> here:
> 
>   http://www.getopt.org/lb/LuceneBenchmark.java
> 
> It depends on the commons-compress.jar, specifically on the Tar 
> functionality. This JAR is in commons-sandbox, so it may not be
> readily 
> available - in that case you can get it here:
> 
>   http://www.getopt.org/lb/commons-compress.jar
> 
> (I will put an index page there, but for now use these direct links).
> 
> CAVEAT: please NOTE WELL that this benchmark runs at 100% CPU and
> 100% 
> disk I/O for SEVERAL HOURS even on a modern equipment (partial
> results 
> are printed on System.out from time to time). You have been warned -
> so 
> don't send me any fried mobo's or melted drives for repairs, ok?
> 
> You can cut down the number of input parameters to reduce the overall
> 
> time, or use the mini* document collection (but this reduces the
> number 
> of documents in index). See the comments in source.
> 
> Comments and patches are welcome!
> 
> -- 
> Best regards,
> Andrzej Bialecki
> 
> -
> Software Architect, System Integration Specialist
> CEN/ISSS EC Workshop, ECIMF project chair
> EU FP6 E-Commerce Expert/Evaluator
> -
> FreeBSD developer (http://www.freebsd.org)
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Deleting document in IndexWriter

2004-12-16 Thread Otis Gospodnetic

Hm, what happens if one of the other index-modifying operations (e.g.
optimize, addDocument, addIndexer) is invoked at the same time by
another thread that uses the same IndexWriter instance?
// close IndexWriter
// open IndexReader
// delete via IndexReader
// close IndexReader
// open and return new IndexWriter
public IndexWriter delete(int docNum) throws IOException {
  this.close();
  IndexReader ireader = IndexReader.open(directory);
  ireader.deleteIngoreLock(docNum);
  ireader.close();
  return new IndexWriter(reuse dir, analyzer, create flag);
}

One ugly API perhaps (returning IndexWriter like this), but it doesn't
mess with (no)locks.  Oh, I didn't synchronize on anything.  Maybe on
directory?

Otis

--- Daniel Naber <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> the request to delete documents in IndexWriter instead of IndexReader
> comes 
> up regularly. What if we implement a delete() method in IndexWriter
> like 
> this:
> 
>   public synchronized void delete(int docNum) throws IOException {
> IndexReader ireader = IndexReader.open(directory);
> ireader.deleteIngoreLock(docNum);
> ireader.close();
>   }
> 
> deleteIngoreLock would be a new method just like delete, but that
> doesn't 
> create a lock -- it uses the IndexWriter's lock which exists all the
> time. 
> Would this work and would this be safe?
> 
> I'm well aware that this can be slow, but it makes deleting documents
> so 
> much easier for many people who don't have huge indices. We could
> document 
> the fact that it's slow and use an array of document IDs to encourage
> 
> people to delete more than one document at once, so the overhead of 
> opening the reader becomes less.
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: potential new Lucene logo

2004-12-14 Thread Otis Gospodnetic

Hello Murray,

Thanks for doing this and contributing.  Changing the logo is not an
urgent or even needed, but it looks like there are issues with it, so
it would be nice to fix it up the way you started.

The 3rd version with green oval looks nice to me.  I'd still close the
'e's and clean up the pale green colour between letters.  The black
font version seems too boring (I'm wearing all black today and I think
I made a boring choice this morning).  The green is better, but maybe
the oval ruins the 60s look (not sure, I was still just a concept in
the 60s).  It's sunny in NYC today, so I'm feeling bright, happy,
energetic and all that.  Maybe some stronger, happy colour that looks
good on white?

Thanks,
Otis



--- Murray Altheim <[EMAIL PROTECTED]> wrote:

> Erik Hatcher wrote:
> > A further logo request would be to chop it into "L", "u", and
> "cene" 
> > with some matching left and right arrows so we can put it on a web
> page 
> > like this:
> > 
> > < L u u u u u u cene >
> > 
> > for search results.
> 
> Erik et al,
> 
> Okay. I got terrifically bored fixing bugs and whipped up a new
> version that is based on Magneto Bold rather than Magneto Bold
> Extended. It's narrower and I've changed the way the letters run
> into each other, used the same hue as the existing logo for an
> underlining but changed the base letter colour to black, and also
> provided a meatball version, good for caps and T-shirts. The master
> is in cleaner-than-last-time SVG, there's a huge PNG built from
> that as a master (with no cleanup this time, it's just a better
> SVG to begin with), plus a sample:
> 
>http://www.altheim.com/murray/img/lucene-master-10pct.png
> 
> with the zip (400K) at
> 
>http://www.altheim.com/murray/img/lucene-logo.zip
> 
> With the SVG available it's easy to changes sizes and/or colours,
> so it can be resized, turned into a negative, whatever.
> 
> And again, I give up rights to the group (just to make that public).
> 
> Murray
> 
>
..
> Murray Altheim   
> http://kmi.open.ac.uk/people/murray/
> Knowledge Media Institute
> The Open University, Milton Keynes, Bucks, MK7 6AA, UK  
> .
> 
>   Empty handed, holding a hoe,
>   Walking, riding a water buffalo,
>   A man is crossing over a bridge;
>   The bridge, not the water, flows.
> 
>  -- Mahasattva Fu, The Blue Cliff Record [96]
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: potential new Lucene logo

2004-12-14 Thread Otis Gospodnetic

I agree with that.  I noticed the same with the very first logo
alternative.

Otis

--- mark harwood <[EMAIL PROTECTED]> wrote:

> I just tried closing the loop on the "e"s in the new
> logo and I think it looks a lot better for it - it
> looks a lot less like the "c"


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: kick-start: Lucene to top-level project

2004-12-13 Thread Otis Gospodnetic


--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> On Dec 12, 2004, at 6:36 PM, Kevin A. Burton wrote:
> > Erik Hatcher wrote:
> >
> >>
> >> Nutch is a full-web crawler built, by Doug, using Lucene indexes 
> >> under the covers. Organizationally, Nutch is under a Apache 
> >> compatible license. I don't believe any committers, other than
> Doug, 
> >> overlap though. We would be bringing in a new set of committers on
> 
> >> the Nutch side of things.
> >
> > Not just Nutch but Heretrix too... sponsored by the Internet
> Archive...
> 
> I'm missing what your point is, Kevin.  Are you saying Heretrix is 
> interested in ASL'ing and incubating at Apache?  Or are you noting
> that 
> Heretrix is like Nutch?  Or...???

I'm wondering, too.  Has Heretrix developer community expressed
interest n joining ASF?

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Null Pointer Exception in clone method of TermVectorsReader class!

2004-12-13 Thread Otis Gospodnetic

Samir,

Can you reproduce this with the very latest Lucene from CVS?
If you can, could you write a little class that reproduces this
exception and stick it in Bugzilla, please?

I'm using Term Vectors on simpy.com, and I haven't had any issues with
them.

Thanks,
Otis


--- ABDOU Samir <[EMAIL PROTECTED]> wrote:

> 
> Hello everybody,
> 
> I got the null pointer exception in the method clone of the
> TermVectorsReader class. Exactly in the given statement:
> clone.tvx = (IndexInput) tvx.clone();
> 
> This happens when trying to indexing.
> 
> I think that is due to the constructor of that class. 
> 
>   TermVectorsReader(Directory d, String segment, FieldInfos
> fieldInfos)
> throws IOException {
> if (d.fileExists(segment + TermVectorsWriter.TVX_EXTENSION)) {
>   tvx = d.openInput(segment + TermVectorsWriter.TVX_EXTENSION);
>   checkValidFormat(tvx);
>   tvd = d.openInput(segment + TermVectorsWriter.TVD_EXTENSION);
>   tvdFormat = checkValidFormat(tvd);
>   tvf = d.openInput(segment + TermVectorsWriter.TVF_EXTENSION);
>   tvfFormat = checkValidFormat(tvf);
>   size = (int) tvx.length() / 8;
> }
> ->else {
> ->System.out.println("The file " + segment + " doesn't exist
> !");
> ->}
> 
> this.fieldInfos = fieldInfos;
>   }
>   
> So my question: what happens when a segment doesn't exist!? I think
> that
> the exception is due to this because tvx may be null in this case!
> 
> Thanks
> Samir
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: kick-start: Lucene to top-level project

2004-12-13 Thread Otis Gospodnetic

Regarding PMC members - I think I'm all alone there :(

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> 
> On Dec 11, 2004, at 9:54 AM, Bernhard Messer wrote:
> > Currently i'm not able to see how much adminitrative work this will
> 
> > raise. Are there any known apache projects already moved from a 
> > subproject to TLP ? Maybe they could give us some important hints
> what 
> > effort will come over the community.
> 
> I've started a thread on the Jakarta PMC list, and have gotten
> several 
> very helpful and supportive replies.
> 
> What other Lucene committers are PMC members?  Otis is.  Others?
> 
> Sam Ruby had the following advice:
> 
> >> The board has a bias against self-referential definitions (the
> 'foo' 
> >> project is for managing software related to the 'foo' project - 
> >> believe it or not, this happens all too often).  So naming the 
> >> proposed project something like search is a good idea.
> >>
> >> The board tends to prefer projects whose scopes don't overlap with
> 
> >> other projects.  That does not appear to be a problem here.  In
> other 
> >> cases, this involves highlighting differences in technical
> approaches 
> >> taken by two "competing" projects.
> >>
> >> Finally, some members of the board have a strong bias against 
> >> umbrella projects.  My advice here is to not emphasize this aspect
> of 
> >> the proposal.  Overall, the fact that this reduces the size of the
> 
> >> Jakarta project umbrella, those with this bias will be happy.
> 
> In other words, I guess we should de-emphasize the idea of bring
> Nutch 
> under the umbrella, but focus on how Lucene itself belongs outside 
> Jakarta because it is its own community and does not depend on any 
> Jakarta projects, and that we will bring in non-Java ports over time.
> 
> Next step is to create the proposal, following in the footsteps of 
> Jetspeed or Struts for example.  If no one else volunteers for it,
> I'll 
> craft it sometime this week hopefully.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: kick-start: Lucene to top-level project

2004-12-10 Thread Otis Gospodnetic

Yes, it uses the normal Lucene packages as a Jar file.

There are not a lot of places where Nutch uses Lucene in interesting
new ways, but there are some.  Actually, the Nutch case study in Lucene
in Action may reveal some.  For instance, Nutch has a simpler
QueryParser.  It also has a neat way of dealing with common terms and
terms that frequently come together.  But you really ought to browse
the Nutch source for that.

Otis

--- Terry Steichen <[EMAIL PROTECTED]> wrote:

> Erik/Otis,
> 
> Your answers to the organizational question makes a lot of sense, and
> I therefore, for one, support what you're proposing.
> 
> Regarding Nutch, I gather it uses the standard Lucene library?  I
> keep seeing references on this (or maybe it's the users') list
> suggesting people might find some answers to some of their questions
> by examining the Nutch code.  Might there be some way to formally
> cross-reference relevant code in the part of Nutch that uses Lucene
> in clever/useful ways?
> 
> Regards,
> 
> Terry
>   - Original Message - 
>   From: Erik Hatcher 
>   To: Lucene Developers List 
>   Sent: Friday, December 10, 2004 3:49 PM
>   Subject: Re: kick-start: Lucene to top-level project
> 
> 
>   On Dec 10, 2004, at 3:27 PM, Terry Steichen wrote:
>   > Forgive me for asking a stupid question, but why?
> 
>   Ah, excellent question that I should have addressed in my first
> message.
> 
>   Bringing Lucene under a top-level project would allow us to
> eventually 
>   bring the other Lucene ports that choose to Apache Software License
> 
>   their code under the same umbrella.  This would allow us to more
> easily 
>   create compatibility test suites to ensure version compatibility
> (or at 
>   least identify incompatibilities clearly).  With the new projects 
>   coming in, we could bring in new committers and partition the 
>   repositories in finer-grained ways.
> 
>   >   Sounds like a fair amount of work is involved.
> 
>   It will be a fair amount of work, mostly administrative.
> 
>   > PS: Heck, just to prove it's probably a dumb question, I still
> don't 
>   > even understand either the technical or organizational
> relationship(s) 
>   > between Lucene and Nutch.
> 
>   Nutch is a full-web crawler built, by Doug, using Lucene indexes
> under 
>   the covers.  Organizationally, Nutch is under a Apache compatible 
>   license.  I don't believe any committers, other than Doug, overlap 
>   though.  We would be bringing in a new set of committers on the
> Nutch 
>   side of things.
> 
>   Erik
> 
> 
>  
> -
>   To unsubscribe, e-mail: [EMAIL PROTECTED]
>   For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: kick-start: Lucene to top-level project

2004-12-10 Thread Otis Gospodnetic

Terry (and anyone else who's wondering).

Erik didn't mention it, but we would also like to bring in various
Lucene ports under a single roof.  This has many advantages, and I
won't get into those now.

Regarding Nutch and Lucene - Nutch is a whole 'solution' for the 'how
do I crawl the web/intranet/set of web sites and make it searchable a
la Google or any other search engine' problem.  Lucene is just one of
the components there - just the piece that handles indexing and then
searching.  There are also pieces that crawl the web, schedule links
for (re)crawling, doing link analysis, page ranking, parsing different
document types, etc.

It would be nice to see all these closely related projects together.
 Lucene
 Nutch
 dotLucene
 CLucene
 PyLucene
 Plucene
 Lupy

We'll have to talk to communities and developers behind each of the
Lucene ports above and see if they are willing to participate.  Some
already expressed their interest on lucene-*, as well as directly with
me.

Otis

--- Terry Steichen <[EMAIL PROTECTED]> wrote:

> Forgive me for asking a stupid question, but why?  Sounds like a fair
> amount of work is involved.  What is the benefit of making Lucene a
> "top-level Apache project"?  
> 
> Regards,
> 
> Terry
> 
> PS: Heck, just to prove it's probably a dumb question, I still don't
> even understand either the technical or organizational
> relationship(s) between Lucene and Nutch.
> 
>   - Original Message - 
>   From: Erik Hatcher 
>   To: Lucene List 
>   Sent: Friday, December 10, 2004 3:18 PM
>   Subject: kick-start: Lucene to top-level project
> 
> 
>   The idea of creating a new top-level Apache project, namely 
>   search.apache.org, has been floating around for a while.  The idea
> is 
>   to bring Lucene under this new umbrella, along with incubating
> Nutch 
>   through the standard Apache incubation process with it aimed at
> coming 
>   under this same top-level project too.
> 
>   This message is to kick-start this process in a more formal manner.
> 
>   Creating a new top-level project to house Lucene, and the effort to
> 
>   incubate Nutch can occur in parallel, I believe.  Nutch will follow
> the 
>   path outlined here:
> 
>   http://incubator.apache.org/howtoparticipate.html
> 
>   The creation of the top-level project has been somewhat outlined
> here:
> 
>   http://wiki.apache.org/jakarta/JakartaPMCTopLevelProjectApplication
> 
>   Most of the effort is administrative, such as selecting a Chair,
> PMC 
>   members, and putting together a plan of action.  Along with this
> move, 
>   we should also consider switching to Subversion and migrating from 
>   Bugzilla to JIRA.
> 
>   Are there any folks opposed to these plans?
> 
>   I personally am happy to champion this effort and I will do what I
> can 
>   within the time constraints of the rest of my life.  What we need
> are 
>   other volunteers willing to assist with these efforts.  First we
> need 
>   to write the proposals, choose the initial PMC members and Chair
> (who I 
>   assume should be his majesty Doug, if he's willing to be in that
> role), 
>   and deal with the flurry of e-mails on the Jakarta PMC e-mail list.
> 
>   More technical effort will be needed to manage the publishing of
> the 
>   website - it would be nice to overhaul the Lucene site - manage the
> 
>   code repositories, and automate the bulk of the release process so
> that 
>   we can simply click and release.
> 
>   What tasks have I overlooked?
> 
>   Discussion?  Suggestions?
> 
>   Erik
> 
> 
>  
> -
>   To unsubscribe, e-mail: [EMAIL PROTECTED]
>   For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: two versioning problems with Lucene

2004-12-09 Thread Otis Gospodnetic

Hello,

--- Bill Janssen <[EMAIL PROTECTED]> wrote:

> > To address the issue Bill just brought up, I refer you to the 
> > documentation of the Ant  task.  Check out the filesetmanifest
> 
> > attribute options:
> > 
> > http://ant.apache.org/manual/CoreTasks/jar.html
> > 
> > I have not yet tried this relatively new (as of Ant 1.6, since we 
> > didn't write about it in Java Development with Ant), but it looks
> like 
> > it addresses the concern of repackaging and keeping the manifest 
> > version information from being lost.
> 
> That's fine if you're using Ant to build, but lots of folks don't.
> 
> Bill

Somebody told my wife the other day that she could work as a freelance
editor.  Editing what, I asked.  Editing university student's papers,
the person said.  It pays alright.  People do it, but both my wife and
I agree it's completely wrong, so she's not even considering that.

Maybe this wasn't as good of an analogy as I had initially thought, but
just because people use Makefiles to repackage Jars and do it
incorrectly, or in a way that makes them lose provided information,
doesn't mean that we have to account for them.  In my mind this is
similar to supporting broken HTML, when there are clearly defined and
well-known standards.

Of course, this is only my opinion, and I don't expect everyone to
agree - how boring would that be! :)

Otis

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: two versioning problems with Lucene

2004-12-07 Thread Otis Gospodnetic

I don't think we need to support cases where people open a Jar - I've
never seen a case where this was needed.  Expanded WARs and EARs I've
seen, but not Jars.  I'm for relying on the information in MANIFEST.MF,
and there must be tools for parsing that.  If I needed Lucene's version
number, I'd get such a tool and use it, instead of trying to come up
with a custom scheme.  Any such custom hack requires maintenance, even
if it's auto-generated.

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> On Dec 7, 2004, at 7:42 PM, Bill Janssen wrote:
> > So if we keep the Lucene version in only the packaging of the jar
> > file, we have a source of end-user error and fragility in two ways:
> > (1) the manifest file may not be available (the class files may be
> > re-packaged in another app which didn't know to copy the Lucene
> > manifest stuff, or unpacked)
> 
> I'd like to hear others weigh in on this repackaging issue.  Is this
> a 
> common practice?
> 
> Supporting users that repackage the JAR and potentially introduce 
> incompatibilities will not be fun, and if someone reports they are 
> running Lucene 1.5.3 I'd like to be sure I know exactly what that 
> means.  Having a Java class that contains the version information
> seems 
> brittle to me, in that someone could repackage improperly.
> 
> JAR manifests, while certainly not leveraged this way by most, were 
> designed to contain versioning information.
> 
> > package org.apache.lucene;
> > public class VERSION {
> >  // ..
> > }
> >
> > Why make life tough on users?
> 
> I'm merely discussing the options.  We've had the version information
> 
> in the manifest already and was wondering why that isn't good enough.
>  
> You've certainly given some reasons why you feel it is not good
> enough.
> 
> What do others think?
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: do we need two FAQs?

2004-12-06 Thread Otis Gospodnetic

Less is more.  One FAQ is better than two.  Although I maintain the
jGuru FAQ, I don't have any issues with going the Wiki approach
(although I'm not a big fan of Wikis, because they require me to learn
yet another made-up markup language), and going the community-supplied
FAQ.

jGuru can provide XML dump of a FAQ, and I believe I can obtain it, if
you want to use that to seed the Wiki FAQ.  But let's hear some more
opinions before doing anything.  Maybe you should ask users on
lucene-user, since that's where the people who depend on Lucene FAQ
are.

Otis

--- Daniel Naber <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> there are currently two FAQs for Lucene:
> 
> http://www.jguru.com/faq/Lucene
> http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi
> 
> To my mind that leads to redundancy and decreases the motivation to
> update 
> at least one of them. As the jguru FAQ is full of ads and more
> difficult 
> to navigate my suggestion is:
> 
> -Don't link the jguru FAQ anymore and then take it offline completely
> (so 
> that people don't find it via Google and think it's up-to-date)
> -Copy missing items missing to the FAQ at sourceforge 
> -Make sure that all committers get write access to that FAQ
> -Clean up and update the FAQ
> 
> Does anybody see a problem with that?
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: missing values in systemproperties.html

2004-12-02 Thread Otis Gospodnetic

I actually wouldn't document those system properties.  I have never
heard of anyone using them, and I am not even sure if using them would
work, because of the class cast on line 116 in FSDirectory for example.

Hm, but I see this is a recent addition from Doug... oh, it must be
related to his GCJ-based FSDirectory implementation, which I'm sure he
tested... so scratch the above paragraph and document all those
properties.

http://wiki.apache.org/jakarta-lucene/ModifyingExecutionParameters

Otis

--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> Otis,
> 
> >I'm not sure which properties you are talking about.  As far as I
> can
> >tell, systemproperties.html covers all properties.  Please add
> anything
> >that you see missing. 
> >
> for example:
> System.getProperty("org.apache.lucene.FSDirectory.class", 
> FSDirectory.class.getName()); in FSDirectory
> System.getProperty("org.apache.lucene.SegmentReader.class", 
> SegmentReader.class.getName()); in SegmentReader
> 
> i will add the missing one to systemproperties.html
> 
> >Actually, I think there is a Wiki page with
> >system properties, so we should probably update just that page, and
> >remove the one under xdocs.
> >  
> >
> do you know where i can find the wiki page. i searched for it but 
> without success.
> 
> thanks
> Bernhard
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: missing values in systemproperties.html

2004-12-01 Thread Otis Gospodnetic

Hello Bernhard (2+ weeks old emal).

I'm not sure which properties you are talking about.  As far as I can
tell, systemproperties.html covers all properties.  Please add anything
that you see missing.  Actually, I think there is a Wiki page with
system properties, so we should probably update just that page, and
remove the one under xdocs.

Otis

--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> hi,
> 
> is there any reason why the following 3 properties are not documented
> in 
> systemproperties.html, or is it just a lack in the documentation ?
> 
> org.apache.lucene.SegmentReader.class (SegmentReader.java)
> org.apache.lucene.FSDirectory.class (FSDirectory.java)
> line.separator (QueryParser.java)
> 
> thanks
> Bernhard
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: cvs commit: jakarta-lucene/xdocs systemproperties.xml

2004-11-29 Thread Otis Gospodnetic

Wouldn't it be better to change the logdir system property in
IndexWriter to logDir, so that its casing follows the existing casing
pattern used in other Lucene system properties?  Having this one
exceptin (logdir vs. logDir) will make a lot of people double-check the
docs or mistakenly choose logDir, when we changed it to logdir.

Otis

--- [EMAIL PROTECTED] wrote:

> bmesser 2004/11/29 13:09:41
> 
>   Modified:docs systemproperties.html
>xdocssystemproperties.xml
>   Log:
>   small typo fix
>   PR:32432
>   Reviewed by:Bernhard Messer
>   
>   Revision  ChangesPath
>   1.8   +2 -2  jakarta-lucene/docs/systemproperties.html
>   
>   Index: systemproperties.html
>   ===
>   RCS file: /home/cvs/jakarta-lucene/docs/systemproperties.html,v
>   retrieving revision 1.7
>   retrieving revision 1.8
>   diff -u -r1.7 -r1.8
>   --- systemproperties.html   29 Nov 2004 13:34:56 -  1.7
>   +++ systemproperties.html   29 Nov 2004 21:09:41 -  1.8
>   @@ -244,10 +244,10 @@
>
>
>
>   -
href="api/org/apache/lucene/store/FSDirectory.html#lockDir">lockDir
>   +
href="api/org/apache/lucene/store/FSDirectory.html#lockdir">lockdir
>
>
>   -org.apache.lucene.lockDir
>   +org.apache.lucene.lockdir
>
>
>the value of
> java.io.tmpdir system property
>   
>   
>   
>   1.3   +2 -2  jakarta-lucene/xdocs/systemproperties.xml
>   
>   Index: systemproperties.xml
>   ===
>   RCS file: /home/cvs/jakarta-lucene/xdocs/systemproperties.xml,v
>   retrieving revision 1.2
>   retrieving revision 1.3
>   diff -u -r1.2 -r1.3
>   --- systemproperties.xml21 May 2004 10:46:13 -  1.2
>   +++ systemproperties.xml29 Nov 2004 21:09:41 -  1.3
>   @@ -103,10 +103,10 @@
>
>
>
>   -
href="api/org/apache/lucene/store/FSDirectory.html#lockDir">lockDir
>   +
href="api/org/apache/lucene/store/FSDirectory.html#lockdir">lockdir
>
>
>   -org.apache.lucene.lockDir
>   +org.apache.lucene.lockdir
>
>
>the value of
> java.io.tmpdir system property
>   
>   
>   
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene in Action reviewer needed for TheServerSide.com

2004-11-24 Thread Otis Gospodnetic

Hello,

Lucene in Action is about to be released, with the ebook version
expected next week and the print version in mid-December (lots of happy
grandmas).

Erik and I are looking for somebody interested in reading the book and
writing a 1-2 page review for TheServerSide.com.  You can look for
"book review" using TSS's search (guess what they use for searching) to
find previously published book reviews.

If you are interested, please reply directly.

Thanks,
Otis



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: jdk 1.3 versus jdk 1.4

2004-11-16 Thread Otis Gospodnetic

Go ahead and do it - we've make the same changes in other places in the
code for the same reason before.

Otis

--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> since the last changes in lucene, we are not longer backward
> compatible 
> with jdk 1.3. All the pure guys, running IBM WebSphere 4.x with IBM
> JDK 
> 1.3, lost their chances to run lucene newer than version 1.4.2. 
> Especially in huge companies, where it is not so trivial to upgrade
> to a 
> new java version, this could reduce the acceptance for lucene
> 
> There are two major reasons for loosing the compatibility:
>  - the new MMapDirectory class
>  - several code parts like:
> ...
> catch (ClassNotFoundException e) {
>   throw new RuntimeException(e);
> }
> ...
> 
> I think we simply can ignore the first one because MMapDirectory is 
> optional anyway. This is an acceptable price for using outdated
> software ;-)
> The second problem could be solved easily using the string
> constructor 
> in java.lang.RuntimeException which is available since 1.0.
> At least we have to document it somehow. There is a chapter "What are
> 
> Lucene system requirements" in the faq. Is this an ideal place to 
> document it.
> 
> I'd like to make the changes for being backward compatible as far as 
> possible. Does anybody disagree ?
> 
> Bernhard
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: the future of MultiFieldQueryParser

2004-11-14 Thread Otis Gospodnetic

I would like that change, as I don't always like queries that MFQP
creates.

Otis

--- Daniel Naber <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I'd like to fix MultiFieldQueryParser so that it properly works with
> AND 
> queries. Currently it rewrites AND queries so that all terms must
> appear 
> in all fields, which rarely makes sense.
> 
> Eric Jain suggested a new class that works for AND and OR queries:
>
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1798116
> 
> It seems that his code can just be added to the current 
> MultiFieldQueryParser class. The current static calls can then all be
> 
> deprecated (once some feature like setting required/prohibited per
> field 
> have been added to the new code).
> 
> Does anybody see a problem with that?
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Tswana analyzer

2004-11-11 Thread Otis Gospodnetic

And to contribute your Analyzer, pease stick it in Lucene's Bugzilla.

Thanks,
Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> 
> On Nov 11, 2004, at 5:45 AM, Laurie Butgereit wrote:
> > We have developed a Tswana analyzer for Lucene.  What
> > do people normally do?  Submit to the sandbox or
> > create an independent project somewhere (like
> > perhaps sourceforge) to release their analyzers?
> 
> I have not heard of any Tswana analyzer, so you're likely to be the 
> pioneer.  You're welcome to contribute any work you do to the Lucene 
> Sandbox, provided you license it with the Apache Software License
> 2.0.  
> Hosting it elsewhere is fine too.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Pluggable Lock Framework - how to submit code enhancement?

2004-11-06 Thread Otis Gospodnetic

Official Hi Jeff,

That sounds like something many users would appreciate.  Bugzilla is
the best way to submit your code.  Open a new 'bug' in Bugzilla, prefix
the summary line with [PATCH], and then attach your code.  This will
have to be a 2-step process: 1) open a bug, 2) attach code.  That's
just how Bugzilla works.

Otis

--- Jeff Patterson <[EMAIL PROTECTED]> wrote:

> I've been lurking here as a watcher for a while, but
> this is my first post - make this an official "hi!"
> 
> We use Lucene at work on a HA system across 2
> machines with multiple JVMs accessing the same
> NFS-mounted index directory.  We overcame the
> NFS locking deficiencies in Lucene by wrapping
> the Lucene API calls in a home-grown database
> locking mechanism.  I have since hooked up to
> the CVS tree for the 1.5 candidate and have
> built in to the codebase a pluggable Lock
> Override framework allowing a user to build their
> own locking mechanism (if you don't, it defaults
> to the current filesystem Lock).
> 
> This framework seems like it would be beneficial
> to the larger community.  What is the best way
> for me to get the changes incorporated in to the
> next release?  I slightly modified:
> 
>   org.apache.lucene.store.FSDirectory
> 
> and added one new small class:
> 
>   org.apache.lucene.store.LockFactory
> 
> Additional tweaks would be advised around
> my changes, but I think they would be minor.
> 
> Please advise on proper submission protocol. Is it
> Bugzilla, to this list, or other?
> 
> Thanks - Jeff
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about PorterStemFilter class

2004-10-29 Thread Otis Gospodnetic

Hello Murray,

You should open a bug entry in Bugzilla and then attach your code to
it, with ASL on top.

Thanks,
Otis

--- Murray Altheim <[EMAIL PROTECTED]> wrote:

> Erik Hatcher wrote:
> > On Oct 29, 2004, at 4:04 AM, PROYECTA.Fernandez Garcia, Ivan wrote:
> > 
> >>We are using it in our Analyzer class and we have the following
> >>questions:
> >>1º Why does it change 'y' to 'i' character using parser
> >>method?.
> >>Instance: study -> studi
> > 
> > 
> > That's what stemmers do.  This allows queries for "study" and
> "studies" 
> > to match the same documents, for example.
> > 
> > 
> >>2º In our case, Lucene has searches 50 hits and is showed
> >>the first one only.
> >>If I comment new PorterStemFilter(ts) from our Analyzer
> >>class. All 50 hits is showed. Why?
> > 
> > You haven't provided enough information.   Please provide a simple 
> > short example that shows one document (that currently does not get 
> > found) being indexed along with the code for your analyzer, along
> with 
> > a sample query that should match but doesn't.
> 
> Erik,
> 
> I just this week joined the mailing list, and on this topic thought
> I'd mention that I've rewritten the PorterStemmer Java class,
> cleaning
> up whitespace and predeclaring all the Strings for better
> performance.
> It passes the file-in file-out test provided by Martin Porter (iow,
> no change from the existing algorithm). The source for mine was taken
> from his site -- I'm not sure of the origin of the one in Lucene. I
> could also add an Apache license to the top.
> 
> What would I need to do to contribute this file? Just fill out the
> ASF IP form and then commit the file in CVS?
> 
> Thanks,
> 
> Murray
> 
>
..
> Murray Altheim   
> http://kmi.open.ac.uk/people/murray/
> Knowledge Media Institute
> The Open University, Milton Keynes, Bucks, MK7 6AA, UK  
> .
> 
> [International terrorism] is a fantasy that has been exaggerated
> and distorted by politicians. It is a dark illusion that has
> spread unquestioned through governments around the world, the
> security services, and the international media. In an age when
> all the grand ideas have lost credibility, fear of a phantom
> enemy is all the politicians have left to maintain their power."
> 
> The making of the terror myth, The Guardian
> http://www.guardian.co.uk/terrorism/story/0,12780,1327904,00.html
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: typo in javadoc

2004-10-22 Thread Otis Gospodnetic

Good eye, fixed.

Otis

--- Paul <[EMAIL PROTECTED]> wrote:

> Class Term, method compareTo:
> "Compares two terms, returning an integer which is less than zero iff
> this term belongs after the argument, (...) and greater than zero iff
> this term belongs after the argument."
> 
> two times "after the argument" :)
> 
> Paul
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Admit a new Lucene Committer

2004-10-22 Thread Otis Gospodnetic

Bernhard has to fax in the CLA.
Only after that you need to email [EMAIL PROTECTED] and [EMAIL PROTECTED], with 2
usernames for Bernhard (one backup), his email address, the link to
voting thread, and I think that's it.  Somebody will email Bernhard
when it's all done (a few days).

I did this recently for Dave Spencer and he is now a happy
lucene-sandbox committer.

Otis


--- Christoph Goller <[EMAIL PROTECTED]> wrote:

> Vadim Gritsenko schrieb:
> 
> > Christoph Goller wrote:
> >
> >> I nominated Bernhard Messer as Lucene committer.
> >>
> >> He got a +1 from me, Doug Cutting, Daniel Naber,
> >> Erik Hatcher, and Otis Gospodnetic (all of them Lucene commiter).
> >> There was no 0 or -1 vote.
> >>
> >> I don't know how to include a whole thread.
> >
> >
> > http://marc.theaimsgroup.com/?t=10981174214&r=1
> >
> > Vadim
> >
> >
> >> So if
> >> you want to check these votes, please have a look at Lucene
> >> developer mailing list around 2004-10-18
> >>
> >> Bernhard needs access to jakarta-lucene and
> jakarta-lucene-sandbox.
> >>
> >> kind regards,
> >> Christoph Goller
> >
> 
> Thanks,
> and I forgot to include Bernhard's email address:
> [EMAIL PROTECTED]
> 
> Am I supposed to do anything else or is someone from the PMC
> taking the next steps?
> 
> Christoph
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring

2004-10-21 Thread Otis Gospodnetic

Chuck - if you provide patches and post them to Bugzilla, one of the
developers will try them locally and either provide feedback or merge
your changes into CVS.

Otis

--- Chuck Williams <[EMAIL PROTECTED]> wrote:

> Thanks Otis.  Other than trying to get some consensus a) that this is
> a
> problem worth fixing, and b) on the best approach to fix it, my
> central
> question is, if I fix it is it likely to get incorporated back into
> Lucene?  I don't want to deviate from Lucene sources, especially with
> so
> many classes, and so would like to address this only if there is a
> process to evaluate the changes and incorporate them back into Lucene
> if
> they provide the improvement I believe they will.
> 
> Thanks for any guidance on that,
> 
> Chuck
> 
>   > -Original Message-
>   > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
>   > Sent: Thursday, October 21, 2004 10:06 AM
>   > To: Lucene Developers List
>   > Subject: Re: Normalized Scoring -- was RE: idf and explain(), was
> Re:
>   > Search and Scoring
>   > 
>   > Hi Chuck,
>   > 
>   > The relative lack of responses is not because there is no
> interest,
> but
>   > probably because there are only a few people on lucene-dev who
> can
>   > follow/understand every detail of your proposal.  I understand
> and
> hear
>   > you, but I have a hard time 'visualizing' some of the formulas in
> your
>   > proposal.  What you are saying sounds right to me, but I don't
> have
>   > enough theoretical knowledge to go one way or the other.
>   > 
>   > Otis
>   > 
>   > 
>   > --- Chuck Williams <[EMAIL PROTECTED]> wrote:
>   > 
>   > > Hi everybody,
>   > >
>   > > Although there doesn't seem to be much interest in this I have
> one
>   > > significant improvement to the below and a couple observations
> that
>   > > clarify the situation.
>   > >
>   > > To illustrate the problem better normalization is intended to
>   > > address,
>   > > in my current application for BooleanQuery's of two terms, I
>   > > frequently
>   > > get a top score of 1.0 when only one of the terms is matched. 
> 1.0
>   > > should indicate a "perfect match".  I'd like set my UI up to
> present
>   > > the
>   > > results differently depending on how good the different results
> are
>   > > (e.g., showing a visual indication of result quality, dropping
> the
>   > > really bad results entirely, and segregating the good results
> from
>   > > other
>   > > only vaguely relevant results).  To build this kind of
> "intelligence"
>   > > into the UI, I certainly need to know whether my top result
> matched
>   > > all
>   > > query terms, or only half of them.  I'd like to have the score
> tell
>   > > me
>   > > directly how good the matches are.  The current normalization
> does
>   > > not
>   > > achieve this.
>   > >
>   > > The intent of a new normalization scheme is to preserve the
> current
>   > > scoring in the following sense:  the ratio of the nth result's
> score
>   > > to
>   > > the best result's score remains the same.  Therefore, the only
>   > > question
>   > > is what normalization factor to apply to all scores.  This
> reduces
> to
>   > > a
>   > > very specific question that determines the entire normalization
> --
>   > > what
>   > > should the score of the best matching result be?
>   > >
>   > > The mechanism below has this property, i.e. it keeps the
> current
>   > > score
>   > > ratios, except that I removed one idf term for reasons covered
>   > > earlier
>   > > (this better reflects the current empirically best scoring
>   > > algorithms).
>   > > However, removing an idf is a completely separate issue.  The
>   > > improved
>   > > normalization is independent of whether or not that change is
> made.
>   > >
>   > > For the central question of what the top score should be, upon
>   > > reflection, I don't like the definition below.  It defined the
> top
>   > > score
>   > > as (approximately) the percentage of query terms matched by the
> top
>   > > scoring result.  A better conceptual definition is to use a
> weighted
>   > > average based on the boosts.  I.e., downward propagate all
> boosts
> to
>   > > t

Re: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring

2004-10-21 Thread Otis Gospodnetic

Hi Chuck,

The relative lack of responses is not because there is no interest, but
probably because there are only a few people on lucene-dev who can
follow/understand every detail of your proposal.  I understand and hear
you, but I have a hard time 'visualizing' some of the formulas in your
proposal.  What you are saying sounds right to me, but I don't have
enough theoretical knowledge to go one way or the other.

Otis


--- Chuck Williams <[EMAIL PROTECTED]> wrote:

> Hi everybody,
> 
> Although there doesn't seem to be much interest in this I have one
> significant improvement to the below and a couple observations that
> clarify the situation.
> 
> To illustrate the problem better normalization is intended to
> address,
> in my current application for BooleanQuery's of two terms, I
> frequently
> get a top score of 1.0 when only one of the terms is matched.  1.0
> should indicate a "perfect match".  I'd like set my UI up to present
> the
> results differently depending on how good the different results are
> (e.g., showing a visual indication of result quality, dropping the
> really bad results entirely, and segregating the good results from
> other
> only vaguely relevant results).  To build this kind of "intelligence"
> into the UI, I certainly need to know whether my top result matched
> all
> query terms, or only half of them.  I'd like to have the score tell
> me
> directly how good the matches are.  The current normalization does
> not
> achieve this.
> 
> The intent of a new normalization scheme is to preserve the current
> scoring in the following sense:  the ratio of the nth result's score
> to
> the best result's score remains the same.  Therefore, the only
> question
> is what normalization factor to apply to all scores.  This reduces to
> a
> very specific question that determines the entire normalization --
> what
> should the score of the best matching result be?
> 
> The mechanism below has this property, i.e. it keeps the current
> score
> ratios, except that I removed one idf term for reasons covered
> earlier
> (this better reflects the current empirically best scoring
> algorithms).
> However, removing an idf is a completely separate issue.  The
> improved
> normalization is independent of whether or not that change is made.
> 
> For the central question of what the top score should be, upon
> reflection, I don't like the definition below.  It defined the top
> score
> as (approximately) the percentage of query terms matched by the top
> scoring result.  A better conceptual definition is to use a weighted
> average based on the boosts.  I.e., downward propagate all boosts to
> the
> underlying terms (or phrases).  Secifically, the "net boost" of a
> term
> is the product of the direct boost of the term and all boosts applied
> to
> encompassing clauses.  Then the score of the top result becomes the
> sum
> of the net boosts of its matching terms divided by the sum of the net
> boosts of all query terms.
> 
> This definition is a refinement of the original proposal below, and
> it
> could probably be further refined if some impact of the tf, idf
> and/or
> lengthNorm was desired in determining the top score.  These other
> factors seems to be harder to normalize for, although I've thought of
> some simple approaches; e.g., assume the unmatched terms in the top
> result have values for these three factors that are the average of
> the
> values of the matched terms, then apply exactly the same concept of
> actual score divided by theorectical maximum score.  That would
> eliminate any need to maintain maximum value statistics in the index.
> 
> As an example of the simple boost-based normalization, for the query
>   ((a^2 b)^3 (c d^2))
> the net boosts are:
>   a --> 6
>   b --> 3
>   c --> 1
>   d --> 2
> 
> So if a and b matched, but not c and d, in the top scoring result,
> its
> score would be 0.75.  The normalizer would be 0.75/(current score
> except
> for the current normalization).  This normalizer would be applied to
> all
> current scores (minus normalization) to create the normalized scores.
> 
> For simple query (a b), if only one of the terms matched in the top
> result, then its score would be 0.5, vs. 1.0 or many other possible
> scores today.
> 
> In addition to enabling more "intelligent" UI's that communicate the
> quality of results to end-users, the proposal below also extends the
> explain() mechanism to fully explain the final normalized score.
> However, that change is also independent -- it could be done with the
> current scoring.
> 
> Am I the only one who would like to see better normalization in
> Lucene?
> Does anybody have a better approach?
> 
> If you've read this far, thanks for indulging me on this.  I would
> love
> to see this or something with similar properties in Lucene.  I really
> just want to build my app, but as stated below would write and
> contribute this if there is interest in putting it in, and nobody
> else
> wants to write it.  Please l

Re: Propose Bernhard as committer

2004-10-19 Thread Otis Gospodnetic

+1

Otis

--- Christoph Goller <[EMAIL PROTECTED]> wrote:

> I would like to propose Bernhard as Lucene committer.
> 
> He has contributed a number of valuable and high quality patches
> and I am simply tired of checking and committing all his work :-)
> 
> Christoph
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: QueryParser and backwards-compatibility

2004-10-11 Thread Otis Gospodnetic

--- Christoph Goller <[EMAIL PROTECTED]> wrote:

> >> Since 1.4.2 is already out, we would have to make a version 1.4.3.
> >
> 
> OK, one more vote needed :-)
> Maybe we should wait what Doug says.

I agree with what's been said (keep 4 & 5, put back the old methods and
deprecate them, don't worry about toString()) and I'm for 1.4.3.

Otis

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: sandbox -> core ?

2004-10-08 Thread Otis Gospodnetic


> In general, I'm a proponent of bundling as much as possible into a 
> single CVS tree and build procedure, since it makes it much easier to
> 
> keep things synchronized.  If folks feel the jar is too big, then we
> can 
> always build these into a separate jar.  I'd also vote to put
> analyzers 
> in the same CVS tree and under the top-level build.xml, for the same 
> reason.  If we like, we could put them each in subdirectories of 
> src/analyzers, and have each built as a separate jar.  Thoughts?

I like this idea.  I don't care so much about 1 or more CVS
repositories, as much as separate Jars, so if we can make
analyzers-1.4.2.jar and highlighter-1.4.2.jar along lucene-1.4.2.jar,
that would be ideal, in my opinion.

> The sandbox should be for experimental stuff.  Stuff that's proven 
> widely useful should go into the main tree and get released along
> with 
> every Lucene release.

True.

Otis

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: sandbox -> core ?

2004-10-08 Thread Otis Gospodnetic

Honestly, I still think that Highlighter and such belong to the
Sandbox.
I think we can make Sandbox components very easy to use if we just
provide a Jar of each component that goes along with each Lucene
version.

MoreLikeThis sounds like it could go into the core.

Otis


--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> I propose adding highlighting and more-like-this code to the standard
> 
> Lucene jar.
> 
> Highlighting is currently in the Sandbox, so folks can find it, but
> they 
> have to compile it, generate javadocs themselves, and ensure that it 
> works with their version of Lucene.  If it's in Lucene's core then it
> 
> will be versioned with Lucene releases.
> 
> More-like-this lives only in the mail archives:
> 
>
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1380971
> 
> This should at least be added to the Sandbox, and probably to the
> core.
> 
> If these were added to the core then the demo code could easily 
> incorporate them.  A demo that made better snippets would be nice.
> 
> What do folks think?
> 
> Doug
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene.net no more?

2004-10-04 Thread Otis Gospodnetic

Excellent!
I will add a reference to dotLucene in the Ports chapter of Lucene in
Action.

I did not forget about the possibility of making Lucene (and all its
ports) one unified Apache TLP (Top Level Project), I just want to
finish Lucene in Action first.

Otis



--- Yussef Alkhamrichi <[EMAIL PROTECTED]> wrote:

> Hi Otis,
> 
> I had a feeling Pasha wouldn't be enthausiastic about continuing or 
> supporting a open version of Lucene. Some guys have gathered around 
> http://sourceforge.net/projects/dotlucene to continue the effort for
> the 
> .NET port.
> 
> I started of with the 1.3-rc3-001 version (the last zip we had) and
> added 
> some missing features and thought that I had completed the 1.3-final.
> It 
> shows like we are missing some sorting code from the java version.
> Because 
> we are new at maintaining this piece of code we still need to do some
> 
> checking about the completeness of the software (and not only using
> the 
> diffs of the java version to update the .NET port). But we will have
> a new 
> version soon (the '1.3 final' compiles and runs all unit tests
> succesfully, 
> files from the .jj files have been generated from the original .jj
> files to 
> C#).
> 
> As you see, some work has been done, some more needs to be done. How
> about 
> the idea of Doug to bring the lucene-related projects (as Lucene.Net)
> under 
> the Jakarta umbrella ? I liked that one.
> 
> Yussef
> 
> 
> >From: Otis Gospodnetic <[EMAIL PROTECTED]>
> >Reply-To: "Lucene Developers List" <[EMAIL PROTECTED]>
> >To: Lucene Developers List <[EMAIL PROTECTED]>
> >Subject: Re: lucene.net no more?
> >Date: Sun, 3 Oct 2004 09:13:16 -0700 (PDT)
> >
> >Hello,
> >
> >I've contacted Pasha (Lucene.Net developer) and while he said one
> could
> >continue Lucene.Net development (code was released under ASL), he
> >didn't show any interest in helping me locate the latest version of
> the
> >sources. :(
> >
> >I am now wondering if there is anyone on lucene-dev list interested
> in
> >taking over Lucene.Net development?  The latest sources are here:
> >http://lucenedotnet.wz.cz/.
> >
> >Otis
> >P.S.
> >Related blog entry from Aaron Johnson:
>
>http://cephas.net/blog/2004/09/21/conflicting_mindsets_of_c_vs_java_part_ii.html#000732
> >P.P.S.
> >If nobody here is interested, I'll take this to lucene-user - larger
> >audience.
> >
> >
> >--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> >
> > > Otis Gospodnetic wrote:
> > >
> > > > Hm, hm, hm, hm.
> > > > And we just copy-edited the Lucene Ports chapter, where we
> cover
> > > > Lucene.Net.  Now what? :)
> > > >
> > > > I wonder if this is related to Microsoft's purchase of Lookout,
> > > which
> > > > uses Lucene.Net.  I also wonder whether the licence allows
> this...
> > >
> > > ASL allows this, given that you put proper notices in the
> > > documentation
> > > / source code (if you distribute any source code). However, AFAIR
> the
> > >
> > > terms of service at SF.net don't allow such removal, and the
> staff at
> > >
> > > SF.net should be notified and take some action...
> > >
> > > Reaching an amicable conclusion to this would be preferrable, of
> > > course,
> > > but there is also a procedure to overtake a project at SF.net, it
> > > takes
> > > ca. 3 weeks to complete, perhaps shorter if you can prove
> violation
> > > of
> > > terms of service. But I think it's easier to restore the project
> > > under
> > > another name, based on the latest published sources...
> > >
> > > --
> > > Best regards,
> > > Andrzej Bialecki
> > >
> > > -
> > > Software Architect, System Integration Specialist
> > > CEN/ISSS EC Workshop, ECIMF project chair
> > > EU FP6 E-Commerce Expert/Evaluator
> > > -
> > > FreeBSD developer (http://www.freebsd.org)
> > >
> > >
> > >
> -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > >
> > >
> >
> >
>
>-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> _
> MSN Search, for accurate results! http://search.msn.nl
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene JAR for Maven Repo

2004-10-03 Thread Otis Gospodnetic

Here is the email I mentioned earlier on lucene-dev.

--- Brian McCallister <[EMAIL PROTECTED]> wrote:

> To: [EMAIL PROTECTED]
> From: Brian McCallister <[EMAIL PROTECTED]>
> Subject: Maven Repo
> Date: Thu, 26 Aug 2004 19:59:50 -0400
> 
> Hi all,
> 
> Thank you for the amazing work on lucene. That said, any chance you 
> could push lucene-1.4.1.jar onto the ibiblio maven repository? I'm 
> happy to do so myself if you prefer (is just copying it to 
> /www/www.apache.org/dist/java-repository/lucene/jars/ ) but figured
> I'd 
> ask before just copying the jar over =)
> 
> Thank you again!
> 
> -Brian



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 1.4.2?

2004-10-03 Thread Otis Gospodnetic

One person also sent an email about Lucene JAR files not being in some
directory on one of the ASF servers.  This is preventing Maven users
from getting the newest Lucene JARs.  I still have the email and I'll
forward it to lucene-dev so that whomever pushes the release to mirrors
can take care of this issue, too.

Otis

--- Daniel Naber <[EMAIL PROTECTED]> wrote:

> On Friday 01 October 2004 23:57, Doug Cutting wrote:
> 
> > It is not mirrored yet.  Erik's the only one who has ever done
> that.
> > Erik, do you have time to mirror 1.4.2?  Thanks.
> 
> BTW, the release on the "official" download pages is still 1.4-final:
> http://jakarta.apache.org/site/sourceindex.cgi
> http://jakarta.apache.org/site/binindex.cgi
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene.net no more?

2004-10-03 Thread Otis Gospodnetic

Hello,

I've contacted Pasha (Lucene.Net developer) and while he said one could
continue Lucene.Net development (code was released under ASL), he
didn't show any interest in helping me locate the latest version of the
sources. :(

I am now wondering if there is anyone on lucene-dev list interested in
taking over Lucene.Net development?  The latest sources are here:
http://lucenedotnet.wz.cz/.

Otis
P.S.
Related blog entry from Aaron Johnson:
http://cephas.net/blog/2004/09/21/conflicting_mindsets_of_c_vs_java_part_ii.html#000732
P.P.S.
If nobody here is interested, I'll take this to lucene-user - larger
audience.


--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> Otis Gospodnetic wrote:
> 
> > Hm, hm, hm, hm.
> > And we just copy-edited the Lucene Ports chapter, where we cover
> > Lucene.Net.  Now what? :)
> > 
> > I wonder if this is related to Microsoft's purchase of Lookout,
> which
> > uses Lucene.Net.  I also wonder whether the licence allows this...
> 
> ASL allows this, given that you put proper notices in the
> documentation 
> / source code (if you distribute any source code). However, AFAIR the
> 
> terms of service at SF.net don't allow such removal, and the staff at
> 
> SF.net should be notified and take some action...
> 
> Reaching an amicable conclusion to this would be preferrable, of
> course, 
> but there is also a procedure to overtake a project at SF.net, it
> takes 
> ca. 3 weeks to complete, perhaps shorter if you can prove violation
> of 
> terms of service. But I think it's easier to restore the project
> under 
> another name, based on the latest published sources...
> 
> -- 
> Best regards,
> Andrzej Bialecki
> 
> -
> Software Architect, System Integration Specialist
> CEN/ISSS EC Workshop, ECIMF project chair
> EU FP6 E-Commerce Expert/Evaluator
> -
> FreeBSD developer (http://www.freebsd.org)
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 1.4.2?

2004-09-20 Thread Otis Gospodnetic

I am still using XEmacs and need to give Eclipse another try.  The last
time I tried Eclipse, some 6-12 months ago, I remember seeing Eclipse
forcing garbage collection somehow.  Thus, I would guess the
command-line behaviour is probably correct and there is indeed a bug in
Lucene's sorting.

Otis


--- Daniel Naber <[EMAIL PROTECTED]> wrote:

> On Sunday 19 September 2004 21:13, Otis Gospodnetic wrote:
> 
> > It would be good to take care of that memory leak issue that comes
> up
> > when people use sorting.  Dave Spencer found one Comparator or Map
> or
> > something that looked suspicious.
> 
> Yes, the "Comparator" WeakHashMap in FieldSortedHitQueue always grows
> and 
> never deletes entries. I can reproduce that (i.e. the
> OutOfMemoryError), but 
> only on the command line, not inside Eclipse. It looks like whether
> the class 
> works without OutOfMemory depends on some implementation detail of
> the 
> garbage collection.
> 
> Regards
>  Daniel
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: parser & bugfixing

2004-09-19 Thread Otis Gospodnetic

All work is done  on.jj files.  Once .java files are generated from .jj
files, we just put them both (.jj and .java) in CVS without modifying
.java files manually.

Otis

--- Yussef Alkhamrichi <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> I've a short question about the policy how the parsers of Lucene are 
> maintained. I've just send a version into the world with the latest
> (public) 
> Lucene.NET port (1.3 final) were I have rebuild the 3 parsers
> (QueryParser, 
> StandardTokenizer and HTMLParser) from scratch using javaccCS and
> ported .jj 
> files.
> 
> Do you guys do any bugfixing in the generated parser files ? (So I
> know 
> wether or not it is enough to only port the .jj files and generate
> the 
> needed classes). (Probably a stupid question, but I just wanted to
> know to 
> get the .NET port as stable as possible).
> 
> Thanks,
> 
> Yussef
> 
> _
> MSN Search, for accurate results! http://search.msn.nl
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 1.4.2?

2004-09-19 Thread Otis Gospodnetic

It would be good to take care of that memory leak issue that comes up
when people use sorting.  Dave Spencer found one Comparator or Map or
something that looked suspicious.
Then I'm +1 for 1.4.2.

Otis

--- Daniel Naber <[EMAIL PROTECTED]> wrote:

> On Saturday 18 September 2004 20:21, [EMAIL PROTECTED] wrote:
> 
> >   order was undefined in case of duplicate sort keys, this could
> lead to
> > incorrect results (documents appearing twice in the result set,
> other
> > documents missing) if there were more than 100 matches. PR:31241
> 
> This bug could actually lead to incorrect results. So what about
> releasing 
> Lucene 1.4.2? Here is a list of fixes that could be part of that
> release:
> 
> -4. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
> prevented deletion of obsolete segments. (Christoph Goller)
> 
> -13. Fixed bug #31241: Sorting could lead to incorrect results
> (documents
> missing, others duplicated) if the sort keys were not unique and
> there
> were more than 100 matches. (Daniel Naber)
> 
> -There was a compile problem with StandardTokenizer.jj because of an 
> missing import.
> 
> Opinions?
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Dave Spencer

2004-09-18 Thread Otis Gospodnetic

Dave, I think we collected 4 +1 votes for you.  You need to sign and
fax the ASF CLA now.  When you do that, let me know, along with your
desired username (+ 1 alternative username) and your email address, and
then I'll go request an account for you.

Thanks,
Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene.net no more?

2004-09-17 Thread Otis Gospodnetic

I was thinking the same.  As a matter of fact, I'm talking to Pasha
(Lucene.Net) to see if he/they would be willing to give away the source
code.  Somebody else would have to look into licenses, see whether SF
has the right to give the project to somebody else, etc.

I'd like to help with this, but I am too busy to push this until
mid-November.

Otis

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> Yussef Alkhamrichi wrote:
> > The main target for the .NET porters is now to get a SourceForge 
> > workspace to continue the .NET port of Lucene. Mohammed is
> contacting 
> > the developers of NLucene and see if we can work something out
> together 
> > to use their space and name for the real open source .NET brother
> (or 
> > sister:) of the Java Lucene version.
> 
> Perhaps it is time to start bringing the various Lucene ports under
> the 
> Apache umbrella.  If we made Lucene a top-level Apache project rather
> 
> than a Jakarta sub-project, at lucene.apache.org rather than 
> jakarta.apache.org/lucene, then we could have sub-projects for each 
> implementation (Java, C++, C#, Perl, etc.).  The primary obstacle to 
> this is a lack of volunteers who are willing to do the legwork of 
> re-assembling the website, cvs, mailing lists, etc.  I am currently 
> overbooked and cannot take this on.
> 
> This would prevent this sort of thing in the future: an Apache 
> sub-project cannot be unilaterally closed.
> 
> Doug
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: mg4j - Managing Gigabyte for Java

2004-09-16 Thread Otis Gospodnetic

Hi Anson,

It's not quite correct to comparing MG4J and Lucene directly.  Lucene
is a toolkit whose primary goal is to let you create an index and
search it, while MG4J is really a library of Java classes that people
implementing an IR library (such as Lucene, for example) may find
useful.  You cannot create a searchable index with MG4J alone.

Otis

--- Anson Lau <[EMAIL PROTECTED]> wrote:

> Hi All,
> 
> Has anyone seen the project MG4J (Managing Gigabyte for Java)
> http://mg4j.dsi.unimi.it/ ?  Anybody knows enough about both Lucene
> and MG4J to comment on how the two compares?
> 
> Thanks,
> 
> Anson
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock handling and Lucene 1.9 / 2.0

2004-09-15 Thread Otis Gospodnetic

Pete,

You can use the same IndexReader by multiple threads.  It sounds like
you are not aware of this (I think this is a FAQ actually... on jGuru).
So, single IndexReader used by multiple threads == OK && does not
require commit.lock on ever 'read'.  Just one commit.lock when
IndexReader is opened, or when you use IndexReader to delete a
Document.

When you say 'library', you are really referring to a Lucene index,
correct?

Otis

--- Pete Lewis <[EMAIL PROTECTED]> wrote:

> Hi Doug
> 
> Weblogic creates a pool of objects for us which are re-initialised
> each time
> the constructor is called.  Its when we grab an IndexReader out of
> the pool
> that we have the creation of the cache, which is where the spin
> originates.
> 
> Been thinking about your suggestion of a Hashtable (based on library
> name)
> for the storage of IndexReaders, but then we'd get a bottleneck of
> access -
> having a single reader per library means only a single process can
> access
> the library (for reading) at once, and this would create a potential
> bottleneck across our servers.  Another way might be to create a pool
> of
> IndexReaders and allocate them on demand, ie 10 IndexReaders per
> library.
> This would allow for 10 synchronous searches with no commit lock
> spin, but
> would be a pain to code.
> 
> Probably would be quickest to create a system property that will
> enable us
> to turn on/off the commit lock around the FSDirectory cache creation,
> so
> we'd have them off when we get an IndexReader for just a read but
> have the
> locks on at other times - don't want to disable all locks as our
> libraries
> are dynamic and not static.
> 
> Sorry the constructive criticism was off-the-wall but it had made my
> head
> hurt getting to the bottom of where our waits on spin locks had come
> from
> ;-)
> 
> Cheers
> 
> Pete Lewis
> 
> - Original Message - 
> From: "Doug Cutting" <[EMAIL PROTECTED]>
> To: "Lucene Developers List" <[EMAIL PROTECTED]>
> Sent: Tuesday, September 14, 2004 10:12 PM
> Subject: Re: Lock handling and Lucene 1.9 / 2.0
> 
> 
> > Pete Lewis wrote:
> > > The only way to continually use the same IndexReader would be to
> use a
> > > stateful session bean (frowned upon by J2EE Container writers)
> >
> > Can one implement DB connection pooling in J2EE?  This is
> analogous.
> > One may keep a pool of IndexReaders that are reused by subsequent
> > queries.  One difference is that the cache need only contain a
> single
> > IndexReader per index, rather than a DB connection pool, which
> typically
> > keeps multiple connections per DB.  Also, at checkout, the cache
> code
> > should check whether a newer version of the index is available,
> and, if
> > it is, update the cache.
> >
> > If there are lots of different indexes, more than you'd like to
> keep
> > open at once, then an LRU cache might work well, implemented e.g.
> with
> > LinkedHashMap.  Such a cache might be a useful contribution to
> Lucene.
> >
> > > I thought that it might be a good candidate for Lucene 2 as the
> FSDirectory
> > > code is horrible and non-J2EE compliant.
> >
> > Your constructive criticism is greatly appreciated!
> >
> > Have a nice day,
> >
> > Doug
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: lucene.net no more?

2004-09-15 Thread Otis Gospodnetic

I/we shouldn't jump to any conclusions yet.  I made a mistake of
putting my thoughts in writing.  Let's give Pasha some time to update
us all.

Otis

--- Yussef Alkhamrichi <[EMAIL PROTECTED]> wrote:

> Hi Otis,
> 
> Sorry for interrupting the process of the book you are putting
> together :S 
> But I just feel something needs to be done to make sure that the .NET
> port 
> of Lucene is as open an available for all of us as is Lucene itself.
> 
> Hmm, maybe the Lucene.NET guys are counting on the same action from 
> Microsoft as what happened to Lookout ;) The acquisition is a great
> thing 
> for the Lookout guys, but I don't think MS can bypass the Apache
> License of 
> Lucene (I guess). If you can't beat them, ...
> 
> We hope to have things cleared out as fast as possible on the .NET
> port.
> 
> Yussef
> 
> 
> >From: Otis Gospodnetic <[EMAIL PROTECTED]>
> >Reply-To: "Lucene Developers List" <[EMAIL PROTECTED]>
> >To: Lucene Developers List <[EMAIL PROTECTED]>
> >Subject: RE: lucene.net no more?
> >Date: Wed, 15 Sep 2004 02:51:27 -0700 (PDT)
> >
> >Hm, hm, hm, hm.
> >And we just copy-edited the Lucene Ports chapter, where we cover
> >Lucene.Net.  Now what? :)
> >
> >I wonder if this is related to Microsoft's purchase of Lookout,
> which
> >uses Lucene.Net.  I also wonder whether the licence allows this...
> >
> >Otis
> >
> >
> >--- Yussef Alkhamrichi <[EMAIL PROTECTED]> wrote:
> >
> > > Hi Doug,
> > >
> > > I've noticed the same odd thing, me and a couple of guys were
> heaving
> > > a
> > > discussion in de Open Discussion of Lucene.NET to re-establish a
> > > decent up
> > > to date .NET port of Lucene when suddenly all plugs were pulled
> out
> > > (no
> > > files released any more, no open discussion, etc.)
> > >
> > > Last night I've send a mail to all people involved (4 in total,
> > > George
> > > -http://www.aroush.net-, Mohammad
> -http://mohammad.abdulfatah.net/-,
> > > Mansur
> > > and me). The idea is to get a Lucene port in sync with the Java
> > > version and
> > > get a decent community support behind (not restricting the people
> > > that can
> > > co-develop to a mere 1 or 2 in-crowd). An open source product is
> only
> > > as
> > > good as its community I guess.
> > >
> > > I find it a strange move from de Lucene.NET guys
> > > (http://www.lucenedotnet.com), ofcourse they have put in a lot of
> > > effort to
> > > port the thing and it's quite possible to earn money with it. But
> > > were did
> > > they get the code from in the first place ?!?! (Under the Apache
> > > License).
> > >
> > > The main target for the .NET porters is now to get a SourceForge
> > > workspace
> > > to continue the .NET port of Lucene. Mohammed is contacting the
> > > developers
> > > of NLucene and see if we can work something out together to use
> their
> > > space
> > > and name for the real open source .NET brother (or sister:) of
> the
> > > Java
> > > Lucene version.
> > >
> > > We hope we can tell you more shortly. Keep up the great work you
> all
> > > have
> > > done with Lucene so far!
> > >
> > > Yussef
> > >
> > > >From: Doug Cutting <[EMAIL PROTECTED]>
> > > >Reply-To: "Lucene Developers List"
> <[EMAIL PROTECTED]>
> > > >To: Lucene Developers List <[EMAIL PROTECTED]>
> > > >Subject: lucene.net no more?
> > > >Date: Tue, 14 Sep 2004 21:21:29 -0700
> > > >
> > > >Does anyone know more about this?
> > > >
> > >
> >
>
>http://mohammad.abdulfatah.net/mohammad/archives/2004/09/the_mysterious.php
> > > >
> > > >Doug
> > > >
> > >
> >
>
>-
> > > >To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > > >For additional commands, e-mail:
> [EMAIL PROTECTED]
> > > >
> > >
> > > _
> > > Hotmail en Messenger on the move
> > > http://www.msn.nl/communicatie/smsdiensten/hotmailsmsv2/
> > >
> > >
> > >
> -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > >
> > >
> >
> >
>
>-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> _
> Talk with your online friends with MSN Messenger
> http://messenger.msn.nl/
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: lucene.net no more?

2004-09-15 Thread Otis Gospodnetic

Hm, hm, hm, hm.
And we just copy-edited the Lucene Ports chapter, where we cover
Lucene.Net.  Now what? :)

I wonder if this is related to Microsoft's purchase of Lookout, which
uses Lucene.Net.  I also wonder whether the licence allows this...

Otis


--- Yussef Alkhamrichi <[EMAIL PROTECTED]> wrote:

> Hi Doug,
> 
> I've noticed the same odd thing, me and a couple of guys were heaving
> a 
> discussion in de Open Discussion of Lucene.NET to re-establish a
> decent up 
> to date .NET port of Lucene when suddenly all plugs were pulled out
> (no 
> files released any more, no open discussion, etc.)
> 
> Last night I've send a mail to all people involved (4 in total,
> George 
> -http://www.aroush.net-, Mohammad -http://mohammad.abdulfatah.net/-,
> Mansur 
> and me). The idea is to get a Lucene port in sync with the Java
> version and 
> get a decent community support behind (not restricting the people
> that can 
> co-develop to a mere 1 or 2 in-crowd). An open source product is only
> as 
> good as its community I guess.
> 
> I find it a strange move from de Lucene.NET guys 
> (http://www.lucenedotnet.com), ofcourse they have put in a lot of
> effort to 
> port the thing and it's quite possible to earn money with it. But
> were did 
> they get the code from in the first place ?!?! (Under the Apache
> License).
> 
> The main target for the .NET porters is now to get a SourceForge
> workspace 
> to continue the .NET port of Lucene. Mohammed is contacting the
> developers 
> of NLucene and see if we can work something out together to use their
> space 
> and name for the real open source .NET brother (or sister:) of the
> Java 
> Lucene version.
> 
> We hope we can tell you more shortly. Keep up the great work you all
> have 
> done with Lucene so far!
> 
> Yussef
> 
> >From: Doug Cutting <[EMAIL PROTECTED]>
> >Reply-To: "Lucene Developers List" <[EMAIL PROTECTED]>
> >To: Lucene Developers List <[EMAIL PROTECTED]>
> >Subject: lucene.net no more?
> >Date: Tue, 14 Sep 2004 21:21:29 -0700
> >
> >Does anyone know more about this?
> >
>
>http://mohammad.abdulfatah.net/mohammad/archives/2004/09/the_mysterious.php
> >
> >Doug
> >
>
>-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> _
> Hotmail en Messenger on the move 
> http://www.msn.nl/communicatie/smsdiensten/hotmailsmsv2/
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock handling and Lucene 1.9 / 2.0

2004-09-14 Thread Otis Gospodnetic

Hello Pete,

--- Pete Lewis <[EMAIL PROTECTED]> wrote:

> Hi Otis
> 
> The only way to continually use the same IndexReader would be to use
> a
> stateful session bean (frowned upon by J2EE Container writers), and
> then
> we'd see the same problem cross-machine - remember that the indexes
> that we
> use are stored once, not replicated across the four servers.
> 
> We'll come up with a fix ourselves.

Excellent.  Please contribute it back, if you can.

> I thought that it might be a good candidate for Lucene 2 as the
> FSDirectory code is horrible and non-J2EE compliant.

Patches are welcome!  As you saw on the Wiki, we (well, the German part
of the development team, really) are making bigger changes than usual,
including a deprecation of some frequently used pieces of the API. 
Including a code improvement for version 1.9/2.0 shouldn't be a problem
then.

> Is there any place for making suggestions for future development
> tasks?
> I've seen the Lucene 2 whiteboard and wondered how the items on the
> list
> were derived.

Discussion on lucene-dev => agreement => conclusion => Wiki.  I think
you need an account for Wiki now (pain, but thank spammers for that).

Otis

> Pete Lewis
> 
> 
> - Original Message - 
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Developers List" <[EMAIL PROTECTED]>
> Sent: Tuesday, September 14, 2004 12:11 PM
> Subject: Re: Lock handling and Lucene 1.9 / 2.0
> 
> 
> > It sounds like this is the source of your problem.  Reuse those
> > IndexReaders or IndexSearchers and you'll avoid the waits you are
> > talking about, as Christoph already pointed out.  How you implement
> the
> > logic for caching IndexReaders/Searchers is up to you.
> >
> > Otis
> > P.S.
> > It sounds like you took the time to figure out how some of the
> Lucene
> > internals work.  The best way to persuade -dev list subscribers
> that
> > doing something one way is better than doing it the current way is
> by
> > providing a clean diff against CVS HEAD, attached to an entry in
> > Bugzilla.  There are now several active Lucene developers and
> > contributors who will look at your suggestion and apply the patch,
> if
> > it improves the code.
> >
> >
> >
> > --- Pete Lewis <[EMAIL PROTECTED]> wrote:
> >
> > > Hi Christoph
> > >
> > > We are in a cluster running under Bea Weblogic.
> > >
> > > We have a static API to the search component from the portlets.
> > >
> > > We therefore need to open an indexreader per request.
> > >
> > > Cheers
> > > Pete
> > >
> > > - Original Message - 
> > > From: "Christoph Goller" <[EMAIL PROTECTED]>
> > > To: "Lucene Developers List" <[EMAIL PROTECTED]>
> > > Sent: Tuesday, September 14, 2004 10:55 AM
> > > Subject: Re: Lock handling and Lucene 1.9 / 2.0
> > >
> > >
> > > > Pete Lewis wrote:
> > > > > Hi Christoph
> > > > >
> > > > > If we stand back a second and ask why we have commit locks
> when
> > > searching?
> > > > >
> > > > > The answer comes down to handling the FSDirectory - where the
> > > methods
> > > used
> > > > > are not j2ee compliant.
> > > > >
> > > > > We could open another can of worms and say why does the
> > > indexreader
> > > delete -
> > > > > but I won't go into that argument again here.
> > > > >
> > > > > The bottom line is that we need the ability to search without
> > > waiting on
> > > a
> > > > > commit lock.
> > > >
> > > > You have this ability already !
> > > >
> > > > The FSDirectory is where the problems lie.  We could hack the
> > > > > code to make it work for our particular application; however
> what
> > > I've
> > > been
> > > > > trying to get across is the need to have a method that will
> give
> > > users
> > > the
> > > > > capability to just search (not delete) without waiting upon
> the
> > > commit
> > > lock,
> > > > > that will be j2ee compliant, and that will be appropriate
> > > clustered
> > > > > implementations - and that this should be a candidate for
> Lucene
> > > 1.9 /
> > > 2.0.
> > > > >

[VOTE] David Spencer as Lucene Sandbox committer

2004-09-14 Thread Otis Gospodnetic

And +1 from me.  Dave has submitted several nice pieces of code over
the years.

Otis

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> David Spencer wrote:
> > If people want to vote me in as a committer to the sandbox then I
> can 
> > check this code in.
> 
> +1
> 
> Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock handling and Lucene 1.9 / 2.0

2004-09-14 Thread Otis Gospodnetic

It sounds like this is the source of your problem.  Reuse those
IndexReaders or IndexSearchers and you'll avoid the waits you are
talking about, as Christoph already pointed out.  How you implement the
logic for caching IndexReaders/Searchers is up to you.

Otis
P.S.
It sounds like you took the time to figure out how some of the Lucene
internals work.  The best way to persuade -dev list subscribers that
doing something one way is better than doing it the current way is by
providing a clean diff against CVS HEAD, attached to an entry in
Bugzilla.  There are now several active Lucene developers and
contributors who will look at your suggestion and apply the patch, if
it improves the code.



--- Pete Lewis <[EMAIL PROTECTED]> wrote:

> Hi Christoph
> 
> We are in a cluster running under Bea Weblogic.
> 
> We have a static API to the search component from the portlets.
> 
> We therefore need to open an indexreader per request.
> 
> Cheers
> Pete
> 
> - Original Message - 
> From: "Christoph Goller" <[EMAIL PROTECTED]>
> To: "Lucene Developers List" <[EMAIL PROTECTED]>
> Sent: Tuesday, September 14, 2004 10:55 AM
> Subject: Re: Lock handling and Lucene 1.9 / 2.0
> 
> 
> > Pete Lewis wrote:
> > > Hi Christoph
> > >
> > > If we stand back a second and ask why we have commit locks when
> searching?
> > >
> > > The answer comes down to handling the FSDirectory - where the
> methods
> used
> > > are not j2ee compliant.
> > >
> > > We could open another can of worms and say why does the
> indexreader
> delete -
> > > but I won't go into that argument again here.
> > >
> > > The bottom line is that we need the ability to search without
> waiting on
> a
> > > commit lock.
> >
> > You have this ability already !
> >
> > The FSDirectory is where the problems lie.  We could hack the
> > > code to make it work for our particular application; however what
> I've
> been
> > > trying to get across is the need to have a method that will give
> users
> the
> > > capability to just search (not delete) without waiting upon the
> commit
> lock,
> > > that will be j2ee compliant, and that will be appropriate
> clustered
> > > implementations - and that this should be a candidate for Lucene
> 1.9 /
> 2.0.
> > >
> > > You say that it shouldn't take long to wait.  A 1 sec spin lock
> per
> index
> > > per process is an eternity when trying to scale for potentially
> thousands of
> > > users.
> >
> > I have to admit that I am not an expert in j2ee compliancy. But I
> would
> like
> > to learn about it. If a database (I consder Lucene as a database)
> really
> has
> > to be initialized for every read-access, than there is a problem
> with j2ee
> > compliancy. I cannot believe that this is really true.
> >
> > LET ME STATE AGAIN: You should not open a new IndexReader for every
> > search/query. If you do so you definitely have a performance
> problem
> > independently from synchronization! Opening an IndexReader
> is
> > much more expensive than any individual query/search.
> >
> > You need a commit.lock for opening an IndexReader not because
> IndexReader
> > has delete functionality. You need it because if there is some
> process
> > writing to your index, your index may be in an inconsistent state.
> An
> existing
> > commit.lock indicates such an inconsistent state. Therefore, every
> writer
> needs
> > a commit.lock while committing, and every reader needs a
> commit.lock while
> > opening an index.
> >
> > Christoph
> >
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: where's 1.9 / 2.0

2004-09-13 Thread Otis Gospodnetic

No, people are really referring to a bit longer term plans (read: end
of 2004 or early 2005 or so).  Ideas and plans for that version are on
the Wiki.

Otis

--- Vic <[EMAIL PROTECTED]> wrote:

> Is 1.9/2.0 the current nighlty build?
> .V
> 
> -- 
> Please post on Rich Internet Applications User Interface (RiA/SoA)
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Binary fields and data compression

2004-08-30 Thread Otis Gospodnetic


--- Robert Engels <[EMAIL PROTECTED]> wrote:

..

> ... thus my request that any compression support be optional.

I think this goes without say.  Say say say...

Otis


> -Original Message-
> From: David Spencer [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 30, 2004 5:33 PM
> To: Lucene Developers List
> Subject: Re: Binary fields and data compression
> 
> 
> Robert Engels wrote:
> 
> > The data size savings is almost certainly not worth the probable
> 20-40%
> > increase in CPU usage in most cases no?
> >
> > I think it should be optional for those who have extremely large
> indices
> and
> > want to save some space (seems not necessary these days), and those
> who
> want
> > to maximize performance.
> 
> You don't know until you benchmark it, but I thought that the
> heuristic
> nowadays was that CPUs are fast and disk i/o is slow ( and yes, disk
> space is 'infinite' :) ) - so therefore I would guess that in spite
> of
> the CPU cost of compression, you'd save time due to less disk i/o.
> 
> 
> >
> >
> > -Original Message-
> > From: Bernhard Messer [mailto:[EMAIL PROTECTED]
> > Sent: Monday, August 30, 2004 4:41 PM
> > To: [EMAIL PROTECTED]
> > Subject: Binary fields and data compression
> >
> >
> > hi developers,
> >
> > a few month ago, there was a very interesting discussion about
> field
> > compression and the possibility to store binary field values within
> a
> > lucene document. Regarding to this topic, Drew Farris came up with
> a
> > patch to add the necessary functionality. I ran all the necessary
> tests
> > on his implementation and didn't find one problem. So the original
> > implementation from Drew could now be enhanced to compress the
> binary
> > field data (maybe even the text fields if they are stored only)
> before
> > writing to disc. I made some simple statistical measurements using
> the
> > java.util.zip package for data compression. Enabling it, we could
> save
> > about 40% data when compressing plain text files with a size from
> 1KB to
> > 4KB. If there is still some interest, we could first try to update
> the
> > patch, because it's outdated due to several changes within the
> Fields
> > class. After finishing that, compression could be added to the
> updated
> > version of the patch.
> >
> > sounds good to me, what do you think ?
> >
> > best regards
> > Bernhard
> >
> >
> >
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Binary fields and data compression

2004-08-30 Thread Otis Gospodnetic

Bernhard,

Sounds good to me.
I would, however, also be interested in the performance impact of
text-field compression.  While adapting Drew's patch, it may be nice to
make the compression mechanism pluggable.

Otis

--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> hi developers,
> 
> a few month ago, there was a very interesting discussion about field 
> compression and the possibility to store binary field values within a
> 
> lucene document. Regarding to this topic, Drew Farris came up with a 
> patch to add the necessary functionality. I ran all the necessary
> tests 
> on his implementation and didn't find one problem. So the original 
> implementation from Drew could now be enhanced to compress the binary
> 
> field data (maybe even the text fields if they are stored only)
> before 
> writing to disc. I made some simple statistical measurements using
> the 
> java.util.zip package for data compression. Enabling it, we could
> save 
> about 40% data when compressing plain text files with a size from 1KB
> to 
> 4KB. If there is still some interest, we could first try to update
> the 
> patch, because it's outdated due to several changes within the Fields
> 
> class. After finishing that, compression could be added to the
> updated 
> version of the patch.
> 
> sounds good to me, what do you think ?
> 
> best regards
> Bernhard
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TermPositionVector used/implemented?

2004-08-27 Thread Otis Gospodnetic

I think Grant Ingersoll is working on adding that functionality.  Grant
is the one who added the TV support to Lucene in the first place, after
Dmitry's initial work.

Otis

--- David Spencer <[EMAIL PROTECTED]> wrote:

> Is TermPositionVector used or implemented anywhere?
> I checked the Lucene source, sandbox source, and Nutch source and
> found 
> no uses or implementations of it. Is it just there to reserve the
> concept?
> 
>
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/TermPositionVector.html
> 
> Looks useful however as the implication is you can get back at the 
> tokenized document w/ the tokens in order (with a little sorting work
> I 
> guess).
> 
> Gosh, even Google found no refs on the mailing list for it...
> 
> thx,
>   Dave
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: highlighting phrases

2004-08-25 Thread Otis Gospodnetic

Guide, your modification sounds good.  If you can contribute it, that
would be great.

Otis

--- Guido Wegener <[EMAIL PROTECTED]> wrote:

> I am working on a modification to Lucene's highlighter. Currently all
> terms of
> a phrase query are highlighted, even if they appear out of phrase
> context:
> Searching for "Foo Bar" in "Foo Bar some stuff Foo" will result in
> "_Foo_ _Bar_ some stuff _Foo_". It would be nicer to have
> "_Foo_ _Bar_ some stuff Foo" as the result.
> 
> I already implemented this behaviour in an older version of the
> highlighter,
> where things were still simple. But now I see that there was a
> modification to
> deal with overlapping tokens. These make the whole matter much more
> complicated.
> But I guess that I will try to merge my old phrase highlighter code
> with the
> current version of Lucene.
> 
> Is anybody working on this kind of phrase highlighting?
> Would my modifications be of interest to you?
> 
> Best regards,
>   Guido Wegener
> 
> -- 
> Guido Wegener
> startext Unternehmensberatung GmbH
> Kennedyallee 2, D-53175 Bonn
> Tel: +49 (0)228 959 96-26, Fax: +49 (0)228 959 96-66
> Internet: http://www.startext.de, E-Mail: [EMAIL PROTECTED]
>  
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 2 new methods in MultiFieldQueryParser

2004-08-23 Thread Otis Gospodnetic

Hello Andraz (dobar dan),

I patched my local version of MultiFieldQueryParser a long time ago,
but never committed your patch.  I see the 2 new methods that you
added, but I'm having trouble coming up with a use case where one would
need to construct a query such as field1:query1 field2:query2 
fieldN:queryN
Hm, something like this:

sender:Andraz   subject:MultiFieldQueryParser   body:patch

?

I guess that can be handy for advanced search screens and such.  I'll
put the patched MultiFieldQueryParser in CVS now.  Thanks, and sorry
for the delay.

Otis

--- Andraz Skoric <[EMAIL PROTECTED]> wrote:

> Hi, I hope i did it right. Diff file is attached. If there's anything
> 
> wrong please let me know.
> 
> Thanks,
> Andraz
> 
> 
> 
> Otis Gospodnetic wrote:
> 
> >My email reader munges inlined text files like this one (line wraps,
> >etc.).
> >Would it be possible for you to:
> >
> >1) create a diff (cvs diff -u )
> >
> >2) attach it to a new entry in Bugzilla
> >   (http://issues.apache.org/bugzilla/enter_bug.cgi?product=Lucene)
> >OR
> >3) zip the cvs diff and attach it to email
> >
> >Thanks,
> >Otis
> >
> 

> ATTACHMENT part 2 application/x-tar
name=MultiFieldQueryParser1.4.diff.tar

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DO NOT REPLY [Bug 30736] - [PATCH] to remove synchronized code from TermVectorsReader

2004-08-19 Thread Otis Gospodnetic

OK.

Otis

--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> Otis,
> 
> sorry to say, the implementation i with provided with that patch
> would 
> work, but looking in detail on it, it's simply bullshit.
> 
> It's not just the IOException which is caught and not passed to the 
> caller. The current implementation would open InputStreams for each 
> thread which never get closed (thanks Christoph for the tip). Every 
> thread calling the ThreadLocal.get() method is creating a new 
> TermVectorsReader within the anonymous inner class. The opened 
> InputStreams in TermVectorsReader will never get closed again
> (correct 
> me please if I'm wrong). So the only way i see, is to make the 
> TermVectorReader class cloneable and put a clone of the original into
> 
> the ThreadLocal.
> 
> The implementation i have in mind, would look very similar to the one
> 
> Doug introduced in TermInfosReader, handling the SegmentTermEnum
> objects.
> 
> So please, simply forget that bad shot. I'll gonna try to correct it
> and 
> add the new files to the current patch in Bugzilla.
> 
> thx
> Bernhard
> 
> Otis Gospodnetic wrote:
> 
> >Ah, I see English.java.  I didn't check test/ directories.
> >
> >IOException - yes, let the caller deal with it.
> >
> >Please just attach new diffs to existing Bugzilla entry.  I'll
> ignore
> >the old ones.
> >
> >Thanks,
> >Otis
> >
> >
> >--- Bernhard Messer <[EMAIL PROTECTED]> wrote:
> >
> >  
> >
> >>Otis,
> >>
> >>the English class is in cvs, that's where i found it. It is also
> used
> >>by 
> >>other test classes like TestTermVectors e.g.
> >>
> >>The IOException was something where i wasn't sure how to process. I
> 
> >>think you're right, the best idea would be to pop it up to the
> >>caller. 
> >>Looking at the original code, the IOException wasn't caught in 
> >>TermVectors constructor.
> >>
> >>Sorry about the tabs, this are my settings in exlipse, inserting
> tabs
> >>
> >>instead of blanks.
> >>
> >>Shall i create a new patch send it to the list, or do i have to
> >>create a 
> >>new bugzilla issue for that. Is it possible to update attachments
> in 
> >>bugzilla ? Don't think so.
> >>
> >>regards
> >>Bernhard
> >>
> >>[EMAIL PROTECTED] wrote:
> >>
> >>
> >>
> >>>DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
> >>>RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
> >>><http://issues.apache.org/bugzilla/show_bug.cgi?id=30736>.
> >>>ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
> >>>INSERTED IN THE BUG DATABASE.
> >>>
> >>>http://issues.apache.org/bugzilla/show_bug.cgi?id=30736
> >>>
> >>>[PATCH] to remove synchronized code from TermVectorsReader
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>--- Additional Comments From [EMAIL PROTECTED]  2004-08-19 11:47
> >>>  
> >>>
> >>---
> >>
> >>
> >>>Bernhard,
> >>>
> >>>Thanks for the patch.  The unit test requires class
> >>>  
> >>>
> >>o.a.lucene.util.English. 
> >>
> >>
> >>>This is not in CVS.  Is this something that should be in the CVS? 
> >>>  
> >>>
> >>What is it?
> >>
> >>
> >>>I am also wondering about this piece of code:
> >>>
> >>>-  termVectorsReader = new TermVectorsReader(cfsDir, segment,
> >>>  
> >>>
> >>fieldInfos);
> >>
> >>
> >>>+   final Directory dir = cfsDir;
> >>>+   termVectorsLocal = new ThreadLocal() {
> >>>+   protected synchronized Object initialValue() {
> >>>+   try {
> >>>+   return new TermVectorsReader(dir,
> >>>  
> >>>
> >>segment,
> >>
> >>
> >>>fieldInfos);
> >>>+   } catch (IOException ioe) {
> >>>+   ioe.printStackTrace();
> >>>+   return null;
> >>>+   }
> >>>+   }
> >>>+   };
&g

Re: IndexReader and TermVectorsWriter cleanup

2004-08-19 Thread Otis Gospodnetic

Bernhard,

Since IndexReader was a public method, some people may be relying on
it.
Also, since one needs to pass Directory to IndexReader's
open(Directory) method, the caller already has a Directory reference
that they can mess with.
 
As for changing those public class variables to protected - do we
really gain much, or anything by making them protected, since they are
already static and final?

I feel like leaving these two classes as they are now.

Otis

--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> Hi developers,
> 
> in the attachments you will find to small cleanups for IndexReader
> and 
> TermVectorsWriter. In TermVectorsWriter, the visibility of some
> public 
> members are changed to protected.
> 
> In IndexReader, there is a public method "directory()", where classes
> 
> outside lucene can get the current directory object attached to this 
> reader. I think it would be better to make the method protected, so
> that 
> no classes outside this package can fetch the directory object from
> an 
> existing reader and maybe even close the directory which would
> definitly 
> damage the reader.
> 
> best regards
> Bernhard
> 
> 
> 
> > Index: src/java/org/apache/lucene/index/IndexReader.java
> ===
> RCS file:
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexReader.java,v
> retrieving revision 1.33
> diff -r1.33 IndexReader.java
> 131c131
> <   public Directory directory() { return directory; }
> ---
> >   protected Directory directory() { return directory; }
> > Index: src/java/org/apache/lucene/index/TermVectorsWriter.java
> ===
> RCS file:
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/TermVectorsWriter.java,v
> retrieving revision 1.1
> diff -r1.1 TermVectorsWriter.java
> 34c34
> <   public static final int FORMAT_VERSION = 1;
> ---
> >   protected static final int FORMAT_VERSION = 1;
> 36c36
> <   public static final int FORMAT_SIZE = 4;
> ---
> >   protected static final int FORMAT_SIZE = 4;
> 39,41c39,41
> <   public static final String TVX_EXTENSION = ".tvx";
> <   public static final String TVD_EXTENSION = ".tvd";
> <   public static final String TVF_EXTENSION = ".tvf";
> ---
> >   protected static final String TVX_EXTENSION = ".tvx";
> >   protected static final String TVD_EXTENSION = ".tvd";
> >   protected static final String TVF_EXTENSION = ".tvf";
> 
> >
-
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DO NOT REPLY [Bug 30736] - [PATCH] to remove synchronized code from TermVectorsReader

2004-08-19 Thread Otis Gospodnetic

Ah, I see English.java.  I didn't check test/ directories.

IOException - yes, let the caller deal with it.

Please just attach new diffs to existing Bugzilla entry.  I'll ignore
the old ones.

Thanks,
Otis


--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> Otis,
> 
> the English class is in cvs, that's where i found it. It is also used
> by 
> other test classes like TestTermVectors e.g.
> 
> The IOException was something where i wasn't sure how to process. I 
> think you're right, the best idea would be to pop it up to the
> caller. 
> Looking at the original code, the IOException wasn't caught in 
> TermVectors constructor.
> 
> Sorry about the tabs, this are my settings in exlipse, inserting tabs
> 
> instead of blanks.
> 
> Shall i create a new patch send it to the list, or do i have to
> create a 
> new bugzilla issue for that. Is it possible to update attachments in 
> bugzilla ? Don't think so.
> 
> regards
> Bernhard
> 
> [EMAIL PROTECTED] wrote:
> 
> >DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
> >RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
> >.
> >ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
> >INSERTED IN THE BUG DATABASE.
> >
> >http://issues.apache.org/bugzilla/show_bug.cgi?id=30736
> >
> >[PATCH] to remove synchronized code from TermVectorsReader
> >
> >
> >
> >
> >
> >--- Additional Comments From [EMAIL PROTECTED]  2004-08-19 11:47
> ---
> >Bernhard,
> >
> >Thanks for the patch.  The unit test requires class
> o.a.lucene.util.English. 
> >This is not in CVS.  Is this something that should be in the CVS? 
> What is it?
> >
> >I am also wondering about this piece of code:
> >
> >-  termVectorsReader = new TermVectorsReader(cfsDir, segment,
> fieldInfos);
> >+   final Directory dir = cfsDir;
> >+   termVectorsLocal = new ThreadLocal() {
> >+   protected synchronized Object initialValue() {
> >+   try {
> >+   return new TermVectorsReader(dir,
> segment,
> >fieldInfos);
> >+   } catch (IOException ioe) {
> >+   ioe.printStackTrace();
> >+   return null;
> >+   }
> >+   }
> >+   };
> >
> >Is is a good thing to 'eat' that IOException and quietly return
> null?  The
> >method where this code is, is already throwing IOException, so why
> not let the
> >IOException pop up?
> >
> >Finally, it looks like diffs contain tabs.  Could you please change
> tabs to 2
> >spaces?
> >
> >Thanks,
> >Otis
> >
>
>-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >  
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: optimize TermVectorsReader, remove synchronization from code

2004-08-17 Thread Otis Gospodnetic

I was going to consider applying this, but the included diff has
wrapped lines.  Is there a Bugzilla entry with this code attached?

Also, is there only SegmentReader.java diff here, or am I missing a
piece of diff?  I don't see any synchronized methods/blocks removed
from the code, so I'm confused about this optimization.

Otis

--- Bernhard Messer <[EMAIL PROTECTED]> wrote:

> Sorry, but there was a bug in the patch i provided several minutes
> ago. 
> A NullpointerException can occur in SegmentReader doClose method. The
> 
> change in the new diff file checks if the ThreadLocal object was
> created 
> and is not null before trying to get the TermVectorReader from it.
> 
> best regards
> Bernhard
> 
> > Index: SegmentReader.java
> ===
> RCS file:
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/SegmentReader.java,v
> retrieving revision 1.25
> diff -u -r1.25 SegmentReader.java
> --- SegmentReader.java11 Aug 2004 17:37:52 -  1.25
> +++ SegmentReader.java15 Aug 2004 15:07:47 -
> @@ -42,8 +42,7 @@
>private FieldsReader fieldsReader;
>  
>TermInfosReader tis;
> -  TermVectorsReader termVectorsReader;
> -
> +  
>BitVector deletedDocs = null;
>private boolean deletedDocsDirty = false;
>private boolean normsDirty = false;
> @@ -51,6 +50,8 @@
>  
>InputStream freqStream;
>InputStream proxStream;
> +  
> +  private ThreadLocal termVectorsLocal = null;
>  
>// Compound File Reader when based on a compound file segment
>CompoundFileReader cfsReader = null;
> @@ -128,7 +129,17 @@
>  openNorms(cfsDir);
>  
>  if (fieldInfos.hasVectors()) { // open term vector files only as
> needed
> -  termVectorsReader = new TermVectorsReader(cfsDir, segment,
> fieldInfos);
> + final Directory dir = cfsDir;
> + termVectorsLocal = new ThreadLocal() {
> + protected synchronized Object initialValue() {
> + try {
> + return new TermVectorsReader(dir, segment, fieldInfos);
> + } catch (IOException ioe) {
> + ioe.printStackTrace();
> + return null;
> + }
> + }
> + };
>  }
>}
>  
> @@ -164,8 +175,13 @@
>proxStream.close();
>  
>  closeNorms();
> -if (termVectorsReader != null) termVectorsReader.close();
> -
> +if (termVectorsLocal != null) {
> + TermVectorsReader termVectorsReader =
> (TermVectorsReader)termVectorsLocal.get();
> + if (termVectorsReader != null) {
> + termVectorsReader.close();
> + }
> +}
> +
>  if (cfsReader != null)
>cfsReader.close();
>}
> @@ -408,6 +424,15 @@
>  FieldInfo fi = fieldInfos.fieldInfo(field);
>  if (fi == null || !fi.storeTermVector) return null;
>  
> +if (termVectorsLocal == null) {
> + return null;
> +}
> +
> +TermVectorsReader termVectorsReader =
> (TermVectorsReader)termVectorsLocal.get();
> + if (termVectorsReader == null) {
> + return null;
> + }
> +
>  return termVectorsReader.get(docNumber, field);
>}
>  
> @@ -419,9 +444,14 @@
> *  If no such fields existed, the method returns null.
> */
>public TermFreqVector[] getTermFreqVectors(int docNumber) {
> -if (termVectorsReader == null)
> -  return null;
> -
> + if (termVectorsLocal == null) {
> + return null;
> +}
> + 
> + TermVectorsReader termVectorsReader =
> (TermVectorsReader)termVectorsLocal.get();
> + if (termVectorsReader == null) {
> + return null;
> + }
>  return termVectorsReader.get(docNumber);
>}
>  }
> 
> >
-
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: procedure for contributing to the sandbox

2004-08-16 Thread Otis Gospodnetic

Hello Michael,

Once you have the code, it's best to put it in a ZIP file or some such,
create a Bugzilla entry with [PATCH]  in the summary, and
then upload your contribution.  Send an email to lucene-dev with your
proposal, description of new code/functionality, etc.

Ideally the contribution will contain unit tests and the Ant build
script that could/should be written like other build.xml files in the
Sandbox, so that the contribution can be eaisly compiled and Jarred
like other Sandbox contributions.

Otis

--- "Crump, Michael" <[EMAIL PROTECTED]> wrote:

> Hello,
> 
>  
> 
> I was wondering what/where I can find the procedures for contributing
> code to the sandbox.  Can someone point me in the right direction
> please?  I have "spoken" with Kevin Burton about his external storage
> proposal and he has indicated to me that if I will do the work of
> getting it submitted to the sandbox he will let me go ahead and do
> that
> since he is currently too busy with other projects.
> 
>  
> 
> Thank you,
> 
>  
> 
> Michael
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: moving the analyzers into sandbox

2004-08-15 Thread Otis Gospodnetic

+1 for moving analyzers to the sandbox.  We've talked about this
before, and I believe that is what we concluded with.

+1 for releasing at least some Sandbox components.  Analyzers, Snowball
Analyzers and Highligher at least.

Is this something you can/want to do, Daniel?

Otis


--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> 
> On Aug 14, 2004, at 1:34 PM, Daniel Naber wrote:
> > any objections against moving the German and Russian analyzers into
> the
> > sandbox? If not, I'd like to do that, but I'm not sure if we
> already
> > agreed on doing so. The current situation with analyzers both in 
> > lucene's
> > core and in the sandbox doesn't seem to make sense.
> 
> +1 on moving those analyzers out.
> 
> > I suggest that we provide an lucene-analyzers.tar.gz with the
> upcoming
> > releases so that people don't have to check out the sandbox from
> CVS to
> > make use of the analyzers.
> 
> We should go even further and release all sandbox components along
> with 
> each release - this would give more exposure to the highlighter, 
> snowball analyzer, and the others.  Maybe we shouldn't package 
> everything that way, as some things might tend to cause more support 
> issues, such as WordNet which requires a few additional steps to make
> 
> use of.  Thoughts?  At the very least though, analyzers, highlighter,
> 
> and snowball make great sense to package and version with Lucene core
> 
> releases, I think.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Is the latest StandardTokenizer.jj corrupted?

2004-08-03 Thread Otis Gospodnetic

Hm, not sure.  As far as I remember, I did run 'ant javacc', because I
also committed the generated .java files.

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> Roy,
> 
> I've committed a fix that fixes the import statement issues (Otis -  
> what happened? - I guess you didn't try regenerating from the JavaCC 
> 
> .jj files)
> 
> All seems to be well now.  Let me know if you still have issues.
> 
>   Erik
> 
> On Aug 2, 2004, at 3:30 PM, Roy wrote:
> 
> > The latest StandardTokenizer.jj in cvs repository seems to be  
> > corrupted.
> >
> > I used ant javacc-StandardAnalyzer to regenerate the java code,
> then
> > tried to rebuild the lucene package but got some strange errors as
> > follows. It seems javacc didn't generated correct code against the
> > latest jj file: the last two errors indicate there are some
> characters
> > missing from those symbols. I checked out the 1.4 jj file and did
> the
> > same thing again. It works!  So I think there's some problems with
> the
> > latest version. The diff shows the change is minor. I didn't figure
> > the cause. Maybe somebody in the core team can look into this?
> >
> > My purpose is to modify the StandardTokenizer.jj to build my own.
> But
> > I didn't succeed. Then I found out even the vanilla jj has some
> > problems.
> >
> > compile-core:
> >[javac] Compiling 160 source files to
> > /home/roy/jakarta-lucene/build/classes/java
> >[javac]  
> >
>
/home/roy/jakarta-lucene/src/java/org/apache/lucene/analysis/standard/
> 
> > StandardTokenizer.java:15:
> > cannot resolve symbol
> >[javac] symbol  : class Reader
> >[javac] location: class
> > org.apache.lucene.analysis.standard.StandardTokenizer
> >[javac]   public StandardTokenizer(Reader reader) {
> >[javac]^
> >[javac]  
> >
>
/home/roy/jakarta-lucene/src/java/org/apache/lucene/analysis/standard/
> 
> > StandardTokenizer.java:24:
> > cannot resolve symbol
> >[javac] symbol  : class IOException
> >[javac] location: class
> > org.apache.lucene.analysis.standard.StandardTokenizer
> >[javac]   final public org.apache.lucene.analysis.Token next()
> > throws ParseException, IOException {
> >[javac]
> >   ^
> >[javac]  
> >
>
/home/roy/jakarta-lucene/src/java/org/apache/lucene/analysis/standard/
> 
> > StandardTokenizer.java:15:
> > recursive constructor invocation
> >[javac]   public StandardTokenizer(Reader reader) {
> >[javac]  ^
> >[javac]  
> >
>
/home/roy/jakarta-lucene/src/java/org/apache/lucene/analysis/standard/
> 
> > StandardTokenizerTokenManager.java:493:
> > cannot resolve symbol
> >[javac] symbol  : method jCheckNAdd (int)
> >[javac] location: class
> > org.apache.lucene.analysis.standard.StandardTokenizerTokenManager
> >[javac]  jCheckNAdd(25);
> >[javac]  ^
> >[javac]  
> >
>
/home/roy/jakarta-lucene/src/java/org/apache/lucene/analysis/standard/
> 
> > StandardTokenizerTokenManager.java:942:
> > cannot resolve symbol
> >[javac] symbol  : variable ind
> >[javac] location: class
> > org.apache.lucene.analysis.standard.StandardTokenizerTokenManager
> >[javac]  ind = 4;
> >[javac]  ^
> >[javac] 5 errors
> >
> > BUILD FAILED
> > /home/roy/jakarta-lucene/build.xml:140: Compile failed; see the
> > compiler error output for details.
> >
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

1 2 3 4 5 6 >

1 - 100 of 567 matches

Mail list logo