RE: Subversion conversion

2005-02-03 Thread Kevin L. Cobb
We recently started using SVN for SCM, were using VSS. We're trying out
approach A, branching off for each release. Development always develops
on the trunk, except when a bug is discovered that needs to be patched
to a previous version of the product. When that scenario comes up (and
it never has), then the developer has to make the change to the branched
version that needs to be patched and then must merge those changes into
other branches and the trunk.  

It seems to be a cleaner approach, at least for now. Of course, for an
open source project like Lucene, I'm not sure branching is necessary at
all. Anyone have any other models to use for SCM, I'd love to hear them,

Here's some ASCII art showing our model:

 +--- branch release
1.2
 |
---trunk|---trunk--|--trunk--|---trunk--
---
|  |
|  +-- branch release 1.1
|
+ branch release 1.0 ---
 

Kevin Cobb


-Original Message-
From: Chakra Yadavalli [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 02, 2005 7:50 PM
To: Lucene Users List
Subject: Re: Subversion conversion

Hello ALL, It might not be the right place for it but as we are talking
about SCM, I have a quick question. First, I haven't used CVS/SVN on any
project. I am a ClearCase/PVCS guy. I just would like to know WHICH
CONFIGURATION MANAGEMENT PLAN DO YOU FOLLOW IN LUCENE DEVELOPMENT.

PLAN A: DEVELOP IN TRUNK AND BRANCH OFF ON RELEASE
Recently I had a discussion with a friend about developing in the TRUNK
(which in the /main in ClearCase speak),  which my friend claims that is
done in the APACHE/Open Source projects. The main advantage he pointed
was that Merging could be avoided if you are developing in the TRUNK.
And when there is a release, they create a new Branch (say LUCENE_1.5
branch) and label them. That branch will be used for maintenance and any
code deltas will be merged back to TRUNK as needed.

PLAN B: BRANCH OF BEFORE PLANNED RELEASE AND MERGE BACK TO MAIN/TRUNK
As I am from a private workspace/isolated development school of
thought promoted by ClearCase, I am used to create a branch at the
project/release initiation and develop in that branch (say /main/dev).
Similarly, we have /main/int for making changes when the project goes to
integration phase, and a /main/acp branch for acceptance. In this
school, the /main will always have fewer versions of files and the
difference between any two consecutive versions is the NET CHANGE of
that SCM element (either file or dir) between two releases (say LUCENE
1.4 and 1.5).

Thanks in advance for your time.
Chakra Yadavalli
http://jroller.com/page/cyblogue

 -Original Message-
 From: aurora [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, February 02, 2005 4:25 PM
 To: lucene-user@jakarta.apache.org
 Subject: Re: Subversion conversion
 
 Subversion rocks!
 
 I have just setup the Windows svn client TortoiseSVN with my favourite
 file manager Total Commander 6.5. The svn status and commands are
 readily
 integrated with the file manager. Offline diff and revert are two
things
 I
 really like from svn.
 
  The conversion to Subversion is complete.  The new repository is
  available to users read-only at:
 
http://svn.apache.org/repos/asf/lucene/java/trunk
 
  Besides /trunk, there is also /branches and /tags.  /tags contains
all
 
  the CVS tags made so that you could grab a snapshot of a previous
  version.  /trunk is analogous to CVS HEAD.  You can learn more about
 the
  Apache repository configuration here and how to use the command-line
  client to check out the repository:
 
http://www.apache.org/dev/version-control.html
 
  Learn about Subversion, including the complete O'Reilly Subversion
 book
  in electronic form for free here:
 
http://subversion.tigris.org
 
  For committers, check out the repository using https and your Apache
  username/password.
 
  The Lucene sandbox has been integrated into our single Subversion
  repository, under /java/trunk/sandbox:
 
http://svn.apache.org/repos/asf/lucene/java/trunk/sandbox/
 
  The Lucene CVS repositories have been locked for read-only.
 
  If there are any issues with this conversion, let me know and I'll
 bring
  them to the Apache infrastructure group.
 
Erik
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-- 
Visit my weblog: http://www.jroller.com/page/cyblogue

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Stemming

2005-01-24 Thread Kevin L. Cobb
Do stemming algorithms take into consideration abbreviations too? Some
examples:

mg = milligrams
US = United States
CD = compact disc
vcr = video casette recorder

And, the next logical question, if stemming does not take care of
abbreviations, are there any solutions that include abbreviations inside
or outside of Lucene?

Thanks,

Kevin


-Original Message-
From: Chris Lamprecht [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 21, 2005 5:51 PM
To: Lucene Users List
Subject: Re: Stemming

Also if you can't wait, see page 2 of
http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html

or the LIA e-book ;)

On Fri, 21 Jan 2005 09:27:42 -0500, Kevin L. Cobb
[EMAIL PROTECTED] wrote:
 OK, OK ... I'll buy the book. I guess its about time since I am deeply
 and forever in love with Lucene. Might as well take the final plunge.
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
 Sent: Friday, January 21, 2005 9:12 AM
 To: Lucene Users List
 Subject: Re: Stemming
 
 Hi Kevin,
 
 Stemming is an optional operation and is done in the analysis step.
 Lucene comes with a Porter stemmer and a Filter that you can use in an
 Analyzer:
 
 ./src/java/org/apache/lucene/analysis/PorterStemFilter.java
 ./src/java/org/apache/lucene/analysis/PorterStemmer.java
 
 You can find more about it here:
 http://www.lucenebook.com/search?query=stemming
 You can also see mentions of SnowballAnalyzer in those search results,
 and you can find an adapter for SnowballAnalyzers in Lucene Sandbox.
 
 Otis
 
 --- Kevin L. Cobb [EMAIL PROTECTED] wrote:
 
  I want to understand how Lucene uses stemming but can't find any
  documentation on the Lucene site. I'll continue to google but hope
  that
  this list can help narrow my search. I have several questions on the
  subject currently but hesitate to list them here since finding a
good
  document on the subject may answer most of them.
 
 
 
  Thanks in advance for any pointers,
 
 
 
  Kevin
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Stemming

2005-01-21 Thread Kevin L. Cobb
I want to understand how Lucene uses stemming but can't find any
documentation on the Lucene site. I'll continue to google but hope that
this list can help narrow my search. I have several questions on the
subject currently but hesitate to list them here since finding a good
document on the subject may answer most of them. 

 

Thanks in advance for any pointers,

 

Kevin

 

 



RE: Stemming

2005-01-21 Thread Kevin L. Cobb
OK, OK ... I'll buy the book. I guess its about time since I am deeply
and forever in love with Lucene. Might as well take the final plunge.



-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 21, 2005 9:12 AM
To: Lucene Users List
Subject: Re: Stemming

Hi Kevin,

Stemming is an optional operation and is done in the analysis step. 
Lucene comes with a Porter stemmer and a Filter that you can use in an
Analyzer:

./src/java/org/apache/lucene/analysis/PorterStemFilter.java
./src/java/org/apache/lucene/analysis/PorterStemmer.java

You can find more about it here:
http://www.lucenebook.com/search?query=stemming
You can also see mentions of SnowballAnalyzer in those search results,
and you can find an adapter for SnowballAnalyzers in Lucene Sandbox.

Otis

--- Kevin L. Cobb [EMAIL PROTECTED] wrote:

 I want to understand how Lucene uses stemming but can't find any
 documentation on the Lucene site. I'll continue to google but hope
 that
 this list can help narrow my search. I have several questions on the
 subject currently but hesitate to list them here since finding a good
 document on the subject may answer most of them. 
 
  
 
 Thanks in advance for any pointers,
 
  
 
 Kevin
 
  
 
  
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Opinions: Using Lucene as a thin database

2004-12-13 Thread Kevin L. Cobb
I use Lucene as a legitimate search engine which is cool. But, I am also
using it as a simple database too. I build an index with a couple of
keyword fields that allows me to retrieve values based on exact matches
in those fields. This is all I need to do so it works just fine for my
needs. I also love the speed. The index is small enough that it is
wicked fast. Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose. 

 

 

 



RE: Opinions: Using Lucene as a thin database

2004-12-13 Thread Kevin L. Cobb
I don't have the requirement to do range type select, i.e. the only
operator I would need is the equals. Select * from MY_TABLE where
MY_NUMERIC_FIELD = 80.

My fields that are searchable in my model are always type KEYWORD. I
believe this forces the match to be exact. So thinking about it in
anything other than equals terms, I believe, would be a mistake. 

In any case, I believe that the requirement to use Lucene as a thin DB
means that your requirements for your database select are fairly simple
and straightforward. 

KLCobb

 
 

-Original Message-
From: Akmal Sarhan [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 14, 2004 10:24 AM
To: Lucene Users List
Subject: Re: Opinions: Using Lucene as a thin database

that sounds very interesting but how do you handle queries like
select * from MY_TABLE where MY_NUMERIC_FIELD  80

as far as I know you have only the range query so you will have to say

my_numeric_filed:[80 TO ??]
but this would not work in the a/m example or am I missing something?

regards

Akmal
Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07:
 Even we use lucene for similar purpose except that we index and store
quite 
 a few fields. Infact I also update partial documents as people
suggested. I 
 store all the indexed fields so I don't have to build the whole
document 
 again while updating partial document. The reason we do this is due to
the 
 speed. I found the lucene search on a millions objects is 4 to 5 times

 faster than our oracle queries (ofcourse this might be due to our
pitiful 
 database design :) ). It works great so far. the only caveat that we
had 
 till now was incremental updates. But now I am implementing real-time 
 updates so that the data in lucene index is almost always in sync with
data 
 in database. So now, our search does not goto the database at all.
 
 Praveen
 - Original Message - 
 From: Kevin L. Cobb [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Tuesday, December 14, 2004 9:40 AM
 Subject: Opinions: Using Lucene as a thin database
 
 
 I use Lucene as a legitimate search engine which is cool. But, I am
also
 using it as a simple database too. I build an index with a couple of
 keyword fields that allows me to retrieve values based on exact
matches
 in those fields. This is all I need to do so it works just fine for my
 needs. I also love the speed. The index is small enough that it is
 wicked fast. Was wondering if anyone out there was doing the same of
it
 there are any dissenting opinions on using Lucene for this purpose.
 
 
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 !EXCUBATOR:41bf0221115901292611315!
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Removing unnecessary characters from Queuery Terms

2004-11-29 Thread Kevin L. Cobb
I am parsing very large files with search terms that will be submitted
to a Lucene Index. My current problem is that some of the terms have
some special characters in them that blow up when the phrase is
parsed. An example would be My Search Phrase (alternate.  I won't be
adding any special Booleans or Clauses around the text.  

 

My basic line of code that build the Query object is ...

Query query = MultiFieldQueryParser.parse(term.toLowerCase(), fields,
new StandardAnalyzer());

 

Thanks in advance for any input. 

 

KLCobb

 



Getting everything out of an index

2003-09-10 Thread Kevin L. Cobb
Is there a quick and easy  way to get everything that is currently
indexed in a particular index when the searchable field is of type
KEYWORD?