RE: Subversion conversion
We recently started using SVN for SCM, were using VSS. We're trying out approach A, branching off for each release. Development always develops on the trunk, except when a bug is discovered that needs to be patched to a previous version of the product. When that scenario comes up (and it never has), then the developer has to make the change to the branched version that needs to be patched and then must merge those changes into other branches and the trunk. It seems to be a cleaner approach, at least for now. Of course, for an open source project like Lucene, I'm not sure branching is necessary at all. Anyone have any other models to use for SCM, I'd love to hear them, Here's some ASCII art showing our model: +--- branch release 1.2 | ---trunk|---trunk--|--trunk--|---trunk-- --- | | | +-- branch release 1.1 | + branch release 1.0 --- Kevin Cobb -Original Message- From: Chakra Yadavalli [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 02, 2005 7:50 PM To: Lucene Users List Subject: Re: Subversion conversion Hello ALL, It might not be the right place for it but as we are talking about SCM, I have a quick question. First, I haven't used CVS/SVN on any project. I am a ClearCase/PVCS guy. I just would like to know WHICH CONFIGURATION MANAGEMENT PLAN DO YOU FOLLOW IN LUCENE DEVELOPMENT. PLAN A: DEVELOP IN TRUNK AND BRANCH OFF ON RELEASE Recently I had a discussion with a friend about developing in the TRUNK (which in the /main in ClearCase speak), which my friend claims that is done in the APACHE/Open Source projects. The main advantage he pointed was that Merging could be avoided if you are developing in the TRUNK. And when there is a release, they create a new Branch (say LUCENE_1.5 branch) and label them. That branch will be used for maintenance and any code deltas will be merged back to TRUNK as needed. PLAN B: BRANCH OF BEFORE PLANNED RELEASE AND MERGE BACK TO MAIN/TRUNK As I am from a private workspace/isolated development school of thought promoted by ClearCase, I am used to create a branch at the project/release initiation and develop in that branch (say /main/dev). Similarly, we have /main/int for making changes when the project goes to integration phase, and a /main/acp branch for acceptance. In this school, the /main will always have fewer versions of files and the difference between any two consecutive versions is the NET CHANGE of that SCM element (either file or dir) between two releases (say LUCENE 1.4 and 1.5). Thanks in advance for your time. Chakra Yadavalli http://jroller.com/page/cyblogue -Original Message- From: aurora [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 02, 2005 4:25 PM To: lucene-user@jakarta.apache.org Subject: Re: Subversion conversion Subversion rocks! I have just setup the Windows svn client TortoiseSVN with my favourite file manager Total Commander 6.5. The svn status and commands are readily integrated with the file manager. Offline diff and revert are two things I really like from svn. The conversion to Subversion is complete. The new repository is available to users read-only at: http://svn.apache.org/repos/asf/lucene/java/trunk Besides /trunk, there is also /branches and /tags. /tags contains all the CVS tags made so that you could grab a snapshot of a previous version. /trunk is analogous to CVS HEAD. You can learn more about the Apache repository configuration here and how to use the command-line client to check out the repository: http://www.apache.org/dev/version-control.html Learn about Subversion, including the complete O'Reilly Subversion book in electronic form for free here: http://subversion.tigris.org For committers, check out the repository using https and your Apache username/password. The Lucene sandbox has been integrated into our single Subversion repository, under /java/trunk/sandbox: http://svn.apache.org/repos/asf/lucene/java/trunk/sandbox/ The Lucene CVS repositories have been locked for read-only. If there are any issues with this conversion, let me know and I'll bring them to the Apache infrastructure group. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Visit my weblog: http://www.jroller.com/page/cyblogue - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Stemming
Do stemming algorithms take into consideration abbreviations too? Some examples: mg = milligrams US = United States CD = compact disc vcr = video casette recorder And, the next logical question, if stemming does not take care of abbreviations, are there any solutions that include abbreviations inside or outside of Lucene? Thanks, Kevin -Original Message- From: Chris Lamprecht [mailto:[EMAIL PROTECTED] Sent: Friday, January 21, 2005 5:51 PM To: Lucene Users List Subject: Re: Stemming Also if you can't wait, see page 2 of http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html or the LIA e-book ;) On Fri, 21 Jan 2005 09:27:42 -0500, Kevin L. Cobb [EMAIL PROTECTED] wrote: OK, OK ... I'll buy the book. I guess its about time since I am deeply and forever in love with Lucene. Might as well take the final plunge. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, January 21, 2005 9:12 AM To: Lucene Users List Subject: Re: Stemming Hi Kevin, Stemming is an optional operation and is done in the analysis step. Lucene comes with a Porter stemmer and a Filter that you can use in an Analyzer: ./src/java/org/apache/lucene/analysis/PorterStemFilter.java ./src/java/org/apache/lucene/analysis/PorterStemmer.java You can find more about it here: http://www.lucenebook.com/search?query=stemming You can also see mentions of SnowballAnalyzer in those search results, and you can find an adapter for SnowballAnalyzers in Lucene Sandbox. Otis --- Kevin L. Cobb [EMAIL PROTECTED] wrote: I want to understand how Lucene uses stemming but can't find any documentation on the Lucene site. I'll continue to google but hope that this list can help narrow my search. I have several questions on the subject currently but hesitate to list them here since finding a good document on the subject may answer most of them. Thanks in advance for any pointers, Kevin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Stemming
I want to understand how Lucene uses stemming but can't find any documentation on the Lucene site. I'll continue to google but hope that this list can help narrow my search. I have several questions on the subject currently but hesitate to list them here since finding a good document on the subject may answer most of them. Thanks in advance for any pointers, Kevin
RE: Stemming
OK, OK ... I'll buy the book. I guess its about time since I am deeply and forever in love with Lucene. Might as well take the final plunge. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, January 21, 2005 9:12 AM To: Lucene Users List Subject: Re: Stemming Hi Kevin, Stemming is an optional operation and is done in the analysis step. Lucene comes with a Porter stemmer and a Filter that you can use in an Analyzer: ./src/java/org/apache/lucene/analysis/PorterStemFilter.java ./src/java/org/apache/lucene/analysis/PorterStemmer.java You can find more about it here: http://www.lucenebook.com/search?query=stemming You can also see mentions of SnowballAnalyzer in those search results, and you can find an adapter for SnowballAnalyzers in Lucene Sandbox. Otis --- Kevin L. Cobb [EMAIL PROTECTED] wrote: I want to understand how Lucene uses stemming but can't find any documentation on the Lucene site. I'll continue to google but hope that this list can help narrow my search. I have several questions on the subject currently but hesitate to list them here since finding a good document on the subject may answer most of them. Thanks in advance for any pointers, Kevin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Opinions: Using Lucene as a thin database
I use Lucene as a legitimate search engine which is cool. But, I am also using it as a simple database too. I build an index with a couple of keyword fields that allows me to retrieve values based on exact matches in those fields. This is all I need to do so it works just fine for my needs. I also love the speed. The index is small enough that it is wicked fast. Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose.
RE: Opinions: Using Lucene as a thin database
I don't have the requirement to do range type select, i.e. the only operator I would need is the equals. Select * from MY_TABLE where MY_NUMERIC_FIELD = 80. My fields that are searchable in my model are always type KEYWORD. I believe this forces the match to be exact. So thinking about it in anything other than equals terms, I believe, would be a mistake. In any case, I believe that the requirement to use Lucene as a thin DB means that your requirements for your database select are fairly simple and straightforward. KLCobb -Original Message- From: Akmal Sarhan [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 10:24 AM To: Lucene Users List Subject: Re: Opinions: Using Lucene as a thin database that sounds very interesting but how do you handle queries like select * from MY_TABLE where MY_NUMERIC_FIELD 80 as far as I know you have only the range query so you will have to say my_numeric_filed:[80 TO ??] but this would not work in the a/m example or am I missing something? regards Akmal Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07: Even we use lucene for similar purpose except that we index and store quite a few fields. Infact I also update partial documents as people suggested. I store all the indexed fields so I don't have to build the whole document again while updating partial document. The reason we do this is due to the speed. I found the lucene search on a millions objects is 4 to 5 times faster than our oracle queries (ofcourse this might be due to our pitiful database design :) ). It works great so far. the only caveat that we had till now was incremental updates. But now I am implementing real-time updates so that the data in lucene index is almost always in sync with data in database. So now, our search does not goto the database at all. Praveen - Original Message - From: Kevin L. Cobb [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 9:40 AM Subject: Opinions: Using Lucene as a thin database I use Lucene as a legitimate search engine which is cool. But, I am also using it as a simple database too. I build an index with a couple of keyword fields that allows me to retrieve values based on exact matches in those fields. This is all I need to do so it works just fine for my needs. I also love the speed. The index is small enough that it is wicked fast. Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] !EXCUBATOR:41bf0221115901292611315! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Removing unnecessary characters from Queuery Terms
I am parsing very large files with search terms that will be submitted to a Lucene Index. My current problem is that some of the terms have some special characters in them that blow up when the phrase is parsed. An example would be My Search Phrase (alternate. I won't be adding any special Booleans or Clauses around the text. My basic line of code that build the Query object is ... Query query = MultiFieldQueryParser.parse(term.toLowerCase(), fields, new StandardAnalyzer()); Thanks in advance for any input. KLCobb
Getting everything out of an index
Is there a quick and easy way to get everything that is currently indexed in a particular index when the searchable field is of type KEYWORD?