Re: TestRangeQuery.java
Hi, If tests work without eclipse it is necessary to adjust correctly their performance in eclipse:-) Good luke, Vladimir. On Wed, 20 Oct 2004 19:10:45 +0530 Karthik N S [EMAIL PROTECTED] wrote: Hi Does anybody have Trouble in Compiling TestRangeQuery.java in Eclipse 3.0 IDE, [ http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/src/test/org/apache/lucene/ search ] Seem's there is an Error doc.add(new Field(id, id + docCount, Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.add(new Field(content, content, Field.Store.NO, Field.Index.TOKENIZED)); Compiler Error is with Lucene1.4.1, Win O/s Field.Store.yes is not Found Thx in Advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: MultiSearcher to Indexing.
Hi Joel, Parallel method requests a lot of memories, but MultiSearcher requires slightly less memory. Tomcat at the large loading gives out a system mistake. If you have other experience of work that please tell me. Regards, Vladimir. On Fri, 13 Aug 2004 12:22:34 +0200 [EMAIL PROTECTED] wrote: Hi Vladimir, Can You please explain me what's the benefit of this approach and why _pickles_? I f I understand correctly the ?-n was how to make query run paralelly on multi-index. Is ParalelMultiSearcher not for this? Regards, Joel Vladimir Yuryev [EMAIL PROTECTED]To: Lucene Users List [EMAIL PROTECTED] ru cc: Subject: Re: MultiSearcher to Indexing. 13.08.2004 06:45 Please respond toCategory: |-| Lucene Users| ( ) Action needed | List| ( ) Decision needed | | ( ) General Information | |-| Natarajan, MultiSeacher - it is well, but this a way have pickles. Example, but it is not sample: public Query combine(Query[] queries) throws IOException { if (expandedQueries.length 2) { return queries[0]; } Query[] combined = new Query[2]; combined[0] = new BooleanQuery(); BooleanQuery.setMaxClauseCount(1); for (int i = 0; i queries.length; i++) { combined[1] = queries[i]; if (queries[i] instanceof BooleanQuery || queries[i] instanceof MultiTermQuery || queries[i] instanceof PrefixQuery || queries[i] instanceof RangeQuery) { combined[0] = Query.mergeBooleanQueries(combined); } else if (queries[i] instanceof PhraseQuery) { Term[] queryTerms = ((PhraseQuery)queries[i]).getTerms(); for (int j = 0; j queryTerms.length; j++) { TermQuery q = new TermQuery(queryTerms[j]); ((BooleanQuery)combined[0]).add(q, true, false); } } else ((BooleanQuery)combined[0]).add(queries[i], true, false); } return combined[0]; } ... Searcher[] searchers = new IndexSearcher[indexName.length]; for(int i=0;iindexName.length;i++) { searchers[i] = new IndexSearcher(indexName[i]); } MultiSearcher multiSearcher=new MultiSearcher(searchers); QueryParser qp = new QueryParser(FIELD_CONTENTS, analyzer); query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer); hits = multiSearcher.search(query); IndexReader reader[] = new IndexReader[indexName.length]; Query[] expandedQueries=new Query[indexName.length]; for(int i=0;iindexName.length;i++){ IndexReader.open(indexName[i]); expandedQueries[i]=query.rewrite(reader[i]); } query=combine(expandedQueries); ... Best regards, Vladimir. On Thu, 12 Aug 2004 20:51:13 +0530 Natarajan.T [EMAIL PROTECTED] wrote: Thanks for your response. Ok I can understand the concept . if you have any sample code pls sent it to me. You have any idea about Parallel Searcher pls share to me. -Original Message- From: Terence Lai [mailto:[EMAIL PROTECTED] Sent: Thursday, August 12, 2004 8:40 PM To: Lucene Users List Subject: RE: MultiSearcher to Indexing. This is how I do it: IndexSearcher[] is = new IndexSearcher[2]; is[0] = new IndexSearcher(IndexDir1); // first index folder is[1] = new IndexSearcher(IndexDir2); // second index folder MultiSearcher searcher = new MultiSearcher(is); searcher.search(query); I think that the MulitSearcher is only doing sequential search. Alternately, you can use ParallelMultiSearcher which allows you to do the search in parallel. Hope this helps, Terence FYI I have an Indexing
Re: MultiSearcher to Indexing.
Thanks. Vladimir. On Fri, 13 Aug 2004 14:03:50 +0200 [EMAIL PROTECTED] wrote: Well, actually we use a nice piece of hardware with a lot of memory and 2 cpu under linux. As front-end we use coldfusion application. Seems to be ok, but we have not tested on huge load yet. Let You know if smth. gettig wrong. Regards, J. Vladimir Yuryev [EMAIL PROTECTED]To: Lucene Users List [EMAIL PROTECTED] ru cc: Subject: Re: MultiSearcher to Indexing. 13.08.2004 13:06 Please respond toCategory: |-| Lucene Users| ( ) Action needed | List| ( ) Decision needed | | ( ) General Information | |-| Hi Joel, Parallel method requests a lot of memories, but MultiSearcher requires slightly less memory. Tomcat at the large loading gives out a system mistake. If you have other experience of work that please tell me. Regards, Vladimir. On Fri, 13 Aug 2004 12:22:34 +0200 [EMAIL PROTECTED] wrote: Hi Vladimir, Can You please explain me what's the benefit of this approach and why _pickles_? I f I understand correctly the ?-n was how to make query run paralelly on multi-index. Is ParalelMultiSearcher not for this? Regards, Joel Vladimir Yuryev [EMAIL PROTECTED]To: Lucene Users List [EMAIL PROTECTED] ru cc: Subject: Re: MultiSearcher to Indexing. 13.08.2004 06:45 Please respond toCategory: |-| Lucene Users| ( ) Action needed | List| ( ) Decision needed | | ( ) General Information | |-| Natarajan, MultiSeacher - it is well, but this a way have pickles. Example, but it is not sample: public Query combine(Query[] queries) throws IOException { if (expandedQueries.length 2) { return queries[0]; } Query[] combined = new Query[2]; combined[0] = new BooleanQuery(); BooleanQuery.setMaxClauseCount(1); for (int i = 0; i queries.length; i++) { combined[1] = queries[i]; if (queries[i] instanceof BooleanQuery || queries[i] instanceof MultiTermQuery || queries[i] instanceof PrefixQuery || queries[i] instanceof RangeQuery) { combined[0] = Query.mergeBooleanQueries(combined); } else if (queries[i] instanceof PhraseQuery) { Term[] queryTerms = ((PhraseQuery)queries[i]).getTerms(); for (int j = 0; j queryTerms.length; j++) { TermQuery q = new TermQuery(queryTerms[j]); ((BooleanQuery)combined[0]).add(q, true, false); } } else ((BooleanQuery)combined[0]).add(queries[i], true, false); } return combined[0]; } ... Searcher[] searchers = new IndexSearcher[indexName.length]; for(int i=0;iindexName.length;i++) { searchers[i] = new IndexSearcher(indexName[i]); } MultiSearcher multiSearcher=new MultiSearcher(searchers); QueryParser qp = new QueryParser(FIELD_CONTENTS, analyzer); query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer); hits = multiSearcher.search(query); IndexReader reader[] = new IndexReader[indexName.length]; Query[] expandedQueries=new Query[indexName.length]; for(int i=0;iindexName.length;i++){ expandedQueries[i
Re: MultiSearcher to Indexing.
Natarajan, MultiSeacher - it is well, but this a way have pickles. Example, but it is not sample: public Query combine(Query[] queries) throws IOException { if (expandedQueries.length 2) { return queries[0]; } Query[] combined = new Query[2]; combined[0] = new BooleanQuery(); BooleanQuery.setMaxClauseCount(1); for (int i = 0; i queries.length; i++) { combined[1] = queries[i]; if (queries[i] instanceof BooleanQuery || queries[i] instanceof MultiTermQuery || queries[i] instanceof PrefixQuery || queries[i] instanceof RangeQuery) { combined[0] = Query.mergeBooleanQueries(combined); } else if (queries[i] instanceof PhraseQuery) { Term[] queryTerms = ((PhraseQuery)queries[i]).getTerms(); for (int j = 0; j queryTerms.length; j++) { TermQuery q = new TermQuery(queryTerms[j]); ((BooleanQuery)combined[0]).add(q, true, false); } } else ((BooleanQuery)combined[0]).add(queries[i], true, false); } return combined[0]; } ... Searcher[] searchers = new IndexSearcher[indexName.length]; for(int i=0;iindexName.length;i++) { searchers[i] = new IndexSearcher(indexName[i]); } MultiSearcher multiSearcher=new MultiSearcher(searchers); QueryParser qp = new QueryParser(FIELD_CONTENTS, analyzer); query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer); hits = multiSearcher.search(query); IndexReader reader[] = new IndexReader[indexName.length]; Query[] expandedQueries=new Query[indexName.length]; for(int i=0;iindexName.length;i++){ reader[i] = IndexReader.open(indexName[i]); expandedQueries[i]=query.rewrite(reader[i]); } query=combine(expandedQueries); ... Best regards, Vladimir. On Thu, 12 Aug 2004 20:51:13 +0530 Natarajan.T [EMAIL PROTECTED] wrote: Thanks for your response. Ok I can understand the concept . if you have any sample code pls sent it to me. You have any idea about Parallel Searcher pls share to me. -Original Message- From: Terence Lai [mailto:[EMAIL PROTECTED] Sent: Thursday, August 12, 2004 8:40 PM To: Lucene Users List Subject: RE: MultiSearcher to Indexing. This is how I do it: IndexSearcher[] is = new IndexSearcher[2]; is[0] = new IndexSearcher(IndexDir1); // first index folder is[1] = new IndexSearcher(IndexDir2); // second index folder MultiSearcher searcher = new MultiSearcher(is); searcher.search(query); I think that the MulitSearcher is only doing sequential search. Alternately, you can use ParallelMultiSearcher which allows you to do the search in parallel. Hope this helps, Terence FYI I have an Indexing files in different folders, in this time how can I doing the Searching process using MultiSearcher. Thanks, Natarajan. -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: continous index update
Hi! I do automatic index update by cron daemon. Regards, Vladimir. On Wed, 28 Jul 2004 15:05:46 +0530 jitender ahuja [EMAIL PROTECTED] wrote: Hi all, I am trying to make an automatic index update file based o a background thread, but it gives errors in deleting the existing index, if (only if) the server accesses the index at the same time or has once accessed it and even if a different request is posed, i.e. for a different index directory or a different job, it makes no difference. Can anyone tell that in such a continous update scenario, how the old index can be updated as I feel deletion is a must of the earlier contents so as to get the new contents in place. Regards, Jitender - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: continous index update
Jitender Use task manager. Regards, Vladimir. On Wed, 28 Jul 2004 16:13:51 +0530 jitender ahuja [EMAIL PROTECTED] wrote: Hi, I am working on Windows platform and I think it wouldn't work there. If it can, do please tell me. Regards, - Original Message - From: Vladimir Yuryev [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, July 28, 2004 3:17 PM Subject: Re: continous index update Hi! I do automatic index update by cron daemon. Regards, Vladimir. On Wed, 28 Jul 2004 15:05:46 +0530 jitender ahuja [EMAIL PROTECTED] wrote: Hi all, I am trying to make an automatic index update file based o a background thread, but it gives errors in deleting the existing index, if (only if) the server accesses the index at the same time or has once accessed it and even if a different request is posed, i.e. for a different index directory or a different job, it makes no difference. Can anyone tell that in such a continous update scenario, how the old index can be updated as I feel deletion is a must of the earlier contents so as to get the new contents in place. Regards, Jitender - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ANN: Luke v. 0.5 released
On Thu, 24 Jun 2004 12:34:35 +0200 Andrzej Bialecki [EMAIL PROTECTED] wrote: Vladimir Yuryev wrote: Hi Andrzej! I am sorry for my English :-( I with pleasure shall tell about the test and I shall try to state conditions of the test in detail. I don't quite understand what you are saying... Do you suspect there is a bug in Luke somewhere on the Search tab? If that's the case, please provide an example. 1. Search was made on an index with coding Cp1251. 2. Conditions of search: Analyzer to use for query parsing: org.apache.lucene.analysis.ru. RussianAnalyzer Default field is:contents 2.1. Enter search expression here: (the coding windows-1251) Result: No Results 2.2. Enter search expression here:* (the coding windows-1251) Result: 1 doc (s), url: http://www.agnuz.info/result.php?year=2004mounth1=Marchday=26files=v02.txtprint=news Time to refresh my russian... :-) Ok, the problem seems to be in the RussianAnalyzer - it uses RussianLetterTokenizer, which filters out anything which is a non-letter - I'm afraid it filters out also the wildcard at the end. Not only that, it then passes the tokens through a RussianStemmer, which further mutilates the tokens. Please try the Parsed query view on the Search tab to see what is the result of your query, or paste your query into the text area on the AnalyzerTool plugin (Plugins), and see what tokens you get using RussianAnalyzer. I just did it, and the result for * was - clearly not what you wanted. -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Hi Andrzej! Well. To the address: http://www.agnuz.info/result.php?year=2004mounth1=Marchday=26files=v02.txtprint=news; there is a full text in which I searched for a phrase ...Pontiff has expressed importance..., in russian. Please try the Parsed query view on the Search tab to see what is the result of your query In a bookmark Search the phrase has not been found. The problem was (for some reason?!) in the second and third words? Search by separate words (simple terms) has found out a problem in these last two words. And so, for Analyzer to use for query parsing: : org.apache.lucene.analysis.ru.RussianAnalyzer, Entry search expression here: [texts in coding Cp1251] - 1. Entry search expression here : . Parsed query view: contents: . - No Results 2. Entry search expression here: Parsed query view: contents: - 2 doc (s) URLs: http: // www.agnuz.info/result.php? year=2004mounth1=Marchday=26files=v01.txtprint=news http: // www.agnuz.info/result.php? year=2004mounth1=Marchday=26files=v02.txtprint=news 3. Entry search expression here: Parsed query view: contents: - No Results 4. Entry search expression here: Parsed query view: contents: - No Results 5. Entry search expression here: . Parsed query view: contents: contents: contents:. - 2 doc (s)- the same documents as point 2. .., or paste your query into the text area on the AnalyzerTool plugin (Plugins), and see what tokens you get using RussianAnalyzer. In a tab Plugins in a field Text to be analyzed I have tested the same three words as a phrase -. As a result of the analysis in a field Tokens found three have been shown stemms - , and . Actions - hilite- has given positive results by all three words. (Similar a problem not in filters?):-) Best regards, Vladimir. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ANN: Luke v. 0.5 released
Hi Andrzej! I congratulate on the successful version. RussianAnalyzer works with my indexes, but there are problems with some words. These problem words are found only WildCard a method. Besides AnalizerTool works with these words without problems. There is one more small discrepancy on webpage http://www.getopt.org/luke/ - Remember to put both JARs on your classpath, e.g.: java-classpath luke.jar; lucene.jar org.getopt.luke. Luke + Remember to put both JARs on your classpath, e.g.: java-classpath luke.jar:lucene.jar org.getopt.luke. Luke Regards, Vladimir. On Tue, 22 Jun 2004 14:10:50 +0200 Andrzej Bialecki [EMAIL PROTECTED] wrote: Hello fellow Luceners, I'm pleased to announce that new release of Luke is now available. You can download it from: http://www.getopt.org/luke/ This release uses Lucene 1.4-rc4. This release also represents a major step forward - many new exciting features have been added. The feature I consider the most important in this release is extensibility - there is a plugin framework, and a sample plugin is provided in the distribution - I encourage you to write more. Here's a short summary of changes in this release: * NEW: Added support for Term Vectors. * NEW: Added a plugin framework - plugins found on classpath are detected automatically and added to the new Plugins tab. Note however that for now plugins autoloading doesn't quite work when using Java WebStart - an alternative mechanism is also provided. Plugins have full access to the application context. Please read JavaDoc for LukePlugin.java for more information. * NEW: A sample plugin is provided, based on Mark Harwood's tool for analyzing analyzers. * NEW: all tables support resizable columns now. Some dialogs are also resizable. * NEW: Added Reconstruct functionality. Using this function users can reconstruct the content of all (also unstored) fields of a document. This function uses a brute-force approach, so it may be slow for larger indexes ( 500,000 docs). * NEW: Added pseudo-edit functionality. New document editor dialog allows to modify reconstructed documents, and add or replace the original ones. * FIX: problems with MRU list solved, and a framework for handling preferences introduced. * FIX: the list of available Analyzers is now dynamically populated from the classpath, using the same method as in the AnalyzerTool plugin. This also doesn't work in WebStart, so a fallback to a static list is provided. * FIX: restructured source repository and added Ant build script. Please note that as a result of the package name changes, the main class is now org.getopt.luke.Luke, and NOT as before luke.Luke. I felt that all these changes merited a slight change in name, from Lucene Index Browser to Lucene Index Toolbox, as this seems to better reflect the current functionality of the tool. Any feedback, patches for enhancements or bufixes are welcome! If you want to provide a patch, please use diff -bdruN - this will help me to integrate it. Thank you! -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene search integration with Portal Servers
Hi! For example: http://www.lutece.paris.fr/en/jsp/site/Portal.jsp Regards, Vladimir. On Fri, 18 Jun 2004 14:32:18 -0700 Hetan Shah [EMAIL PROTECTED] wrote: Hi All, Has anyone tried or have any sample of working integration solution for LUCENE with any J2EE portal servers? Also I am curious to know what are the best or mostly used practices to link the search results with the documents/files on the system. Thanks all, -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Analyzers
Hi! Well. It would be even better if it is allowable existed any InterAnalyzer in which the national coding would enter and it could inherit properties core analyzers. Regards, Vladimir. Don, You should get Snowball Analyzers: http://jakarta.apache.org/lucene/docs/lucene-sandbox/ Lucene core includes Russian and German Analyzers, but in the long run they, too, will most likely get moved out of the core. Otis --- Don Vaillancourt [EMAIL PROTECTED] wrote: I have recently downloaded the latest version of Lucene 1.3 and was wondering where some of the classes are. For example all of the analyzers except for Standard are missing from the binary. Are these documented, but incomplete classes which will be available later, although some articles that I have read seem to have tested these analyzers. = http://www.simpy.com/ - social bookmarking and personal search engine - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Writing a stemmer
On Sat, 05 Jun 2004 21:15:23 +0200 Andrzej Bialecki [EMAIL PROTECTED] wrote: Vladimir Yuryev wrote: Hi, Andjej! How you tested the Polish texts with what stemer? Thanks, Vladimir. No reason to be too modest, Leo.. I tested your stemmer on English, Swedish and Polish texts (including F-measure vs. training set size plots), and it works exceptionally well indeed. Highly recommended! Well, I have several corpora of Polish language, which together amount to roughly 90,000 words (nouns and verbs) having at least 4 inflected forms. This set is randomized (i.e. lines of words + forms are in random order). I've split this into two parts - one of a fixed size, as a test set, and one of variable size as a training set. Then I compile stemmer tables using variable number of training examples, and using differnt settings (trie, multi-trie, different optimizations, etc..). Then for each output table I test the precision/recall of correct base forms (lemmatization), and of ability to create unique stems (stemming). Finally, I select the best table, which gives reasonably good results vs. table size. To put it in plain terms, e.g. for tables roughly 300kB in size (created from training set of 3000 unique words + their forms) in best cases I get ~90% of correct stems, and ~70% of correct lemmas. Which is a _very_ good result! -- Best regards, Andrzej Bialecki Thanks for the detailed description of the test of the Polish texts. It was very important for me. Vladimir. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Writing a stemmer
Hi, Andjej! How you tested the Polish texts with what stemer? Thanks, Vladimir. No reason to be too modest, Leo.. I tested your stemmer on English, Swedish and Polish texts (including F-measure vs. training set size plots), and it works exceptionally well indeed. Highly recommended! -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ParallelMultiSearcher
Hi, Erik! Thanks for your reply. Vladimir. What requirements ParallelMultiSearch to JVM? What the adjustments of memory and for processes of system are required? If it somebody knows, let it can be on an example anyone of Unix System. ParallelMultiSearcher simply spins a separate thread for each index and waits for the results from all threads before returning results. Depending on your hardware, you may or may not receive performance benefits over using plain MultiSearcher. You would likely need each index on a separate disk so that you would benefit from parallel I/O. Beyond standard multi-threaded Java concerns, there is nothing special about ParallelMultiSearcher, and tuning would be dependent on your environment. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
ParallelMultiSearcher
Hello all, What requirements ParallelMultiSearch to JVM? What the adjustments of memory and for processes of system are required? If it somebody knows, let it can be on an example anyone of Unix System. Is there anyone know something about it? Thanks, Vladimir - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
DEFAULT_OPERATOR_AND
Hi! I have lucene1.4-rc3-dev. TestQueryParser works with RussianAnalyzer(RussianCharsets.CP1251) and russian terms. ... public Query getQueryDOA(String query, Analyzer a) throws Exception { if (a == null) a = new RussianAnalyzer(RussianCharsets.CP1251); // a = new SimpleAnalyzer(); QueryParser qp = new QueryParser(field, a); qp.setOperator(QueryParser.DEFAULT_OPERATOR_AND); return qp.parse(query); } ... In a reality QueryParser work as QueryParser.DEFAULT_OPERATOR_OR after set QueryParser.DEFAULT_OPERATOR_AND. For example: 1. Query: (after set DEFAULT _ OPERATOR _ AND): term1 term2 term3 Result : term1 OR term2 OR term3 2. Query: +term1 +term2 +term3 Result : term1 AND term2 AND term3 Please, help to decide this problem? Thanks, Vladimir. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Bug Luke
Hi! The search works not correctly c RussianAnalyzer allocating stems. It(he) searches only for words conterminous with stem. For example, WildCard the search gives another result. Thanks, Vladimir. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Patchs for RussianAnalyzer
Erik, Look, please second my letter whithout attachment. It has the texts in body letter. Vladimir. On Mon, 29 Mar 2004 12:06:45 -0500 Erik Hatcher [EMAIL PROTECTED] wrote: Vladimir, I have just taken a look at your submitted patches. I have no objections to making Cp1251 the default charset used in the no-arg constructor to RussianAnalyzer, but all of your other changes are formatting along with the addition of some other constructors. Could you please provide a functionality-only diff for your patches, preferably in a single file attached to a Bugzilla issue? Thanks, Erik On Mar 17, 2004, at 8:25 AM, Vladimir Yuryev wrote: Dear developers! The user using RussianAnalyzer writes to you of Lucene. There is one problem at work only with it of Analyzer it is parameter of the Russian coding (you it know as the set of the code tables for one language always causes admiration). East Europe or the population the using applied programs in Russian use the coding windows-1251 as basic or widely widespread client a platform MS Windows. There is an opinion to update constructor without parameters establishing default Cp1251. See attached file: RussianAnalyzerPatchs.tgz RussianAnalyzer.java.path RussianLetterTokenizer.java.patch RussianLowerCaseFilter.java.patch RussianStemFilter.java.patch TestRussianAnalyzer.java.path Such updating will remove mess (for the beginners in Lucene or beginners of Russian) and will facilitate use Analyzers at switchings multilanguage search. Regards, Vladimir Yuryev. RussianAnalyzerPatchs.tgz - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Patchs for RussianAnalyzer
Erik, I made BUG # 28050. Vladimir On Tue, 30 Mar 2004 06:19:04 -0500 Erik Hatcher [EMAIL PROTECTED] wrote: On Mar 30, 2004, at 3:38 AM, Vladimir Yuryev wrote: Erik, Look, please second my letter whithout attachment. It has the texts in body letter. Vladimir. I don't have that e-mail you refer to. Please use the standard Jakarta Bugzilla issue tracking system, though. You can place an attachment to an issue after you create it - e-mail ends up mangling in-line patches. What I'm after is a clean patch that *only* changes the functionality you desire, not code formatting also. We can clean up code formatting in another pass if needed - or I can just do that on my end after reviewing the functionality-only patch. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What happened with build.xml in CVS?
Thanks Rob, works now. Vladimir On Mon, 29 Mar 2004 10:34:44 +0100 Rob Oxspring [EMAIL PROTECTED] wrote: Looks like Erik's commits 2 days back have up'd the depencancy from ant 1.5 to 1.6. Previously only selected tasks were allowed outside of targets and tstamp doesn't look like one of them. Rob Vladimir Yuryev wrote: Hi ! I have made latest update from lucene CVS, in which build.xml has problems: Buildfile: /home/vyuryev/workspace/jakarta-lucene/build.xml BUILD FAILED: file:/home/vyuryev/workspace/jakarta-lucene/build.xml:11: Unexpected element tstamp Total time: 297 milliseconds Best Regards, Vladimir Yuryev - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What happened with build.xml in CVS?
Thanks, Erik. Ant 1.6.1 works with build.xml v.1.58 without problems. Vladimir. On Mon, 29 Mar 2004 08:32:56 -0500 Erik Hatcher [EMAIL PROTECTED] wrote: Cool... my sinister plan of subversively getting the world to upgrade to Ant 1.6 is working! :) Erik On Mar 29, 2004, at 4:34 AM, Rob Oxspring wrote: Looks like Erik's commits 2 days back have up'd the depencancy from ant 1.5 to 1.6. Previously only selected tasks were allowed outside of targets and tstamp doesn't look like one of them. Rob Vladimir Yuryev wrote: Hi ! I have made latest update from lucene CVS, in which build.xml has problems: Buildfile: /home/vyuryev/workspace/jakarta-lucene/build.xml BUILD FAILED: file:/home/vyuryev/workspace/jakarta-lucene/build.xml:11: Unexpected element tstamp Total time: 297 milliseconds Best Regards, Vladimir Yuryev - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
What happened with build.xml in CVS?
Hi ! I have made latest update from lucene CVS, in which build.xml has problems: Buildfile: /home/vyuryev/workspace/jakarta-lucene/build.xml BUILD FAILED: file:/home/vyuryev/workspace/jakarta-lucene/build.xml:11: Unexpected element tstamp Total time: 297 milliseconds Best Regards, Vladimir Yuryev - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Patchs for RussianAnalyzer
Dear developers! The user using RussianAnalyzer writes to you of Lucene. There is one problem at work only with it of Analyzer it is parameter of the Russian coding (you it know as the set of the code tables for one language always causes admiration). East Europe or the population the using applied programs in Russian use the coding windows-1251 as basic or widely widespread client a platform MS Windows. There is an opinion to update constructor without parameters establishing default Cp1251. See attached file: RussianAnalyzerPatchs.tgz RussianAnalyzer.java.path RussianLetterTokenizer.java.patch RussianLowerCaseFilter.java.patch RussianStemFilter.java.patch TestRussianAnalyzer.java.path Such updating will remove mess (for the beginners in Lucene or beginners of Russian) and will facilitate use Analyzers at switchings multilanguage search. Regards, Vladimir Yuryev. RussianAnalyzerPatchs.tgz Description: GNU Zip compressed data - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Highlighting problem
Hi! For you Mark Harwood has made a file HighlightExtractorTest in which the principle of work Highlight is specified. Besides by replacing tags B for example on B style = color:black; background-color:#66 , receive yellow Highlight. If to apply conformity found word and color, it will turn out as at Google and etc. Best regards, Vladimir. On Tue, 2 Mar 2004 18:19:28 + (GMT) Clandes Tino [EMAIL PROTECTED] wrote: Hi all, I have incorporated highlighting package (http://home.clara.net/markharwood/lucene/highlight.htm) but I am worried about the following issue. If I want to display body field content?s best segments, containing term from query highlighted, I have to define Field body as Stored. So, complete process would be like this: Index related work: 1. parse uploaded document into temp ASCII file 2. read ASCII file and append its content to String 3. make Field as Text(String name, String value) Search related work: 1. Retrieve field ?body? String value from the hit (again - only way to do this - as I have understood ? is to declare Field ?body? as Stored) 2. pass the String value to Highlighter methods. Besides that in Lucene FAQ I have read that ?body? fields are not good candidates to be declared as Stored. Index size is one obvious reason, but I am wondering, how it implies Lucene search performance in general? Has somebody an idea how to include highlight functionality in Unstored Field? Regards and thanx in advance Milan ___ Yahoo! Messenger - Communicate instantly...Ping your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]