RE: Does Escaping Really Work?
My understanding is that escaping may not work (as Terry and I believe) however a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer when parsing a query. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 1:48 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Well, pardon me for breathing, Otis. I didn't make the connection (partly 'cause you changed the subject line). But anyway, I don't understand your rather oblique answer - does escaping work or not? Are you saying that, in order for it to work (the way the docs say it does), I need to insert this module in the chain? Or what? Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 3:07 PM Subject: Re: Does Escaping Really Work? Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Does Escaping Really Work?
I suspect to dig deeper we'll have to look at QueryParser.jj. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 3:11 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Dave, I would say you seem to be right. But this is getting very frustrating. Here is what the Lucene docs say: docs quote Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - || ! ( ) { } [ ] ^ ~ * ? : \ To escape these character use the \ before the character. For example to search for (1+1):2 use the query: \(1\+1\)\:2 /docs quote Is the Lucene documentation in error? Does it work but only using something other than the standard configuration? If so, precisely what non-standard configuration is necessary? Why can't these questions be answered simply and clearly? Terry - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 5:02 PM Subject: RE: Does Escaping Really Work? My understanding is that escaping may not work (as Terry and I believe) however a workaround for most 'reasonable' cases is to use WhitespaceAnalyzer when parsing a query. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 26, 2002 1:48 PM To: Lucene Users List Subject: Re: Does Escaping Really Work? Well, pardon me for breathing, Otis. I didn't make the connection (partly 'cause you changed the subject line). But anyway, I don't understand your rather oblique answer - does escaping work or not? Are you saying that, in order for it to work (the way the docs say it does), I need to insert this module in the chain? Or what? Terry - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 3:07 PM Subject: Re: Does Escaping Really Work? Didn't I just answer this last night? WhitespaceAnalyzer? Otis --- Terry Steichen [EMAIL PROTECTED] wrote: I'm confused about how to use escape characters in Lucene. My Lucene configuration is 1.3-dev1 and I use the StandardAnalyzer and QueryParser. My documents have a field called 'path' with a value like 1102/a55407-2002nov2.xml. This field is indexed but not tokenized. Here are the various queries I've tried and their results: 1) When a dash is included in the query, Lucene interprets this as a space. (path:1102/a55402-2002nov2.xml is interpreted as path:1102/a55402 -body:2002nov2.xml) 2) When a backslash is inserted before the dash (and the query does *not* contain a wildcard), Lucene interprets this by inserting a space in lieu of the next character. ('path:1102/a55402\-2002nov2.xml' interpreted as 'path:1102/a55402 2002nov2.xml [note the space where the dash was]') 3) When a backslash is inserted before the dash (and the query *does* contain a wildcard), Lucene interprets this literally, without any conversion. (path:1102/55407\-2002nov* is interpreted literally). 4) When a backslash is inserted before the dash and immediately followed by a wildcard, Lucene reports an error. ('path:1102/a55407-*'causes lexical error: Encountered EOF after :) My overall observation is that it appears it is not possible to escape a dash - is this true? A previous post (yesterday) suggests that it is also not possible to escape a backslash. If that's also true, what characters can be escaped? Regards, Terry __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Slash Problem
Funny, I have more or less the same question I've been meaning to post. I think the answer is going to be that the analyzer applies to all parts of a query, even to untokenized fields, which to me seems wrong. So I think if you have a query like body:foo uri:/alpha/beta With 'body' being tokenized and 'uri' not tokenized, I think that the analyzer applies to /alpha/beta and breaks it into alpha beta which is not desired... -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 9:26 AM To: Lucene Users List Subject: Re: Slash Problem Rob, I presume that means that you used backslashes (in the url) rather than forward slashes (in the path). I had planned to test that as a workaround and it's good to know that you've already tested that successfully. But why is this necessary? Why doesn't the escape ('\') allow the use of a backslash? Regards, Terry - Original Message - From: Rob Outar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, November 25, 2002 12:01 PM Subject: RE: Slash Problem I don't know if this helps but I had exact same problem, I then stored the URI instead of the path, I was then able to search on the URI. Thanks, Rob -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 11:53 AM To: Lucene Users Group Subject: Slash Problem I've got a Text field (tokenized, indexed, stored) called 'path' which contains a string in the form of '1102\A3345-12RT.XML'. When I submit a query like path:1102* it works fine. But, when I try to be more specific (such as path:1102\a* or path:1102*a*) it fails. I've tried escaping the slash (path:1102\\a*) but that also fails. I'm using the StandardAnalyzer and the default QueryParser. Could anyone suggest what's going wrong here? Regards, Terry -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: PDF parser
I've tried all 3 of those and none have worked out for me. Our intranet has 802 PDFs from lots of (vendor) sources and all the pure java parsers have trouble w/ some of them. I've since gone to pdftotext from xpdf at the link below. True, not pure java, but it works on all platforms w/ my doc set and I suggest people use it, esp if they have any troubles w/ the java stuff below. http://www.foolabs.com/xpdf/ problems: some java parsers have trouble w/ the dummy encryption used, some parsers go into loops w/ some docs, and some parsers crash on some docs. Yes, I've reported some of these problems to the authors. -Original Message- From: Borkenhagen, Michael (ofd-ko zdfin) [mailto:[EMAIL PROTECTED]] Sent: Friday, November 22, 2002 6:42 AM To: 'Lucene Users List' Subject: AW: PDF parser There are different Parsers available - every Parser has other advantages and disadvantages. I use a combination of the PDFBox http://www.pdfbox.org/ and Etymon PJ http://www.etymon.com/pjc/, cause their APIs are very simple. Both of them parse PDF in a format of their own an provide interfaces to get the PDF Documents contents. Other developers on this list prefer JPedal http://www.jpedal.org/ which parses PDF into XML an provide a XML Tree with the PDF Documents contents. JPedal does the work best, but the Documentation isn´t very detailed. Micha -Ursprüngliche Nachricht- Von: Thomas Chacko [mailto:[EMAIL PROTECTED]] Gesendet: Freitag, 22. November 2002 15:26 An: Lucene Users List Betreff: PDF parser Whats the best parser available to extarct text from PDF documents. Expecting a reply ASAP Thanks in advance Thomas Chacko -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Book
I didn't see anyone mention my favorite text, Managing Gigabytes. My amazon link is: http://www.amazon.com/exec/obidos/ASIN/1558605703/tropoA -Original Message- From: William W [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 20, 2002 12:14 PM To: [EMAIL PROTECTED] Subject: Book I would like to buy a book about Lucene. Who could write it ? : ) _ STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Slash Problem
OK, sorry for the noise then. If I can reproduce I'll be more precise. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 12:13 PM To: Lucene Users List Subject: Re: Slash Problem Dave, My recent testing suggests that when the field is not tokenized, it is not split as you suggest. When I search the path field using path:1102/A* I get precisely what I am looking for (though I discovered the lowercase mechanism isn't applied to this field and the query is case-sensitive - not the uppercase 'A' above.) Regards, Terry - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, November 25, 2002 2:58 PM Subject: RE: Slash Problem Funny, I have more or less the same question I've been meaning to post. I think the answer is going to be that the analyzer applies to all parts of a query, even to untokenized fields, which to me seems wrong. So I think if you have a query like body:foo uri:/alpha/beta With 'body' being tokenized and 'uri' not tokenized, I think that the analyzer applies to /alpha/beta and breaks it into alpha beta which is not desired... -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 9:26 AM To: Lucene Users List Subject: Re: Slash Problem Rob, I presume that means that you used backslashes (in the url) rather than forward slashes (in the path). I had planned to test that as a workaround and it's good to know that you've already tested that successfully. But why is this necessary? Why doesn't the escape ('\') allow the use of a backslash? Regards, Terry - Original Message - From: Rob Outar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, November 25, 2002 12:01 PM Subject: RE: Slash Problem I don't know if this helps but I had exact same problem, I then stored the URI instead of the path, I was then able to search on the URI. Thanks, Rob -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 11:53 AM To: Lucene Users Group Subject: Slash Problem I've got a Text field (tokenized, indexed, stored) called 'path' which contains a string in the form of '1102\A3345-12RT.XML'. When I submit a query like path:1102* it works fine. But, when I try to be more specific (such as path:1102\a* or path:1102*a*) it fails. I've tried escaping the slash (path:1102\\a*) but that also fails. I'm using the StandardAnalyzer and the default QueryParser. Could anyone suggest what's going wrong here? Regards, Terry -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
test case - RE: Slash Problem
I'm sure there's something that I'm missing here. Let's say we have an index of a web site with 2 fields, body, and url. Body is formed via Field.Text(...,Reader) and the url field by Field.Keyword(), thus the URL is not tokenized but is searchable. I use StandardAnalyzer and I want to find the Document with a matching URL, and I want to use QueryParser to parse the users queries. I'm using v1.2. It seems that, if I'm correct, one design problem is that the Analyzer does not have a reference to an index, so it doesn't know if a field has been tokenized. It probably should not tokenize queries against an untokenized field. AFAIAK the queries against untokenized fields are always tokenized and there is no way to tell the QueryParser to not tokenize a field. I have attached a test program that shows the behavior and sample output. The From: lines are user queries. The To: lines are the result of calling QueryParser and then Query.toString(). The 3rd and 4th From/To lines below are the key ones. The goal is to enter a query like url:http://.tropo.com/ or url:http://www.tropo.com/; and not tokenize the 'http://www.tropo.com/'. I tried backslashes too to no avail (url:http\://www.tropo.com/) == C:\proj\tropo_javajava com.tropo.lucene.KeywordProblem From: foo To : foo From: body:foo To : body:foo From: url:http://www.tropo.com/-- first attempt To : http -- first problem, ok, we gotta quote From: url:http://www.tropo.com/; -- second attempt To : http www.tropo.com -- second problem, colon and slashes missing == package com.tropo.lucene; import java.io.*; import java.util.*; import org.apache.lucene.analysis.*; import org.apache.lucene.analysis.standard.*; import org.apache.lucene.search.*; import org.apache.lucene.queryParser.*; public class KeywordProblem { /** * */ public static void main(String[] args) throws Throwable { String body = body; String url = url; String[] lines = new String[] { foo, body:foo, url:http://www.tropo.com/;, url:\http://www.tropo.com/\; }; Analyzer a = new StandardAnalyzer(); for ( int i = 0; i lines.length; i++) { Query query = QueryParser.parse( lines[i], url, a); o.println( From: + lines[i]); o.println( To : + query.toString( url)); o.println(); } } private static PrintStream o = System.out; } -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 12:13 PM To: Lucene Users List Subject: Re: Slash Problem Dave, My recent testing suggests that when the field is not tokenized, it is not split as you suggest. When I search the path field using path:1102/A* I get precisely what I am looking for (though I discovered the lowercase mechanism isn't applied to this field and the query is case-sensitive - not the uppercase 'A' above.) Regards, Terry - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, November 25, 2002 2:58 PM Subject: RE: Slash Problem Funny, I have more or less the same question I've been meaning to post. I think the answer is going to be that the analyzer applies to all parts of a query, even to untokenized fields, which to me seems wrong. So I think if you have a query like body:foo uri:/alpha/beta With 'body' being tokenized and 'uri' not tokenized, I think that the analyzer applies to /alpha/beta and breaks it into alpha beta which is not desired... -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 9:26 AM To: Lucene Users List Subject: Re: Slash Problem Rob, I presume that means that you used backslashes (in the url) rather than forward slashes (in the path). I had planned to test that as a workaround and it's good to know that you've already tested that successfully. But why is this necessary? Why doesn't the escape ('\') allow the use of a backslash? Regards, Terry - Original Message - From: Rob Outar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, November 25, 2002 12:01 PM Subject: RE: Slash Problem I don't know if this helps but I had exact same problem, I then stored the URI instead of the path, I was then able to search on the URI. Thanks, Rob -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday
RE: test case - RE: Slash Problem
Good point though I thought the rule was you were supposed to use the same Analyzer on your Query as you built the index with. Of course I suspect that this will break down if the Field.Keyword text has spaces in it. But: it gets past all reasonable uri/url/filename cases so thanks. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 7:23 PM To: Lucene Users List Subject: Re: test case - RE: Slash Problem Maybe there is a good reason for using WhitespaceAnalyzer in TestQueryParser.java :). Try it. public void testEscaped() throws Exception { Analyzer a = new WhitespaceAnalyzer(); assertQueryEquals(\\[brackets, a, \\[brackets); assertQueryEquals(\\[brackets, null, brackets); assertQueryEquals(, a, ); assertQueryEquals(\\+blah, a, \\+blah); assertQueryEquals(\\(blah, a, \\(blah); } Otis --- Spencer, Dave [EMAIL PROTECTED] wrote: I'm sure there's something that I'm missing here. Let's say we have an index of a web site with 2 fields, body, and url. Body is formed via Field.Text(...,Reader) and the url field by Field.Keyword(), thus the URL is not tokenized but is searchable. I use StandardAnalyzer and I want to find the Document with a matching URL, and I want to use QueryParser to parse the users queries. I'm using v1.2. It seems that, if I'm correct, one design problem is that the Analyzer does not have a reference to an index, so it doesn't know if a field has been tokenized. It probably should not tokenize queries against an untokenized field. AFAIAK the queries against untokenized fields are always tokenized and there is no way to tell the QueryParser to not tokenize a field. I have attached a test program that shows the behavior and sample output. The From: lines are user queries. The To: lines are the result of calling QueryParser and then Query.toString(). The 3rd and 4th From/To lines below are the key ones. The goal is to enter a query like url:http://.tropo.com/ or url:http://www.tropo.com/; and not tokenize the 'http://www.tropo.com/'. I tried backslashes too to no avail (url:http\://www.tropo.com/) == C:\proj\tropo_javajava com.tropo.lucene.KeywordProblem From: foo To : foo From: body:foo To : body:foo From: url:http://www.tropo.com/-- first attempt To : http -- first problem, ok, we gotta quote From: url:http://www.tropo.com/; -- second attempt To : http www.tropo.com -- second problem, colon and slashes missing == package com.tropo.lucene; import java.io.*; import java.util.*; import org.apache.lucene.analysis.*; import org.apache.lucene.analysis.standard.*; import org.apache.lucene.search.*; import org.apache.lucene.queryParser.*; public class KeywordProblem { /** * */ public static void main(String[] args) throws Throwable { String body = body; String url = url; String[] lines = new String[] { foo, body:foo, url:http://www.tropo.com/;, url:\http://www.tropo.com/\; }; Analyzer a = new StandardAnalyzer(); for ( int i = 0; i lines.length; i++) { Query query = QueryParser.parse( lines[i], url, a); o.println( From: + lines[i]); o.println( To : + query.toString( url)); o.println(); } } private static PrintStream o = System.out; } -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 12:13 PM To: Lucene Users List Subject: Re: Slash Problem Dave, My recent testing suggests that when the field is not tokenized, it is not split as you suggest. When I search the path field using path:1102/A* I get precisely what I am looking for (though I discovered the lowercase mechanism isn't applied to this field and the query is case-sensitive - not the uppercase 'A' above.) Regards, Terry - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, November 25, 2002 2:58 PM Subject: RE: Slash Problem Funny, I have more or less the same question I've been meaning to post. I think the answer is going to be that the analyzer applies to all parts of a query, even to untokenized fields, which to me seems wrong. So I think if you have a query like body:foo uri:/alpha/beta With 'body
RE: How to get all field names
This fragment (from a JSP page..) should dump the fields for an index in alphabetical order - this is not precisely what you're asking however -this is all the fields used in an *index*, not a document, but anyway maybe this helps: IndexReader r = IndexReader.open( indexName); TermEnum te = r.terms(); Set s = new TreeSet(); while ( te.next()) { Term t = te.term(); s.add( t.field()); } te.close(); r.close(); o.println( These are all the fields in the index and they can be searched on...p); Iterator it = s.iterator(); while ( it.hasNext()) { o.println( it.next() + br); } -Original Message- From: Christoph Kiehl [mailto:kiehl;subshell.com] Sent: Tuesday, November 12, 2002 1:11 AM To: [EMAIL PROTECTED] Subject: How to get all field names Hi, I was wondering if there is a possibility to get a list of all field names that have ever been used to index a document? This way I could filter out some special fields, like identity and such, and do a search over the remaining. That would give me total freedom to choose any document structure and have all fields searched. Is this possible? Or do anyone of you have a better way achieving that? Regards Christoph -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
the code - RE: Indexing synonyms
terms in the index but I am pressured on time. If you want something more sophisticated is to expand terms depending on the word sense but this requires the expensive process of building a word sense disambiguation. This will solve the problem mentioned by Joshua like 'minute' (time period) and 'minute' (very small). However this is no easy task and time consuming!!! Perhaps in my case doing a query expansion is the best idea and will solve all the hassle but I am still thinking which way to go. Regarding the question how things will be stored in the index it is as you say Otis: Document1: word: word1 word1synonym1 word1synonym2 word1synonym3 But not sure whether I understood your question. regards Aaron - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, November 11, 2002 8:22 PM Subject: RE: Indexing synonyms I always thought that WordNet was not accessible to general public. Wrong? Also, I'm curious - what would you use for storing synonyms? Are you considering using a 'static', read-only Lucene index maybe? An index that makes use of setPosition(0) calls to store synonyms like this, for instance: Document1: word: word1 word1synonym1 word1synonym2 word1synonym3 ... DocumentN: word: wordN wordNsynonym1 wordNsynonym2 wordNsynonym3 Unless I am missing something, and if a synonym database is available, this would be pretty easy to implement, no? Otis --- Spencer, Dave [EMAIL PROTECTED] wrote: Re reducing the set of question/answer pair to consider below - I would expect that using synonyms either in the index or in the reformed query would (annoyingly) increase the number of potential matches or is there something I'm missing. Interesting that this topic just came up as I wanted to experiment w/ the same thing. My first stab at an public domain synonym list, the moby list, didn't seem to have synonyms however. I believe another poster mentioned WordNet so I'll try that. I'd really like it if it was possibly to automatically determine synonyms - maybe something similar to Latent Semantic Analysis - but such things seem kinda hard to code up... -Original Message- From: Aaron Galea [mailto:agale;nextgen.net.mt] Sent: Sunday, November 10, 2002 4:18 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: Re: Indexing synonyms Thanks for all your replies, Well I will start of with an idea of what I am trying to achieve. I am building a question answer system and one of its modules is an FAQ Module. Since the QA system is concerned with education, users can concentrate their question on a particular subject reducing the set of question/answer pair to consider. Since there is this hierarchical indexing the index files are not that big so I can store synonyms for each word in a question in the index. Query expansion will solve the problem and eliminating the need to store synonyms in the index but this will slow things as there is no depth limit to consider for term expansion. It is not my intension to build something similar to the FAQFinder system but I want to further reduce the subset of questions to consider on which a question reformulation algorithm would be applied. Therefore the idea is get an faq file dealing with one education subject, index all of its questions and expand each term in the question. Using lucene I will retrieve the questions that are likely to be similar to a user question, select say the top 5 and apply a query reformulation algorithm. If this succeeds fine and I return the answer to user, otherwise submit the question to an answer extraction module. The most important thing is speed so putting term expansion in the index hopefully should improve things. Obviously problems arise with this method as there is no word sense disambiguation but the query reformulation algorithm will solve this. However it is slow so I must reduce the number of questions it is applied on. It is a tradeoff!!! Well I managed to solve this by overriding the next() method and when it gets to an EOS I start returning the new expanded terms that I accumulated in a list. Thanks everyone for your reply Aaron NB : And yep I am a Malteser Otis ! :) - Original Message - From: Alex Murzaku [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Monday, November 11, 2002 12:17 AM Subject: RE: Indexing synonyms You could also do something with org.apache.lucene.analyzer.Token which includes the following self-explanatory note: /** Set the position increment. This determines the position of this token * relative to the previous Token in a {@link TokenStream}, used in phrase
RE: Indexing synonyms
Re reducing the set of question/answer pair to consider below - I would expect that using synonyms either in the index or in the reformed query would (annoyingly) increase the number of potential matches or is there something I'm missing. Interesting that this topic just came up as I wanted to experiment w/ the same thing. My first stab at an public domain synonym list, the moby list, didn't seem to have synonyms however. I believe another poster mentioned WordNet so I'll try that. I'd really like it if it was possibly to automatically determine synonyms - maybe something similar to Latent Semantic Analysis - but such things seem kinda hard to code up... -Original Message- From: Aaron Galea [mailto:agale;nextgen.net.mt] Sent: Sunday, November 10, 2002 4:18 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: Re: Indexing synonyms Thanks for all your replies, Well I will start of with an idea of what I am trying to achieve. I am building a question answer system and one of its modules is an FAQ Module. Since the QA system is concerned with education, users can concentrate their question on a particular subject reducing the set of question/answer pair to consider. Since there is this hierarchical indexing the index files are not that big so I can store synonyms for each word in a question in the index. Query expansion will solve the problem and eliminating the need to store synonyms in the index but this will slow things as there is no depth limit to consider for term expansion. It is not my intension to build something similar to the FAQFinder system but I want to further reduce the subset of questions to consider on which a question reformulation algorithm would be applied. Therefore the idea is get an faq file dealing with one education subject, index all of its questions and expand each term in the question. Using lucene I will retrieve the questions that are likely to be similar to a user question, select say the top 5 and apply a query reformulation algorithm. If this succeeds fine and I return the answer to user, otherwise submit the question to an answer extraction module. The most important thing is speed so putting term expansion in the index hopefully should improve things. Obviously problems arise with this method as there is no word sense disambiguation but the query reformulation algorithm will solve this. However it is slow so I must reduce the number of questions it is applied on. It is a tradeoff!!! Well I managed to solve this by overriding the next() method and when it gets to an EOS I start returning the new expanded terms that I accumulated in a list. Thanks everyone for your reply Aaron NB : And yep I am a Malteser Otis ! :) - Original Message - From: Alex Murzaku [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Monday, November 11, 2002 12:17 AM Subject: RE: Indexing synonyms You could also do something with org.apache.lucene.analyzer.Token which includes the following self-explanatory note: /** Set the position increment. This determines the position of this token * relative to the previous Token in a {@link TokenStream}, used in phrase * searching. * * pThe default value is one. * * pSome common uses for this are:ul * * liSet it to zero to put multiple terms in the same position. This is * useful if, e.g., a word has multiple stems. Searches for phrases * including either stem will match. In this case, all but the first stem's * increment should be set to zero: the increment of the first instance * should be one. Repeating a token with an increment of zero can also be * used to boost the scores of matches on that token. * * liSet it to values greater than one to inhibit exact phrase matches. * If, for example, one does not want phrases to match across removed stop * words, then one could build a stop word filter that removes stop words and * also sets the increment to the number of stop words removed before each * non-stop word. Then exact phrase queries will only match when the terms * occur with no intervening stop words. * * /ul * @see TermPositions */ public void setPositionIncrement(int positionIncrement) { if (positionIncrement 0) throw new IllegalArgumentException (Increment must be positive: + positionIncrement); this.positionIncrement = positionIncrement; } -- Alex Murzaku ___ alex(at)lissus.com http://www.lissus.com -Original Message- From: Otis Gospodnetic [mailto:otis_gospodnetic;yahoo.com] Sent: Sunday, November 10, 2002 1:30 PM To: Lucene Users List Subject: Re: Indexing synonyms .mt? Malta? That's rare! :) A person called Clemens Marschner just submitted diffs for query rewriting to lucene-dev list 1-2 weeks ago. The diffs are not in CVS yet, and they are a bit old now becase the code they were
RE: Indexing Db Table -- Better way request
We have a number of internal systems here (content mgmt, bug db, support email, CRM), all of which are PHP/MySQL combos - and in all cases Lucene is used for the indexing and we have never seen any reason to go to XML as in intermediate step. We've been at this for 6 months or so. Only hassle is that if the group that's doing the PHP/MySQL tweaks the schema, they have to remember to modify the Lucene indexer so that, say, it picks up the new columns - but there's no way around this unless you want to be very generic, in which case xml still doesn't give you anything since you could just as well use JDBC meta-data to get all columns... -Original Message- From: Michael Caughey [mailto:michael;caughey.com] Sent: Friday, November 08, 2002 4:21 PM To: Spencer, Dave; Lucene Users List Subject: Re: Indexing Db Table -- Better way request Converting straight to a document seemed to me the best answer as I started to investigate. Somewhere along the line I thought I remembered seeing a suggestion that it was for some reason better to convert to XML and then add it as an XML document. I'd rather not have the hassel of creating then later parsing the XML. I could not find the reference again. This in part was what I was hoping to hear. Thanks, Michael - Original Message - From: Spencer, Dave [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, November 08, 2002 6:59 PM Subject: RE: Indexing Db Table -- Better way request One small comment: what's the point of converting a row to XML? What I think you want to do is convert a row to a Document and then pass that off to IndexWriter. -Original Message- From: Caughey, Michael [mailto:mcaughey;trigon.com] Sent: Friday, November 08, 2002 2:22 PM To: '[EMAIL PROTECTED]' Cc: '[EMAIL PROTECTED]' Subject: Indexing Db Table -- Better way request Hello, I'm new to Lucene and this group, if it is improper to send such a message to this group I apologize. I tried to do a reasonable amount of up front research before coming here. I'm about to undertake a piece of my project where I've decided that Lucene will be of use. I have been researching, over the past two week's, ways to accomplish this. I know I'll use an indexWriter to write the index to a file, but I'm having difficultly settling on how to process the data to be indexed. What I have is a table in a MySQL database called items. I want to be able to search on a couple of fields and have it return the ID: Fields: = Name VARCHAR (80) Description TEXT Location VARCHAR (80) Qty int ExpireDate Long MMDD Category int ListingPrice FLOAT(9,2) Supplier int Return = ItemId int On start up of the application every row in the database will be read. After that I need to keep the table and the index in sync. Data in the columns can change, rows can be added and removed. I have a centeral entity controller which is responsible for all access to that table. I figured on approach which would work would be on start up to read each row and build an XML document and submit it to the IndexWriter. As Inserts, Deletes and updates occurred I could modify both lucene and the database. Seems simple enough, and may be the only way to handle it. Before I did it I wanted to make sure that there wasn't a better way. Are there documents which can automatically read the table and build a document? Should I read the row and just build fields and construct a document? Does anyone see any problems with storing it in memory versus writing it to a file? Or should I say at point would you consider writing it to a file, would you base that on total document size? I feel that a file index will most likely be just fine. Thanks in advance for any suggestions. Michael Caughey -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: stopwords
Suggestion - include a ref to org.apache.lucene.analysis.StopFilter.makeStopTable() which some users of this stop word list will use. Also maybe you want to put in a ref to SMART - this may be the offical download site: ftp://ftp.cs.cornell.edu/pub/smart/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodnetic;yahoo.com] Sent: Thursday, October 17, 2002 11:31 AM To: Lucene Users List Cc: John Caron Subject: Re: stopwords Thanks. I may stick this in the Lucene CVS repository somewhere. Otis --- John Caron [EMAIL PROTECTED] wrote: i am just starting to use lucene, and it it very impressive! I hope to try Dmitri's new term vectors when he gets them in, in order to do vector model research, in particular LSA. i will port my existing code to use lucene framework, and make it available when it is ready. I am appending a longer list of stop words, mostly from SMART, in case these are useful to anyone. Thanks again! private static String smart[] = { a, able, about, above, according, accordingly, across, actually, after, afterwards, again, against, all, allow, allows, almost, alone, along, already, also, although, always, am, among, amongst, an, and, another, any, anybody, anyhow, anyone, anything, anyway, anyways, anywhere, apart, appear, appreciate, appropriate, are, around, as, aside, ask, asking, associated, at, available, away, awfully, b, be, became, because, become, becomes, becoming, been, before, beforehand, behind, being, believe, below, beside, besides, best, better, between, beyond, both, brief, but, by, c, came, can, cannot, cant, cause, causes, certain, certainly, changes, clearly, co, com, come, comes, concerning, consequently, consider, considering, contain, containing, contains, corresponding, could, course, currently, d, definitely, described, despite, did, different, do, does, doing, done, down, downwards, during, e, each, edu, eg, eight, either, else, elsewhere, enough, entirely, especially, et, etc, even, ever, every, everybody, everyone, everything, everywhere, ex, exactly, example, except, f, far, few, fifth, first, five, followed, following, follows, for, former, formerly, forth, four, from, further, furthermore, g, get, gets, getting, given, gives, go, goes, going, gone, got, gotten, greetings, h, had, happens, hardly, has, have, having, he, hello, help, hence, her, here, hereafter, hereby, herein, hereupon, hers, herself, hi, him, himself, his, hither, hopefully, how, howbeit, however, i, ie, if, ignored, immediate, in, inasmuch, inc, indeed, indicate, indicated, indicates, inner, insofar, instead, into, inward, is, it, its, itself, j, just, k, keep, keeps, kept, know, knows, known, l, last, lately, later, latter, latterly, least, less, lest, let, like, liked, likely, little, look, looking, looks, ltd, m, mainly, many, may, maybe, me, mean, meanwhile, merely, might, more, moreover, most, mostly, much, must, my, myself, n, name, namely, nd, near, nearly, necessary, need, needs, neither, never, nevertheless, new, next, nine, no, nobody, non, none, noone, nor, normally, not, nothing, novel, now, nowhere, o, obviously, of, off, often, oh, ok, okay, old, on, once, one, ones, only, onto, or, other, others, otherwise, ought, our, ours, ourselves, out, outside, over, overall, own, p, particular, particularly, per, perhaps, placed, please, plus, possible, presumably, probably, provides, q, que, quite, qv, r, rather, rd, re, really, reasonably, regarding, regardless, regards, relatively, respectively, right, s, said, same, saw, say, saying, says, second,
RE: Using Pooled IndexSearchers?
I/O buffering would certainly be handled by the OS but in theory the application can do its own buffering -and in a sense RAMDirectory is an extreme example of this. Having an app w/ an adjustable buffer pool gives you more options for tuning. -Original Message- From: Jonathan Pace [mailto:jmpace;fedex.com] Sent: Thursday, October 17, 2002 10:54 AM To: Lucene Users List Subject: RE: Using Pooled IndexSearchers? The index is only a gig, but of course, optimizing will increase that size substantially. At the rate our index grows, it would be better to keep it in a disk array. I assume that I/O buffering would be handled by the underlying OS wouldn't it? -jon -Original Message- From: Spencer, Dave [mailto:dave;lumos.com] Sent: Thursday, October 17, 2002 11:45 AM To: Lucene Users List Subject: RE: Using Pooled IndexSearchers? One idea - have you tried searching with a RAMDirectory instead of an FSDirectory? If you index fits into memory then this could be a win. Some notes code here: http://www.tropo.com/techno/java/lucene/rammer.html Note: I know some people have huge indexes that can't fit into RAM...but I'm sure I've read that Google uses solid state (ram) disks in their search farm. Can't find the article however that says this. Might have been an interview w/ E. Schmidt. Also: does Lucene have any buffer control in the API? In theory shouldn't IndexSearcher, or FSDirectory, have control over buffering of disk blocks? -Original Message- From: Jonathan Pace [mailto:jmpace;fedex.com] Sent: Thursday, October 17, 2002 8:08 AM To: Lucene Users List Subject: Using Pooled IndexSearchers? Just a question for the group. Is anyone using or have benchmarked a pooled IndexSearcher setup? (Especially the Jakarta Commons POOL implementations) I am looking to increase the concurrent search performance because quite a few of our users use DateFiltering which dramatically increases search times. Is it worth the effort? Thankyou in advance. Jonathan M Pace Sr Programmer/Analyst Corporate Portal Development FedEx Services 60 FedEx Pkwy 1st Floor Horiz 901-263-4744 [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org -- To unsubscribe, e-mail: mailto:lucene-user-unsubscribe;jakarta.apache.org For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org
RE: Performance with 5 Millions indexed items
I have a 1GHz P4 w/ 512MB of RAM and prob a standard 7200 RPM disk. Running w/ JDK1.4. I have indexed the content from dmoz.org [maybe I should donate this as a kind of example] and the index size is 1GB and it has 3.2M docs in it. I think it takes around 4 hours to produce the index. Briefly, for one quick test, a fuzzy 2 word search takes 10x as long as the same search unfuzzy. Searching for: title:kasparov 35 total matching documents after 1232(ms) Searching for: title:kasparov title:chess 1046 total matching documents after 1272(ms) Searching for: title:kasparov~ title:chess~ 18965 total matching documents after 11276(ms) As an aside, you can get the dmoz.org content here: http://dmoz.org/rdf.html I indexed content.rdf.u8.gz. It is invalid xml(!) and I couldn't get several SAX parsers to work so I had to use Electric XML. -Original Message- From: Mader, Volker [mailto:[EMAIL PROTECTED]] Sent: Tuesday, September 10, 2002 12:00 AM To: [EMAIL PROTECTED] Subject: Performance with 5 Millions indexed items Hi, I've got a question about performance with bigger indexes. We used IndexWriter with GermanAnalyzer to index data with the following fields: Field1: ID (a long value) Field2: Description (a free text) Field3: Groups (a list of up to 10 long values encoded in a single string) Field4: Classes (a list of up to 10 long values encoded in a single string) Documents are created with the 4 fields and then added to the Indexwriter. After all the index is optimized. Searching now for a word in field Description using IndexSearcher(GermanAnalyzer) with FuzzyQuery leads to search times up to 30 seconds on a Pentium 4 1,4GHz. Also the retrieval with hits.doc(..) is very slow. Any ideas? Volker -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Question Deleting/Reindexing Files
[1] There's no update so delete and then add is what you want. [2] I have had the same problems w/ using an IndexWriter and IndexReader at the same time and getting a locking problem when deleting. I think I sent mail to the list w/ a test case a week ago [disclaimer: this is not a complaint!] and I think the issue is still open. Maybe I should turn this into a bug report? I know fixing bugs is encourage but I don't have enough context about the right solution, or how the locking apparently changed to foul this up, though I did look thru things. My workaround was to write new entries to a new index and then run a separate merge utility that 1st does a delete pass, and then reopens and does adds, based on a primary key (the URL of each doc in my case). -Original Message- From: Joe Hajek [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 20, 2002 12:28 AM To: [EMAIL PROTECTED] Subject: Question Deleting/Reindexing Files Hi, I am using Lucene for indexing a relatively large article based system where articles change from time to time so i have to reindex them. reindexing had the effekt that a query would return the hit for a file multiple times (according to the number of updates. The only solution to that problem I found was to delete the file to be updated before indexing it again. Is there another possibility ? As the system is large i am collecting the articles that have to be updated together, open a writer and add the documents to the index. this solution worked fine for me using rc1 in rc4 it seems that it is not possible anymore to delete a file from an index while the index is opened for writing. do you know any solutions to that problem ? thanx a lot in advance regards joe -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: getting relative path after searching
I think this is a JSP question, not a Lucene question, and the answer is application.getResouce(...) or application.getRealPath() http://www.jspinsider.com/reference/jsp/jspapplication.html -Original Message- From: Parag Dharmadhikari [mailto:[EMAIL PROTECTED]] Sent: Monday, March 18, 2002 6:08 AM To: Lucene Users List Subject: getting relative path after searching Hi all, When searching is done it gives you the full path of the searched document for instance D:/tomcat/webapps/Root/Office/Office/xyz.doc. Now if I want only relative path like /Office/Office/xyz.doc instead of total path then how should I proceed/ regards parag -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Search result ordering question
Is this question still pending? Well I haven't tried it but DateFilter might be what you're looking for: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DateF ilter.html You could also add a field that's a kind of enumerated indicating how recent the doc is. You add a field when with a value of day, week, month, year, to indicate if it is a day old, week old etc. Then you query using a boost: when:day^2.0 when:week^1.8 when:month^1.6 when:year:^1.4 and priority will be given to newer docs. -Original Message- From: Kent Vilhelmsen [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 12, 2002 12:00 PM To: [EMAIL PROTECTED] Subject: Search result ordering question I've been using Lucene a bit, and find it very flexible and fast. However, I need to order search results by date (or, equally, document id); I've looked a bit into (re)writing a collect method without any luck. I'm not programming Java too much, so I'm not getting any way with the (few) hints I've seen regarding date-sorted result sets. Does anyone have a quick solution/example to give? thanks, Kent Vilhelmsen -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Deleting documents
I think I've come across the same problem. If you have an indexer that adds docs and also deletes docs as it goes (use case: it's updating old docs or adding new ones) it seems that you always get an exception like this thrown from IndexReader.delete(). java.io.IOException: Index locked for write: Lock@C:\tmp\luc\locktest\write.lock I had code similar to the code below, and then modified to explicitely use the same Directory, to no avail. Approx code: Directory dir = FSDirectory.getDirectory( indexName, create); IndexWriter writer = new IndexWriter( dir, ..., create); IndexReader reader = IndexReader.open( dir); // now calls to writer.addDocument() work // if you call reader.delete(int) it fails I've attached the full src below though it's a bit messy w/ trace statements. Should work fine as an isolation test case. Uses windows dir names, sorry to Unix folk. This fails against rc4 and also the latest build (0312). I'm positive a few months ago this stuff worked fine. If this is indeed a bug then I think the IndexReader and IndexWriter should know they're sharing a Directory, whereas now they don't seem to. As a side note I've always found it strange that IndexReader was used to delete entries. reader to me means read-only, thus I would have expected IndexWriter to be the thing that is used to add/delete documents. -Original Message- From: Aruna Raghavan [mailto:[EMAIL PROTECTED]] Sent: Friday, March 08, 2002 10:40 AM To: 'Lucene Users List' Subject: Deleting documents Hi, Is there anything wrong with the following code? try { m_lock.write(); // obtain a write lock on a RWLock IndexReader indexReader = IndexReader.open(mypath); IndexSearcher indexSearcher = new IndexSearcher(mypath); // use the searcher to search for documents to be deleted // use the reader to do the deletes. indexReader.close(); } catch(Throwable e) { e.printStackTrace(); } finally { m_lock.unlock(); } Sometimes I am getting the following exception: java.io.IOException: Index locked for write: Lock@D:\RevealCS\Search\Data\reports\write.lock at org.apache.lucene.index.IndexReader.delete(Unknown Source) at org.apache.lucene.index.IndexReader.delete(Unknown Source) at revsearch.RevSearch$DeleteWatcherThread.checkAction(RevSearch.java:1455) at revsearch.RevSearch$WatcherThread.run(RevSearch.java:250) This exception was not happening every time the code was run, it was intermittent. I suspect it is because I am using indexSearcher and indexWriter to open the myPath dir. I changed it such that indexSearcher uses the indexReader in the constructor. I am hoping that some one can shed some light on what went wrong, thanks. Aruna. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] LockTest.java Description: LockTest.java -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]