Hello, Lingway (http://www.lingway.com) is a french company that specializes in the design, development and implementation of linguistics-based software solutions. We are using Lucene in one of our projects, which can be seen at http://kant.lingway.com/LGfisc/index.html.
This demo provides an access to fiscal legal texts in French (Code General des Impôts) through our linguistic technology, which analyses the user input, retrieves the most relevant terms and adds semantically related terms. This helps to retrieve more documents related to the query. An other aspect is that the linguistic analysis gives automatically all possibles forms for a word (singular, plural, masculine, feminine) and corrects some user mistyping (like the lack of accent in impots for impôts). The analysis provides a disambiguisation between homographic forms (e.g. verb to book and noun a book). This is why the system proposes related terms only for the form found in the user sentence. At last, the boolean operators used in the query are computed according to the weight and role of terms in the user query. Since the documentation of the demo is in French (by the way it could be interesting to know where the Lucene user come from, and in which proportions), I'll give you a brief overview of the functionalities. Let's figure that we typed the following query : réduction d'impôts pour les couples 1/ The number of documents found is indicated by : 24 documents trouvés sur (réduction d' impôt), couple The second element ( (réduction d' impôt), couple ) gives an information about which terms (and their related ones) have been sent to the query. You can try other analysis by passing the mouse over this element, which will display a contextual menu with all possible degradations of the original query. By default, the system returns the results for the best matching analysis. 2/ Information about the document Article 200 sexies Section V : Calcul de l'impôt Termes pertinents : réductions impôt - couple - couples - famille - exonérés Clicking on the document's reference (Article 200 sexies) opens the document in a pop-up window. (see below) Relevant terms are indicated in grey in the second line. These terms have been sent to Lucene in a query generated by the system and appears in that document. Note that the color of the terms depends on the weight of this term in this document (more relevant terms are darker). Here we can see that the boolean query generated contains not only the words present in the original query (réduction - impôt - couple) but also related terms found by the linguistic analysis (famille - exonérés) and morphologic variations (singular - plural forms). We can see also that "réduction d' impôts" has been recognized as a compound word. This functionality helps the user to know roughly what's the content of a document without opening it. 3/ Displaying the document A click on the document's reference opens it in a pop-up window. The system highlights the words of the text which are present in the query. This functionality uses partially Mark Schreiber's proposals (http://www.iq-computing.de/lucene/highlight.htm), the difference beeing that our highlighter recognizes coumpound words (e.g. it will highlight "réduction d'impôts" as a whole and not separately "réduction" and "impôts"). ------------------------------------------------------- A set of examples (in French of course) are available at http://kant.lingway.com/LGfisc/about.html#exemples Committers : We would really be happy to be mentionned in the powered by Lucene page (http://jakarta.apache.org/lucene/docs/powered.html). Is it possible? An demo of our system in English is planned. We are waiting for your suggestions : what would you like us to show you? Any questions or comments are welcome. You can send them to [EMAIL PROTECTED] Please take a look at our site (www.lingway.com) for more information about our activities. Thank you Julien Nioche / www.lingway.com