Hi

I am working on a webapp called "Katie" in order to detect duplicated questions

https://ukatie.com/

As a test case I have imported the Lucene FAQ

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ

to

https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en

and made them available at

https://lucene-faq.ukatie.com/

whereas the FAQ are loaded as JSON from the REST interface of Katie

https://ukatie.com/swagger-ui/?urls.primaryName=API%20V2#/faq-controller-v-2/getFAQUsingGET_1

and the Javascript can be found at

https://github.com/wyona/katie-4-faq

I am currently "experimenting" with different search algorithms, e.g.

Lucene only
SentenceBERT- Lucene Vector Search
SentenceBERT only
Weaviate

The goal is to find the right answer with "similar" questions, e.g.

- "Are there mailing lists?"
- "How can I ask questions re Lucene?"

independent whether the question was trained/indexed or not or the answer contains keywords of the question

whereas the answer in this particular case is

https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=e19b6f48-62ac-427a-9d5e-d4e4eb110769

and another meaningful answer could be

https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=154d9aa7-29e6-457e-a2ad-315b1a67599f

There is still a lot to be improved :-) but it is lot of fun to use Lucene for this!

Any feedback is very welcome or if you want to know more about the implementation details.

Thanks

Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to