Am 21.12.21 um 18:49 schrieb Michael Sokolov:
interesting -- it always matches *something* I guess?

yes, but this is something I would like to improve, that it knows when it does not know :-)

I understand Lucene provides a score, but just defining a threshold doesn't really solve the problem, or do I misunderstand this?

It seems to me one has to implement some kind of "understanding / reasoning" in order to solve this. Or what would be your approach?

  It might be
helpful to show not only the answer, but also the question that was
matched?

yes, definitely, whereas the Katie frontend already provides this functionality

https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en

but I have to enhance the Javascript client used at

https://lucene-faq.ukatie.com/

Thanks

Michael


On Mon, Dec 20, 2021 at 5:05 AM Michael Wechner
<michael.wech...@wyona.com> wrote:
Hi

I am working on a webapp called "Katie" in order to detect duplicated
questions

https://ukatie.com/

As a test case I have imported the Lucene FAQ

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ

to

https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en

and made them available at

https://lucene-faq.ukatie.com/

whereas the FAQ are loaded as JSON from the REST interface of Katie

https://ukatie.com/swagger-ui/?urls.primaryName=API%20V2#/faq-controller-v-2/getFAQUsingGET_1

and the Javascript can be found at

https://github.com/wyona/katie-4-faq

I am currently "experimenting" with different search algorithms, e.g.

Lucene only
SentenceBERT- Lucene Vector Search
SentenceBERT only
Weaviate

The goal is to find the right answer with "similar" questions, e.g.

- "Are there mailing lists?"
- "How can I ask questions re Lucene?"

independent whether the question was trained/indexed or not or the
answer contains keywords of the question

whereas the answer in this particular case is

https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=e19b6f48-62ac-427a-9d5e-d4e4eb110769

and another meaningful answer could be

https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=154d9aa7-29e6-457e-a2ad-315b1a67599f

There is still a lot to be improved :-) but it is lot of fun to use
Lucene for this!

Any feedback is very welcome or if you want to know more about the
implementation details.

Thanks

Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to