Am 21.12.21 um 18:49 schrieb Michael Sokolov:
interesting -- it always matches *something* I guess?
yes, but this is something I would like to improve, that it knows when
it does not know :-)
I understand Lucene provides a score, but just defining a threshold
doesn't really solve the problem, or do I misunderstand this?
It seems to me one has to implement some kind of "understanding /
reasoning" in order to solve this. Or what would be your approach?
It might be
helpful to show not only the answer, but also the question that was
matched?
yes, definitely, whereas the Katie frontend already provides this
functionality
https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en
but I have to enhance the Javascript client used at
https://lucene-faq.ukatie.com/
Thanks
Michael
On Mon, Dec 20, 2021 at 5:05 AM Michael Wechner
<michael.wech...@wyona.com> wrote:
Hi
I am working on a webapp called "Katie" in order to detect duplicated
questions
https://ukatie.com/
As a test case I have imported the Lucene FAQ
https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ
to
https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en
and made them available at
https://lucene-faq.ukatie.com/
whereas the FAQ are loaded as JSON from the REST interface of Katie
https://ukatie.com/swagger-ui/?urls.primaryName=API%20V2#/faq-controller-v-2/getFAQUsingGET_1
and the Javascript can be found at
https://github.com/wyona/katie-4-faq
I am currently "experimenting" with different search algorithms, e.g.
Lucene only
SentenceBERT- Lucene Vector Search
SentenceBERT only
Weaviate
The goal is to find the right answer with "similar" questions, e.g.
- "Are there mailing lists?"
- "How can I ask questions re Lucene?"
independent whether the question was trained/indexed or not or the
answer contains keywords of the question
whereas the answer in this particular case is
https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=e19b6f48-62ac-427a-9d5e-d4e4eb110769
and another meaningful answer could be
https://ukatie.com/#/read-answer?domain-id=9f206aec-5223-4e03-a2fc-c16e4b885ef8&uuid=154d9aa7-29e6-457e-a2ad-315b1a67599f
There is still a lot to be improved :-) but it is lot of fun to use
Lucene for this!
Any feedback is very welcome or if you want to know more about the
implementation details.
Thanks
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org