Re: Does Luke already support vector search or are there any plans to support vector search?
(vector of for example 768 dimensions). I currently see two possibilities to do this: - Import/open the embedding from a file - Connecting the regular search input with a service generating the embedding, like for example https://github.com/hanxiao/bert-as-service to support vector search codec (it'd be costly operation to decode vectors with several hundreds of dimensions); though I am open to new ideas which are feasible and useful. I think beside the query it would be nice if Luke would display some "stats" of the index, for example the various fields beside the actual vector and also how many vectors are inside the index Nonetheless the error you saw is not great; we could improve that by just ignoring the codec for now. maybe I can try to improve this :-) Can you give me a hint where in the code this check does currently happen? (I guess where the error is happening about the corrupted index) Thanks Michael Tomoko 2021年7月6日(火) 16:23 Michael Wechner : Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and I can search successfully :-) I tried to open the index with Luke, but receive an error, that the index might be corrupt. Does Luke already support analyzing a vector search index? If not, are there any plans to support vector search? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does Luke already support vector search or are there any plans to support vector search?
e for example https://github.com/hanxiao/bert-as-service to support vector search codec (it'd be costly operation to decode vectors with several hundreds of dimensions); though I am open to new ideas which are feasible and useful. I think beside the query it would be nice if Luke would display some "stats" of the index, for example the various fields beside the actual vector and also how many vectors are inside the index Nonetheless the error you saw is not great; we could improve that by just ignoring the codec for now. maybe I can try to improve this :-) Can you give me a hint where in the code this check does currently happen? (I guess where the error is happening about the corrupted index) Thanks Michael Tomoko 2021年7月6日(火) 16:23 Michael Wechner : Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and I can search successfully :-) I tried to open the index with Luke, but receive an error, that the index might be corrupt. Does Luke already support analyzing a vector search index? If not, are there any plans to support vector search? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does Luke already support vector search or are there any plans to support vector search?
n't tried to read > >>> indexes that includes vector values with Luke). > >>> The stack traces you might see should include full information to fix > >>> or improve it. > >>> > >>> Tomoko > >>> > >>> 2021年7月13日(火) 14:22 Michael Wechner : > >>>> > >>>> Am 13.07.21 um 04:22 schrieb Tomoko Uchida: > >>>>> There isn't any plans for that, and I'm not sure what is actually > >>>>> expected of the GUI tool > >>>> yes, I understand, the input for the query would have to be an > >>>> embedding > >>>> (vector of for example 768 dimensions). > >>>> > >>>> I currently see two possibilities to do this: > >>>> > >>>> - Import/open the embedding from a file > >>>> - Connecting the regular search input with a service generating the > >>>> embedding, like for example https://github.com/hanxiao/bert-as-service > >>>> > >>>>>to support vector search codec (it'd be > >>>>> costly operation to decode vectors with several hundreds of > >>>>> dimensions); though I am open to new ideas which are feasible and > >>>>> useful. > >>>> I think beside the query it would be nice if Luke would display some > >>>> "stats" of the index, for example the various fields beside the actual > >>>> vector and also how many vectors are inside the index > >>>> > >>>>> Nonetheless the error you saw is not great; we could improve that by > >>>>> just ignoring the codec for now. > >>>> maybe I can try to improve this :-) > >>>> > >>>> Can you give me a hint where in the code this check does currently > >>>> happen? > >>>> (I guess where the error is happening about the corrupted index) > >>>> > >>>> Thanks > >>>> > >>>> Michael > >>>> > >>>>> Tomoko > >>>>> > >>>>> 2021年7月6日(火) 16:23 Michael Wechner : > >>>>>> Hi > >>>>>> > >>>>>> I just created a Lucene vector search index with > >>>>>> Lucene-9.0.0-SNAPSHOT > >>>>>> based on train-v2.0.json of SQuAD > >>>>>> (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs > >>>>>> (for the embedding I used SentenceBERT). > >>>>>> > >>>>>> It took a couple of hours on my Mac laptop, but it worked in the > >>>>>> end and > >>>>>> I can search successfully :-) > >>>>>> > >>>>>> I tried to open the index with Luke, but receive an error, that the > >>>>>> index might be corrupt. > >>>>>> > >>>>>> Does Luke already support analyzing a vector search index? If > >>>>>> not, are > >>>>>> there any plans to support vector search? > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> Michael > >>>>>> > >>>>>> - > >>>>>> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>>>> > >>>>> - > >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>>> > >>>> > >>>> - > >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>>> > >>> - > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>> > >> > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does Luke already support vector search or are there any plans to support vector search?
w many vectors are inside the index Nonetheless the error you saw is not great; we could improve that by just ignoring the codec for now. maybe I can try to improve this :-) Can you give me a hint where in the code this check does currently happen? (I guess where the error is happening about the corrupted index) Thanks Michael Tomoko 2021年7月6日(火) 16:23 Michael Wechner : Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and I can search successfully :-) I tried to open the index with Luke, but receive an error, that the index might be corrupt. Does Luke already support analyzing a vector search index? If not, are there any plans to support vector search? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does Luke already support vector search or are there any plans to support vector search?
I analyzed the logs and the class/method lucene/luke/src/java/org/apache/lucene/luke/models/util/IndexUtils.java#openIndex(String, String) and realized that the problem was not the index itself, but that the index directory/path did not exist anymore. I forgot that I renamed the index directory, but Luke displayed in the dropdown "Index Path" the previously opened directory paths. So when I selected the one which did not exist anymore and I received the error message "Not a valid lucene index directory or corrupted?" and I wrongly assumed that the problem is because the index is a vector search index. So Luke is able to open the vector search index and displays the correct number of indexed vectors :-) Sorry for the noise! Nevertheless it might make sense to enhance the error message, that if one tries to open a directory which does not exist, then the error message reads "No such directory" Or that the dropdown "Index Path" is checking whether the previously opened directories still exist. Thanks Michael Am 13.07.21 um 10:47 schrieb Michael Wechner: thanks again for your feeback! I will give it a try and get back if I should have more questions :-) Thanks Michael Am 13.07.21 um 09:58 schrieb Tomoko Uchida: I think beside the query it would be nice if Luke would display some "stats" of the index, for example the various fields beside the actual vector and also how many vectors are inside the index It would be a good start point, I think. Can you give me a hint where in the code this check does currently happen? (I guess where the error is happening about the corrupted index) Actually I have few clues about where to start (haven't tried to read indexes that includes vector values with Luke). The stack traces you might see should include full information to fix or improve it. Tomoko 2021年7月13日(火) 14:22 Michael Wechner : Am 13.07.21 um 04:22 schrieb Tomoko Uchida: There isn't any plans for that, and I'm not sure what is actually expected of the GUI tool yes, I understand, the input for the query would have to be an embedding (vector of for example 768 dimensions). I currently see two possibilities to do this: - Import/open the embedding from a file - Connecting the regular search input with a service generating the embedding, like for example https://github.com/hanxiao/bert-as-service to support vector search codec (it'd be costly operation to decode vectors with several hundreds of dimensions); though I am open to new ideas which are feasible and useful. I think beside the query it would be nice if Luke would display some "stats" of the index, for example the various fields beside the actual vector and also how many vectors are inside the index Nonetheless the error you saw is not great; we could improve that by just ignoring the codec for now. maybe I can try to improve this :-) Can you give me a hint where in the code this check does currently happen? (I guess where the error is happening about the corrupted index) Thanks Michael Tomoko 2021年7月6日(火) 16:23 Michael Wechner : Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and I can search successfully :-) I tried to open the index with Luke, but receive an error, that the index might be corrupt. Does Luke already support analyzing a vector search index? If not, are there any plans to support vector search? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does Luke already support vector search or are there any plans to support vector search?
thanks again for your feeback! I will give it a try and get back if I should have more questions :-) Thanks Michael Am 13.07.21 um 09:58 schrieb Tomoko Uchida: I think beside the query it would be nice if Luke would display some "stats" of the index, for example the various fields beside the actual vector and also how many vectors are inside the index It would be a good start point, I think. Can you give me a hint where in the code this check does currently happen? (I guess where the error is happening about the corrupted index) Actually I have few clues about where to start (haven't tried to read indexes that includes vector values with Luke). The stack traces you might see should include full information to fix or improve it. Tomoko 2021年7月13日(火) 14:22 Michael Wechner : Am 13.07.21 um 04:22 schrieb Tomoko Uchida: There isn't any plans for that, and I'm not sure what is actually expected of the GUI tool yes, I understand, the input for the query would have to be an embedding (vector of for example 768 dimensions). I currently see two possibilities to do this: - Import/open the embedding from a file - Connecting the regular search input with a service generating the embedding, like for example https://github.com/hanxiao/bert-as-service to support vector search codec (it'd be costly operation to decode vectors with several hundreds of dimensions); though I am open to new ideas which are feasible and useful. I think beside the query it would be nice if Luke would display some "stats" of the index, for example the various fields beside the actual vector and also how many vectors are inside the index Nonetheless the error you saw is not great; we could improve that by just ignoring the codec for now. maybe I can try to improve this :-) Can you give me a hint where in the code this check does currently happen? (I guess where the error is happening about the corrupted index) Thanks Michael Tomoko 2021年7月6日(火) 16:23 Michael Wechner : Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and I can search successfully :-) I tried to open the index with Luke, but receive an error, that the index might be corrupt. Does Luke already support analyzing a vector search index? If not, are there any plans to support vector search? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does Luke already support vector search or are there any plans to support vector search?
> I think beside the query it would be nice if Luke would display some > "stats" of the index, for example the various fields beside the actual > vector and also how many vectors are inside the index It would be a good start point, I think. > Can you give me a hint where in the code this check does currently happen? > (I guess where the error is happening about the corrupted index) Actually I have few clues about where to start (haven't tried to read indexes that includes vector values with Luke). The stack traces you might see should include full information to fix or improve it. Tomoko 2021年7月13日(火) 14:22 Michael Wechner : > > > Am 13.07.21 um 04:22 schrieb Tomoko Uchida: > > There isn't any plans for that, and I'm not sure what is actually > > expected of the GUI tool > > yes, I understand, the input for the query would have to be an embedding > (vector of for example 768 dimensions). > > I currently see two possibilities to do this: > > - Import/open the embedding from a file > - Connecting the regular search input with a service generating the > embedding, like for example https://github.com/hanxiao/bert-as-service > > > to support vector search codec (it'd be > > costly operation to decode vectors with several hundreds of > > dimensions); though I am open to new ideas which are feasible and > > useful. > > I think beside the query it would be nice if Luke would display some > "stats" of the index, for example the various fields beside the actual > vector and also how many vectors are inside the index > > > Nonetheless the error you saw is not great; we could improve that by > > just ignoring the codec for now. > > maybe I can try to improve this :-) > > Can you give me a hint where in the code this check does currently happen? > (I guess where the error is happening about the corrupted index) > > Thanks > > Michael > > > > > Tomoko > > > > 2021年7月6日(火) 16:23 Michael Wechner : > >> Hi > >> > >> I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT > >> based on train-v2.0.json of SQuAD > >> (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs > >> (for the embedding I used SentenceBERT). > >> > >> It took a couple of hours on my Mac laptop, but it worked in the end and > >> I can search successfully :-) > >> > >> I tried to open the index with Luke, but receive an error, that the > >> index might be corrupt. > >> > >> Does Luke already support analyzing a vector search index? If not, are > >> there any plans to support vector search? > >> > >> Thanks > >> > >> Michael > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does Luke already support vector search or are there any plans to support vector search?
Am 13.07.21 um 04:22 schrieb Tomoko Uchida: There isn't any plans for that, and I'm not sure what is actually expected of the GUI tool yes, I understand, the input for the query would have to be an embedding (vector of for example 768 dimensions). I currently see two possibilities to do this: - Import/open the embedding from a file - Connecting the regular search input with a service generating the embedding, like for example https://github.com/hanxiao/bert-as-service to support vector search codec (it'd be costly operation to decode vectors with several hundreds of dimensions); though I am open to new ideas which are feasible and useful. I think beside the query it would be nice if Luke would display some "stats" of the index, for example the various fields beside the actual vector and also how many vectors are inside the index Nonetheless the error you saw is not great; we could improve that by just ignoring the codec for now. maybe I can try to improve this :-) Can you give me a hint where in the code this check does currently happen? (I guess where the error is happening about the corrupted index) Thanks Michael Tomoko 2021年7月6日(火) 16:23 Michael Wechner : Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and I can search successfully :-) I tried to open the index with Luke, but receive an error, that the index might be corrupt. Does Luke already support analyzing a vector search index? If not, are there any plans to support vector search? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Does Luke already support vector search or are there any plans to support vector search?
There isn't any plans for that, and I'm not sure what is actually expected of the GUI tool to support vector search codec (it'd be costly operation to decode vectors with several hundreds of dimensions); though I am open to new ideas which are feasible and useful. Nonetheless the error you saw is not great; we could improve that by just ignoring the codec for now. Tomoko 2021年7月6日(火) 16:23 Michael Wechner : > > Hi > > I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT > based on train-v2.0.json of SQuAD > (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs > (for the embedding I used SentenceBERT). > > It took a couple of hours on my Mac laptop, but it worked in the end and > I can search successfully :-) > > I tried to open the index with Luke, but receive an error, that the > index might be corrupt. > > Does Luke already support analyzing a vector search index? If not, are > there any plans to support vector search? > > Thanks > > Michael > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Does Luke already support vector search or are there any plans to support vector search?
Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and I can search successfully :-) I tried to open the index with Luke, but receive an error, that the index might be corrupt. Does Luke already support analyzing a vector search index? If not, are there any plans to support vector search? Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org