Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-17 Thread Michael Wechner

Hi Tomoko

Just noticed that you resolved the issue and also did some additional 
improvement :-)


Thanks a lot!

Michael

Am 14.07.21 um 07:52 schrieb Michael Wechner:
sure, I understand, but I just wanted to ask whether such a change 
makes sense actually.


I have created a Jira ticket

https://issues.apache.org/jira/browse/LUCENE-10024

and added the patch as attachment. Let me know if you prefer a pull 
request.


Cheers

Michael

Am 14.07.21 um 03:43 schrieb Tomoko Uchida:

We don't accept patches by email... please open a Jira.


2021年7月14日(水) 5:58 Michael Wechner :

would the following patch make sense?

git diff lucene/luke/src/
diff --git
a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
b/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
index f3fc635872b..ad13745eec8 100644
--- a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
+++ b/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
@@ -18,6 +18,7 @@
   package org.apache.lucene.luke.app;

   import java.lang.invoke.MethodHandles;
+import java.nio.file.NoSuchFileException;
   import java.util.Objects;
   import org.apache.logging.log4j.Logger;
   import org.apache.lucene.index.IndexReader;
@@ -71,6 +72,10 @@ public final class IndexHandler extends
AbstractHandler {
   IndexReader reader;
   try {
 reader = IndexUtils.openIndex(indexPath, dirImpl);
+    } catch (NoSuchFileException e) {
+  log.error("Error opening index", e);
+  throw new LukeException(
+
MessageUtils.getLocalizedMessage("openindex.message.index_path_does_not_exist", 


indexPath), e);
   } catch (Exception e) {
 log.error("Error opening index", e);
 throw new LukeException(
diff --git
a/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties 

b/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties 


index f9c8c45a0f4..30b43cf18b7 100644
---
a/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties 


+++
b/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties 


@@ -71,6 +71,7 @@ openindex.radio.keep_only_last_commit=Keep only last
commit point
   openindex.radio.keep_all_commits=Keep all commit points
   openindex.message.index_path_not_selected=Please choose index path.
   openindex.message.index_path_invalid=Cannot open index path {0}. 
Not a

valid lucene index directory or corrupted?
+openindex.message.index_path_does_not_exist=Cannot open index path 
{0}.

No such directory!
   openindex.message.index_opened=Index successfully opened.
   openindex.message.index_opened_ro=Index successfully opened. 
(read-only)


Thanks

Michael



Am 13.07.21 um 22:43 schrieb Michael Wechner:

I analyzed the logs and the class/method

lucene/luke/src/java/org/apache/lucene/luke/models/util/IndexUtils.java#openIndex(String, 


String)

and realized that the problem was not the index itself, but that the
index directory/path did not exist anymore.

I forgot that I renamed the index directory, but Luke displayed in the
dropdown "Index Path" the previously opened directory paths.
So when I selected the one which did not exist anymore and I received
the error message

"Not a valid lucene index directory or corrupted?"

and I wrongly assumed that the problem is because the index is a
vector search index.

So Luke is able to open the vector search index and displays the
correct number of indexed vectors :-)

Sorry for the noise!

Nevertheless it might make sense to enhance the error message, that if
one tries to open a directory which does not exist, then the error
message reads

"No such directory"

Or that the dropdown "Index Path" is checking whether the previously
opened directories still exist.

Thanks

Michael


Am 13.07.21 um 10:47 schrieb Michael Wechner:

thanks again for your feeback!

I will give it a try and get back if I should have more questions :-)

Thanks

Michael

Am 13.07.21 um 09:58 schrieb Tomoko Uchida:
I think beside the query it would be nice if Luke would display 
some
"stats" of the index, for example the various fields beside the 
actual

vector and also how many vectors are inside the index

It would be a good start point, I think.


Can you give me a hint where in the code this check does currently
happen?
(I guess where the error is happening about the corrupted index)
Actually I have few clues about where to start (haven't tried to 
read

indexes that includes vector values with Luke).
The stack traces you might see should include full information to 
fix

or improve it.

Tomoko

2021年7月13日(火) 14:22 Michael Wechner :

Am 13.07.21 um 04:22 schrieb Tomoko Uchida:

There isn't any plans for that, and I'm not sure what is actually
expected of the GUI tool

yes, I understand, the input for the query would have to be an
embedding
(vector of for example 768 dimensions).

I currently see two possibilities to do this:

- 

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
sure, I understand, but I just wanted to ask whether such a change makes 
sense actually.


I have created a Jira ticket

https://issues.apache.org/jira/browse/LUCENE-10024

and added the patch as attachment. Let me know if you prefer a pull request.

Cheers

Michael

Am 14.07.21 um 03:43 schrieb Tomoko Uchida:

We don't accept patches by email... please open a Jira.


2021年7月14日(水) 5:58 Michael Wechner :

would the following patch make sense?

git diff lucene/luke/src/
diff --git
a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
b/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
index f3fc635872b..ad13745eec8 100644
--- a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
+++ b/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
@@ -18,6 +18,7 @@
   package org.apache.lucene.luke.app;

   import java.lang.invoke.MethodHandles;
+import java.nio.file.NoSuchFileException;
   import java.util.Objects;
   import org.apache.logging.log4j.Logger;
   import org.apache.lucene.index.IndexReader;
@@ -71,6 +72,10 @@ public final class IndexHandler extends
AbstractHandler {
   IndexReader reader;
   try {
 reader = IndexUtils.openIndex(indexPath, dirImpl);
+} catch (NoSuchFileException e) {
+  log.error("Error opening index", e);
+  throw new LukeException(
+
MessageUtils.getLocalizedMessage("openindex.message.index_path_does_not_exist",
indexPath), e);
   } catch (Exception e) {
 log.error("Error opening index", e);
 throw new LukeException(
diff --git
a/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
b/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
index f9c8c45a0f4..30b43cf18b7 100644
---
a/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
+++
b/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
@@ -71,6 +71,7 @@ openindex.radio.keep_only_last_commit=Keep only last
commit point
   openindex.radio.keep_all_commits=Keep all commit points
   openindex.message.index_path_not_selected=Please choose index path.
   openindex.message.index_path_invalid=Cannot open index path {0}. Not a
valid lucene index directory or corrupted?
+openindex.message.index_path_does_not_exist=Cannot open index path {0}.
No such directory!
   openindex.message.index_opened=Index successfully opened.
   openindex.message.index_opened_ro=Index successfully opened. (read-only)

Thanks

Michael



Am 13.07.21 um 22:43 schrieb Michael Wechner:

I analyzed the logs and the class/method

lucene/luke/src/java/org/apache/lucene/luke/models/util/IndexUtils.java#openIndex(String,
String)

and realized that the problem was not the index itself, but that the
index directory/path did not exist anymore.

I forgot that I renamed the index directory, but Luke displayed in the
dropdown "Index Path" the previously opened directory paths.
So when I selected the one which did not exist anymore and I received
the error message

"Not a valid lucene index directory or corrupted?"

and I wrongly assumed that the problem is because the index is a
vector search index.

So Luke is able to open the vector search index and displays the
correct number of indexed vectors :-)

Sorry for the noise!

Nevertheless it might make sense to enhance the error message, that if
one tries to open a directory which does not exist, then the error
message reads

"No such directory"

Or that the dropdown "Index Path" is checking whether the previously
opened directories still exist.

Thanks

Michael


Am 13.07.21 um 10:47 schrieb Michael Wechner:

thanks again for your feeback!

I will give it a try and get back if I should have more questions :-)

Thanks

Michael

Am 13.07.21 um 09:58 schrieb Tomoko Uchida:

I think beside the query it would be nice if Luke would display some
"stats" of the index, for example the various fields beside the actual
vector and also how many vectors are inside the index

It would be a good start point, I think.


Can you give me a hint where in the code this check does currently
happen?
(I guess where the error is happening about the corrupted index)

Actually I have few clues about where to start (haven't tried to read
indexes that includes vector values with Luke).
The stack traces you might see should include full information to fix
or improve it.

Tomoko

2021年7月13日(火) 14:22 Michael Wechner :

Am 13.07.21 um 04:22 schrieb Tomoko Uchida:

There isn't any plans for that, and I'm not sure what is actually
expected of the GUI tool

yes, I understand, the input for the query would have to be an
embedding
(vector of for example 768 dimensions).

I currently see two possibilities to do this:

- Import/open the embedding from a file
- Connecting the regular search input with a service generating the
embedding, like for example https://github.com/hanxiao/bert-as-service


to support vector 

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Tomoko Uchida
We don't accept patches by email... please open a Jira.


2021年7月14日(水) 5:58 Michael Wechner :
>
> would the following patch make sense?
>
> git diff lucene/luke/src/
> diff --git
> a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
> b/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
> index f3fc635872b..ad13745eec8 100644
> --- a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
> +++ b/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
> @@ -18,6 +18,7 @@
>   package org.apache.lucene.luke.app;
>
>   import java.lang.invoke.MethodHandles;
> +import java.nio.file.NoSuchFileException;
>   import java.util.Objects;
>   import org.apache.logging.log4j.Logger;
>   import org.apache.lucene.index.IndexReader;
> @@ -71,6 +72,10 @@ public final class IndexHandler extends
> AbstractHandler {
>   IndexReader reader;
>   try {
> reader = IndexUtils.openIndex(indexPath, dirImpl);
> +} catch (NoSuchFileException e) {
> +  log.error("Error opening index", e);
> +  throw new LukeException(
> +
> MessageUtils.getLocalizedMessage("openindex.message.index_path_does_not_exist",
> indexPath), e);
>   } catch (Exception e) {
> log.error("Error opening index", e);
> throw new LukeException(
> diff --git
> a/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
> b/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
> index f9c8c45a0f4..30b43cf18b7 100644
> ---
> a/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
> +++
> b/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
> @@ -71,6 +71,7 @@ openindex.radio.keep_only_last_commit=Keep only last
> commit point
>   openindex.radio.keep_all_commits=Keep all commit points
>   openindex.message.index_path_not_selected=Please choose index path.
>   openindex.message.index_path_invalid=Cannot open index path {0}. Not a
> valid lucene index directory or corrupted?
> +openindex.message.index_path_does_not_exist=Cannot open index path {0}.
> No such directory!
>   openindex.message.index_opened=Index successfully opened.
>   openindex.message.index_opened_ro=Index successfully opened. (read-only)
>
> Thanks
>
> Michael
>
>
>
> Am 13.07.21 um 22:43 schrieb Michael Wechner:
> > I analyzed the logs and the class/method
> >
> > lucene/luke/src/java/org/apache/lucene/luke/models/util/IndexUtils.java#openIndex(String,
> > String)
> >
> > and realized that the problem was not the index itself, but that the
> > index directory/path did not exist anymore.
> >
> > I forgot that I renamed the index directory, but Luke displayed in the
> > dropdown "Index Path" the previously opened directory paths.
> > So when I selected the one which did not exist anymore and I received
> > the error message
> >
> > "Not a valid lucene index directory or corrupted?"
> >
> > and I wrongly assumed that the problem is because the index is a
> > vector search index.
> >
> > So Luke is able to open the vector search index and displays the
> > correct number of indexed vectors :-)
> >
> > Sorry for the noise!
> >
> > Nevertheless it might make sense to enhance the error message, that if
> > one tries to open a directory which does not exist, then the error
> > message reads
> >
> > "No such directory"
> >
> > Or that the dropdown "Index Path" is checking whether the previously
> > opened directories still exist.
> >
> > Thanks
> >
> > Michael
> >
> >
> > Am 13.07.21 um 10:47 schrieb Michael Wechner:
> >> thanks again for your feeback!
> >>
> >> I will give it a try and get back if I should have more questions :-)
> >>
> >> Thanks
> >>
> >> Michael
> >>
> >> Am 13.07.21 um 09:58 schrieb Tomoko Uchida:
>  I think beside the query it would be nice if Luke would display some
>  "stats" of the index, for example the various fields beside the actual
>  vector and also how many vectors are inside the index
> >>> It would be a good start point, I think.
> >>>
>  Can you give me a hint where in the code this check does currently
>  happen?
>  (I guess where the error is happening about the corrupted index)
> >>> Actually I have few clues about where to start (haven't tried to read
> >>> indexes that includes vector values with Luke).
> >>> The stack traces you might see should include full information to fix
> >>> or improve it.
> >>>
> >>> Tomoko
> >>>
> >>> 2021年7月13日(火) 14:22 Michael Wechner :
> 
>  Am 13.07.21 um 04:22 schrieb Tomoko Uchida:
> > There isn't any plans for that, and I'm not sure what is actually
> > expected of the GUI tool
>  yes, I understand, the input for the query would have to be an
>  embedding
>  (vector of for example 768 dimensions).
> 
>  I currently see two possibilities to do this:
> 
>  - Import/open the embedding from a file
>  - Connecting the regular search 

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner

would the following patch make sense?

git diff lucene/luke/src/
diff --git 
a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java 
b/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java

index f3fc635872b..ad13745eec8 100644
--- a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
+++ b/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java
@@ -18,6 +18,7 @@
 package org.apache.lucene.luke.app;

 import java.lang.invoke.MethodHandles;
+import java.nio.file.NoSuchFileException;
 import java.util.Objects;
 import org.apache.logging.log4j.Logger;
 import org.apache.lucene.index.IndexReader;
@@ -71,6 +72,10 @@ public final class IndexHandler extends 
AbstractHandler {

 IndexReader reader;
 try {
   reader = IndexUtils.openIndex(indexPath, dirImpl);
+    } catch (NoSuchFileException e) {
+  log.error("Error opening index", e);
+  throw new LukeException(
+ 
MessageUtils.getLocalizedMessage("openindex.message.index_path_does_not_exist", 
indexPath), e);

 } catch (Exception e) {
   log.error("Error opening index", e);
   throw new LukeException(
diff --git 
a/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties 
b/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties

index f9c8c45a0f4..30b43cf18b7 100644
--- 
a/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
+++ 
b/lucene/luke/src/resources/org/apache/lucene/luke/app/desktop/messages/messages.properties
@@ -71,6 +71,7 @@ openindex.radio.keep_only_last_commit=Keep only last 
commit point

 openindex.radio.keep_all_commits=Keep all commit points
 openindex.message.index_path_not_selected=Please choose index path.
 openindex.message.index_path_invalid=Cannot open index path {0}. Not a 
valid lucene index directory or corrupted?
+openindex.message.index_path_does_not_exist=Cannot open index path {0}. 
No such directory!

 openindex.message.index_opened=Index successfully opened.
 openindex.message.index_opened_ro=Index successfully opened. (read-only)

Thanks

Michael



Am 13.07.21 um 22:43 schrieb Michael Wechner:

I analyzed the logs and the class/method

lucene/luke/src/java/org/apache/lucene/luke/models/util/IndexUtils.java#openIndex(String, 
String)


and realized that the problem was not the index itself, but that the 
index directory/path did not exist anymore.


I forgot that I renamed the index directory, but Luke displayed in the 
dropdown "Index Path" the previously opened directory paths.
So when I selected the one which did not exist anymore and I received 
the error message


"Not a valid lucene index directory or corrupted?"

and I wrongly assumed that the problem is because the index is a 
vector search index.


So Luke is able to open the vector search index and displays the 
correct number of indexed vectors :-)


Sorry for the noise!

Nevertheless it might make sense to enhance the error message, that if 
one tries to open a directory which does not exist, then the error 
message reads


"No such directory"

Or that the dropdown "Index Path" is checking whether the previously 
opened directories still exist.


Thanks

Michael


Am 13.07.21 um 10:47 schrieb Michael Wechner:

thanks again for your feeback!

I will give it a try and get back if I should have more questions :-)

Thanks

Michael

Am 13.07.21 um 09:58 schrieb Tomoko Uchida:

I think beside the query it would be nice if Luke would display some
"stats" of the index, for example the various fields beside the actual
vector and also how many vectors are inside the index

It would be a good start point, I think.

Can you give me a hint where in the code this check does currently 
happen?

(I guess where the error is happening about the corrupted index)

Actually I have few clues about where to start (haven't tried to read
indexes that includes vector values with Luke).
The stack traces you might see should include full information to fix
or improve it.

Tomoko

2021年7月13日(火) 14:22 Michael Wechner :


Am 13.07.21 um 04:22 schrieb Tomoko Uchida:

There isn't any plans for that, and I'm not sure what is actually
expected of the GUI tool
yes, I understand, the input for the query would have to be an 
embedding

(vector of for example 768 dimensions).

I currently see two possibilities to do this:

- Import/open the embedding from a file
- Connecting the regular search input with a service generating the
embedding, like for example https://github.com/hanxiao/bert-as-service


   to support vector search codec (it'd be
costly operation to decode vectors with several hundreds of
dimensions); though I am open to new ideas which are feasible and
useful.

I think beside the query it would be nice if Luke would display some
"stats" of the index, for example the various fields beside the actual
vector and also how many vectors are inside the index


Nonetheless the error you saw is not great; we could 

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner

I analyzed the logs and the class/method

lucene/luke/src/java/org/apache/lucene/luke/models/util/IndexUtils.java#openIndex(String, 
String)


and realized that the problem was not the index itself, but that the 
index directory/path did not exist anymore.


I forgot that I renamed the index directory, but Luke displayed in the 
dropdown "Index Path" the previously opened directory paths.
So when I selected the one which did not exist anymore and I received 
the error message


"Not a valid lucene index directory or corrupted?"

and I wrongly assumed that the problem is because the index is a vector 
search index.


So Luke is able to open the vector search index and displays the correct 
number of indexed vectors :-)


Sorry for the noise!

Nevertheless it might make sense to enhance the error message, that if 
one tries to open a directory which does not exist, then the error 
message reads


"No such directory"

Or that the dropdown "Index Path" is checking whether the previously 
opened directories still exist.


Thanks

Michael


Am 13.07.21 um 10:47 schrieb Michael Wechner:

thanks again for your feeback!

I will give it a try and get back if I should have more questions :-)

Thanks

Michael

Am 13.07.21 um 09:58 schrieb Tomoko Uchida:

I think beside the query it would be nice if Luke would display some
"stats" of the index, for example the various fields beside the actual
vector and also how many vectors are inside the index

It would be a good start point, I think.

Can you give me a hint where in the code this check does currently 
happen?

(I guess where the error is happening about the corrupted index)

Actually I have few clues about where to start (haven't tried to read
indexes that includes vector values with Luke).
The stack traces you might see should include full information to fix
or improve it.

Tomoko

2021年7月13日(火) 14:22 Michael Wechner :


Am 13.07.21 um 04:22 schrieb Tomoko Uchida:

There isn't any plans for that, and I'm not sure what is actually
expected of the GUI tool
yes, I understand, the input for the query would have to be an 
embedding

(vector of for example 768 dimensions).

I currently see two possibilities to do this:

- Import/open the embedding from a file
- Connecting the regular search input with a service generating the
embedding, like for example https://github.com/hanxiao/bert-as-service


   to support vector search codec (it'd be
costly operation to decode vectors with several hundreds of
dimensions); though I am open to new ideas which are feasible and
useful.

I think beside the query it would be nice if Luke would display some
"stats" of the index, for example the various fields beside the actual
vector and also how many vectors are inside the index


Nonetheless the error you saw is not great; we could improve that by
just ignoring the codec for now.

maybe I can try to improve this :-)

Can you give me a hint where in the code this check does currently 
happen?

(I guess where the error is happening about the corrupted index)

Thanks

Michael


Tomoko

2021年7月6日(火) 16:23 Michael Wechner :

Hi

I just created a Lucene vector search index with 
Lucene-9.0.0-SNAPSHOT

based on train-v2.0.json of SQuAD
(https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs
(for the embedding I used SentenceBERT).

It took a couple of hours on my Mac laptop, but it worked in the 
end and

I can search successfully :-)

I tried to open the index with Luke, but receive an error, that the
index might be corrupt.

Does Luke already support analyzing a vector search index? If not, 
are

there any plans to support vector search?

Thanks

Michael

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner

thanks again for your feeback!

I will give it a try and get back if I should have more questions :-)

Thanks

Michael

Am 13.07.21 um 09:58 schrieb Tomoko Uchida:

I think beside the query it would be nice if Luke would display some
"stats" of the index, for example the various fields beside the actual
vector and also how many vectors are inside the index

It would be a good start point, I think.


Can you give me a hint where in the code this check does currently happen?
(I guess where the error is happening about the corrupted index)

Actually I have few clues about where to start (haven't tried to read
indexes that includes vector values with Luke).
The stack traces you might see should include full information to fix
or improve it.

Tomoko

2021年7月13日(火) 14:22 Michael Wechner :


Am 13.07.21 um 04:22 schrieb Tomoko Uchida:

There isn't any plans for that, and I'm not sure what is actually
expected of the GUI tool

yes, I understand, the input for the query would have to be an embedding
(vector of for example 768 dimensions).

I currently see two possibilities to do this:

- Import/open the embedding from a file
- Connecting the regular search input with a service generating the
embedding, like for example https://github.com/hanxiao/bert-as-service


   to support vector search codec (it'd be
costly operation to decode vectors with several hundreds of
dimensions); though I am open to new ideas which are feasible and
useful.

I think beside the query it would be nice if Luke would display some
"stats" of the index, for example the various fields beside the actual
vector and also how many vectors are inside the index


Nonetheless the error you saw is not great; we could improve that by
just ignoring the codec for now.

maybe I can try to improve this :-)

Can you give me a hint where in the code this check does currently happen?
(I guess where the error is happening about the corrupted index)

Thanks

Michael


Tomoko

2021年7月6日(火) 16:23 Michael Wechner :

Hi

I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT
based on train-v2.0.json of SQuAD
(https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs
(for the embedding I used SentenceBERT).

It took a couple of hours on my Mac laptop, but it worked in the end and
I can search successfully :-)

I tried to open the index with Luke, but receive an error, that the
index might be corrupt.

Does Luke already support analyzing a vector search index? If not, are
there any plans to support vector search?

Thanks

Michael

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Tomoko Uchida
> I think beside the query it would be nice if Luke would display some
> "stats" of the index, for example the various fields beside the actual
> vector and also how many vectors are inside the index

It would be a good start point, I think.

> Can you give me a hint where in the code this check does currently happen?
> (I guess where the error is happening about the corrupted index)

Actually I have few clues about where to start (haven't tried to read
indexes that includes vector values with Luke).
The stack traces you might see should include full information to fix
or improve it.

Tomoko

2021年7月13日(火) 14:22 Michael Wechner :
>
>
> Am 13.07.21 um 04:22 schrieb Tomoko Uchida:
> > There isn't any plans for that, and I'm not sure what is actually
> > expected of the GUI tool
>
> yes, I understand, the input for the query would have to be an embedding
> (vector of for example 768 dimensions).
>
> I currently see two possibilities to do this:
>
> - Import/open the embedding from a file
> - Connecting the regular search input with a service generating the
> embedding, like for example https://github.com/hanxiao/bert-as-service
>
> >   to support vector search codec (it'd be
> > costly operation to decode vectors with several hundreds of
> > dimensions); though I am open to new ideas which are feasible and
> > useful.
>
> I think beside the query it would be nice if Luke would display some
> "stats" of the index, for example the various fields beside the actual
> vector and also how many vectors are inside the index
>
> > Nonetheless the error you saw is not great; we could improve that by
> > just ignoring the codec for now.
>
> maybe I can try to improve this :-)
>
> Can you give me a hint where in the code this check does currently happen?
> (I guess where the error is happening about the corrupted index)
>
> Thanks
>
> Michael
>
> >
> > Tomoko
> >
> > 2021年7月6日(火) 16:23 Michael Wechner :
> >> Hi
> >>
> >> I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT
> >> based on train-v2.0.json of SQuAD
> >> (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs
> >> (for the embedding I used SentenceBERT).
> >>
> >> It took a couple of hours on my Mac laptop, but it worked in the end and
> >> I can search successfully :-)
> >>
> >> I tried to open the index with Luke, but receive an error, that the
> >> index might be corrupt.
> >>
> >> Does Luke already support analyzing a vector search index? If not, are
> >> there any plans to support vector search?
> >>
> >> Thanks
> >>
> >> Michael
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-12 Thread Michael Wechner



Am 13.07.21 um 04:22 schrieb Tomoko Uchida:

There isn't any plans for that, and I'm not sure what is actually
expected of the GUI tool


yes, I understand, the input for the query would have to be an embedding 
(vector of for example 768 dimensions).


I currently see two possibilities to do this:

- Import/open the embedding from a file
- Connecting the regular search input with a service generating the 
embedding, like for example https://github.com/hanxiao/bert-as-service



  to support vector search codec (it'd be
costly operation to decode vectors with several hundreds of
dimensions); though I am open to new ideas which are feasible and
useful.


I think beside the query it would be nice if Luke would display some 
"stats" of the index, for example the various fields beside the actual 
vector and also how many vectors are inside the index



Nonetheless the error you saw is not great; we could improve that by
just ignoring the codec for now.


maybe I can try to improve this :-)

Can you give me a hint where in the code this check does currently happen?
(I guess where the error is happening about the corrupted index)

Thanks

Michael



Tomoko

2021年7月6日(火) 16:23 Michael Wechner :

Hi

I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT
based on train-v2.0.json of SQuAD
(https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs
(for the embedding I used SentenceBERT).

It took a couple of hours on my Mac laptop, but it worked in the end and
I can search successfully :-)

I tried to open the index with Luke, but receive an error, that the
index might be corrupt.

Does Luke already support analyzing a vector search index? If not, are
there any plans to support vector search?

Thanks

Michael

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-12 Thread Tomoko Uchida
There isn't any plans for that, and I'm not sure what is actually
expected of the GUI tool to support vector search codec (it'd be
costly operation to decode vectors with several hundreds of
dimensions); though I am open to new ideas which are feasible and
useful.
Nonetheless the error you saw is not great; we could improve that by
just ignoring the codec for now.

Tomoko

2021年7月6日(火) 16:23 Michael Wechner :
>
> Hi
>
> I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT
> based on train-v2.0.json of SQuAD
> (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs
> (for the embedding I used SentenceBERT).
>
> It took a couple of hours on my Mac laptop, but it worked in the end and
> I can search successfully :-)
>
> I tried to open the index with Luke, but receive an error, that the
> index might be corrupt.
>
> Does Luke already support analyzing a vector search index? If not, are
> there any plans to support vector search?
>
> Thanks
>
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org