Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-14 Thread Mike Adamson
We can't use open-nlp because it is JDK 17 only. I'll pull the smile-nlp
dependency and write something to do the same thing. Our usage was trivial.

On Thu, 14 Sept 2023 at 00:10, J. D. Jordan 
wrote:

> Reading through smile license again, it is licensed pure GPL 3, not GPL
> with classpath exception. So I think that kills all debate here.
>
> -1 on inclusion
>
> On Sep 13, 2023, at 2:30 PM, Jeremiah Jordan 
> wrote:
>
> 
> I wonder if it can easily be replaced with Apache open-nlp?  It also
> provides an implementation of GloVe.
>
>
> https://opennlp.apache.org/docs/2.3.0/apidocs/opennlp-tools/opennlp/tools/util/wordvector/Glove.html
>
>
> On Sep 13, 2023 at 1:17:46 PM, Benedict  wrote:
>
>> There’s a distinction for spotbugs and other build related tools where
>> they can be downloaded and used during the build so long as they’re not
>> critical to the build process.
>>
>> They have to be downloaded dynamically in binary form I believe though,
>> they cannot be included in the release.
>>
>> So it’s not really in conflict with what Jeff is saying, and my
>> recollection accords with Jeff’s
>>
>> On 13 Sep 2023, at 17:42, Brandon Williams  wrote:
>>
>> 
>>
>> On Wed, Sep 13, 2023 at 11:37 AM Jeff Jirsa  wrote:
>>
>>> You can open a legal JIRA to confirm, but based on my understanding (and
>>> re-confirming reading
>>> https://www.apache.org/legal/resolved.html#category-a ):
>>>
>>>
>> We should probably get clarification here regardless, iirc this came up
>> when we were considering SpotBugs too.
>>
>>

-- 
[image: DataStax Logo Square]  *Mike Adamson*
Engineering

+1 650 389 6000 <16503896000> | datastax.com 
Find DataStax Online: [image: LinkedIn Logo]

   [image: Facebook Logo]

   [image: Twitter Logo]    [image: RSS Feed]
   [image: Github Logo]



Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread J. D. Jordan
Reading through smile license again, it is licensed pure GPL 3, not GPL with classpath exception. So I think that kills all debate here.-1 on inclusion On Sep 13, 2023, at 2:30 PM, Jeremiah Jordan  wrote:
I wonder if it can easily be replaced with Apache open-nlp?  It also provides an implementation of GloVe.https://opennlp.apache.org/docs/2.3.0/apidocs/opennlp-tools/opennlp/tools/util/wordvector/Glove.html


On Sep 13, 2023 at 1:17:46 PM, Benedict  wrote:

There’s a distinction for spotbugs and other build related tools where they can be downloaded and used during the build so long as they’re not critical to the build process.They have to be downloaded dynamically in binary form I believe though, they cannot be included in the release.So it’s not really in conflict with what Jeff is saying, and my recollection accords with Jeff’sOn 13 Sep 2023, at 17:42, Brandon Williams  wrote:On Wed, Sep 13, 2023 at 11:37 AM Jeff Jirsa  wrote:You can open a legal JIRA to confirm, but based on my understanding (and re-confirming reading https://www.apache.org/legal/resolved.html#category-a ): We should probably get clarification here regardless, iirc this came up when we were considering SpotBugs too.





Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Jeremiah Jordan
 I wonder if it can easily be replaced with Apache open-nlp?  It also
provides an implementation of GloVe.

https://opennlp.apache.org/docs/2.3.0/apidocs/opennlp-tools/opennlp/tools/util/wordvector/Glove.html


On Sep 13, 2023 at 1:17:46 PM, Benedict  wrote:

> There’s a distinction for spotbugs and other build related tools where
> they can be downloaded and used during the build so long as they’re not
> critical to the build process.
>
> They have to be downloaded dynamically in binary form I believe though,
> they cannot be included in the release.
>
> So it’s not really in conflict with what Jeff is saying, and my
> recollection accords with Jeff’s
>
> On 13 Sep 2023, at 17:42, Brandon Williams  wrote:
>
> 
>
> On Wed, Sep 13, 2023 at 11:37 AM Jeff Jirsa  wrote:
>
>> You can open a legal JIRA to confirm, but based on my understanding (and
>> re-confirming reading
>> https://www.apache.org/legal/resolved.html#category-a ):
>>
>>
> We should probably get clarification here regardless, iirc this came up
> when we were considering SpotBugs too.
>
>


Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Benedict
There’s a distinction for spotbugs and other build related tools where they can be downloaded and used during the build so long as they’re not critical to the build process.They have to be downloaded dynamically in binary form I believe though, they cannot be included in the release.So it’s not really in conflict with what Jeff is saying, and my recollection accords with Jeff’sOn 13 Sep 2023, at 17:42, Brandon Williams  wrote:On Wed, Sep 13, 2023 at 11:37 AM Jeff Jirsa  wrote:You can open a legal JIRA to confirm, but based on my understanding (and re-confirming reading https://www.apache.org/legal/resolved.html#category-a ): We should probably get clarification here regardless, iirc this came up when we were considering SpotBugs too.


Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Brandon Williams
On Wed, Sep 13, 2023 at 11:37 AM Jeff Jirsa  wrote:

> You can open a legal JIRA to confirm, but based on my understanding (and
> re-confirming reading
> https://www.apache.org/legal/resolved.html#category-a ):
>
>
We should probably get clarification here regardless, iirc this came up
when we were considering SpotBugs too.


Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Jeff Jirsa
You can open a legal JIRA to confirm, but based on my understanding (and
re-confirming reading https://www.apache.org/legal/resolved.html#category-a
):

- The restriction is on what can be in A PROJECT. tests are part of the
project, and not distinguished from the compiled product (especially since
the PROJECT ships SOURCE to build the product, if the SOURCE to build
requires the test, the test is clearly non-optional)
- GPL is cat X https://www.apache.org/legal/resolved.html#category-x

Cat X mixes "project" and "product" a few times, but again, the product is
still the source distribution, which would include the test, which means
it's excluded.

"Apache projects may not distribute Category X licensed components, in
source or binary form" doesnt seem ambiguous to me, but if someone wants to
ask ASF legal if I'm wrong, that's totally ok.



On Wed, Sep 13, 2023 at 9:25 AM Ekaterina Dimitrova 
wrote:

> Jeff, isn’t this ok as long as it is used only in tests? If we are not
> sure we can open a Jira to legal?
>
> On Wed, 13 Sep 2023 at 12:23, Jeff Jirsa  wrote:
>
>> Just to be clear - this repo?
>> https://github.com/haifengl/smile/blob/master/LICENSE
>>
>> That shows GPL + Commercial?
>>
>>
>>
>> On Wed, Sep 13, 2023 at 9:10 AM Brandon Williams 
>> wrote:
>>
>>> I don't see any problem with this, +1.
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>>
>>> On Wed, Sep 13, 2023 at 11:09 AM Mike Adamson 
>>> wrote:
>>>
 CEP-30: [Approximate Nearest Neighbor(ANN) Vector Search via
 Storage-Attached Indexes] uses the smile-nlp library
 (com.github.haifengl.smile-nlp) in its testing to allow the creation of
 word2vec embeddings for valid input into the HNSW graph index.

 The reason for this library is that we found that using random vectors
 in testing produced very inconsistent results. Using the smile-nlp word2vec
 implementation with the glove.3k.50d library produces repeatable results.

 Does anyone have any objections to the use of this library as a test
 only dependency?
 --
 [image: DataStax Logo Square]  *Mike
 Adamson*
 Engineering

 +1 650 389 6000 <16503896000> | datastax.com
 
 Find DataStax Online: [image: LinkedIn Logo]
 
[image: Facebook Logo]
 
[image: Twitter Logo]    [image: RSS
 Feed]    [image: Github Logo]
 




Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Ekaterina Dimitrova
Jeff, isn’t this ok as long as it is used only in tests? If we are not sure
we can open a Jira to legal?

On Wed, 13 Sep 2023 at 12:23, Jeff Jirsa  wrote:

> Just to be clear - this repo?
> https://github.com/haifengl/smile/blob/master/LICENSE
>
> That shows GPL + Commercial?
>
>
>
> On Wed, Sep 13, 2023 at 9:10 AM Brandon Williams  wrote:
>
>> I don't see any problem with this, +1.
>>
>> Kind Regards,
>> Brandon
>>
>>
>> On Wed, Sep 13, 2023 at 11:09 AM Mike Adamson 
>> wrote:
>>
>>> CEP-30: [Approximate Nearest Neighbor(ANN) Vector Search via
>>> Storage-Attached Indexes] uses the smile-nlp library
>>> (com.github.haifengl.smile-nlp) in its testing to allow the creation of
>>> word2vec embeddings for valid input into the HNSW graph index.
>>>
>>> The reason for this library is that we found that using random vectors
>>> in testing produced very inconsistent results. Using the smile-nlp word2vec
>>> implementation with the glove.3k.50d library produces repeatable results.
>>>
>>> Does anyone have any objections to the use of this library as a test
>>> only dependency?
>>> --
>>> [image: DataStax Logo Square]  *Mike Adamson*
>>> Engineering
>>>
>>> +1 650 389 6000 <16503896000> | datastax.com 
>>> Find DataStax Online: [image: LinkedIn Logo]
>>> 
>>>[image: Facebook Logo]
>>> 
>>>[image: Twitter Logo]    [image: RSS
>>> Feed]    [image: Github Logo]
>>> 
>>>
>>>


Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Jeff Jirsa
Just to be clear - this repo?
https://github.com/haifengl/smile/blob/master/LICENSE

That shows GPL + Commercial?



On Wed, Sep 13, 2023 at 9:10 AM Brandon Williams  wrote:

> I don't see any problem with this, +1.
>
> Kind Regards,
> Brandon
>
>
> On Wed, Sep 13, 2023 at 11:09 AM Mike Adamson 
> wrote:
>
>> CEP-30: [Approximate Nearest Neighbor(ANN) Vector Search via
>> Storage-Attached Indexes] uses the smile-nlp library
>> (com.github.haifengl.smile-nlp) in its testing to allow the creation of
>> word2vec embeddings for valid input into the HNSW graph index.
>>
>> The reason for this library is that we found that using random vectors in
>> testing produced very inconsistent results. Using the smile-nlp word2vec
>> implementation with the glove.3k.50d library produces repeatable results.
>>
>> Does anyone have any objections to the use of this library as a test only
>> dependency?
>> --
>> [image: DataStax Logo Square]  *Mike Adamson*
>> Engineering
>>
>> +1 650 389 6000 <16503896000> | datastax.com 
>> Find DataStax Online: [image: LinkedIn Logo]
>> 
>>[image: Facebook Logo]
>> 
>>[image: Twitter Logo]    [image: RSS
>> Feed]    [image: Github Logo]
>> 
>>
>>


Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Brandon Williams
I don't see any problem with this, +1.

Kind Regards,
Brandon


On Wed, Sep 13, 2023 at 11:09 AM Mike Adamson  wrote:

> CEP-30: [Approximate Nearest Neighbor(ANN) Vector Search via
> Storage-Attached Indexes] uses the smile-nlp library
> (com.github.haifengl.smile-nlp) in its testing to allow the creation of
> word2vec embeddings for valid input into the HNSW graph index.
>
> The reason for this library is that we found that using random vectors in
> testing produced very inconsistent results. Using the smile-nlp word2vec
> implementation with the glove.3k.50d library produces repeatable results.
>
> Does anyone have any objections to the use of this library as a test only
> dependency?
> --
> [image: DataStax Logo Square]  *Mike Adamson*
> Engineering
>
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online: [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS
> Feed]    [image: Github Logo]
> 
>
>


[DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Mike Adamson
CEP-30: [Approximate Nearest Neighbor(ANN) Vector Search via
Storage-Attached Indexes] uses the smile-nlp library
(com.github.haifengl.smile-nlp) in its testing to allow the creation of
word2vec embeddings for valid input into the HNSW graph index.

The reason for this library is that we found that using random vectors in
testing produced very inconsistent results. Using the smile-nlp word2vec
implementation with the glove.3k.50d library produces repeatable results.

Does anyone have any objections to the use of this library as a test only
dependency?
-- 
[image: DataStax Logo Square]  *Mike Adamson*
Engineering

+1 650 389 6000 <16503896000> | datastax.com 
Find DataStax Online: [image: LinkedIn Logo]

   [image: Facebook Logo]

   [image: Twitter Logo]    [image: RSS Feed]
   [image: Github Logo]