[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-07-25 Thread ottobackwards
GitHub user ottobackwards opened a pull request:

https://github.com/apache/metron/pull/667

METRON-1061 Add FUZZY_SCORE STELLAR function

[Apache Commons Text 
Similarity](https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/FuzzyScore.html)

FuzzyScore :
"A matching algorithm that is similar to the searching algorithms 
implemented in editors such as Sublime Text, TextMate, Atom and others.
One point is given for every matched character. Subsequent matches yield 
two bonus points. A higher score indicates a higher similarity."

This PR adds FUZZY_SCORE to STELLAR, exposing the Apache Commons FuzzyScore 
class functionality.

This allows for fuzzy matching of strings to queries, and composing 
statements based on thresholds related to those scores.

For example:

term = "metron"
query = "metron"

```java
16 == FUZZY_SCORE(term,query,'en')
```
+1 for each match and +2 bonus for the 2nd, 3rd, 4th, 5th, and 6th matches.



This is related to 
[METRON-1052](https://issues.apache.org/jira/browse/METRON-1052)

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && build_utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [] Have you verified the basic functionality of the build by building and 
running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
-  [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ottobackwards/metron fuzzy

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #667


commit 34c9ffa97b47005ac10063a43ff1890f51c9a3cf
Author: Otto Fowler 
Date:   2017-07-25T12:31:18Z

add FUZZY_SCORE stellar function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-07-25 Thread ottobackwards
Github user ottobackwards closed the pull request at:

https://github.com/apache/metron/pull/667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-07-25 Thread ottobackwards
GitHub user ottobackwards reopened a pull request:

https://github.com/apache/metron/pull/667

METRON-1061 Add FUZZY_SCORE STELLAR function

[Apache Commons Text 
Similarity](https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/FuzzyScore.html)

FuzzyScore :
"A matching algorithm that is similar to the searching algorithms 
implemented in editors such as Sublime Text, TextMate, Atom and others.
One point is given for every matched character. Subsequent matches yield 
two bonus points. A higher score indicates a higher similarity."

This PR adds FUZZY_SCORE to STELLAR, exposing the Apache Commons FuzzyScore 
class functionality.

This allows for fuzzy matching of strings to queries, and composing 
statements based on thresholds related to those scores.

For example:

term = "metron"
query = "metron"

```java
16 == FUZZY_SCORE(term,query,'en')
```
+1 for each match and +2 bonus for the 2nd, 3rd, 4th, 5th, and 6th matches.



This is related to 
[METRON-1052](https://issues.apache.org/jira/browse/METRON-1052)

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && build_utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [] Have you verified the basic functionality of the build by building and 
running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
-  [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ottobackwards/metron fuzzy

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #667


commit 34c9ffa97b47005ac10063a43ff1890f51c9a3cf
Author: Otto Fowler 
Date:   2017-07-25T12:31:18Z

add FUZZY_SCORE stellar function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-03 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r131099946
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

If there's an exception here (specifically a classcastexception), we're 
going to exception.  Given the noise in the data that we have, I would expect 
this to happen.

I have a couple of questions:
* Do we want to return `NaN` or `Infinity` in that case and log at a warn 
level (or error level)?
* Do we want to throw an exception which could not possibly be caught in 
the stream?

I, personally, vote for the first, but I'd like to hear other people's 
impressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-03 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r131142516
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

1. The (CAST)list.get(0) is the common pattern in our stellar code.  I 
believe I have asked before in another pr maybe why we don't use the conversion 
utils. 
2. I think we want to return 0 for invalid args.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-03 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r131143225
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

We really need a pattern or a helper class for stellar for variables, that 
everyone uses.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-03 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r131146245
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

Also, the return should consider the common usage, I imagine something like:

IF (FUZZY_SCORE(fld,qry,'EN') > 4) THEN SET SOME FIELD




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-03 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r131148105
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

Well, in this case, I don't think we want to use `ConversionUtils` since we 
probably dont' want to coerce a list into a string for a fuzzy score, but I 
might be wrong.

I think we probably want to return `0` in the case that someone passes in a 
wrong type too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-03 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r131155065
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-14 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133090339
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  Object oterm = list.get(0);
+  Object oquery = list.get(1);
+  Object olang = list.get(2);
+
+  if (!(oterm instanceof String) || !(oquery instanceof String) || 
!(olang instanceof String)) {
+return 0;
--- End diff --

This is an error condition, right?  Should we treat it like you did above 
and throw an `IllegalStateException`?  Better yet, maybe we should use 
`IllegalArgumentException`.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-14 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133091002
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  Object oterm = list.get(0);
+  Object oquery = list.get(1);
+  Object olang = list.get(2);
+
+  if (!(oterm instanceof String) || !(oquery instanceof String) || 
!(olang instanceof String)) {
+return 0;
+  }
+  String term = (String) oterm;
+  String query = (String) oquery;
+  String lang = (String) olang;
+  if (StringUtils.isEmpty(term) || StringUtils.isEmpty(query) || 
StringUtils.isEmpty(lang)) {
+return 0;
--- End diff --

Aren't there are a few different scenarios that should be treated 
differently here?  
* If the `lang` is an empty string, I would assume that is an error 
condition.  
* If either the `term` or `query` is empty, then why not just allow the 
`FuzzyScore` to handle it and return a value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-14 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133092693
  
--- Diff: metron-stellar/stellar-common/README.md ---
@@ -411,6 +412,14 @@ In the core language functions, we support basic 
functional programming primitiv
 * format - string
 * arguments... - object(s)
   * Returns: A formatted string.
+  
+### `FUZZY_SCORE`
+  * Description: Returns the Fuzzy Score which indicates the similarity 
score between two strings. One point is given for every matched character. 
Subsequent matches yield two bonus points. A higher score indicates a higher 
similarity.
+  * Input:
+* string - The full term that should be matched against.
+* string - The query that will be matched against a term.
+* string - The IETF BCP 47 language code to use.
--- End diff --

Can we document the common language codes here, so I can easily see them 
when running `?FUZZY_SCORE` in the REPL?  I'll let you define "common 
languages", but I assume English, Spanish, at least.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-14 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133092792
  
--- Diff: metron-stellar/stellar-common/README.md ---
@@ -411,6 +412,14 @@ In the core language functions, we support basic 
functional programming primitiv
 * format - string
 * arguments... - object(s)
   * Returns: A formatted string.
+  
+### `FUZZY_SCORE`
+  * Description: Returns the Fuzzy Score which indicates the similarity 
score between two strings. One point is given for every matched character. 
Subsequent matches yield two bonus points. A higher score indicates a higher 
similarity.
+  * Input:
+* string - The full term that should be matched against.
+* string - The query that will be matched against a term.
+* string - The IETF BCP 47 language code to use.
--- End diff --

You seem to have an affinity for Klingon, so I'm fine with that too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-14 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133093147
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

Sorry, I am finding this discussion late.  I read through your previous 
comments and changed my mind about 3 times as I read through.

If we know there is an error condition, I think something like `NaN` would 
definitely be more helpful than returning 0.  I feel like that too easily masks 
a problem and might make this a little frustrating to use. 

If there are not even 3 arguments passed in, then throwing an exception 
works.  That's clearly an error that should be caught.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-14 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133095223
  
--- Diff: metron-stellar/stellar-common/README.md ---
@@ -411,6 +412,14 @@ In the core language functions, we support basic 
functional programming primitiv
 * format - string
 * arguments... - object(s)
   * Returns: A formatted string.
+  
+### `FUZZY_SCORE`
+  * Description: Returns the Fuzzy Score which indicates the similarity 
score between two strings. One point is given for every matched character. 
Subsequent matches yield two bonus points. A higher score indicates a higher 
similarity.
+  * Input:
+* string - The full term that should be matched against.
+* string - The query that will be matched against a term.
+* string - The IETF BCP 47 language code to use.
--- End diff --

How about List Lang codes()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-14 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133096320
  
--- Diff: metron-stellar/stellar-common/README.md ---
@@ -411,6 +412,14 @@ In the core language functions, we support basic 
functional programming primitiv
 * format - string
 * arguments... - object(s)
   * Returns: A formatted string.
+  
+### `FUZZY_SCORE`
+  * Description: Returns the Fuzzy Score which indicates the similarity 
score between two strings. One point is given for every matched character. 
Subsequent matches yield two bonus points. A higher score indicates a higher 
similarity.
+  * Input:
+* string - The full term that should be matched against.
+* string - The query that will be matched against a term.
+* string - The IETF BCP 47 language code to use.
--- End diff --

If you're ambitious, sure.  If you do implement a function that lists the 
lang codes, I would try to use the same namespace.

Otherwise, just the common langs and/or a URL link works for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-14 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133117091
  
--- Diff: metron-stellar/stellar-common/README.md ---
@@ -411,6 +412,14 @@ In the core language functions, we support basic 
functional programming primitiv
 * format - string
 * arguments... - object(s)
   * Returns: A formatted string.
+  
+### `FUZZY_SCORE`
+  * Description: Returns the Fuzzy Score which indicates the similarity 
score between two strings. One point is given for every matched character. 
Subsequent matches yield two bonus points. A higher score indicates a higher 
similarity.
+  * Input:
+* string - The full term that should be matched against.
+* string - The query that will be matched against a term.
+* string - The IETF BCP 47 language code to use.
--- End diff --

you mean to the doc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-15 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133210881
  
--- Diff: metron-stellar/stellar-common/README.md ---
@@ -411,6 +412,14 @@ In the core language functions, we support basic 
functional programming primitiv
 * format - string
 * arguments... - object(s)
   * Returns: A formatted string.
+  
+### `FUZZY_SCORE`
+  * Description: Returns the Fuzzy Score which indicates the similarity 
score between two strings. One point is given for every matched character. 
Subsequent matches yield two bonus points. A higher score indicates a higher 
similarity.
+  * Input:
+* string - The full term that should be matched against.
+* string - The query that will be matched against a term.
+* string - The IETF BCP 47 language code to use.
--- End diff --

Right, just to the doc string that gets printed when I run `?FUZZY_SCORE` 
in the REPL.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-15 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133211499
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

> I think we want to return 0 for invalid args.

Hey @ottobackwards , why do you like returning 0 for invalid args?  Maybe 
there is some reasoning that I am missing.

To me it seems like I can't distinguish between an error condition and two 
dissimilar strings.  A 0 would be returned for both.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-15 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133247450
  
--- Diff: metron-stellar/stellar-common/README.md ---
@@ -411,6 +412,14 @@ In the core language functions, we support basic 
functional programming primitiv
 * format - string
 * arguments... - object(s)
   * Returns: A formatted string.
+  
+### `FUZZY_SCORE`
+  * Description: Returns the Fuzzy Score which indicates the similarity 
score between two strings. One point is given for every matched character. 
Subsequent matches yield two bonus points. A higher score indicates a higher 
similarity.
+  * Input:
+* string - The full term that should be matched against.
+* string - The query that will be matched against a term.
+* string - The IETF BCP 47 language code to use.
--- End diff --

I did both



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-15 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133258977
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,95 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "GET_AVAILABLE_LANGUAGE_TAGS",
--- End diff --

Do the language tags have any use outside of `FUZZY_SCORE`?

If its only use is for `FUZZY_SCORE`, I was thinking we could the same 
namespace for both functions.
`FUZZY_SCORE` -> namespace = "FUZZY", name = "SCORE"
`FUZZY_LANGS` -> namespace = "FUZZY", name = "LANGS"

Or any other logical naming scheme that ties them together with a 
namespace.  I mean, we do have this cool namespace feature at our disposal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-15 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133273527
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,95 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "GET_AVAILABLE_LANGUAGE_TAGS",
--- End diff --

I always forget about the undocumented Namespace feature... I sure will do 
that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-16 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r133621065
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,95 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "GET_AVAILABLE_LANGUAGE_TAGS",
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-23 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r134747025
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

The thinking is that the user is going to write an expression where they 
want the score between two things.  In an error conditions, logically the score 
would be 0, and they can still have the correct logical outcome.  It is just 
more forgiving.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-23 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r134782723
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

I would think that if I use the wrong lang, I would want to know its an 
error, rather than mask that error by returning 0.

For example, it might be frustrating to to see the following return 0.  
```
FUZZY_SCORE("metron","metron","eng")
```
Why is this returning 0?  
The terms are exactly the same!  
Oh, wait I used the wrong lang code, it should be 'en'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-23 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r134790594
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

ok, i'll tighten that up


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135246886
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

@nickwallen Check out my latest.  Note that because of the way validate 
works, we need to return 0 for non-string arguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135262542
  
--- Diff: 
metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/TextFunctionsTest.java
 ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.Arrays;
--- End diff --

Some of these are unused imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135256040
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,114 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import com.google.common.collect.ImmutableList;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
--- End diff --

Some of these are unused imports which would be nice to clean-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135262378
  
--- Diff: 
metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/TextFunctionsTest.java
 ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import org.apache.metron.stellar.dsl.DefaultVariableResolver;
+import org.apache.metron.stellar.dsl.ParseException;
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.HashMap;
+import java.util.Map;
+
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.run;
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.runPredicate;
+
+public class TextFunctionsTest {
+
+  static final Map variableMap = new HashMap() {{
+put("metron", "metron");
+put("sentence", "metron is great");
+put("empty", "");
+put("english", "en");
+put("klingon", "Kling");
+put("asf", "Apache Software Foundation");
+  }};
+
+  @Test
+  public void testGetAvailableLanguageTags() {
+Object ret = run("FUZZY_LANGS()", new HashMap<>());
+Assert.assertNotNull(ret);
+Assert.assertTrue(ret instanceof List);
+List tags = (List) ret;
+Assert.assertTrue(tags.size() > 0);
+Assert.assertTrue(tags.contains("en"));
+Assert.assertTrue(tags.contains("fr"));
+  }
+
+  @Test()
+  public void testNoMatchStrings() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',english)",
+new DefaultVariableResolver(v -> variableMap.get(v),
+v -> variableMap.containsKey(v)));
+  }
+
+  @Test(expected = ParseException.class)
+  public void testMissingLanguage() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',klingon)",
+new DefaultVariableResolver(v -> variableMap.get(v),
+v -> variableMap.containsKey(v)));
+  }
+
+  @Test()
+  public void testEmptyFirstArg() throws Exception {
+runPredicate("0 == FUZZY_SCORE(empty,'z',english)",
+new DefaultVariableResolver(v -> variableMap.get(v), v -> 
variableMap.containsKey(v)));
+  }
+
+  @Test()
+  public void testEmptyFirstTwoArgs() throws Exception {
+runPredicate("0 == FUZZY_SCORE(empty,empty,english)",
--- End diff --

This test will never fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135262450
  
--- Diff: 
metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/TextFunctionsTest.java
 ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import org.apache.metron.stellar.dsl.DefaultVariableResolver;
+import org.apache.metron.stellar.dsl.ParseException;
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.HashMap;
+import java.util.Map;
+
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.run;
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.runPredicate;
+
+public class TextFunctionsTest {
+
+  static final Map variableMap = new HashMap() {{
+put("metron", "metron");
+put("sentence", "metron is great");
+put("empty", "");
+put("english", "en");
+put("klingon", "Kling");
+put("asf", "Apache Software Foundation");
+  }};
+
+  @Test
+  public void testGetAvailableLanguageTags() {
+Object ret = run("FUZZY_LANGS()", new HashMap<>());
+Assert.assertNotNull(ret);
+Assert.assertTrue(ret instanceof List);
+List tags = (List) ret;
+Assert.assertTrue(tags.size() > 0);
+Assert.assertTrue(tags.contains("en"));
+Assert.assertTrue(tags.contains("fr"));
+  }
+
+  @Test()
+  public void testNoMatchStrings() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',english)",
--- End diff --

This test will never fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135262339
  
--- Diff: 
metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/TextFunctionsTest.java
 ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import org.apache.metron.stellar.dsl.DefaultVariableResolver;
+import org.apache.metron.stellar.dsl.ParseException;
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.HashMap;
+import java.util.Map;
+
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.run;
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.runPredicate;
+
+public class TextFunctionsTest {
+
+  static final Map variableMap = new HashMap() {{
+put("metron", "metron");
+put("sentence", "metron is great");
+put("empty", "");
+put("english", "en");
+put("klingon", "Kling");
+put("asf", "Apache Software Foundation");
+  }};
+
+  @Test
+  public void testGetAvailableLanguageTags() {
+Object ret = run("FUZZY_LANGS()", new HashMap<>());
+Assert.assertNotNull(ret);
+Assert.assertTrue(ret instanceof List);
+List tags = (List) ret;
+Assert.assertTrue(tags.size() > 0);
+Assert.assertTrue(tags.contains("en"));
+Assert.assertTrue(tags.contains("fr"));
+  }
+
+  @Test()
+  public void testNoMatchStrings() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',english)",
+new DefaultVariableResolver(v -> variableMap.get(v),
+v -> variableMap.containsKey(v)));
+  }
+
+  @Test(expected = ParseException.class)
+  public void testMissingLanguage() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',klingon)",
+new DefaultVariableResolver(v -> variableMap.get(v),
+v -> variableMap.containsKey(v)));
+  }
+
+  @Test()
+  public void testEmptyFirstArg() throws Exception {
--- End diff --

This test will never fail


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135263000
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.List;
+import java.util.Locale;
+import org.apache.commons.lang.StringUtils;
+import org.apache.commons.text.similarity.FuzzyScore;
+import org.apache.metron.stellar.dsl.BaseStellarFunction;
+import org.apache.metron.stellar.dsl.Stellar;
+
+public class TextFunctions {
+
+  @Stellar(name = "FUZZY_SCORE",
+  description =
+  "Returns the Fuzzy Score which indicates the similarity score 
between two Strings "
+  +
+  "One point is given for every matched character. Subsequent 
matches yield two bonus "
+  +
+  "points. A higher score indicates a higher similarity",
+  params = {
+  "string - The full term that should be matched against",
+  "string - The query that will be matched against a term",
+  "string - The IETF BCP 47 language code to use"
+  },
+  returns = "integer representing the score")
+  /**
+   * FuzzyScoreFunction exposes the Apache Commons Text Similarity 
FuzzyScore through
+   * Stellar.
+   */
+  public static class FuzzyScoreFunction extends BaseStellarFunction {
+
+@Override
+public Object apply(List list) {
+  if (list.size() < 3) {
+throw new IllegalStateException("FUZZY_SCORE expects three args: 
[string, string, string]");
+  }
+  String term = (String) list.get(0);
--- End diff --

Thanks @ottobackwards. That change looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135336818
  
--- Diff: 
metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/TextFunctionsTest.java
 ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.Arrays;
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135336803
  
--- Diff: 
metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
 ---
@@ -0,0 +1,114 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import com.google.common.collect.ImmutableList;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.LinkedList;
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135336880
  
--- Diff: 
metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/TextFunctionsTest.java
 ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import org.apache.metron.stellar.dsl.DefaultVariableResolver;
+import org.apache.metron.stellar.dsl.ParseException;
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.HashMap;
+import java.util.Map;
+
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.run;
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.runPredicate;
+
+public class TextFunctionsTest {
+
+  static final Map variableMap = new HashMap() {{
+put("metron", "metron");
+put("sentence", "metron is great");
+put("empty", "");
+put("english", "en");
+put("klingon", "Kling");
+put("asf", "Apache Software Foundation");
+  }};
+
+  @Test
+  public void testGetAvailableLanguageTags() {
+Object ret = run("FUZZY_LANGS()", new HashMap<>());
+Assert.assertNotNull(ret);
+Assert.assertTrue(ret instanceof List);
+List tags = (List) ret;
+Assert.assertTrue(tags.size() > 0);
+Assert.assertTrue(tags.contains("en"));
+Assert.assertTrue(tags.contains("fr"));
+  }
+
+  @Test()
+  public void testNoMatchStrings() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',english)",
--- End diff --

sorry, I lost the asserts refactoring, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135336896
  
--- Diff: 
metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/TextFunctionsTest.java
 ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import org.apache.metron.stellar.dsl.DefaultVariableResolver;
+import org.apache.metron.stellar.dsl.ParseException;
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.HashMap;
+import java.util.Map;
+
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.run;
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.runPredicate;
+
+public class TextFunctionsTest {
+
+  static final Map variableMap = new HashMap() {{
+put("metron", "metron");
+put("sentence", "metron is great");
+put("empty", "");
+put("english", "en");
+put("klingon", "Kling");
+put("asf", "Apache Software Foundation");
+  }};
+
+  @Test
+  public void testGetAvailableLanguageTags() {
+Object ret = run("FUZZY_LANGS()", new HashMap<>());
+Assert.assertNotNull(ret);
+Assert.assertTrue(ret instanceof List);
+List tags = (List) ret;
+Assert.assertTrue(tags.size() > 0);
+Assert.assertTrue(tags.contains("en"));
+Assert.assertTrue(tags.contains("fr"));
+  }
+
+  @Test()
+  public void testNoMatchStrings() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',english)",
+new DefaultVariableResolver(v -> variableMap.get(v),
+v -> variableMap.containsKey(v)));
+  }
+
+  @Test(expected = ParseException.class)
+  public void testMissingLanguage() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',klingon)",
+new DefaultVariableResolver(v -> variableMap.get(v),
+v -> variableMap.containsKey(v)));
+  }
+
+  @Test()
+  public void testEmptyFirstArg() throws Exception {
--- End diff --

sorry, I lost the asserts refactoring, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-25 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:

https://github.com/apache/metron/pull/667#discussion_r135336905
  
--- Diff: 
metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/TextFunctionsTest.java
 ---
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements.  See the NOTICE file distributed with this work for 
additional information regarding
+ * copyright ownership.  The ASF licenses this file to you under the 
Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with 
the License.  You may obtain
+ * a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
+ * or implied. See the License for the specific language governing 
permissions and limitations under
+ * the License.
+ */
+
+package org.apache.metron.stellar.dsl.functions;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import org.apache.metron.stellar.dsl.DefaultVariableResolver;
+import org.apache.metron.stellar.dsl.ParseException;
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.HashMap;
+import java.util.Map;
+
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.run;
+import static 
org.apache.metron.stellar.common.utils.StellarProcessorUtils.runPredicate;
+
+public class TextFunctionsTest {
+
+  static final Map variableMap = new HashMap() {{
+put("metron", "metron");
+put("sentence", "metron is great");
+put("empty", "");
+put("english", "en");
+put("klingon", "Kling");
+put("asf", "Apache Software Foundation");
+  }};
+
+  @Test
+  public void testGetAvailableLanguageTags() {
+Object ret = run("FUZZY_LANGS()", new HashMap<>());
+Assert.assertNotNull(ret);
+Assert.assertTrue(ret instanceof List);
+List tags = (List) ret;
+Assert.assertTrue(tags.size() > 0);
+Assert.assertTrue(tags.contains("en"));
+Assert.assertTrue(tags.contains("fr"));
+  }
+
+  @Test()
+  public void testNoMatchStrings() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',english)",
+new DefaultVariableResolver(v -> variableMap.get(v),
+v -> variableMap.containsKey(v)));
+  }
+
+  @Test(expected = ParseException.class)
+  public void testMissingLanguage() throws Exception {
+runPredicate("0 == FUZZY_SCORE(metron,'z',klingon)",
+new DefaultVariableResolver(v -> variableMap.get(v),
+v -> variableMap.containsKey(v)));
+  }
+
+  @Test()
+  public void testEmptyFirstArg() throws Exception {
+runPredicate("0 == FUZZY_SCORE(empty,'z',english)",
+new DefaultVariableResolver(v -> variableMap.get(v), v -> 
variableMap.containsKey(v)));
+  }
+
+  @Test()
+  public void testEmptyFirstTwoArgs() throws Exception {
+runPredicate("0 == FUZZY_SCORE(empty,empty,english)",
--- End diff --

sorry, I lost the asserts refactoring, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #667: METRON-1061 Add FUZZY_SCORE STELLAR function

2017-08-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---