GitHub user cestella opened a pull request:
https://github.com/apache/metron/pull/878
METRON-1377: Stellar function to generate typosquatted domains (similar to
dnstwist)
## Contributor Comments
As a component of a strategy to detect [Typosquatting](
https://en.wikipedia.org/wiki/Typosquatting), generating typosquatted domains
is necessary. As such, a stellar function which replicates the functionality of
dnstwist would be of use.
You can validate this in the REPL via:
```
{17:10}[system]~/Documents/workspace/metron/fork/incubator-metron:typosquat
â â mvn exec:java
-Dexec.mainClass="org.apache.metron.stellar.common.shell.StellarShell" -pl
metron-platform/metron-common
[INFO] Scanning for projects...
[INFO]
[INFO]
------------------------------------------------------------------------
[INFO] Building metron-common 0.4.2
[INFO]
------------------------------------------------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.5.0:java (default-cli) @ metron-common ---
log4j:WARN No appenders could be found for logger
(org.apache.metron.stellar.dsl.functions.resolver.BaseFunctionResolver).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Stellar, Go!
Please note that functions are loading lazily in the background and will be
unavailable until loaded fully.
[Stellar]>>> Functions loaded, you may refer to functions now...
[Stellar]>>>
[Stellar]>>> filter := REDUCE( DOMAIN_TYPOSQUAT( 'amazon' ), (s, d) ->
BLOOM_ADD(s, d), BLOOM_INIT())
[Stellar]>>> BLOOM_EXISTS( filter, 'amazon')
true
[Stellar]>>> BLOOM_EXISTS( filter, 'google')
false
[Stellar]>>> BLOOM_EXISTS( filter, 'amazoon')
true
[Stellar]>>>
```
Note: By itself, this is of some interest, but is not a complete solution.
I suggest as a follow-on to this, two JIRAs:
1. the ability through a new mode for the flat-file loader to write out
serialized objects (e.g. a bloom filter containing all the typosquatted domains
for a CSV of domains)
2. the ability to take a serialized object from HDFS and load it into
memory and return it (e.g. `OBJECT_GET(path)` (with a cache in front of it)
With these, in conjunction with the stellar function from this PR, we
should have the ability to scalably detect typosquatted domains at the
enrichment phase:
1. with the flat file loader, generate a bloom filter containing the
typosquatted domains from the set of known good domains
2. upload to HDFS
3. As an enrichment:
```
is_typosquatted :=
BLOOM_EXISTS(OBJECT_GET('/apps/metron/typosquat/alexa1m.ser', domain))
```
## Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our [Development
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
for the complete guide to follow for contributions.
Please refer also to our [Build Verification
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow
these guidelines and ask you to double check the following:
### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to
be created at [Metron
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [x] Has your PR been rebased against the latest commit within the target
branch (typically master)?
### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been
executed in the root metron folder via:
```
mvn -q clean integration-test install && build_utils/verify_licenses.sh
```
- [x] Have you written or updated unit tests and or integration tests to
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building
and running locally with Vagrant full-dev environment or the equivalent?
### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in
which it is rendered by building and verifying the site-book? If not then run
the following commands and the verify changes via
`site-book/target/site/index.html`:
```
cd site-book
mvn site
```
#### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up
for your personal repository such that your branches are built there before
submitting a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cestella/incubator-metron typosquat
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metron/pull/878.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #878
----
commit a95014ed1e145f9133dd95dcbfbf7e9212401fef
Author: cstella <cestella@...>
Date: 2017-12-19T22:26:03Z
METRON-1377: Stellar function to generate typosquatted domains (similar to
dnstwist)
----
---