[ 
https://issues.apache.org/jira/browse/METRON-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978720#comment-15978720
 ] 

ASF GitHub Bot commented on METRON-870:
---------------------------------------

GitHub user cestella reopened a pull request:

    https://github.com/apache/incubator-metron/pull/541

    METRON-870: Add filtering by packet payload to the pcap query

    ## Contributor Comments
    Currently we have the ability to filter packets in the pcap query tool by 
header information (src/dest ip/port). We should be able to filter by binary 
regex on the packets themselves.
    
    Probably the state of the art and the goal to get to here is integration 
with [Yara](https://virustotal.github.io/yara/), but I'd like to iterate toward 
that solution for a couple of reasons:
    * Yara is hard to integrate with in our stack.
      * It's C and, while the [yara-java](https://github.com/p8a/yara-java) 
project does exist, it would make the build a bit of a pain and no longer 
platform agnostic (i.e. you'd have to build certain modules against the 
machines that you're running in the cluster).  There are paths through that for 
sure, but it's more than I wanted to tackle just now.
    * The core abstraction for the obvious integration yara-java is running 
yara over a file, not a byte array.  This would necessitate taking the 
performance penalty with JNI AND writing out every packet to a temporary file, 
then deleting it, in the MR job.  I did not deem that a sensible approach.
    * Yara is a whole language, similar to stellar.  The point of integration 
would be as a proper `org.apache.metron.pcap.filter.PcapFilter`, not as a 
portion of an existing one.
    
    That lead me to look for a stop-gap that was simpler and had the following 
characteristics:
    * Worked within Java easily
    * Was permissively licensed
    * Functioned on byte arrays
    * Could do both hex regex as well as interpreting the byte array as a 
string (similar to Yara)
    
    [bytestream](https://github.com/nishihatapalmer/byteseek) ( an all java 
regex library that functions on byte arrays, not files) fit the bill without 
eating all of a full-on Yara integration and fit within our core abstractions 
better.  
    
    As such, the approach that I took is to provide the capability both of the 
packet filters that we currently have in place:
    * Fixed via a new command line option `--packet_filter` or `-pf` wherein 
you pass the binary regex.
      * This would restrict to a single pattern
    * Query via a new Stellar function `BYTEARRAY_MATCHER(pattern, packet)`
      * This allows you to compose multiple filters with logic operations to 
get a closer to Yara-esque feel via Stellar
    
    I have made a follow-on task to integrate with Yara at 
[METRON-871](https://issues.apache.org/jira/browse/METRON-871).
    
    Testing plan will be in the comments.
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron (Incubating).  
    Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    
    ### For code changes:
    - [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [x] Have you ensured that the full suite of tests and checks have been 
executed in the root incubating-metron folder via:
      ```
      mvn -q clean integration-test install && build_utils/verify_licenses.sh 
      ```
    
    - [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:
    
      ```
      cd site-book
      bin/generate-md.sh
      mvn site:site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
    It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron METRON-870

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #541
    
----
commit 69fa1b471567731e7024ba63bcb614efbcb83308
Author: cstella <ceste...@gmail.com>
Date:   2017-04-21T01:39:51Z

    METRON-870: Add filtering by packet payload to the pcap query

commit 6fe15eb52e307c10898e6391cd5e02bca344120d
Author: cstella <ceste...@gmail.com>
Date:   2017-04-21T02:48:02Z

    Remove LGPL dependency and replace with something more permissive.

commit 4683bbfbb6a36a7f02b0133003f4afb424de2d43
Author: cstella <ceste...@gmail.com>
Date:   2017-04-21T03:42:50Z

    Wrong mahout math version.

commit 31031481e8f0c29d9bd72f9a1217d87858ff1872
Author: cstella <ceste...@gmail.com>
Date:   2017-04-21T04:01:27Z

    Didn't need the mahout-math bit after all.

commit c50a50d230ae1d71a7a512fc199e26264b17ca60
Author: cstella <ceste...@gmail.com>
Date:   2017-04-21T13:00:59Z

    Changed to a faster bytearray searcher.

----


> Add filtering by packet payload to the pcap query
> -------------------------------------------------
>
>                 Key: METRON-870
>                 URL: https://issues.apache.org/jira/browse/METRON-870
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>
> Currently we have the ability to filter packets in the pcap query tool by 
> header information (src/dest ip/port).  We should be able to filter by binary 
> regex on the packets themselves. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to