Re: [CALL FOR TEST DATA] Request help identifying public domain or opensource test data sets for Metron testing

Matt Foley Thu, 04 May 2017 15:28:33 -0700

Hi Dima,
In terms of process, I’m not aware of any changes to the below.  In response to 
your specific questions:


1 and 2. The individual CLA is always required, as it establishes your 
authority to contribute your contributions.  It is up to you to determine 
whether the corporate CLA is needed, but if your employer has ownership of the 
code you write, then you’ll want to get the CCLA signed for your own protection 
as well as Apache’s.  The following Apache document section says everything 
better than I could:
https://www.apache.org/dev/new-committers-guide.html#cla

BTW, this is in the context of a new committer signing the ICLA, but a 
non-committer contributor contemplating a significant submission can and should 
also sign the ICLA.  You just won’t get an apache email id yet :-)

3. After you sign up on Github and Apache Jira, drop an email to Casey asking 
him to give you karma on Jira so you can self-assign tickets and change their 
status.

5. Within reason.  You don’t need separate jiras for the little pieces.  But 
probably two per sensor (for parser and test code generator) would make sense.  
Basically you should try to split the work into independent reviewable chunks, 
and have a jira for each chunk.  Several small PR’s are a lot easier to review 
and test than one huge one.

6. The email isn’t really a requirement, it’s more to smooth the way.  Think of 
it as an opportunity to send a cover letter to precede your PR.  

Expectations?  The most common response is encouragement.  Which is good, 
because you’re trying to recruit interest.  You’ll need people to put in 
significant time to review your contributions when you make the PR.  If you ask 
questions, advice and suggestions are usually readily offered.

If you were suggesting architecture changes, you’d definitely want to discuss 
them in the email list *before* doing the work to implement them, because the 
PMC members have to approve any architecture changes, and you wouldn’t want to 
get a rejection in the PR after doing all that work.  There can be some 
controversy about architecture issues, but such discussions should be driven 
solely by technical merit, and stay professional and friendly in tone.  Issues 
of maintainability, usability, testability, and performance, are fair game, as 
well as features.  Consistency with existing architecture is encouraged.

But you’re talking about parsers, which are pretty well a plug-in model with a 
standard interface you shouldn’t need to change.  So the email is just a “hey 
remember those parsers we talked about?  They’re coming shortly” message.  If 
you have architecture concerns, or  want to clarify anything before doing the 
submission, by all means bring them up too.

8. How to create a pull request:

Make a fork of Metron in github, if you haven’t already, and create a branch 
named METRON-XXXX (the jira number your PR will address).  Make sure the branch 
is updated to current Apache master, then merge in your work (for that Jira 
only), commit, and push to your github fork.  Now browse to your fork in 
Github, and select the METRON-XXXX branch, then select the “Pull requests” tab 
at the top of the page.  On this page there’s a big button labeled “New pull 
request”.  Click it, and adjust:
*    base fork: apache/metron
*    base: master
*    head fork: <your-github-name>/metron
*    compare: METRON-XXXX
>From here it should be self-explanatory.  It will construct the PR and ask you 
>to fill in a template.  You can see the diffs that reviewers will see.  When 
>you finalize the PR, it will automatically be published to the dev@ mailing 
>list.

Hope this helps,
--Matt

 

On 5/4/17, 2:43 AM, "Dima Kovalyov" <[email protected]> wrote:

    Hello Matt,
    
    It's been long-time for us to continue working in this direction further. 
Thank you for the response.
    
    I wanted to ask if anything changed since our last discussion regarding 
parsers, enrichments and generators contribution. Is there anything else we 
should be doing other then:
    1. Sign Corporate CLA with Apache 
(link).<https://www.apache.org/licenses/#clas>
    2. Sign an Individual CLA for the submitter 
(instructions<https://www.apache.org/licenses/#clas>), I need to do that 
despite #1?
    3. Register on Apache GitHub and JIRA.
    4. Open JIRA master ticket for submissions from SSTECH.
    5. Create sub-task for each piece of code we are going to submit.
    6. Send email to [email protected]<mailto:[email protected]> 
describing proposed changes including JIRA case. What to expect from email? 
Approval or suggestions?
    7. Fork Apache Metron master branch internally, merge our changes and test 
them using single-node vagrant.
    8. Create Pull Request (PR), how?
    9. Wait for the dev team to review, accept changes and answer any questions 
or suggestions.
    
    This above applies to the code that was:
    1. Written and tested.
    2. Covered with unit tests.
    3. Can be built using maven
    4. Has place in the Apache Metron folder tree.
    
    - Dima
    
    
    On 10/08/2016 06:43 AM, Matt Foley wrote:
    Hi Dima,
    Sorry this is getting a little long, but TL;DR on 
Metron+Development+Environment+Setup+Instructions<https://cwiki.apache.org/confluence/display/METRON/Metron+Development+Environment+Setup+Instructions>
 is:
    
    A. Open a Jira for the work you want to do, or the contribution you want to 
make.  Since you have several parsers, you might open an umbrella Jira, with 
four subtask jiras, each of which includes the parser and test data generator 
for one of the four technologies you mentioned.
    B. Send an email to the dev list proposing what you want to submit, and 
referencing the Jira.
    C. Fork the Apache Metron code base in your personal github area.
    D. Make sure your contribution works correctly with the latest master 
branch code.
    E. Decide where in the code tree your contribution would fit best.  The 
parsers themselves would of course go under metron-platform/metron-parsers/.  
The data generators could reasonably be put in the test/ subdirectory, perhaps 
under metron-platform/metron-parsers/src/test/java/org/apache/metron/writers 
(although we would defer to the reviewers).
    F. Add the necessary maven glue so the new pieces build along with the core.
    G. Metron requires all submissions to have unit tests with thorough 
coverage, so add those if they aren’t there yet.
    H. When things are ready to submit, commit everything to your github, and 
create a Pull Request (PR)
    I. Watch the PR and Jira for responses.  Respond to questions, accept 
feedback or suggest alternative solutions, and work through the process with 
the community.  If things need lengthy discussion, you may be asked to do so in 
the dev list.
    J. With patience, all issues will be agreed on, and the contribution will 
be accepted into Metron, for the benefit of the whole community.
    
    Hope this helps.  Feel free to contact me directly, or just ask questions 
on the dev list.
    Best regards,
    —Matt
    
    
    On Oct 7, 2016, at 6:05 PM, Matt Foley 
<[email protected]<mailto:[email protected]>> wrote:
    
    Dima, that’s great!
    
    Since you’re talking about a code contribution (or several :-), let’s move 
the discussion over to the 
[email protected]<mailto:[email protected]> list, 
after this response.  Briefly, here’s how you submit a contribution.
    
    First the housekeeping:
    1. If Sstech has not yet signed a Corporate CLA with Apache, please ask 
them to do so (instructions<https://www.apache.org/licenses/#clas>)
    2. If you, or a colleague who will submit the contributions, has not yet 
signed an Individual CLA, please do so 
(instructions<https://www.apache.org/licenses/#clas>)
    
    Since you’ve been successfully writing Metron parsers, you almost certainly 
have already done the following, but I’ll mention them here for the sake of 
other readers:
    3. If you’re not on the dev mailing list, please join it 
(instructions<https://cwiki.apache.org/confluence/display/METRON/Community+Resources>)
    4. If you weren’t a registered user of Apache’s Jira, you would request to 
be added, but I see you already are, so that’s good.
    5. If you don’t yet have an account on Github.com<http://github.com/>, sign 
up for one (the free level is fine).
    6. Set up a Metron Development Environment, and establish the ability to 
spin up a single-node test environment 
(instructions<https://cwiki.apache.org/confluence/display/METRON/Metron+Development+Environment+Setup+Instructions>)
    
    To actually make the contribution, you follow the process shown in:
    
https://cwiki.apache.org/confluence/display/METRON/Metron+Development+Environment+Setup+Instructions
    
    I’ll go into more detail in a direct email.
    Thanks a lot for being interested in submitting these!
    
    Cheers,
    —Matt
    
    ________________________________
    From: Dima Kovalyov 
<[email protected]<mailto:[email protected]>>
    Sent: Friday, October 07, 2016 4:44 PM
    To: 
[email protected]<mailto:[email protected]>; 
Satish Abburi
    Subject: Re: [CALL FOR TEST DATA] Request help identifying public domain or 
opensource test data sets for Metron testing
    
    Hello Matt,
    
    We (Sstech team) currently have parsers and data generators for BlueCoat, 
Unix, MS Exchange, MS Windows and we would gladly contribute them.
    
    Can you please share the procedure for submitting these peaces?
    Thank you.
    
    - Dima
    
    On 10/08/2016 01:49 AM, Matt Foley wrote:
    Hi all,
    Enhanced testing of Metron, especially performance testing, would be aided 
by having data sets of realistic size, that exercise one or more of the various 
parts of Metron:
    
      *   each Parser (bro, yaf, snort, squid, ...)
      *   each Enhancer (geo, user, assets, ...)
      *   each Threat Intel module (Soltra, HailATaxi, ...)
    
    Data sets must meet the following criteria:
    
      *   opensource or public domain
      *   suitably scrubbed, containing no Personally Identifiable Information
      *   unencumbered by company sensitivity, security, or IP concerns.
    
    They may take the form of raw PCAP streams, or they may be already parsed 
or otherwise pre-processed.
    
    If you know of opensource or public domain data sets of this kind, please 
respond with the URL, in this email thread or to the Jira ticket 
METRON-491<https://issues.apache.org/jira/browse/METRON-491>.
    
    If you have an appropriate data set that your company would be willing to 
contribute, please also respond and we will help in any way we can.
    
    
    Thanks,
    --Matt

Re: [CALL FOR TEST DATA] Request help identifying public domain or opensource test data sets for Metron testing

Reply via email to