Reorganize Apache Links on Spot Asf-site

2020-07-27 Thread Tadd Wood
After reviewing how the Whimsy report website crawler script works it
appears that it's only looking in the homepage for the various Apache
links.  PR-158 was good step forward, but the Whimsy report website crawler
isn’t finding the links we’ve displayed.  We should discuss possibly moving
the links up into their own tab inside the Navbar, which should take care
of this problem since this code is present on the index.html pages.  This
is how several other projects have also organized their Apache links as
well (Livy for example).  Thoughts?

Thank you,
Tadd Wood


Re: Completing clean-up on migrated blog-posts on 'asf-site' branch

2020-07-10 Thread Tadd Wood
+1, Jeremy I’ve noticed the same.  I’m ok with moving forward with the
clean-up.

Thank you,
Tadd Wood

On Jul 10, 2020, at 10:42 AM, Jeremy Nelson 
wrote:

Everyone,

I noticed that in the ONI days, there were some blog posts made with
wordpress.  These blog posts live in top level directories (such as
open-network-insight-3-most-asked-questions).

I noticed that when ONI was made Apache Spot, these things occurred:

(1) The top level directories were deep copied and were edited from
ONI to Apache Spot, creating additional top level directories such as
"apache-spot-3-most-asked-questions"

(2) All of these blog posts were deep copied underneath /blog/ when it
was necessary to migrate the Apache Spot website to the Apache format.

(3) The copied blog posts under /blog/* were edited to point at all
the new apache spot web links.

What *didn't* happen was these original blog posts (1) nor the
non-migrated deep copy ones (2) were cleaned up.

I have verified these things:

(1) Nothing points at these top level blog post directories.
Everything points at /blog/* now.
(2) These top level blog post directories DO NOT POINT at Apache Spot
-- they still point at ONI
(3) Many of these directories contain the only links to files which
have been migrated away from /doc or /wp-content/ into /library

I am using some tooling look for duplicate files and dead links, and I
believe by removing these abandoned/orphaned directories, I can more
accurately capture what is stale and needing cleanup.

Does anyone have any concerns with removing directories that have no
inbound links, which do not link to apache spot, and which have been
successfully copied/migrated to another place?

Thanks,
Jeremy


Re: SPOT-286 and perhaps unnecessary optimizations

2020-05-07 Thread Tadd Wood
+1, Jeremy I think let's move forward with the change and revisit later if
we find that the performance degrades too drastically for IPv6 datasets.

Thank you,
Tadd Wood

On Wed, May 6, 2020 at 12:10 PM Jeremy Nelson 
wrote:

> Greetings,
>
> SPOT-286 points out that in
> scala-ml/src/main/scala/org/apache/spot/*/model/*.scala, there are
> calls to org.apache.spark.sql.functions.broadcast(), which is a
> performance optimization hint to the scala engine. The ticket
> identified at least one scenario where this hint causes a crash.
>
> There is no harm to correctness from removing this optimization, and
> it fixes the crash. I recommend we remove this optimization until the
> matter is re-assessed later.
>
> Jeremy
>


[SPOT-266] ml_ops.sh script should use Spark 2?

2020-03-20 Thread Tadd Wood
I notice there’s a dependency on Spark 2.1+ that didn’t get resolved per
this JIRA (https://issues.apache.org/jira/projects/SPOT/issues/SPOT-266). I
propose adding a check in this script to verify the user’s version of Spark
and if it's not at least 2.1 the script will exit and indicate that the
user should upgrade their version of Spark. We also need to replace
spark-submit with spark-submit2. Any thoughts for/against updating this
script?

Also would anyone be willing to update the documentation on the Spot
website to reflect the correct required version of Spark to run Spot?

Thank you,
Tadd Wood


Re: Podling Spot Report Reminder - March 2020

2020-03-04 Thread Tadd Wood
Podling report should be up now.  

Thank you,
Tadd Wood

> On Mar 3, 2020, at 9:42 AM, Uma gangumalla  wrote:
> 
> Great. Thanks for the update, Jeremy !!!
> 
> Regards,
> Uma
> 
> On Tue, Mar 3, 2020 at 9:39 AM Jeremy Nelson 
> wrote:
> 
>> I had talked with Tadd about the report.  I believe he is working on it.
>> Everything seemed on schedule to be ready by the deadline.
>> 
>> Jeremy
>> 
>> On Tue, Mar 3, 2020 at 11:36 AM Uma gangumalla 
>> wrote:
>> 
>>> HI All,
>>> 
>>> Some one working on this report?
>>> 
>>> Regards,
>>> Uma
>>> 
>>> On Tue, Feb 25, 2020 at 3:26 PM  wrote:
>>> 
>>>> Dear podling,
>>>> 
>>>> This email was sent by an automated system on behalf of the Apache
>>>> Incubator PMC. It is an initial reminder to give you plenty of time to
>>>> prepare your quarterly board report.
>>>> 
>>>> The board meeting is scheduled for Wed, 18 March 2020, 10:30 am PDT.
>>>> The report for your podling will form a part of the Incubator PMC
>>>> report. The Incubator PMC requires your report to be submitted 2 weeks
>>>> before the board meeting, to allow sufficient time for review and
>>>> submission (Wed, March 04).
>>>> 
>>>> Please submit your report with sufficient time to allow the Incubator
>>>> PMC, and subsequently board members to review and digest. Again, the
>>>> very latest you should submit your report is 2 weeks prior to the board
>>>> meeting.
>>>> 
>>>> Candidate names should not be made public before people are actually
>>>> elected, so please do not include the names of potential committers or
>>>> PPMC members in your report.
>>>> 
>>>> Thanks,
>>>> 
>>>> The Apache Incubator PMC
>>>> 
>>>> Submitting your Report
>>>> 
>>>> --
>>>> 
>>>> Your report should contain the following:
>>>> 
>>>> *   Your project name
>>>> *   A brief description of your project, which assumes no knowledge of
>>>>the project or necessarily of its field
>>>> *   A list of the three most important issues to address in the move
>>>>towards graduation.
>>>> *   Any issues that the Incubator PMC or ASF Board might wish/need to
>> be
>>>>aware of
>>>> *   How has the community developed since the last report
>>>> *   How has the project developed since the last report.
>>>> *   How does the podling rate their own maturity.
>>>> 
>>>> This should be appended to the Incubator Wiki page at:
>>>> 
>>>> https://cwiki.apache.org/confluence/display/INCUBATOR/March2020
>>>> 
>>>> Note: This is manually populated. You may need to wait a little before
>>>> this page is created from a template.
>>>> 
>>>> Note: The format of the report has changed to use markdown.
>>>> 
>>>> Mentors
>>>> ---
>>>> 
>>>> Mentors should review reports for their project(s) and sign them off on
>>>> the Incubator wiki page. Signing off reports shows that you are
>>>> following the project - projects that are not signed may raise alarms
>>>> for the Incubator PMC.
>>>> 
>>>> Incubator PMC
>>>> 
>>> 
>> 



[DISCUSS] Migrate spot-demo code from ONI repository

2020-03-02 Thread Tadd Wood
Similar to NFdump, I think we should start looking at migrating the
spot-demo code properly over from the original ONI repository so that it's
formally managed under the Apache Spot codebase.

We should also look at revamping the demo code at some point soon, but I
think this would be a first step towards that effort. Thoughts?

Thank you,
Tadd Wood


Re: [asf-site] Whimsy report indicates missing Apache links

2020-02-19 Thread Tadd Wood
Vivienne, can you also add a link for Security to the list?  This link
should suffice:  https://www.apache.org/security/
<http://www.apache.org/security/>

Thank you,
Tadd Wood

On Thu, Jan 30, 2020 at 9:18 PM Tadd Wood  wrote:

> An extra hand would be greatly appreciated.  I've already gone ahead and
> created a JIRA to track this:
> https://issues.apache.org/jira/browse/SPOT-299
>
> Take a look and let me know when your changes are ready for review.
>
> Thank you,
> Tadd Wood
>
> On Wed, Jan 29, 2020 at 6:51 PM Vivienne Pustell 
> wrote:
>
>> That sounds like a great idea, Tadd -- makes sense to get the links in
>> place so that everything is in order and connected where it should be. Do
>> you have that under control, or would you like a hand?
>>
>> Best,
>> -Vivienne
>>
>> On Wed, Jan 29, 2020 at 6:37 PM Tadd Wood  wrote:
>>
>> > I took a look into the Whimsy report for Spot website and it looks like
>> > we’re missing a number of Apache foundation links:
>> >
>> > https://whimsy.apache.org/pods/project/spot
>> >
>> > I think we could easily add these links to this page to fix this issue:
>> >
>> > https://spot.incubator.apache.org/get-started/supporting-apache/
>> >
>> > Thoughts?
>> >
>> > Thank you,
>> > Tadd Wood
>> >
>>
>


Apache Spot logo missing according to Whimsy report?

2020-02-19 Thread Tadd Wood
I noticed a few weeks ago that we were also being flagged on the Whimsy
report for our logo missing in the Apache project logos repo:
https://www.apache.org/img/.  It isn't entirely clear why this was missing,
but our logos do show up when searched here:
https://www.apache.org/logos/?#spot.

I've created a 212px png of the Apache Spot logo for submission as this is
the requirement per the Whimsy report and am going to reach out to
d...@community.apache.org to see if they can help point us in the right
direction to submit this.

Thank you,
Tadd Wood


Re: install instructions have a typo

2020-02-14 Thread Tadd Wood
Skip, thanks for pointing that out.  Should be a quick fix which is great.
Also, this ticket is relevant to that issue:
https://issues.apache.org/jira/projects/SPOT/issues/SPOT-147

Let us know when you've got a PR and we'll get it reviewed.

Thank you,
Tadd Wood

On Fri, Feb 14, 2020 at 9:39 AM skip cruse  wrote:

> I was following the install instructions and noticed that the commands
> cannot be copy and pasted as there are missing line breaks in the commands.
> Additionally a -j is needed on the tar command to successfully extract a
> .bzip2 file and extract wireshark. We should update this page to have
> proper line breaking and correct command line flags.
>
> BR,
> Skip
> --
> if( bool halfWayThere == true);
> printf "WAH! LIVIN ON A PRAYER";
>


Re: Installation instructions imply/recommend that Spot-ML runs on YARN Node Manager, but should be Edge Server

2020-02-13 Thread Tadd Wood
Agreed Jeremy, I think the diagram is a little misleading.  If you want,
checkout this JIRA that discusses this issue as well as a few other parts
of the documentation we need to revise:
https://issues.apache.org/jira/projects/SPOT/issues/SPOT-224

Thank you,
Tadd Wood

On Mon, Feb 10, 2020 at 2:29 PM Jeremy Nelson 
wrote:

> On https://spot.incubator.apache.org/doc/ in section “2. Deployment
> Recommendations”,
> we are recommending that the “spot-ml” software (which is all
> scala-spark-streaming)
> should be run on the YARN node manager. On the "service layout", it is
> implied that
> the spot-ml software is installed on "Worker" node.
>
> But these jobs are intended to be launched from the edge node, where the
> YARN service packages them up and publishes them to worker nodes. Thus, to
> make this simpler and more obvious for the users, the "ML" task should be
> shown as being installed, configured, and executed from the Edge node.
>
> Jeremy
>


Re: nfdump install instructions seem inaccurate

2020-02-05 Thread Tadd Wood
I support moving/forking the code in until we decide how we want to proceed
with eliminating that dependency.

Thank you,
Tadd Wood

On Wed, Feb 5, 2020 at 7:55 AM Jeremy Nelson 
wrote:

> I agree with the sentiment that we should work towards eliminating the
> dependency.
> In the meantime, until we figure out how to do that, it may be necessary
> for us to make changes to spot-nfdump.
> I would suggest that just for our own sanity, we should include/fork the
> project (open-network-insight/spot-nfdump) so we can make commits to it,
> and so that unknown people can't make changes that might affect our users.
> We could update the documentation to direct users to install our maintained
> version, instead of the external copy.
>
> Jeremy
>
>
> On Wed, Feb 5, 2020 at 12:08 AM Nate Smith  wrote:
>
> > I’ve not been active in this project for a while, so forgive me if this
> > has already been addressed.
> >
> > I’m assuming this is for spot-nfdump?
> >
> > If so it’s worth saying unless someone has been maintaining it we should
> > look at removing the dependency.
> >
> > Originally we cooked our own version due to time stamp issues. But this
> > should really be handled post ingestion IMHO.
> >
> > - nathanael
> >
> > > On Feb 4, 2020, at 3:40 PM, skip cruse  wrote:
> > >
> > > I noticed when setting up nfdump for spot-ingest that there were some
> > > errors around the version of automake that’s requried. Apparently
> > automake
> > > 1.14 is required and after installing that everything worked fine. I
> did
> > > some digging and it looks like this was already raised via SPOT-178:
> > > https://issues.apache.org/jira/browse/SPOT-178. Perhaps someone could
> > > update the website to reflect this dependency and add the additional
> > setup
> > > steps in there as well?
> > >
> > > Cheers,
> > > Skip
> > > --
> > > if( bool halfWayThere == true);
> > > printf "WAH! LIVIN ON A PRAYER";
> >
>


Re: [asf-site] Whimsy report indicates missing Apache links

2020-01-30 Thread Tadd Wood
An extra hand would be greatly appreciated.  I've already gone ahead and
created a JIRA to track this:
https://issues.apache.org/jira/browse/SPOT-299

Take a look and let me know when your changes are ready for review.

Thank you,
Tadd Wood

On Wed, Jan 29, 2020 at 6:51 PM Vivienne Pustell  wrote:

> That sounds like a great idea, Tadd -- makes sense to get the links in
> place so that everything is in order and connected where it should be. Do
> you have that under control, or would you like a hand?
>
> Best,
> -Vivienne
>
> On Wed, Jan 29, 2020 at 6:37 PM Tadd Wood  wrote:
>
> > I took a look into the Whimsy report for Spot website and it looks like
> > we’re missing a number of Apache foundation links:
> >
> > https://whimsy.apache.org/pods/project/spot
> >
> > I think we could easily add these links to this page to fix this issue:
> >
> > https://spot.incubator.apache.org/get-started/supporting-apache/
> >
> > Thoughts?
> >
> > Thank you,
> > Tadd Wood
> >
>


[asf-site] Whimsy report indicates missing Apache links

2020-01-29 Thread Tadd Wood
I took a look into the Whimsy report for Spot website and it looks like
we’re missing a number of Apache foundation links:

https://whimsy.apache.org/pods/project/spot

I think we could easily add these links to this page to fix this issue:

https://spot.incubator.apache.org/get-started/supporting-apache/

Thoughts?

Thank you,
Tadd Wood


Re: Maybe we should check documentation for /etc/spot.conf as well?

2020-01-29 Thread Tadd Wood
Yes I agree that we should review this, and add better documentation and
code comments around spot.conf. There’s notes on the website about certain
variables being deprecated and we should make a note both on the website
and in the code what versions those variables became deprecated to help
reduce confusion.  More example values for the configs would also be great.

Thank you,

Tadd Wood

On Tue, Jan 28, 2020 at 8:21 AM Brian McInerney 
wrote:

> Last week, we suggested that we needed to document the “ingest_spot.conf”
> config file, because it was poorly documented on the website. The other
> major config file, /etc/spot.conf, looks like it has much more mature
> documentation on the website, but perhaps we should review whether this
> information is current and accurate so that we have a source of truth.
>


Re: Improving conformance with Apache Podling expectations

2020-01-28 Thread Tadd Wood
Justin,

I think Whimsy might be down or having some issues right now (I'm seeing
500 errors anytime I click a name), but is there a definitive list or
reference to the 13 that are missing so we can reach out to them?

Thank you,
Tadd Wood

On Tue, Jan 28, 2020 at 3:30 PM Justin Mclean  wrote:

> Hi,
>
> I can still see that 13 people on your PPMC are not signed up to the
> private mailing list. Can someone please look into this and try and correct
> it.
>
> Thanks,
> justin
>


Re: [asf-site] Clean-up duplicate files and directories

2020-01-24 Thread Tadd Wood
Jose, since it looks like you're already subscribed to the dev list I would
also suggest getting an account setup on the Apache JIRA site so that you
can open tickets and explore issues or comment on outstanding issues.

https://issues.apache.org/jira/browse/SPOT/

Is there a particular component you're interested in digging into or
contributing to?  There's also a project board in Github that has groups of
different issues that can be claimed as well, especially those marked as
"Tickets that need action".

https://github.com/apache/incubator-spot/projects/1

Feel free to start a new thread in the dev list once you've got a project
or discussion you're interested in kicking off.

Thank you,
Tadd Wood

On Fri, Jan 24, 2020 at 10:35 AM Jose Delgado  wrote:

> I would like to start to contribute to the project
>
> How is the process to start?
>
> On Fri, Jan 24, 2020 at 12:21 AM Tadd Wood  wrote:
>
> > After digging into the asf-site branch for a bit, I've noticed there are
> > duplicate files and directories which could likely be removed. Some of
> > these even look like Wordpress carryovers that can either be removed or
> > consolidated. I’d like to clean this up so we have a better baseline for
> > the site's structure. I think in a follow-up project it would be worth
> > reorganizing the site a bit too so that it's easier to expand and add-to.
> > Thoughts?
> >
> > Thank you,
> > Tadd Wood
> >
>


[asf-site] Clean-up duplicate files and directories

2020-01-23 Thread Tadd Wood
After digging into the asf-site branch for a bit, I've noticed there are
duplicate files and directories which could likely be removed. Some of
these even look like Wordpress carryovers that can either be removed or
consolidated. I’d like to clean this up so we have a better baseline for
the site's structure. I think in a follow-up project it would be worth
reorganizing the site a bit too so that it's easier to expand and add-to.
Thoughts?

Thank you,
Tadd Wood


Re: Fixing link to PySParkStreaming-Kafka support JAR

2020-01-23 Thread Tadd Wood
+1, looks like an easy fix and will help avoid confusion.  Let me know when
there's a JIRA and/or PR and I'll review.

Thank you,
Tadd Wood

On Mon, Jan 20, 2020 at 2:09 PM skip cruse  wrote:

> +1, I had to figure this out as well, and it wasn't exactly straightforward
> to be honest.
>
> On Mon, Jan 20, 2020 at 12:48 PM Jeremy Nelson 
> wrote:
>
> > Greetings All!
> >
> > I was figuring out what was involved with running pyspark-streaming with
> > kafka on a test cluster, and I noticed that the URL to the JAR that is
> > required to support PySparkStreaming-Kafka has a typo in it. There is a
> > space between “spark-streaming-kafka-” and “0-8-assembly”. This keeps you
> > from being able to copy/paste the text directly.
> >
> > Although I was able to figure out the problem, new users might find it
> hard
> > to diagnose the problem. Maybe we can fix this typo?
> >
> > Thanks, Jeremy
> >
>
>
> --
> if( bool halfWayThere == true);
> echo "WAH! LIVIN ON A PRAYER";
>


Re: ingest_conf question

2020-01-22 Thread Tadd Wood
+1, yes I think there's an opportunity to document the config parameters
better.  Once we get that ironed out we should also push to the Spot
website.  Does it make sense to split this into two different JIRA tasks?

Thank you,
Tadd Wood

On Wed, Jan 22, 2020 at 2:13 PM skip cruse  wrote:

> I was setting up Spot-Ingest, and I noticed that I needed to create a
> “ingest_conf.json” file. The documentation on
> http://spot.apache.org/doc/#deployment refers to a few things, but doesn’t
> go into any details. It does link to the github page at
> https://github.com/apache/incubator-spot/tree/master/spot-ingest/README.md
> ,
> but that gives a blank config, and doesn’t talk in more detail about how to
> determine what values are appropriate.
>
> Perhaps we can find out if people have successfully filled in this config
> file, and can share details that don’t talk about their specific
> implementation details, but do lead a user to understanding what the values
> are intended to be?
>
> --
> if( bool halfWayThere == true);
> echo "WAH! LIVIN ON A PRAYER";
>


Re: Details on ODM

2020-01-21 Thread Tadd Wood
I've created a JIRA to track this improvement, which can be referenced here:
https://issues.apache.org/jira/browse/SPOT-298

I'll update this thread when there's a PR to review.

Thank you,
Tadd Wood

On Fri, Jan 17, 2020 at 8:55 AM Brian McInerney 
wrote:

> Good call both of you! I look forward to seeing the changes to the site!
>
> Thank you,
> Brian McInerney
>
> On Fri, Jan 17, 2020 at 10:37 AM Tadd Wood  wrote:
>
> > Hi Ethan, thanks for reaching out.  There's documentation in GitHub for
> the
> > ODM (
> >
> >
> https://github.com/apache/incubator-spot/blob/SPOT-181_ODM/docs/open-data-model.md
> > ),
> > but unfortunately there's nothing on the Spot website currently aside
> from
> > a high level description of the purpose and theory around the ODM.  I
> think
> > that's something we should remedy right away. I can take that on as a
> task.
> >
> > Thank you,
> > Tadd Wood
> >
> > On Thu, Jan 16, 2020 at 10:02 PM Ethan Pemberton  >
> > wrote:
> >
> > > Hi!
> > >
> > > I’m looking at ways to improve our Security Operations, specifically in
> > > getting more value out of our SIEM. SPOT looks interesting, and I am
> > > especially interested in the ODM. It sounds like it could have a lot of
> > > potential. Other than the high level mention in the project
> > descriptions, I
> > > can’t find any documentation. Can you provide a link?
> > >
> > > Cheers,
> > >
> > > Ethan
> > >
> >
>


Re: Improving conformance with Apache Podling expectations

2020-01-16 Thread Tadd Wood
Justin,

I see this note on the PPMC guide (
https://incubator.apache.org/guides/ppmc.html):

The mentors should verify that all PPMC members are subscribed to the
private list. The Whimsy Podling Roster
<https://whimsy.apache.org/roster/ppmc/> shows who is subscribed, and any
subscriber can send a "ping - please reply" message to check who is
actually "listening" to the PPMC list.

What's strange is that if I click on Austin's user (as an example) it
doesn't show he's subscribed to any of the lists which doesn't make any
sense given that he's obviously on the dev list.  Any idea why there would
be an inconsistency here?  When I click on my user though I see that I'm a
part of dev and committers, which makes sense but I'm also pretty sure I've
used the private list in the past as well.


Thank you,
Tadd Wood


On Wed, Jan 15, 2020 at 10:08 PM Austin Leahy 
wrote:

> Thanks for surfacing this are there anymore of these?
>
> > On Jan 15, 2020, at 8:52 PM, Justin Mclean  wrote:
> >
> > Hi,
> >
> > You should look to fixing up your PPMC at the same time, many PPMC
> members are not subscribed to the private mailing list, it is a requirement
> that they do so.
> >
> > Thanks,
> > Justin
>


Re: [SPOT-INGEST] Ingest file organization

2020-01-15 Thread Tadd Wood
That would be really helpful Kostas.  I'll create a JIRA today for this if
you can work on tracking the files down in the meantime.

Thank you,
Tadd Wood

On Wed, Jan 15, 2020 at 12:57 AM Kostas Tzoulas  wrote:

> Or even better
>
> ./spot-ingest/python
>
> ./spot-ingest/pyspark-streaming
>
> Also, I could help to track down the files that were created for ingestion.
>
>
> - kostas
>
>
> On 15/01/2020 06:28, Nate Smith wrote:
> > Perhaps separating by framework would be good,
> >
> > ./spot-ingest/python
> > ./spot-ingest/spark-streaming
> >
> > Just my 2 cents,
> >
> > - nathanael
> >
> >> On Jan 14, 2020, at 4:45 PM, Skip Cruse  wrote:
> >>
> >> We should keep the name /spot-ingest/ for the original ingester, but
> move the new ingester to /spot-ingest-sparkstreaming/ or similar.
> Hopefully we can use the ticket to track down the files that were created,
> so we can move them to a new home easily.
> >>
> >> Get Outlook for iOS<https://aka.ms/o0ukef>
> >>
> >> 
> >> From: Tadd Wood 
> >> Sent: Tuesday, January 14, 2020 5:51 PM
> >> To: dev@spot.incubator.apache.org
> >> Subject: [SPOT-INGEST] Ingest file organization
> >>
> >> I noticed that after SPOT-141 was introduced (a new kind of Spot Ingest,
> >> using PySpark Streaming) that it overlaid the new code on top of the old
> >> code on /spot-ingest/. When debugging the code, it makes it hard to
> >> determine which files are relevant to the new or the old ingest
> process. We
> >> should split them apart. Thoughts?
> >>
> >> Thank you,
> >> Tadd Wood
>


[SPOT-INGEST] Ingest file organization

2020-01-14 Thread Tadd Wood
I noticed that after SPOT-141 was introduced (a new kind of Spot Ingest,
using PySpark Streaming) that it overlaid the new code on top of the old
code on /spot-ingest/. When debugging the code, it makes it hard to
determine which files are relevant to the new or the old ingest process. We
should split them apart. Thoughts?

Thank you,
Tadd Wood


Re: update copyright

2020-01-13 Thread Tadd Wood
There should now be two PRs available for review to resolve this issue:

https://github.com/apache/incubator-spot/pull/155
https://github.com/apache/incubator-spot/pull/156

Thank you,
Tadd Wood

On Mon, Jan 13, 2020 at 3:34 PM Austin Leahy 
wrote:

> +1
>
> On Mon, Jan 13, 2020 at 2:50 PM Tadd Wood  wrote:
>
> > Good point.  I’ve actually started some work on that effort before the
> > holidays, but need to push my PR.  I will complete that by tonight so we
> > can review quickly and push that update.
> >
> > Thank you,
> > Tadd Wood
> >
> > On Jan 13, 2020, at 2:38 PM, skip cruse  wrote:
> >
> > I think we should update the copyright notices on the spot website
> > http://spot.apache.org/get-started/ to 2020.
> >
> > --
> > if( bool halfWayThere == true);
> > echo "WAH! LIVIN ON A PRAYER";
> >
>


Re: update copyright

2020-01-13 Thread Tadd Wood
Good point.  I’ve actually started some work on that effort before the
holidays, but need to push my PR.  I will complete that by tonight so we
can review quickly and push that update.

Thank you,
Tadd Wood

On Jan 13, 2020, at 2:38 PM, skip cruse  wrote:

I think we should update the copyright notices on the spot website
http://spot.apache.org/get-started/ to 2020.

-- 
if( bool halfWayThere == true);
echo "WAH! LIVIN ON A PRAYER";


Re: [GitHub] [incubator-spot] dependabot[bot] opened a new pull request #154: Bump pyyaml from 3.12 to 5.1 in /spot-ingest

2019-11-01 Thread Tadd Wood
Does anyone have any thoughts or know any obvious incompatibilities with 
respect to this library update that was auto-pushed by Github?  

If so please comment on the PR 154 so we can evaluate this one quickly and 
determine if further work is needed to get it merged in.

Thank you,
Tadd Wood

> On Nov 1, 2019, at 5:19 AM, GitBox  wrote:
> 
> dependabot[bot] opened a new pull request #154: Bump pyyaml from 3.12 to 5.1 
> in /spot-ingest
> URL: https://github.com/apache/incubator-spot/pull/154
> 
> 
>   Bumps [pyyaml](https://github.com/yaml/pyyaml) from 3.12 to 5.1.
>   
>   Changelog
> 
>   *Sourced from [pyyaml's 
> changelog](https://github.com/yaml/pyyaml/blob/master/CHANGES).*
> 
>> 5.1 (2019-03-13)
>> 
>> 
>> * 
>> [yaml/pyyaml#35](https://github-redirect.dependabot.com/yaml/pyyaml/pull/35) 
>> -- Some modernization of the test running
>> * 
>> [yaml/pyyaml#42](https://github-redirect.dependabot.com/yaml/pyyaml/pull/42) 
>> -- Install tox in a virtualenv
>> * 
>> [yaml/pyyaml#45](https://github-redirect.dependabot.com/yaml/pyyaml/pull/45) 
>> -- Allow colon in a plain scalar in a flow context
>> * 
>> [yaml/pyyaml#48](https://github-redirect.dependabot.com/yaml/pyyaml/pull/48) 
>> -- Fix typos
>> * 
>> [yaml/pyyaml#55](https://github-redirect.dependabot.com/yaml/pyyaml/pull/55) 
>> -- Improve RepresenterError creation
>> * 
>> [yaml/pyyaml#59](https://github-redirect.dependabot.com/yaml/pyyaml/pull/59) 
>> -- Resolves 
>> [#57](https://github-redirect.dependabot.com/yaml/pyyaml/issues/57), update 
>> readme issues link
>> * 
>> [yaml/pyyaml#60](https://github-redirect.dependabot.com/yaml/pyyaml/pull/60) 
>> -- Document and test Python 3.6 support
>> * 
>> [yaml/pyyaml#61](https://github-redirect.dependabot.com/yaml/pyyaml/pull/61) 
>> -- Use Travis CI built in pip cache support
>> * 
>> [yaml/pyyaml#62](https://github-redirect.dependabot.com/yaml/pyyaml/pull/62) 
>> -- Remove tox workaround for Travis CI
>> * 
>> [yaml/pyyaml#63](https://github-redirect.dependabot.com/yaml/pyyaml/pull/63) 
>> -- Adding support to Unicode characters over codepoint 0x
>> * 
>> [yaml/pyyaml#65](https://github-redirect.dependabot.com/yaml/pyyaml/pull/65) 
>> -- Support unicode literals over codepoint 0x
>> * 
>> [yaml/pyyaml#75](https://github-redirect.dependabot.com/yaml/pyyaml/pull/75) 
>> -- add 3.12 changelog
>> * 
>> [yaml/pyyaml#76](https://github-redirect.dependabot.com/yaml/pyyaml/pull/76) 
>> -- Fallback to Pure Python if Compilation fails
>> * 
>> [yaml/pyyaml#84](https://github-redirect.dependabot.com/yaml/pyyaml/pull/84) 
>> -- Drop unsupported Python 3.3
>> * 
>> [yaml/pyyaml#102](https://github-redirect.dependabot.com/yaml/pyyaml/pull/102)
>>  -- Include license file in the generated wheel package
>> * 
>> [yaml/pyyaml#105](https://github-redirect.dependabot.com/yaml/pyyaml/pull/105)
>>  -- Removed Python 2.6 & 3.3 support
>> * 
>> [yaml/pyyaml#111](https://github-redirect.dependabot.com/yaml/pyyaml/pull/111)
>>  -- Remove commented out Psyco code
>> * 
>> [yaml/pyyaml#129](https://github-redirect.dependabot.com/yaml/pyyaml/pull/129)
>>  -- Remove call to `ord` in lib3 emitter code
>> * 
>> [yaml/pyyaml#143](https://github-redirect.dependabot.com/yaml/pyyaml/pull/143)
>>  -- Allow to turn off sorting keys in Dumper
>> * 
>> [yaml/pyyaml#149](https://github-redirect.dependabot.com/yaml/pyyaml/pull/149)
>>  -- Test on Python 3.7-dev
>> * 
>> [yaml/pyyaml#158](https://github-redirect.dependabot.com/yaml/pyyaml/pull/158)
>>  -- Support escaped slash in double quotes "\/"
>> * 
>> [yaml/pyyaml#181](https://github-redirect.dependabot.com/yaml/pyyaml/pull/181)
>>  -- Import Hashable from collections.abc
>> * 
>> [yaml/pyyaml#256](https://github-redirect.dependabot.com/yaml/pyyaml/pull/256)
>>  -- Make default_flow_style=False
>> * 
>> [yaml/pyyaml#257](https://github-redirect.dependabot.com/yaml/pyyaml/pull/257)
>>  -- Deprecate yaml.load and add FullLoader and UnsafeLoader classes
>> * 
>> [yaml/pyyaml#263](https://github-redirect.dependabot.com/yaml/pyyaml/pull/263)
>>  -- Windows Appveyor build
>> 
>> 3.13 (2018-07-05)
>> -
>> 
>> * Resolved issues around PyYAML working in Python 3.7.
>   
>   
>   Commits
> 
>   - 
> [`e471e86`](https://github.com/yaml/pyyaml/commit/e471e86bf6dabdad45a1438c20a4a5c033eb9034)
>  Updates for 5.1 release
>   - 
> [`9141e90`](https://github.com/yaml/

Re: Podling report due today

2019-06-04 Thread Tadd Wood
Uma,

I updated confluence with our podling report update for June 2019.  Let me know 
if there’s anything I missed or need to adjust.

Thank you,
Tadd Wood

> On Jun 5, 2019, at 1:24 AM, Uma gangumalla  wrote:
> 
> -- Forwarded message -
> From: Justin Mclean 
> Date: Tue, Jun 4, 2019 at 3:58 PM
> Subject: Podling report due today
> To: 
> 
> 
> Hi,
> Don't forget your podling report is due today. [1]
> Thanks,
> Justin
> 1. https://cwiki.apache.org/confluence/display/INCUBATOR/June2019


Re: ODM Merge?

2019-03-25 Thread Tadd Wood
Alan,

Below are the statuses of the currently open PRs.  I can merge any of these 
once they’ve got enough votes or have been reviewed furthered by other 
committers.  Also happy to have discussions on any of these PRs individually if 
anyone wants to start new discussions/threads in the Dev list.

Ok to merge, but need more votes:
PR Title
Author
PR Link
Notes
PR #153
Updates to Spot Website Copyright Year
@schonz
https://github.com/apache/incubator-spot/pull/153 
<https://github.com/apache/incubator-spot/pull/153>   

PR #151
Fix sudo command, add help, fix ShellCheck warnings
@pdion891
https://github.com/apache/incubator-spot/pull/151 
<https://github.com/apache/incubator-spot/pull/151>   

PR #147
Updates to Spot Website
@schonz
https://github.com/apache/incubator-spot/pull/147 
<https://github.com/apache/incubator-spot/pull/147>   

PR #144
spot-ingest for ODM with config-driven Spark streaming (Envelope)
@curtishoward
https://github.com/apache/incubator-spot/pull/144 
<https://github.com/apache/incubator-spot/pull/144>   

PR #143
clean config options via configurator.py
@natedogs911
https://github.com/apache/incubator-spot/pull/143 
<https://github.com/apache/incubator-spot/pull/143>   

PR #140
odm event schema updates
@TaddWood
https://github.com/apache/incubator-spot/pull/140 
<https://github.com/apache/incubator-spot/pull/140>   

PR #126
Fix broken ingest_summary generation
@castleguarders
https://github.com/apache/incubator-spot/pull/126 
<https://github.com/apache/incubator-spot/pull/126>   

PR #102
Proxy Spot Schema
@mpereaji
https://github.com/apache/incubator-spot/pull/102 
<https://github.com/apache/incubator-spot/pull/102>   






Awaiting updates/fixes from committer:




PR #95
Edited to match documentation
@rphi
https://github.com/apache/incubator-spot/pull/95 
<https://github.com/apache/incubator-spot/pull/95/commits> 






Has merge conflicts:




PR #25
Improved hdfs_setup.sh
@kpeiruza
https://github.com/apache/incubator-spot/pull/25 
<https://github.com/apache/incubator-spot/pull/25> 

PR #24
aiming to close spot-23
@natedogs911
https://github.com/apache/incubator-spot/pull/24 
<https://github.com/apache/incubator-spot/pull/24> 

PR #21
Fix low hanging fruit in the documentation
@gustavstickley
https://github.com/apache/incubator-spot/pull/21 
<https://github.com/apache/incubator-spot/pull/21> 






Needs to be broken into a new branch:




PR #150
Spot 181 odm

./ml_ops.sh 20181102 flow 5000 0.5
The order of the two parameters needs to be changed.
@tzhou2018
https://github.com/apache/incubator-spot/pull/150 
<https://github.com/apache/incubator-spot/pull/150>   
I think this PR was mistakenly trying to merge the ODM into the master branch.





Need further review by other committers:




PR #141
Ingestion using Spark Streaming
@ktzoulas 
https://github.com/apache/incubator-spot/pull/141 
<https://github.com/apache/incubator-spot/pull/141>   

PR #149
Inconsistencies in the open data model descriptions
@cgiraldo
https://github.com/apache/incubator-spot/pull/149 
<https://github.com/apache/incubator-spot/pull/149>   


Thank you,
Tadd Wood

> On Mar 22, 2019, at 10:48 AM, Austin Leahy  wrote:
> 
> I may be able to leverage some other stuff we have been doing lately to
> close some ui gaps. Will get back in and poke around this afternoon.
> 
> On Thu, Mar 21, 2019 at 10:17 PM Nate Smith  wrote:
> 
>> As I recall I think we can merge envelope without affecting the existing
>> code. But as you pointed out there are other gaps such as the UI.
>> At this point I say merge as much as we can (assuming no obvious code
>> quality issues) as any movement at this point is positive. If something
>> breaks then we know what needs to be fixed.
>> 
>> - nathanael
>> 
>>> On Mar 21, 2019, at 6:07 PM, Tadd Wood 
>> wrote:
>>> 
>>> Alan,
>>> 
>>> I can help organize the open PRs.  Right now the biggest barrier to
>> merging in the ODM branch is bridging the gap between the ingest code and
>> the ODM.
>>> @curtishoward did some great work in PR #144 using Envelope as the
>> ingest framework for populating the ODM.  I will reach out to see what work
>> is left to finish up that PR so we can merge it in.
>>> 
>>> Thank you,
>>> Tadd Wood
>>> 
>>> 
>>>> On Mar 21, 2019, at 4:02 PM, Alan Ross  wrote:
>>>> 
>>>> thanks for the reply, Pierre-Luc.
>>>> 
>>>> Any input on merging PRs? Is there a list of current open and which ones
>>>> have been reviewed?
>>>> 
>>>>> On Thu, Mar 21, 2019 at 1:10 PM Pierre-Luc Dion 
>> wrote:
>>>>> 
>>>>> Look like there few

Re: ODM Merge?

2019-03-21 Thread Tadd Wood
Alan,

I can help organize the open PRs.  Right now the biggest barrier to merging in 
the ODM branch is bridging the gap between the ingest code and the ODM.  
@curtishoward did some great work in PR #144 using Envelope as the ingest 
framework for populating the ODM.  I will reach out to see what work is left to 
finish up that PR so we can merge it in.

Thank you,
Tadd Wood


> On Mar 21, 2019, at 4:02 PM, Alan Ross  wrote:
> 
> thanks for the reply, Pierre-Luc.
> 
> Any input on merging PRs? Is there a list of current open and which ones
> have been reviewed?
> 
> On Thu, Mar 21, 2019 at 1:10 PM Pierre-Luc Dion  wrote:
> 
>> Look like there few pending PR waiting to be merge to this branch, wouldn't
>> it make sense to merge all that first, then merge SPOT-181_odm branch into
>> master?
>> I'm not committer so I can't help on that but I can help with review
>> wherever it's possible.
>> 
>> the PR pile look stalled, a lot of PR are becoming hold :-(
>> 
>> On Tue, Mar 19, 2019 at 4:59 PM Alan Ross  wrote:
>> 
>>> Hey team,
>>> 
>>> It's hard for people to find the ODM as it appears to be tied up in
>> request
>>> 181. Can someone merge this? Not sure if we need to bring it for vote
>> but I
>>> support it being merged.
>>> 
>>> 
>>> 
>> https://github.com/apache/incubator-spot/blob/SPOT-181_ODM/docs/open-data-model.md
>>> 
>>> Thanks, Alan
>>> 
>> 



Re: [Vote] Relocate Spot Repository

2018-12-17 Thread Tadd Wood
+1

> On Dec 17, 2018, at 11:02 AM, Mark Schoeni  wrote:
> 
> Hey Everyone,
> 
> The ASF has requested that projects voluntarily start moving from their
> repository from *"git-wip-us"* to *"gitbox.apache.org
> "* *(Email is copied below for reference)*. This
> move is highly automated and only requires us to record a vote and create
> an Infra ticket.
> 
> I am starting a vote to move the Spot repository from "git-wip-us" to "
> gitbox.apache.org"
> 
> This will be a simple majority vote, and will be open for at least 4 days.
> Please reply with:
> +1  for Yes let's move
> or
> -1  for No, let's wait until it's mandatory
> 
> Thank you,
> 
> Mark Schoeni (schonz)
> 
> -- Forwarded message -
> From: Daniel Gruno 
> Date: Fri, Dec 7, 2018 at 10:53 AM
> Subject: [NOTICE] Mandatory relocation of Apache git repositories on
> git-wip-us.apache.org
> To: us...@infra.apache.org 
> 
> 
> [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
>  DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
> 
> Hello Apache projects,
> 
> I am writing to you because you may have git repositories on the
> git-wip-us server, which is slated to be decommissioned in the coming
> months. All repositories will be moved to the new gitbox service which
> includes direct write access on github as well as the standard ASF
> commit access via gitbox.apache.org.
> 
> ## Why this move? ##
> The move comes as a result of retiring the git-wip service, as the
> hardware it runs on is longing for retirement. In lieu of this, we
> have decided to consolidate the two services (git-wip and gitbox), to
> ease the management of our repository systems and future-proof the
> underlying hardware. The move is fully automated, and ideally, nothing
> will change in your workflow other than added features and access to
> GitHub.
> 
> ## Timeframe for relocation ##
> Initially, we are asking that projects voluntarily request to move
> their repositories to gitbox, hence this email. The voluntary
> timeframe is between now and January 9th 2019, during which projects
> are free to either move over to gitbox or stay put on git-wip. After
> this phase, we will be requiring the remaining projects to move within
> one month, after which we will move the remaining projects over.
> 
> To have your project moved in this initial phase, you will need:
> 
> - Consensus in the project (documented via the mailing list)
> - File a JIRA ticket with INFRA to voluntarily move your project repos
>   over to gitbox (as stated, this is highly automated and will take
>   between a minute and an hour, depending on the size and number of
>   your repositories)
> 
> To sum up the preliminary timeline;
> 
> - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
>   relocation
> - January 9th -> February 6th: Mandated (coordinated) relocation
> - February 7th: All remaining repositories are mass migrated.
> 
> This timeline may change to accommodate various scenarios.
> 
> ## Using GitHub with ASF repositories ##
> When your project has moved, you are free to use either the ASF
> repository system (gitbox.apache.org) OR GitHub for your development
> and code pushes. To be able to use GitHub, please follow the primer
> at: https://reference.apache.org/committer/github
> 
> 
> We appreciate your understanding of this issue, and hope that your
> project can coordinate voluntarily moving your repositories in a
> timely manner.
> 
> All settings, such as commit mail targets, issue linking, PR
> notification schemes etc will automatically be migrated to gitbox as
> well.
> 
> With regards, Daniel on behalf of ASF Infra.
> 
> PS:For inquiries, please reply to us...@infra.apache.org, not your
> project's dev list :-).


Re: Configuration-driven ingest for the Open Data Model (ODM) using Spark Streaming (Envelope)

2018-05-03 Thread Tadd Wood
Curtis,

Excited to take a look as well :).  Thanks for the hard work on this.

Thank you,
Tadd Wood



> On May 2, 2018, at 4:45 AM, Austin Leahy <aus...@digitalminion.com> wrote:
> 
> Curtis this is very cool thanks for putting so much time into this will
> check out the PR and comment.
> 
> On Tue, May 1, 2018 at 3:37 PM Curtis Howard <cur...@cloudera.com> wrote:
> 
>> Hi Nathanael,
>> 
>> So far only https://github.com/Open-Network-Insight/spot-nfdump.git
>> 
>> The PR code is a proof-of-concept at this point - look forward to your
>> thoughts on next steps though!
>> 
>> Thanks again
>> Curtis
>> 
>> On Tue, May 1, 2018 at 6:28 PM, Nate Smith <natedogs...@gmail.com> wrote:
>> 
>>> Curtis,
>>> 
>>> Have you tested this with a standard version of nfdump? Or only
>>> spot-nfdump?
>>> 
>>> - Nathanael
>>> 
>>>> On May 1, 2018, at 1:12 PM, Curtis Howard <cur...@cloudera.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> We had discussed prototyping Envelope for ingest in the past - I've
>>>> submitted a PR for this which includes:
>>>> - Kafka -> Spark streaming -> ODM Hive table applications for dns,
>> flow
>>>> and proxy raw source data
>>>> - a simple alternative for source data collection/dissection using
>>>> tshark/nfdump/unzip + Flume (sinking data to Kafka)
>>>> - https://github.com/apache/incubator-spot/pull/144
>>>> 
>>>> To quote directly from the Envelope site (https://github.com/cloudera-
>>>> labs/envelope#envelope):
>>>> *"Envelope is simply a pre-made Spark application that implements many
>> of
>>>> the tasks commonly found in ETL pipelines. In many cases, Envelope
>> allows
>>>> large pipelines to be developed on Spark with no coding required. When
>>>> custom code is needed, there are pluggable points in Envelope for core
>>>> functionality to be extended. Envelope works in batch and streaming
>>> modes."*
>>>> 
>>>> For example, the complete Kafka/SparkStreaming/ODM ingest application
>>>> definition for DNS:
>>>> https://github.com/curtishoward/incubator-spot/
>>>> blob/SPOT-181_envelope_ingest/spot-ingest/odm/workers/spot_proxy.conf
>>>> 
>>>> From the perspective of the Spot project, my thoughts are that it would
>>>> enable:
>>>> - faster turnaround time to ingest new source types while still
>> allowing
>>>> for arbitrarily complex ETL pipelines (data enrichment, data quality
>>>> checks, etc..)
>>>> - simplify future integration with other storage layers (HBase, Kudu,
>>> for
>>>> example)
>>>> - a framework that is simple to extend (input sources, output storage
>>>> layers, translators, derivers, UDFs, ...)
>>>> 
>>>> If there is interest, I will continue to refactor the current
>>>> implementation - centralize/integration configuration with spot.conf,
>>> test
>>>> Kerberos integration, run performance tests and tune as possible.
>>>> 
>>>> In the near term, I will also add a PR with Hive views for
>> dns/flow/proxy
>>>> under spot-ml/ - this should enable an end-to-end proof-of-concept ODM
>>>> implementation using Envelope.
>>>> 
>>>> Thanks
>>>> Curtis
>>> 
>>> 
>> 



signature.asc
Description: Message signed with OpenPGP using GPGMail