Finally figured it out.  Commit is here:
https://github.com/merrimanr/incubator-metron/commit/22fe5e9ff3c167b42ebeb7a9f1000753a409aff1

It came down to figuring out the right combination of maven dependencies
and passing in the HDP version to REST as a Java system property.  I also
included some HDFS setup tasks.  I tested this in full dev and can now
successfully run a pcap query and get results.  All you should have to do
is generate some pcap data first.

On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> @Ryan - pulled your branch and experimented with a few things. In doing so,
> it dawned on me that by adding the yarn and hadoop classpath, you probably
> didn't introduce a new classpath issue, rather you probably just moved onto
> the next classpath issue, ie hbase per your exception about hbase jaxb.
> Anyhow, I put up a branch with some pom changes worth trying in conjunction
> with invoking the rest app startup via "/usr/bin/yarn jar"
>
> https://github.com/mmiklavc/metron/tree/ryan-rest-test
>
> https://github.com/mmiklavc/metron/commit/5ca23580fc6e043fafae2327c80b65
> b20ca1c0c9
>
> Mike
>
>
> On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
> > That would be a step closer to something more like a micro-service
> > architecture. However, I would want to make sure we think about the
> > operational complexity, and mpack implications of having another server
> > installed and running somewhere on the cluster (also, ssl, kerberos, etc
> > etc requirements for that service).
> >
> > On 8 May 2018 at 14:27, Ryan Merriman <merrim...@gmail.com> wrote:
> >
> > > +1 to having metron-api as it's own service and using a gateway type
> > > pattern.
> > >
> > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <ottobackwa...@gmail.com>
> > > wrote:
> > >
> > > > Why not have metron-api as it’s own service and use a ‘gateway’ type
> > > > pattern in rest?
> > > >
> > > >
> > > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrim...@gmail.com)
> wrote:
> > > >
> > > > Moving the yarn classpath command earlier in the classpath now gives
> > this
> > > > error:
> > > >
> > > > Caused by: java.lang.NoSuchMethodError:
> > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> lang/String;
> > > >
> > > > I will experiment with other combinations, I suspect we will need
> > > > finer-grain control over the order.
> > > >
> > > > The grep matches class names inside jar files. I use this all the
> time
> > > and
> > > > it's really useful.
> > > >
> > > > The metron-rest jar is already shaded.
> > > >
> > > > Reverse engineering the yarn jar command was the next thing I was
> going
> > > to
> > > > try. Will let you know how it goes.
> > > >
> > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > michael.miklav...@gmail.com> wrote:
> > > >
> > > > > What order did you add the hadoop or yarn classpath? The "shaded"
> > > > package
> > > > > stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try
> > adding
> > > > > those packages earlier on the classpath.
> > > > >
> > > > > I think that find command needs a "jar tvf", otherwise you're
> looking
> > > > for a
> > > > > class name in jar file names.
> > > > >
> > > > > Have you tried shading the rest jar?
> > > > >
> > > > > I'd also look at the classpath you get when running "yarn jar" to
> > start
> > > > the
> > > > > existing pcap service, per the instructions in
> metron-api/README.md.
> > > > >
> > > > >
> > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <merrim...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > To explore the idea of merging metron-api into metron-rest and
> > > running
> > > > > pcap
> > > > > > queries inside our REST application, I created a simple test
> here:
> > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> rest-test.
> > A
> > > > > > summary of what's included:
> > > > > >
> > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > - Added a pcap query controller endpoint at
> > > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > > > queryUsingGET
> > > > > > - Added a pcap query service that runs a simple, hardcoded query
> > > > > >
> > > > > > Generate some pcap data using pycapa (
> > > > > > https://github.com/apache/metron/tree/master/metron-
> sensors/pycapa
> > )
> > > > and
> > > > > > the
> > > > > > pcap topology (
> > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > After this initial setup there should be data in HDFS at
> > > > > > "/apps/metron/pcap". I believe this should be enough to exercise
> > the
> > > > > > issue. Just hit the endpoint referenced above. I tested this in
> an
> > > > > > already running full dev by building and deploying the
> metron-rest
> > > > jar.
> > > > > I
> > > > > > did not rebuild full dev with this change but I would still
> expect
> > it
> > > > to
> > > > > > work. Let me know if it doesn't.
> > > > > >
> > > > > > The first error I see when I hit this endpoint is:
> > > > > >
> > > > > > java.lang.NoClassDefFoundError:
> > > > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > > > > >
> > > > > > Here are the things I've tried so far:
> > > > > >
> > > > > > - Run the REST application with the YARN jar command since this
> is
> > > how
> > > > > > all our other YARN/MR-related applications are started
> (metron-api,
> > > > > > MAAS,
> > > > > > pcap query, etc). I wouldn't expect this to work since we have
> > > > > runtime
> > > > > > dependencies on our shaded elasticsearch and parser jars and I'm
> > not
> > > > > > aware
> > > > > > of a way to add additional jars to the classpath with the YARN
> jar
> > > > > > command
> > > > > > (is there a way?). Either way I get this error:
> > > > > >
> > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not create
> > Dir
> > > > > using
> > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar.
> > > > > skipping.
> > > > > > java.lang.NullPointerException
> > > > > >
> > > > > >
> > > > > > - I tried adding `yarn classpath` and `hadoop classpath` to the
> > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> > > > > > script). I
> > > > > > get this error:
> > > > > >
> > > > > > java.lang.ClassNotFoundException:
> > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > >
> > > > > >
> > > > > > - I searched for the class in the previous attempt but could not
> > find
> > > > > it
> > > > > > in full dev:
> > > > > >
> > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > 2>/dev/null
> > > > > >
> > > > > >
> > > > > > - Further up in the stack trace I see the error happens when
> > > > > initiating
> > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils class. I
> > > > > tried
> > > > > > setting "yarn.timeline-service.enabled" in Ambari to false and
> > then
> > > I
> > > > > > get
> > > > > > this error:
> > > > > >
> > > > > > Unable to parse
> > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> framework'
> > > as
> > > > a
> > > > > > URI, check the setting for mapreduce.application.framework.path
> > > > > >
> > > > > >
> > > > > > - I've tried adding different hadoop, hbase, yarn and mapreduce
> > Maven
> > > > > > dependencies without any success
> > > > > > - hadoop-yarn-client
> > > > > > - hadoop-yarn-common
> > > > > > - hadoop-mapreduce-client-core
> > > > > > - hadoop-yarn-server-common
> > > > > > - hadoop-yarn-api
> > > > > > - hbase-server
> > > > > >
> > > > > > I will keep exploring other possible solutions. Let me know if
> > anyone
> > > > > has
> > > > > > any ideas.
> > > > > >
> > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > ottobackwa...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I can imagine a new generic service(s) capability whose job (
> pun
> > > > > > intended
> > > > > > > ) is to
> > > > > > > abstract the submittal, tracking, and storage of results to
> yarn.
> > > > > > >
> > > > > > > It would be extended with storage providers, queue provider,
> > > > possibly
> > > > > > some
> > > > > > > set of policies or rather strategies.
> > > > > > >
> > > > > > > The pcap ‘report’ would be a client to that service, the
> > > specializes
> > > > > the
> > > > > > > service operation for the way we want pcap to work.
> > > > > > >
> > > > > > > We can then re-use the generic service for other long running
> > yarn
> > > > > > > things…..
> > > > > > >
> > > > > > >
> > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> ottobackwa...@gmail.com
> > )
> > > > > wrote:
> > > > > > >
> > > > > > > RE: Tracking v. users
> > > > > > >
> > > > > > > The submittal and tracking can associate the submitter with the
> > > yarn
> > > > > job
> > > > > > > and track that,
> > > > > > > regardless of the yarn credentials.
> > > > > > >
> > > > > > > IE> if all submittals and monitoring are by the same yarn user
> (
> > > > > Metron )
> > > > > > > from a single or
> > > > > > > co-operative set of services, that service can maintain the
> > > mapping.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (merrim...@gmail.com
> )
> > > > wrote:
> > > > > > >
> > > > > > > Otto, your use case makes sense to me. We'll have to think
> about
> > > how
> > > > to
> > > > > > > manage the user to job relationships. I'm assuming YARN jobs
> will
> > > be
> > > > > > > submitted as the metron service user so YARN won't keep track
> of
> > > > this
> > > > > for
> > > > > > > us. Is that assumption correct? Do you have any ideas for doing
> > > > that?
> > > > > > >
> > > > > > > Mike, I can start a feature branch and experiment with merging
> > > > > metron-api
> > > > > > > into metron-rest. That should allow us to collaborate on any
> > issues
> > > > or
> > > > > > > challenges. Also, can you expand on your idea to manage
> external
> > > > > > > dependencies as a special module? That seems like a very
> > attractive
> > > > > > option
> > > > > > > to me.
> > > > > > >
> > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > ottobackwa...@gmail.com>
> > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > From my response on the other thread, but applicable to the
> > > > backend
> > > > > > > stuff:
> > > > > > > >
> > > > > > > > "The PCAP Query seems more like PCAP Report to me. You are
> > > > > generating a
> > > > > > > > report based on parameters.
> > > > > > > > That report is something that takes some time and external
> > > process
> > > > to
> > > > > > > > generate… ie you have to wait for it.
> > > > > > > >
> > > > > > > > I can almost imagine a flow where you:
> > > > > > > >
> > > > > > > > * Are in the AlertUI
> > > > > > > > * Ask to generate a PCAP report based on some selected
> > > > > > alerts/meta-alert,
> > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > that have query options etc
> > > > > > > > * The report request is ‘queued’, that is dispatched to be be
> > > > > > > > executed/generated
> > > > > > > > * You as a user have a ‘queue’ of your report results, and
> when
> > > > the
> > > > > > > report
> > > > > > > > is done it is queued there
> > > > > > > > * We ‘monitor’ the report/queue press through the yarn rest (
> > > > report
> > > > > > > > info/meta has the yarn details )
> > > > > > > > * You can select the report from your queue and view it
> either
> > in
> > > > a
> > > > > new
> > > > > > > UI
> > > > > > > > or custom component
> > > > > > > > * You can then apply a different ‘view’ to the report or work
> > > with
> > > > > the
> > > > > > > > report data
> > > > > > > > * You can print / save etc
> > > > > > > > * You can associate the report with the alerts ( again in the
> > > > report
> > > > > > info
> > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or
> > other
> > > > > > > >
> > > > > > > >
> > > > > > > > We can introduce extensibility into the report templates,
> > report
> > > > > views
> > > > > > (
> > > > > > > > thinks that work with the json data of the report )
> > > > > > > >
> > > > > > > > Something like that.”
> > > > > > > >
> > > > > > > > Maybe we can do :
> > > > > > > >
> > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > yarn info + query info + alert context + yarn status =>
> report
> > > > info
> > > > > ->
> > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > report persistence added to report info
> > > > > > > > metron-rest -> api to monitor the queue, read results ( page
> ),
> > > > etc
> > > > > etc
> > > > > > > >
> > > > > > > >
> > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> merrim...@gmail.com
> > )
> > > > > wrote:
> > > > > > > >
> > > > > > > > I started a separate thread on Pcap UI considerations and
> user
> > > > > > > > requirements
> > > > > > > > at Otto's request. This should help us keep these two related
> > but
> > > > > > > separate
> > > > > > > > discussions focused.
> > > > > > > >
> > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > michelsum...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > If I may, I would like to share my view on the following 3
> > > > points.
> > > > > > > > >
> > > > > > > > > - Backend:
> > > > > > > > >
> > > > > > > > > The current metron-api is totally seperate, it will be
> logic
> > > for
> > > > me
> > > > > > to
> > > > > > > > have
> > > > > > > > > it at the same place as the others rest api. Especially
> when
> > > > more
> > > > > > > > security
> > > > > > > > > will be added, it will not be needed to do the job twice.
> > > > > > > > > The current implementation send back a pcap object which
> > still
> > > > need
> > > > > > to
> > > > > > > > be
> > > > > > > > > decoded. In the opensoc, the decoding was done with tshard
> on
> > > > the
> > > > > > > > frontend.
> > > > > > > > > It will be good to have this decoding happening directly on
> > the
> > > > > > backend
> > > > > > > > to
> > > > > > > > > not create a load on frontend. An option will be to install
> > > > tshark
> > > > > on
> > > > > > > > the
> > > > > > > > > rest server and to use to convert the pcap to xml and then
> > to a
> > > > > json
> > > > > > > > that
> > > > > > > > > will be send to the frontend.
> > > > > > > > >
> > > > > > > > > I tried to start directly the map/reduce job to search over
> > all
> > > > the
> > > > > > > pcap
> > > > > > > > > data from the rest server and as Ryan mention it, we had
> > > > trouble. I
> > > > > > > will
> > > > > > > > > try to find back the error.
> > > > > > > > >
> > > > > > > > > Then in the POC, what we tried is to use the pcap_query
> > script
> > > > and
> > > > > > this
> > > > > > > > > work fine. I just modified it that he sends back directly
> the
> > > > > job_id
> > > > > > of
> > > > > > > > > yarn and not waiting that the job is finished. Then it will
> > > > allow
> > > > > the
> > > > > > > UI
> > > > > > > > > and the rest server to know what the status of the research
> > by
> > > > > > querying
> > > > > > > > the
> > > > > > > > > yarn rest api. This will allow the UI and the rest server
> to
> > be
> > > > > async
> > > > > > > > > without any blocking phase. What do you think about that?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Having the job submitted directly from the code of the rest
> > > > server
> > > > > > will
> > > > > > > > be
> > > > > > > > > perfect, but it will need a lot of investigation I think
> (but
> > > > I'm
> > > > > not
> > > > > > > > the
> > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > >
> > > > > > > > > We know that the pcap_query scritp work fine so why not
> > calling
> > > > it?
> > > > > > Is
> > > > > > > > it
> > > > > > > > > that bad? (maybe stupid question, but I really don’t see a
> > lot
> > > > of
> > > > > > > > drawback)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - Front end:
> > > > > > > > >
> > > > > > > > > Adding the the pcap search to the alert UI is, I think, the
> > > > easiest
> > > > > > way
> > > > > > > > to
> > > > > > > > > move forward. But indeed, it will then be the “Alert UI and
> > > > > > pcapquery”.
> > > > > > > > > Maybe the name of the UI should just change to something
> like
> > > > > > > > “Monitoring &
> > > > > > > > > Investigation UI” ?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Is there any roadmap or plan for the different UI? I mean
> did
> > > > you
> > > > > > > > already
> > > > > > > > > had discussion on how you see the ui evolving with the new
> > > > feature
> > > > > > that
> > > > > > > > > will come in the future?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - Microservices:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > What do you mean exactly by microservices? Is it to
> separate
> > > all
> > > > > the
> > > > > > > > > features in different projects? Or something like having
> the
> > > > > > different
> > > > > > > > > components in container like kubernet? (again maybe stupid
> > > > > question,
> > > > > > > but
> > > > > > > > I
> > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Michel
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
>

Reply via email to