Re: [DISCUSS] Batch Profiler Feature Branch

Nick Allen Wed, 26 Sep 2018 15:26:59 -0700

Or support to be offered for merging this feature branch into master?

On Wed, Sep 26, 2018 at 6:20 PM Nick Allen <n...@nickallen.org> wrote:


> Thanks for the review.  With  https://github.com/apache/metron/pull/1209 
> complete,
> I think the feature branch is ready to be merged.  Sounds like I have
> Mike's support.  Anyone else have comments, concerns, questions?
>
> On Tue, Sep 25, 2018 at 10:33 PM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
>> I just made a couple minor comments on that PR, and I am in agreement
>> about
>> the readiness for merging with master. Good stuff Nick.
>>
>> On Fri, Sep 21, 2018 at 12:37 PM Nick Allen <n...@nickallen.org> wrote:
>>
>> > Here is a PR that adds the input time constraints to the Batch Profiler
>> > (METRON-1787);  https://github.com/apache/metron/pull/1209.
>> >
>> > It seems that the consensus is that this is probably the last feature we
>> > need before merging the FB into master.  The other two can wait until
>> after
>> > the feature branch has been merged.  Let me know if you disagree.
>> >
>> > Thanks
>> >
>> >
>> > On Thu, Sep 20, 2018 at 1:55 PM Nick Allen <n...@nickallen.org> wrote:
>> >
>> > > Yeah, agreed.  Per use case 3, when deploying to production there
>> really
>> > > wouldn't be a huge overlap like 3 months of already profiled data.
>> Its
>> > day
>> > > 1, the profile was just deployed around the same time as you are
>> running
>> > > the Batch Profiler, so the overlap is in minutes, maybe hours.  But I
>> can
>> > > definitely see the usefulness of the feature for re-runs, etc as you
>> have
>> > > described.
>> > >
>> > > Based on this discussion, I created a few JIRAs.  Thanks all for the
>> > great
>> > > feedback and keep it coming.
>> > >
>> > > [1] METRON-1787 - Input Time Constraints for Batch Profiler
>> > > [2] METRON-1788 - Fetch Profile Definitions from Zk for Batch Profiler
>> > > [3] METRON-1789 - MPack Should Define Default Input Path for Batch
>> > > Profiler
>> > >
>> > >
>> > > --
>> > > [1] https://issues.apache.org/jira/browse/METRON-1787
>> > > [2] https://issues.apache.org/jira/browse/METRON-1788
>> > > [3] https://issues.apache.org/jira/browse/METRON-1789
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Sep 20, 2018 at 1:34 PM Michael Miklavcic <
>> > > michael.miklav...@gmail.com> wrote:
>> > >
>> > >> I think we might want to allow the flexibility to choose the date
>> range
>> > >> then. I don't yet feel like I have a good enough understanding of all
>> > the
>> > >> ways in which users would want to seed to force them to run the batch
>> > job
>> > >> over all the data. It might also make it easier to deal with
>> > remediation,
>> > >> ie an error doesn't force you to re-run over the entire history. Same
>> > goes
>> > >> for testing out the profile seeing batch job in the first place.
>> > >>
>> > >> On Thu, Sep 20, 2018 at 11:23 AM Nick Allen <n...@nickallen.org>
>> wrote:
>> > >>
>> > >> > Assuming you have 9 months of data archived, yes.
>> > >> >
>> > >> > On Thu, Sep 20, 2018 at 1:22 PM Michael Miklavcic <
>> > >> > michael.miklav...@gmail.com> wrote:
>> > >> >
>> > >> > > So in the case of 3 - if you had 6 months of data that hadn't
>> been
>> > >> > profiled
>> > >> > > and another 3 that had been profiled (9 months total data), in
>> its
>> > >> > current
>> > >> > > form the batch job runs over all 9 months?
>> > >> > >
>> > >> > > On Thu, Sep 20, 2018 at 11:13 AM Nick Allen <n...@nickallen.org>
>> > >> wrote:
>> > >> > >
>> > >> > > > > How do we establish "tm" from 1.1 above? Any concerns about
>> > >> overlap
>> > >> > or
>> > >> > > > gaps after the seeding is performed?
>> > >> > > >
>> > >> > > > Good point.  Right now, if the Streaming and Batch Profiler
>> > overlap
>> > >> the
>> > >> > > > last write wins.  And presumably the output of the Streaming
>> and
>> > >> Batch
>> > >> > > > Profiler are the same, so no worries, right? :)
>> > >> > > >
>> > >> > > > So it kind of works, but it is definitely not ideal for use
>> case
>> > >> 3.  I
>> > >> > > > could add --begin and --end args to constrain the time frame
>> over
>> > >> which
>> > >> > > the
>> > >> > > > Batch Profiler runs.  I do not have that in the feature branch.
>> > It
>> > >> > would
>> > >> > > > be easy enough to add though.
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > On Thu, Sep 20, 2018 at 12:41 PM Michael Miklavcic <
>> > >> > > > michael.miklav...@gmail.com> wrote:
>> > >> > > >
>> > >> > > > > Ok, makes sense. That's sort of what I was thinking as well,
>> > Nick.
>> > >> > > > Pulling
>> > >> > > > > at this thread just a bit more...
>> > >> > > > >
>> > >> > > > >    1. I have an existing system that's been up a while, and I
>> > have
>> > >> > > added
>> > >> > > > k
>> > >> > > > >    profiles - assume these are the first profiles I've
>> created.
>> > >> > > > >       1. I would have t0 - tm (where m is the time when the
>> > >> profiles
>> > >> > > were
>> > >> > > > >       first installed) worth of data that has not been
>> profiled
>> > >> yet.
>> > >> > > > >       2. The batch profiler process would be to take that
>> exact
>> > >> > profile
>> > >> > > > >       definition from ZK and run the batch loader with that
>> from
>> > >> the
>> > >> > > CLI.
>> > >> > > > >       3. Profiles are now up to date from t0 - tCurrent
>> > >> > > > >    2. I've already done #1 above. Time goes by and now I
>> want to
>> > >> add
>> > >> > a
>> > >> > > > new
>> > >> > > > >    profile.
>> > >> > > > >       1. Same first step above
>> > >> > > > >       2. I would run the batch loader with *only* that new
>> > profile
>> > >> > > > >       definition to seed?
>> > >> > > > >
>> > >> > > > > Forgive me if I missed this in PR's and discussion in the FB,
>> > but
>> > >> how
>> > >> > > do
>> > >> > > > we
>> > >> > > > > establish "tm" from 1.1 above? Any concerns about overlap or
>> > gaps
>> > >> > after
>> > >> > > > the
>> > >> > > > > seeding is performed?
>> > >> > > > >
>> > >> > > > > On Thu, Sep 20, 2018 at 10:26 AM Nick Allen <
>> n...@nickallen.org
>> > >
>> > >> > > wrote:
>> > >> > > > >
>> > >> > > > > > I think more often than not, you would want to load your
>> > profile
>> > >> > > > > definition
>> > >> > > > > > from a file.  This is why I considered the 'load from Zk'
>> more
>> > >> of a
>> > >> > > > > > nice-to-have.
>> > >> > > > > >
>> > >> > > > > >    - In use case 1 and 2, this would definitely be the
>> case.
>> > >> The
>> > >> > > > > profiles
>> > >> > > > > >    I am working with are speculative and I am using the
>> batch
>> > >> > > profiler
>> > >> > > > to
>> > >> > > > > >    determine if they are worth keeping.  In this case, my
>> > >> > speculative
>> > >> > > > > > profiles
>> > >> > > > > >    would not be in Zk (yet).
>> > >> > > > > >    - In use case 3, I could see it go either way.  It
>> might be
>> > >> > useful
>> > >> > > > to
>> > >> > > > > >    load from Zk, but it certainly isn't a blocker.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > > So if the config does not correctly match the profiler
>> > config
>> > >> > held
>> > >> > > in
>> > >> > > > > ZK
>> > >> > > > > > and
>> > >> > > > > > the user runs the batch seeding job, what happens?
>> > >> > > > > >
>> > >> > > > > > You would just get a profile that is slightly different
>> over
>> > the
>> > >> > > entire
>> > >> > > > > > time span.  This is not a new risk.  If the user changes
>> their
>> > >> > > Profile
>> > >> > > > > > definitions in Zk, the same thing would happen.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > On Thu, Sep 20, 2018 at 12:15 PM Michael Miklavcic <
>> > >> > > > > > michael.miklav...@gmail.com> wrote:
>> > >> > > > > >
>> > >> > > > > > > I think I'm torn on this, specifically because it's batch
>> > and
>> > >> > would
>> > >> > > > > > > generally be run as-needed. Justin, can you elaborate on
>> > your
>> > >> > > > concerns
>> > >> > > > > > > there? This feels functionally very similar to our flat
>> file
>> > >> > > loaders,
>> > >> > > > > > which
>> > >> > > > > > > all have inputs for config from the CLI only. On the
>> other
>> > >> hand,
>> > >> > > our
>> > >> > > > > flat
>> > >> > > > > > > file loaders are not typically seeding an existing
>> > structure.
>> > >> My
>> > >> > > > > concern
>> > >> > > > > > of
>> > >> > > > > > > a local file profiler config stems from this stated goal:
>> > >> > > > > > > > The goal would be to enable “profile seeding” which
>> allows
>> > >> > > profiles
>> > >> > > > > to
>> > >> > > > > > be
>> > >> > > > > > > populated from a time before the profile was created.
>> > >> > > > > > > So if the config does not correctly match the profiler
>> > config
>> > >> > held
>> > >> > > in
>> > >> > > > > ZK
>> > >> > > > > > > and the user runs the batch seeding job, what happens?
>> > >> > > > > > >
>> > >> > > > > > > On Thu, Sep 20, 2018 at 10:06 AM Justin Leet <
>> > >> > > justinjl...@gmail.com>
>> > >> > > > > > > wrote:
>> > >> > > > > > >
>> > >> > > > > > > > The profile not being able to read from ZK feels like a
>> > >> fairly
>> > >> > > > > > > substantial,
>> > >> > > > > > > > if subtle, set of potential problems.  I'd like to see
>> > that
>> > >> in
>> > >> > > > either
>> > >> > > > > > > > before merging or at least pretty soon after merging.
>> Is
>> > >> it a
>> > >> > > lot
>> > >> > > > of
>> > >> > > > > > > work
>> > >> > > > > > > > to add that functionality based on where things are
>> right
>> > >> now?
>> > >> > > > > > > >
>> > >> > > > > > > > On Thu, Sep 20, 2018 at 9:59 AM Nick Allen <
>> > >> n...@nickallen.org
>> > >> > >
>> > >> > > > > wrote:
>> > >> > > > > > > >
>> > >> > > > > > > > > Here is another limitation that I just thought. It
>> can
>> > >> only
>> > >> > > read
>> > >> > > > a
>> > >> > > > > > > > profile
>> > >> > > > > > > > > definition from a file.  It probably also makes
>> sense to
>> > >> add
>> > >> > an
>> > >> > > > > > option
>> > >> > > > > > > > that
>> > >> > > > > > > > > allows it to read the current Profiler configuration
>> > from
>> > >> > > > > Zookeeper.
>> > >> > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > > > > Is it worth setting up a default config that pulls
>> > from
>> > >> the
>> > >> > > > main
>> > >> > > > > > > > indexing
>> > >> > > > > > > > > output?
>> > >> > > > > > > > >
>> > >> > > > > > > > > Yes, I think that makes sense.  We want the Batch
>> > >> Profiler to
>> > >> > > > point
>> > >> > > > > > to
>> > >> > > > > > > > the
>> > >> > > > > > > > > right HDFS URL, no matter where/how Metron is
>> deployed.
>> > >> When
>> > >> > > > > Metron
>> > >> > > > > > > gets
>> > >> > > > > > > > > spun-up on a cluster, I should be able to just run
>> the
>> > >> Batch
>> > >> > > > > Profiler
>> > >> > > > > > > > > without having to fuss with the input path.
>> > >> > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > > > On Thu, Sep 20, 2018 at 9:46 AM Justin Leet <
>> > >> > > > justinjl...@gmail.com
>> > >> > > > > >
>> > >> > > > > > > > wrote:
>> > >> > > > > > > > >
>> > >> > > > > > > > > > Re:
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > >  * You do not configure the Batch Profiler in
>> > >> Ambari.  It
>> > >> > > is
>> > >> > > > > > > > configured
>> > >> > > > > > > > > > > and executed completely from the command-line.
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > Is it worth setting up a default config that pulls
>> > from
>> > >> the
>> > >> > > > main
>> > >> > > > > > > > indexing
>> > >> > > > > > > > > > output?  I'm a little on the fence about it, but it
>> > >> seems
>> > >> > > like
>> > >> > > > > > making
>> > >> > > > > > > > the
>> > >> > > > > > > > > > most common case more or less built-in would be
>> nice.
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > Having said that, I do not consider that a
>> requirement
>> > >> for
>> > >> > > > > merging
>> > >> > > > > > > the
>> > >> > > > > > > > > > feature branch.
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > On Wed, Sep 19, 2018 at 11:23 AM James Sirota <
>> > >> > > > > jsir...@apache.org>
>> > >> > > > > > > > > wrote:
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > > I think what you have outlined above is a good
>> > initial
>> > >> > stab
>> > >> > > > at
>> > >> > > > > > the
>> > >> > > > > > > > > > > feature.  Manual install of spark is not a big
>> deal.
>> > >> > > > > Configuring
>> > >> > > > > > > via
>> > >> > > > > > > > > > > command line while we mature this feature is ok
>> as
>> > >> well.
>> > >> > > > > Doesn't
>> > >> > > > > > > > look
>> > >> > > > > > > > > > like
>> > >> > > > > > > > > > > configuration steps are too hard.  I think you
>> > should
>> > >> > > merge.
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > James
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > 19.09.2018, 08:15, "Nick Allen" <
>> n...@nickallen.org
>> > >:
>> > >> > > > > > > > > > > > I would like to open a discussion to get the
>> Batch
>> > >> > > Profiler
>> > >> > > > > > > feature
>> > >> > > > > > > > > > > branch
>> > >> > > > > > > > > > > > merged into master as part of METRON-1699 [1]
>> > Create
>> > >> > > Batch
>> > >> > > > > > > > Profiler.
>> > >> > > > > > > > > > All
>> > >> > > > > > > > > > > > of the work that I had in mind for our first
>> draft
>> > >> of
>> > >> > the
>> > >> > > > > Batch
>> > >> > > > > > > > > > Profiler
>> > >> > > > > > > > > > > > has been completed. Please take a look through
>> > what
>> > >> I
>> > >> > > have
>> > >> > > > > and
>> > >> > > > > > > let
>> > >> > > > > > > > me
>> > >> > > > > > > > > > > know
>> > >> > > > > > > > > > > > if there are other features that you think are
>> > >> required
>> > >> > > > > > *before*
>> > >> > > > > > > we
>> > >> > > > > > > > > > > merge.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > Previous list discussions on this topic include
>> > [2]
>> > >> and
>> > >> > > > [3].
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > (Q) What can I do with the feature branch?
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * With the Batch Profiler, you can
>> backfill/seed
>> > >> > > profiles
>> > >> > > > > > using
>> > >> > > > > > > > > > > archived
>> > >> > > > > > > > > > > > telemetry. This enables the following types of
>> use
>> > >> > cases.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       1. As a Security Data Scientist, I want
>> to
>> > >> > > understand
>> > >> > > > > the
>> > >> > > > > > > > > > > historical
>> > >> > > > > > > > > > > > behaviors and trends of a profile that I have
>> > >> created
>> > >> > so
>> > >> > > > > that I
>> > >> > > > > > > can
>> > >> > > > > > > > > > > > determine if I have created a feature set that
>> has
>> > >> > > > predictive
>> > >> > > > > > > value
>> > >> > > > > > > > > for
>> > >> > > > > > > > > > > > model building.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       2. As a Security Data Scientist, I want
>> to
>> > >> > > understand
>> > >> > > > > the
>> > >> > > > > > > > > > > historical
>> > >> > > > > > > > > > > > behaviors and trends of a profile that I have
>> > >> created
>> > >> > so
>> > >> > > > > that I
>> > >> > > > > > > can
>> > >> > > > > > > > > > > > determine if I have defined the profile
>> correctly
>> > >> and
>> > >> > > > > created a
>> > >> > > > > > > > > feature
>> > >> > > > > > > > > > > set
>> > >> > > > > > > > > > > > that matches reality.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       3. As a Security Platform Engineer, I
>> want
>> > to
>> > >> > > > generate
>> > >> > > > > a
>> > >> > > > > > > > > profile
>> > >> > > > > > > > > > > > using archived telemetry when I deploy a new
>> model
>> > >> to
>> > >> > > > > > production
>> > >> > > > > > > so
>> > >> > > > > > > > > > that
>> > >> > > > > > > > > > > > models depending on that profile can function
>> on
>> > >> day 1.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * METRON-1699 [1] includes a more detailed
>> > >> > description
>> > >> > > of
>> > >> > > > > the
>> > >> > > > > > > > > > feature.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > (Q) What work was completed?
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * The Batch Profiler runs on Spark and was
>> > >> > implemented
>> > >> > > in
>> > >> > > > > > Java
>> > >> > > > > > > to
>> > >> > > > > > > > > > > remain
>> > >> > > > > > > > > > > > consistent with our current Java-heavy code
>> base.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * The Batch Profiler is executed from the
>> > >> > command-line.
>> > >> > > > It
>> > >> > > > > > can
>> > >> > > > > > > be
>> > >> > > > > > > > > > > > launched using a script or by calling
>> > >> `spark-submit`,
>> > >> > > which
>> > >> > > > > may
>> > >> > > > > > > be
>> > >> > > > > > > > > > useful
>> > >> > > > > > > > > > > > for advanced users.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * Input telemetry can be consumed from
>> multiple
>> > >> > > sources;
>> > >> > > > > for
>> > >> > > > > > > > > example
>> > >> > > > > > > > > > > HDFS
>> > >> > > > > > > > > > > > or the local file system.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * Input telemetry can be consumed in multiple
>> > >> > formats;
>> > >> > > > for
>> > >> > > > > > > > example
>> > >> > > > > > > > > > JSON
>> > >> > > > > > > > > > > > or ORC.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * The 'output' profile measurements are
>> > persisted
>> > >> in
>> > >> > > > HBase
>> > >> > > > > > and
>> > >> > > > > > > is
>> > >> > > > > > > > > > > > consistent with the Storm Profiler.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * It can be run on any underlying engine
>> > >> supported by
>> > >> > > > > Spark.
>> > >> > > > > > I
>> > >> > > > > > > > have
>> > >> > > > > > > > > > > > tested it both in 'local' mode and on a YARN
>> > >> cluster.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * It is installed automatically by the Metron
>> > >> MPack.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * A README was added that documents usage
>> > >> > instructions.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * The existing Profiler code was refactored
>> so
>> > >> that
>> > >> > as
>> > >> > > > much
>> > >> > > > > > > code
>> > >> > > > > > > > as
>> > >> > > > > > > > > > > > possible is shared between the 3 Profiler
>> ports;
>> > >> Storm,
>> > >> > > the
>> > >> > > > > > > Stellar
>> > >> > > > > > > > > > REPL,
>> > >> > > > > > > > > > > > and Spark. For example, the logic which
>> determines
>> > >> the
>> > >> > > > > > timestamp
>> > >> > > > > > > > of a
>> > >> > > > > > > > > > > > message was refactored so that it could be
>> reused
>> > by
>> > >> > all
>> > >> > > > > ports.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       * metron-profiler-common: The common
>> > Profiler
>> > >> > code
>> > >> > > > > shared
>> > >> > > > > > > > > amongst
>> > >> > > > > > > > > > > > each port.
>> > >> > > > > > > > > > > >       * metron-profiler-storm: Profiler on
>> Storm
>> > >> > > > > > > > > > > >       * metron-profiler-spark: Profiler on
>> Spark
>> > >> > > > > > > > > > > >       * metron-profiler-repl: Profiler on the
>> > >> Stellar
>> > >> > > REPL
>> > >> > > > > > > > > > > >       * metron-profiler-client: The client code
>> > for
>> > >> > > > > retrieving
>> > >> > > > > > > > > profile
>> > >> > > > > > > > > > > > data; for example PROFILE_GET.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * There are 3 separate RPM and DEB packages
>> now
>> > >> > created
>> > >> > > > for
>> > >> > > > > > the
>> > >> > > > > > > > > > > Profiler.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       * metron-profiler-storm-*.rpm
>> > >> > > > > > > > > > > >       * metron-profiler-spark-*.rpm
>> > >> > > > > > > > > > > >       * metron-profiler-repl-*.rpm
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * The Profiler integration tests were
>> enhanced
>> > to
>> > >> > > > leverage
>> > >> > > > > > the
>> > >> > > > > > > > > > Profiler
>> > >> > > > > > > > > > > > Client logic to validate the results.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * Review METRON-1699 [1] for a complete
>> > >> break-down of
>> > >> > > the
>> > >> > > > > > tasks
>> > >> > > > > > > > > that
>> > >> > > > > > > > > > > have
>> > >> > > > > > > > > > > > been completed on the feature branch.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > (Q) What limitations exist?
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * You must manually install Spark to use the
>> > Batch
>> > >> > > > > Profiler.
>> > >> > > > > > > The
>> > >> > > > > > > > > > Metron
>> > >> > > > > > > > > > > > MPack does not treat Spark as a Metron
>> dependency
>> > >> and
>> > >> > so
>> > >> > > > does
>> > >> > > > > > not
>> > >> > > > > > > > > > install
>> > >> > > > > > > > > > > > it automatically.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * You do not configure the Batch Profiler in
>> > >> Ambari.
>> > >> > It
>> > >> > > > is
>> > >> > > > > > > > > configured
>> > >> > > > > > > > > > > > and executed completely from the command-line.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >   * To run the Batch Profiler in 'Full Dev',
>> you
>> > >> have
>> > >> > to
>> > >> > > > take
>> > >> > > > > > the
>> > >> > > > > > > > > > > following
>> > >> > > > > > > > > > > > manual steps. Some of these are arguably
>> > limitations
>> > >> > with
>> > >> > > > how
>> > >> > > > > > > > Ambari
>> > >> > > > > > > > > > > > installs Spark 2 in the version of HDP that we
>> > run.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       1. Install Spark 2 using Ambari.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       2. Tell Spark how to talk with HBase.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >
>>  SPARK_HOME=/usr/hdp/current/spark2-client
>> > >> > > > > > > > > > > >         cp
>> > >> > > > /usr/hdp/current/hbase-client/conf/hbase-site.xml
>> > >> > > > > > > > > > > > $SPARK_HOME/conf/
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       3. Create the Spark History directory in
>> > HDFS.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >         export HADOOP_USER_NAME=hdfs
>> > >> > > > > > > > > > > >         hdfs dfs -mkdir /spark2-history
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >       4. Change the default input path to
>> > >> > > > > > > > `hdfs://localhost:8020/...`
>> > >> > > > > > > > > > to
>> > >> > > > > > > > > > > > match the port defined by HDP, instead of port
>> > 9000.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > [1]
>> > >> https://issues.apache.org/jira/browse/METRON-1699
>> > >> > > > > > > > > > > > [2]
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > >
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://lists.apache.org/thread.html/da81c1227ffda3a47eb2e5bb4d0b162dd6d36006241c4ba4b659587b@%3Cdev.metron.apache.org%3E
>> > >> > > > > > > > > > > > [3]
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > >
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://lists.apache.org/thread.html/d28d18cc9358f5d9c276c7c304ff4ee601041fb47bfc97acb6825083@%3Cdev.metron.apache.org%3E
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > -------------------
>> > >> > > > > > > > > > > Thank you,
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > James Sirota
>> > >> > > > > > > > > > > PMC- Apache Metron
>> > >> > > > > > > > > > > jsirota AT apache DOT org
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > >
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: [DISCUSS] Batch Profiler Feature Branch

Reply via email to