- I agree Gradle is better than "yet another script".

- And docker container, even if suboptimal, as thin wrapper to gradle
container is all you need to deliver the data-generator on the masses so
they can try it w/ zero startup cost.



On Mon, Aug 31, 2015 at 7:36 PM, Konstantin Boudnik <[email protected]> wrote:

> Why do would we need yet another script (and potentially an extra readme to
> explain its command options) when we have the gradle?
>
> Cos
>
> On Mon, Aug 31, 2015 at 10:51PM, Olaf Flebbe wrote:
> > +1 to the CLI /shell script interface.
> >
> > If I can choose I like to have a apt-get install bigtop-datagenerator ,
> running for instance
> >
> > bigtop-data-generatoroutputDir nStores nCustomers nPurchasingModels
> simulationLength seed
> >
> > I can help out with packaging if needed.
> >
> > Why should we use the docker indirection for a plain CLI file ? Of
> course, We can provide a trivial Dockerfile to create a container supplying
> a JVM and running the CLI ... But I do not like to depend our services on
> docker registry more than we do now.
> >
> > Olaf
> >
> >
> >
> > > Am 31.08.2015 um 16:40 schrieb Evans Ye <[email protected]>:
> > >
> > > I am very much like the shell script wrapper and docker image idea
> since
> > > that way we can integrate it directly with bigtop provisioner which
> yield a
> > > perfect ux for the whole things. I think its not too hard to do it
> both, we
> > > just need to add a parameter to turn the script into daemon mode. I see
> > > lots of image doing this way.
> > >
> > > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output
> > > data-dir --etc  foo --etc bar --daemon
> > > 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道:
> > >
> > >> The BigPetStore, Bazaar, and weather data generators have
> single-threaded
> > >> command-line interfaces.  We could do the same with the smaller
> generators
> > >> (names, locations, etc.) if there is interest.
> > >>
> > >> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas <
> [email protected]>
> > >> wrote:
> > >>
> > >>> Nate: Good idea to abstract the interface one level higher....
> > >>>
> > >>> How about a docker run command ? That is probably the easiest way for
> > >>> Linux folks to run one off Java apps nowadays.
> > >>>
> > >>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB
> --output
> > >>> data-dir --etc  foo --etc bar
> > >>>
> > >>> I'm happy to curate such a docker image, I already am doing something
> > >> like
> > >>> this in kube for bigtop-transaction-queue, which continuously pumps
> data
> > >>> generator outputs into a REST endpoint or file
> > >>> Queue... So it could be extended to support other generators.
> > >>>
> > >>>
> > >>>> om> <[email protected]> wrote:
> > >>>>
> > >>>> Could picture at some point supporting something like this for
> non-jvm
> > >>> folk just looking for test/demo data:
> > >>>>
> > >>>> apt-get install bigtop-data-gen
> > >>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir
> > >>> --etc  foo --etc bar
> > >>>>
> > >>>>
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: jay vyas [mailto:[email protected]]
> > >>>> Sent: Sunday, August 30, 2015 5:11 PM
> > >>>> To: [email protected]
> > >>>> Subject: Re: Proposal for "BigTop Data Generators"
> > >>>>
> > >>>> Hola nate.  Well, here are the Use cases I know of that I have used
> the
> > >>> data generators for.
> > >>>>
> > >>>> Dockerfile:
> > >>>>
> > >>>> (1) for testing kubernetes.  For this, I just use transaction-queue
> > >>> docker file.
> > >>>> (2) for testing GlusterFS small file workloads, maybe with other
> > >>> analytics tools...
> > >>>>
> > >>>> Maven repo
> > >>>>
> > >>>> (3) Java maprduce/ignite/spark applications, which can just add a
> mvn
> > >>> repo when compiling.  Java developers never add jars through RPM
> repos.
> > >>>>
> > >>>> RPM/DEB packages:
> > >>>>
> > >>>> I could see people using an RPM/DEB data generator, and I'm not
> against
> > >>> it.  But I simply don't know of any real world projects which
> *currently*
> > >>> need RPM/Deb packages, which is why I haven't bothered to propose it
> as a
> > >>> requirement.  Nevertheless linux packages are always a welcome
> addition
> > >> if
> > >>> someone wants to create em !
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote:
> > >>>>>
> > >>>>> Would container be in addition to deb/rpm, or instead of?  If
> latter
> > >>>>> can we do deb/rpm as base then have container either created from
> them
> > >>>>> or directly from artifacts?
> > >>>>>
> > >>>>> On test usage side, seems could probably break up tests into
> > >>>>> base/required and then optional/add-on tests/test-suites.  Think
> > >>>>> remember seeing mention of certain tests that are failing at times
> on
> > >>>>> certain component(s) anyways in the core builds but don’t mean that
> > >>>>> the build is broken, so would make sense to have some clean up
> around
> > >>> those anyways.
> > >>>>>
> > >>>>> -----Original Message-----
> > >>>>> From: RJ Nowling [mailto:[email protected]]
> > >>>>> Sent: Sunday, August 30, 2015 1:11 PM
> > >>>>> To: [email protected]
> > >>>>> Subject: Re: Proposal for "BigTop Data Generators"
> > >>>>>
> > >>>>> I agree with the above. :)
> > >>>>>
> > >>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas
> > >>>>> <[email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi RJ.
> > >>>>>>
> > >>>>>> Maven repositories and docker containers for the transaction queue
> > >>>>>> are good enough IMO.  That will give people a way to compose them
> in
> > >>>>>> different idioms (one for Java folks, another for broader Linux
> > >>>>>> audience
> > >>>>> ).
> > >>>>>>
> > >>>>>> I think the lib designs are fairly intuitive.  I would say that we
> > >>>>>> should constrain them all to being written in Java or Groovy to
> keep
> > >>>>>> the bigtop theme of "JVM for everything" :).
> > >>>>>>
> > >>>>>> Any particular questions you have around technical design can be
> > >>>>>> followed in a JIRA or else maybe a Readme spec that goes in a  top
> > >>>>>> level of the data-generators dir...
> > >>>>>>
> > >>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]>
> wrote:
> > >>>>>>>
> > >>>>>>> I'd like to keep this conversation going.
> > >>>>>>>
> > >>>>>>> So here are a few discussion points:
> > >>>>>>>
> > >>>>>>> 1. How do we want to make the data generators available?  Maven?
> > >>>>>>> RPMs
> > >>>>>> and
> > >>>>>>> Debs?
> > >>>>>>>
> > >>>>>>> For now, I'm using a gradle multi-project build to easily build
> > >>>>>>> and
> > >>>>>> install
> > >>>>>>> the BPS data generators and its libraries into a local maven
> repo.
> > >>>>>>> This makes development easy.  Eventually, I would like to post
> > >>>>>>> binaries
> > >>>>>> through
> > >>>>>>> Maven for easy integration by users.  RPMs / Debs could be
> > >>>>>>> interesting since I use a pattern where the data generators are
> > >>>>>>> libraries (to support application integration / parallelization
> by
> > >>>>>>> the host framework) but also provide CLI drivers for local
> testing.
> > >>>>>>>
> > >>>>>>> 2.  The idea of using the data generators as part of the smoke
> > >>>>>>> tests came up.  Since there is concern about making the data
> > >>>>>>> generators required, we could offer the blueprints (BigPetStore)
> > >>>>>>> as optional smoke tests.  Would that be a good compromise?
> > >>>>>>>
> > >>>>>>> 3.  How will they be maintained?
> > >>>>>>>
> > >>>>>>> I'll certainly add myself to the maintainers list and will be
> > >>>>>>> taking responsibility.  I'm happy to have others help as well if
> > >>>>>>> anyone wants to
> > >>>>>>> -- if not, that's cool, too.
> > >>>>>>>
> > >>>>>>> 4. Is anyone interested at all in discussing library APIs and
> > >> designs?
> > >>>>>>> What about internal interfaces and such?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> My plan was to add at least one more data generator (weather
> > >>>>>>> simulator)
> > >>>>>> to
> > >>>>>>> bigtop-data-generators in the short term.  However, given the
> > >>>>>>> concerns raised by Cos (more discussion needed) and Olaf (don't
> > >>>>>>> want to force data generators on unsuspecting users ;) ), I would
> > >>>>>>> like to reach some
> > >>>>>> consensus
> > >>>>>>> on what people are concerned about and solutions.
> > >>>>>>>
> > >>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik
> > >>>>>>> <[email protected]>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ
> > >>>>>> created,
> > >>>>>>>> so
> > >>>>>>>> we have a way to connect one to another ;)
> > >>>>>>>>
> > >>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote:
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I am not confident that moving important design discussions
> with
> > >>>>>>>>> impact
> > >>>>>>>> to
> > >>>>>>>>> the whole project to jira is a good idea.
> > >>>>>>>>>
> > >>>>>>>>> In the current JIRA Traffic storm it is not easy to identify
> and
> > >>>>>>>>> follow
> > >>>>>>>> important tickets.
> > >>>>>>>>>
> > >>>>>>>>> Please keep discussions on the list or at least, please state
> on
> > >>>>>>>>> this
> > >>>>>>>> list which Ticket to follow ...
> > >>>>>>>>>
> > >>>>>>>>> Olaf
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <
> > >> [email protected]
> > >>>> :
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote:
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Nive to have data generators in Bigtop.
> > >>>>>>>>>>>
> > >>>>>>>>>>> But please do not include it in bigtop_utils, since this
> > >>>>>>>>>>> package is mandatory. Not everyone needs a data generator .
> > >>>>>>>>>>
> > >>>>>>>>>> Yup. And let's move further design discussion to the JIRA!
> > >>>>>>>>>>
> > >>>>>>>>>>> Olaf
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <
> > >>>>>> [email protected]
> > >>>>>>>>> :
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good first
> > >>>>>>>>>>>> step
> > >>>>>>>> ,Then apps can just include it as needed...?.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'm not against packaging if someone wants packages for
> this.
> > >>>>>>>>>>>> Maybe
> > >>>>>>>> even include it in bigtop util ?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Let's move to jira,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik
> > >>>>>>>>>>>>> <[email protected]>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> It is pretty cool indeed!
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I wonder how it needs to be structured to be:
> > >>>>>>>>>>>>> - easy to access/use from other components wherever it is
> > >>>>>>>>>>>>> needed
> > >>>>>>>>>>>>> - doesn't interfere with the rest of the stack
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I guess one possible way would be to implement the
> generator
> > >>>>>>>>>>>>> as a
> > >>>>>>>> set of maven
> > >>>>>>>>>>>>> artifacts, that could be installed/consumed transparently
> by
> > >>>>>>>>>>>>> just
> > >>>>>>>> declaring a
> > >>>>>>>>>>>>> dependency e.g as proposed via top-level component.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Another way is to have a new package like we do for
> > >>>>>>>>>>>>> bigtop-utils
> > >>>>>>>> and such.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we
> > >>>>>>>> continue on the
> > >>>>>>>>>>>>> dev@ ??
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Cos
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
> > >>>>>>>>>>>>>> Hi BigTop,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to
> propose
> > >>>>>>>>>>>>>> a new
> > >>>>>>>> component
> > >>>>>>>>>>>>>> for BigTop: BigTop Data Generators.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> BigTop Data Generators would consist of a common set of
> > >>>>>>>>>>>>>> libraries
> > >>>>>>>> for
> > >>>>>>>>>>>>>> building data generators and three example data
> generators:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> * BigPetStore transaction generator (moved from
> > >>>>>>>>>>>>>> BigPetStore)
> > >>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with
> > >>>>>>>>>>>>>> booths
> > >>>>>>>> on a
> > >>>>>>>>>>>>>> showroom floor, at a conference, or at a mall
> > >>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation
> > >>>>>>>> (temperature, wind
> > >>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code.  (From a
> > >>>>>>>>>>>>>> model
> > >>>>>>>> trained on
> > >>>>>>>>>>>>>> NOAA historical weather data)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> We believe that creating a common set of libraries will
> > >>>>>>>>>>>>>> have
> > >>>>>>>> several
> > >>>>>>>>>>>>>> benefits including:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> * Easier for others to build their own data generators
> > >>>>>>>>>>>>>> * Make data generators smaller and easier to maintain
> > >>>>>>>>>>>>>> * Share improvements across the data generators
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> More details on the libraries are below.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> BigPetStore will be continue to focus on building  and
> > >>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data
> > >> Generators.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop
> > >>>>>>>>>>>>>> for tools
> > >>>>>>>> for
> > >>>>>>>>>>>>>> building better, more comprehensive blueprints.  We want
> to
> > >>>>>>>> support these
> > >>>>>>>>>>>>>> efforts through data generators and the initial set of
> > >>>>>>>>>>>>>> blueprint
> > >>>>>>>> we've been
> > >>>>>>>>>>>>>> building.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> If the community is generally in support of this, I can
> > >>>>>>>>>>>>>> create a
> > >>>>>>>> top-level
> > >>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data
> > >>>>>>>>>>>>>> generators and libraries in there.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks!
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> RJ
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> -------
> > >>>>>>>>>>>>>> Library details:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> So far, I've extracted the following common libraries:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various
> > >>>>>>>>>>>>>> samplers
> > >>>>>>>>>>>>>> * Name generator -- data set and samplers for generating
> > >>>>>>>>>>>>>> names
> > >>>>>>>>>>>>>> * Location data set -- data set and classes for US zip
> > >>>>>>>>>>>>>> codes,
> > >>>>>>>> their
> > >>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and population
> > >>>>>>>>>>>>>> sizes
> > >>>>>>>>>>>>>> * Product generator -- library for enumerating products
> > >>>>>>>>>>>>>> from a specification file.  Comes with default
> > >>>>>>>>>>>>>> specifications for
> > >>>>>>>> BigPetStore
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I also expect that I'll add libraries for:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> * Particle simulation -- customer movement in a room
> > >>>>>>>>>>>>>> * Latent factor model generation -- generate latent
> > >>>>>>>>>>>>>> factors and customer weights to create something like
> > >>> MovieLens data.
> > >>>>>>>>>>>>>> Used in
> > >>>>>>>> Bazaar
> > >>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore for
> > >>>>>>>>>>>>>> customer
> > >>>>>>>> item
> > >>>>>>>>>>>>>> preferences
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore data
> > >>>>>>>>>>>>>> generator
> > >>>>>>>> but the
> > >>>>>>>>>>>>>> other generators have been refactored to be based off the
> > >>>>>>>>>>>>>> standard
> > >>>>>>>> set of
> > >>>>>>>>>>>>>> libraries.
> > >>>>
> > >>>>
> > >>>> --
> > >>>> jay vyas
> > >>>>
> > >>>
> > >>
> >
>
>
>


-- 
jay vyas

Reply via email to