Hi RJ. Maven repositories and docker containers for the transaction queue are good enough IMO. That will give people a way to compose them in different idioms (one for Java folks, another for broader Linux audience ).
I think the lib designs are fairly intuitive. I would say that we should constrain them all to being written in Java or Groovy to keep the bigtop theme of "JVM for everything" :). Any particular questions you have around technical design can be followed in a JIRA or else maybe a Readme spec that goes in a top level of the data-generators dir... > On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> wrote: > > I'd like to keep this conversation going. > > So here are a few discussion points: > > 1. How do we want to make the data generators available? Maven? RPMs and > Debs? > > For now, I'm using a gradle multi-project build to easily build and install > the BPS data generators and its libraries into a local maven repo. This > makes development easy. Eventually, I would like to post binaries through > Maven for easy integration by users. RPMs / Debs could be interesting > since I use a pattern where the data generators are libraries (to support > application integration / parallelization by the host framework) but also > provide CLI drivers for local testing. > > 2. The idea of using the data generators as part of the smoke tests came > up. Since there is concern about making the data generators required, we > could offer the blueprints (BigPetStore) as optional smoke tests. Would > that be a good compromise? > > 3. How will they be maintained? > > I'll certainly add myself to the maintainers list and will be taking > responsibility. I'm happy to have others help as well if anyone wants to > -- if not, that's cool, too. > > 4. Is anyone interested at all in discussing library APIs and designs? > What about internal interfaces and such? > > > My plan was to add at least one more data generator (weather simulator) to > bigtop-data-generators in the short term. However, given the concerns > raised by Cos (more discussion needed) and Olaf (don't want to force data > generators on unsuspecting users ;) ), I would like to reach some consensus > on what people are concerned about and solutions. > > On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik <[email protected]> wrote: > >> Fine by me. I have linked this thread to the JIRA ticket that RJ created, >> so >> we have a way to connect one to another ;) >> >>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote: >>> Hi, >>> >>> I am not confident that moving important design discussions with impact >> to >>> the whole project to jira is a good idea. >>> >>> In the current JIRA Traffic storm it is not easy to identify and follow >> important tickets. >>> >>> Please keep discussions on the list or at least, please state on this >> list which Ticket to follow ... >>> >>> Olaf >>> >>> >>> >>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik <[email protected]>: >>>> >>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote: >>>>> Hi, >>>>> >>>>> Nive to have data generators in Bigtop. >>>>> >>>>> But please do not include it in bigtop_utils, since this package is >>>>> mandatory. Not everyone needs a data generator . >>>> >>>> Yup. And let's move further design discussion to the JIRA! >>>> >>>>> Olaf >>>>> >>>>> >>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas <[email protected] >>> : >>>>>> >>>>>> Publishing the jar to bigtops maven is probably a good first step >> ,Then apps can just include it as needed...?. >>>>>> >>>>>> I'm not against packaging if someone wants packages for this. Maybe >> even include it in bigtop util ? >>>>>> >>>>>> Let's move to jira, >>>>>> >>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik <[email protected]> >> wrote: >>>>>>> >>>>>>> It is pretty cool indeed! >>>>>>> >>>>>>> I wonder how it needs to be structured to be: >>>>>>> - easy to access/use from other components wherever it is needed >>>>>>> - doesn't interfere with the rest of the stack >>>>>>> >>>>>>> I guess one possible way would be to implement the generator as a >> set of maven >>>>>>> artifacts, that could be installed/consumed transparently by just >> declaring a >>>>>>> dependency e.g as proposed via top-level component. >>>>>>> >>>>>>> Another way is to have a new package like we do for bigtop-utils >> and such. >>>>>>> >>>>>>> Perhaps this discussion should be moved to JIRA or shall we >> continue on the >>>>>>> dev@ ?? >>>>>>> >>>>>>> Cos >>>>>>> >>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: >>>>>>>> Hi BigTop, >>>>>>>> >>>>>>>> I had a discussion with Jay yesterday, we'd like to propose a new >> component >>>>>>>> for BigTop: BigTop Data Generators. >>>>>>>> >>>>>>>> BigTop Data Generators would consist of a common set of libraries >> for >>>>>>>> building data generators and three example data generators: >>>>>>>> >>>>>>>> * BigPetStore transaction generator (moved from BigPetStore) >>>>>>>> * BigTop Bazaar -- attendee movement and interactions with booths >> on a >>>>>>>> showroom floor, at a conference, or at a mall >>>>>>>> * BigTop Weatherman -- stochastic weather simulation >> (temperature, wind >>>>>>>> speed, wind chill, rainfall, etc.) per zip code. (From a model >> trained on >>>>>>>> NOAA historical weather data) >>>>>>>> >>>>>>>> We believe that creating a common set of libraries will have >> several >>>>>>>> benefits including: >>>>>>>> >>>>>>>> * Easier for others to build their own data generators >>>>>>>> * Make data generators smaller and easier to maintain >>>>>>>> * Share improvements across the data generators >>>>>>>> >>>>>>>> More details on the libraries are below. >>>>>>>> >>>>>>>> BigPetStore will be continue to focus on building and maintaining >>>>>>>> blueprints, powered by the BigTop Data Generators. >>>>>>>> >>>>>>>> Our vision is that we get all of Apache coming to BigTop for tools >> for >>>>>>>> building better, more comprehensive blueprints. We want to >> support these >>>>>>>> efforts through data generators and the initial set of blueprint >> we've been >>>>>>>> building. >>>>>>>> >>>>>>>> If the community is generally in support of this, I can create a >> top-level >>>>>>>> "bigtop-data-generators" directory and put the data generators and >>>>>>>> libraries in there. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> RJ >>>>>>>> >>>>>>>> >>>>>>>> ------- >>>>>>>> Library details: >>>>>>>> >>>>>>>> So far, I've extracted the following common libraries: >>>>>>>> >>>>>>>> * Samplers -- provides classes for PDFs and various samplers >>>>>>>> * Name generator -- data set and samplers for generating names >>>>>>>> * Location data set -- data set and classes for US zip codes, >> their >>>>>>>> GPS coordinates, median house hold incomes, and population sizes >>>>>>>> * Product generator -- library for enumerating products from a >>>>>>>> specification file. Comes with default specifications for >> BigPetStore >>>>>>>> >>>>>>>> I also expect that I'll add libraries for: >>>>>>>> >>>>>>>> * Particle simulation -- customer movement in a room >>>>>>>> * Latent factor model generation -- generate latent factors and >>>>>>>> customer weights to create something like MovieLens data. Used in >> Bazaar >>>>>>>> for booth preferences and potentially in BigPetStore for customer >> item >>>>>>>> preferences >>>>>>>> >>>>>>>> Most of these libraries came out of the BigPetStore data generator >> but the >>>>>>>> other generators have been refactored to be based off the standard >> set of >>>>>>>> libraries. >> >> >>
