- I agree Gradle is better than "yet another script". - And docker container, even if suboptimal, as thin wrapper to gradle container is all you need to deliver the data-generator on the masses so they can try it w/ zero startup cost.
On Mon, Aug 31, 2015 at 7:36 PM, Konstantin Boudnik <[email protected]> wrote: > Why do would we need yet another script (and potentially an extra readme to > explain its command options) when we have the gradle? > > Cos > > On Mon, Aug 31, 2015 at 10:51PM, Olaf Flebbe wrote: > > +1 to the CLI /shell script interface. > > > > If I can choose I like to have a apt-get install bigtop-datagenerator , > running for instance > > > > bigtop-data-generatoroutputDir nStores nCustomers nPurchasingModels > simulationLength seed > > > > I can help out with packaging if needed. > > > > Why should we use the docker indirection for a plain CLI file ? Of > course, We can provide a trivial Dockerfile to create a container supplying > a JVM and running the CLI ... But I do not like to depend our services on > docker registry more than we do now. > > > > Olaf > > > > > > > > > Am 31.08.2015 um 16:40 schrieb Evans Ye <[email protected]>: > > > > > > I am very much like the shell script wrapper and docker image idea > since > > > that way we can integrate it directly with bigtop provisioner which > yield a > > > perfect ux for the whole things. I think its not too hard to do it > both, we > > > just need to add a parameter to turn the script into daemon mode. I see > > > lots of image doing this way. > > > > > > docker run bigtop/bigtop-data-gen --scheme weather --size 5GB --output > > > data-dir --etc foo --etc bar --daemon > > > 2015年8月31日 下午9:06於 "RJ Nowling" <[email protected]>寫道: > > > > > >> The BigPetStore, Bazaar, and weather data generators have > single-threaded > > >> command-line interfaces. We could do the same with the smaller > generators > > >> (names, locations, etc.) if there is interest. > > >> > > >> On Mon, Aug 31, 2015 at 5:24 AM, Jay Vyas < > [email protected]> > > >> wrote: > > >> > > >>> Nate: Good idea to abstract the interface one level higher.... > > >>> > > >>> How about a docker run command ? That is probably the easiest way for > > >>> Linux folks to run one off Java apps nowadays. > > >>> > > >>> docker run bigtop/bigtop-data-gen --scheme weather --size 5GB > --output > > >>> data-dir --etc foo --etc bar > > >>> > > >>> I'm happy to curate such a docker image, I already am doing something > > >> like > > >>> this in kube for bigtop-transaction-queue, which continuously pumps > data > > >>> generator outputs into a REST endpoint or file > > >>> Queue... So it could be extended to support other generators. > > >>> > > >>> > > >>>> om> <[email protected]> wrote: > > >>>> > > >>>> Could picture at some point supporting something like this for > non-jvm > > >>> folk just looking for test/demo data: > > >>>> > > >>>> apt-get install bigtop-data-gen > > >>>> ~/ $ bigtop-data-gen --scheme weather --size 5GB --output data-dir > > >>> --etc foo --etc bar > > >>>> > > >>>> > > >>>> > > >>>> -----Original Message----- > > >>>> From: jay vyas [mailto:[email protected]] > > >>>> Sent: Sunday, August 30, 2015 5:11 PM > > >>>> To: [email protected] > > >>>> Subject: Re: Proposal for "BigTop Data Generators" > > >>>> > > >>>> Hola nate. Well, here are the Use cases I know of that I have used > the > > >>> data generators for. > > >>>> > > >>>> Dockerfile: > > >>>> > > >>>> (1) for testing kubernetes. For this, I just use transaction-queue > > >>> docker file. > > >>>> (2) for testing GlusterFS small file workloads, maybe with other > > >>> analytics tools... > > >>>> > > >>>> Maven repo > > >>>> > > >>>> (3) Java maprduce/ignite/spark applications, which can just add a > mvn > > >>> repo when compiling. Java developers never add jars through RPM > repos. > > >>>> > > >>>> RPM/DEB packages: > > >>>> > > >>>> I could see people using an RPM/DEB data generator, and I'm not > against > > >>> it. But I simply don't know of any real world projects which > *currently* > > >>> need RPM/Deb packages, which is why I haven't bothered to propose it > as a > > >>> requirement. Nevertheless linux packages are always a welcome > addition > > >> if > > >>> someone wants to create em ! > > >>>> > > >>>> > > >>>> > > >>>> > > >>>>> On Sun, Aug 30, 2015 at 4:34 PM, <[email protected]> wrote: > > >>>>> > > >>>>> Would container be in addition to deb/rpm, or instead of? If > latter > > >>>>> can we do deb/rpm as base then have container either created from > them > > >>>>> or directly from artifacts? > > >>>>> > > >>>>> On test usage side, seems could probably break up tests into > > >>>>> base/required and then optional/add-on tests/test-suites. Think > > >>>>> remember seeing mention of certain tests that are failing at times > on > > >>>>> certain component(s) anyways in the core builds but don’t mean that > > >>>>> the build is broken, so would make sense to have some clean up > around > > >>> those anyways. > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: RJ Nowling [mailto:[email protected]] > > >>>>> Sent: Sunday, August 30, 2015 1:11 PM > > >>>>> To: [email protected] > > >>>>> Subject: Re: Proposal for "BigTop Data Generators" > > >>>>> > > >>>>> I agree with the above. :) > > >>>>> > > >>>>> On Sun, Aug 30, 2015 at 11:19 AM, Jay Vyas > > >>>>> <[email protected]> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi RJ. > > >>>>>> > > >>>>>> Maven repositories and docker containers for the transaction queue > > >>>>>> are good enough IMO. That will give people a way to compose them > in > > >>>>>> different idioms (one for Java folks, another for broader Linux > > >>>>>> audience > > >>>>> ). > > >>>>>> > > >>>>>> I think the lib designs are fairly intuitive. I would say that we > > >>>>>> should constrain them all to being written in Java or Groovy to > keep > > >>>>>> the bigtop theme of "JVM for everything" :). > > >>>>>> > > >>>>>> Any particular questions you have around technical design can be > > >>>>>> followed in a JIRA or else maybe a Readme spec that goes in a top > > >>>>>> level of the data-generators dir... > > >>>>>> > > >>>>>>> On Aug 30, 2015, at 1:51 AM, RJ Nowling <[email protected]> > wrote: > > >>>>>>> > > >>>>>>> I'd like to keep this conversation going. > > >>>>>>> > > >>>>>>> So here are a few discussion points: > > >>>>>>> > > >>>>>>> 1. How do we want to make the data generators available? Maven? > > >>>>>>> RPMs > > >>>>>> and > > >>>>>>> Debs? > > >>>>>>> > > >>>>>>> For now, I'm using a gradle multi-project build to easily build > > >>>>>>> and > > >>>>>> install > > >>>>>>> the BPS data generators and its libraries into a local maven > repo. > > >>>>>>> This makes development easy. Eventually, I would like to post > > >>>>>>> binaries > > >>>>>> through > > >>>>>>> Maven for easy integration by users. RPMs / Debs could be > > >>>>>>> interesting since I use a pattern where the data generators are > > >>>>>>> libraries (to support application integration / parallelization > by > > >>>>>>> the host framework) but also provide CLI drivers for local > testing. > > >>>>>>> > > >>>>>>> 2. The idea of using the data generators as part of the smoke > > >>>>>>> tests came up. Since there is concern about making the data > > >>>>>>> generators required, we could offer the blueprints (BigPetStore) > > >>>>>>> as optional smoke tests. Would that be a good compromise? > > >>>>>>> > > >>>>>>> 3. How will they be maintained? > > >>>>>>> > > >>>>>>> I'll certainly add myself to the maintainers list and will be > > >>>>>>> taking responsibility. I'm happy to have others help as well if > > >>>>>>> anyone wants to > > >>>>>>> -- if not, that's cool, too. > > >>>>>>> > > >>>>>>> 4. Is anyone interested at all in discussing library APIs and > > >> designs? > > >>>>>>> What about internal interfaces and such? > > >>>>>>> > > >>>>>>> > > >>>>>>> My plan was to add at least one more data generator (weather > > >>>>>>> simulator) > > >>>>>> to > > >>>>>>> bigtop-data-generators in the short term. However, given the > > >>>>>>> concerns raised by Cos (more discussion needed) and Olaf (don't > > >>>>>>> want to force data generators on unsuspecting users ;) ), I would > > >>>>>>> like to reach some > > >>>>>> consensus > > >>>>>>> on what people are concerned about and solutions. > > >>>>>>> > > >>>>>>> On Thu, Aug 27, 2015 at 12:38 PM, Konstantin Boudnik > > >>>>>>> <[email protected]> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> Fine by me. I have linked this thread to the JIRA ticket that RJ > > >>>>>> created, > > >>>>>>>> so > > >>>>>>>> we have a way to connect one to another ;) > > >>>>>>>> > > >>>>>>>>> On Thu, Aug 27, 2015 at 01:02PM, Olaf Flebbe wrote: > > >>>>>>>>> Hi, > > >>>>>>>>> > > >>>>>>>>> I am not confident that moving important design discussions > with > > >>>>>>>>> impact > > >>>>>>>> to > > >>>>>>>>> the whole project to jira is a good idea. > > >>>>>>>>> > > >>>>>>>>> In the current JIRA Traffic storm it is not easy to identify > and > > >>>>>>>>> follow > > >>>>>>>> important tickets. > > >>>>>>>>> > > >>>>>>>>> Please keep discussions on the list or at least, please state > on > > >>>>>>>>> this > > >>>>>>>> list which Ticket to follow ... > > >>>>>>>>> > > >>>>>>>>> Olaf > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> Am 26.08.2015 um 22:56 schrieb Konstantin Boudnik < > > >> [email protected] > > >>>> : > > >>>>>>>>>> > > >>>>>>>>>> On Wed, Aug 26, 2015 at 10:38PM, Olaf Flebbe wrote: > > >>>>>>>>>>> Hi, > > >>>>>>>>>>> > > >>>>>>>>>>> Nive to have data generators in Bigtop. > > >>>>>>>>>>> > > >>>>>>>>>>> But please do not include it in bigtop_utils, since this > > >>>>>>>>>>> package is mandatory. Not everyone needs a data generator . > > >>>>>>>>>> > > >>>>>>>>>> Yup. And let's move further design discussion to the JIRA! > > >>>>>>>>>> > > >>>>>>>>>>> Olaf > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> Am 26.08.2015 um 11:25 schrieb Jay Vyas < > > >>>>>> [email protected] > > >>>>>>>>> : > > >>>>>>>>>>>> > > >>>>>>>>>>>> Publishing the jar to bigtops maven is probably a good first > > >>>>>>>>>>>> step > > >>>>>>>> ,Then apps can just include it as needed...?. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I'm not against packaging if someone wants packages for > this. > > >>>>>>>>>>>> Maybe > > >>>>>>>> even include it in bigtop util ? > > >>>>>>>>>>>> > > >>>>>>>>>>>> Let's move to jira, > > >>>>>>>>>>>> > > >>>>>>>>>>>>> On Aug 25, 2015, at 9:41 PM, Konstantin Boudnik > > >>>>>>>>>>>>> <[email protected]> > > >>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> It is pretty cool indeed! > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I wonder how it needs to be structured to be: > > >>>>>>>>>>>>> - easy to access/use from other components wherever it is > > >>>>>>>>>>>>> needed > > >>>>>>>>>>>>> - doesn't interfere with the rest of the stack > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I guess one possible way would be to implement the > generator > > >>>>>>>>>>>>> as a > > >>>>>>>> set of maven > > >>>>>>>>>>>>> artifacts, that could be installed/consumed transparently > by > > >>>>>>>>>>>>> just > > >>>>>>>> declaring a > > >>>>>>>>>>>>> dependency e.g as proposed via top-level component. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Another way is to have a new package like we do for > > >>>>>>>>>>>>> bigtop-utils > > >>>>>>>> and such. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Perhaps this discussion should be moved to JIRA or shall we > > >>>>>>>> continue on the > > >>>>>>>>>>>>> dev@ ?? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Cos > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote: > > >>>>>>>>>>>>>> Hi BigTop, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I had a discussion with Jay yesterday, we'd like to > propose > > >>>>>>>>>>>>>> a new > > >>>>>>>> component > > >>>>>>>>>>>>>> for BigTop: BigTop Data Generators. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> BigTop Data Generators would consist of a common set of > > >>>>>>>>>>>>>> libraries > > >>>>>>>> for > > >>>>>>>>>>>>>> building data generators and three example data > generators: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> * BigPetStore transaction generator (moved from > > >>>>>>>>>>>>>> BigPetStore) > > >>>>>>>>>>>>>> * BigTop Bazaar -- attendee movement and interactions with > > >>>>>>>>>>>>>> booths > > >>>>>>>> on a > > >>>>>>>>>>>>>> showroom floor, at a conference, or at a mall > > >>>>>>>>>>>>>> * BigTop Weatherman -- stochastic weather simulation > > >>>>>>>> (temperature, wind > > >>>>>>>>>>>>>> speed, wind chill, rainfall, etc.) per zip code. (From a > > >>>>>>>>>>>>>> model > > >>>>>>>> trained on > > >>>>>>>>>>>>>> NOAA historical weather data) > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> We believe that creating a common set of libraries will > > >>>>>>>>>>>>>> have > > >>>>>>>> several > > >>>>>>>>>>>>>> benefits including: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> * Easier for others to build their own data generators > > >>>>>>>>>>>>>> * Make data generators smaller and easier to maintain > > >>>>>>>>>>>>>> * Share improvements across the data generators > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> More details on the libraries are below. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> BigPetStore will be continue to focus on building and > > >>>>>>>>>>>>>> maintaining blueprints, powered by the BigTop Data > > >> Generators. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Our vision is that we get all of Apache coming to BigTop > > >>>>>>>>>>>>>> for tools > > >>>>>>>> for > > >>>>>>>>>>>>>> building better, more comprehensive blueprints. We want > to > > >>>>>>>> support these > > >>>>>>>>>>>>>> efforts through data generators and the initial set of > > >>>>>>>>>>>>>> blueprint > > >>>>>>>> we've been > > >>>>>>>>>>>>>> building. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> If the community is generally in support of this, I can > > >>>>>>>>>>>>>> create a > > >>>>>>>> top-level > > >>>>>>>>>>>>>> "bigtop-data-generators" directory and put the data > > >>>>>>>>>>>>>> generators and libraries in there. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Thanks! > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> RJ > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> ------- > > >>>>>>>>>>>>>> Library details: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> So far, I've extracted the following common libraries: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> * Samplers -- provides classes for PDFs and various > > >>>>>>>>>>>>>> samplers > > >>>>>>>>>>>>>> * Name generator -- data set and samplers for generating > > >>>>>>>>>>>>>> names > > >>>>>>>>>>>>>> * Location data set -- data set and classes for US zip > > >>>>>>>>>>>>>> codes, > > >>>>>>>> their > > >>>>>>>>>>>>>> GPS coordinates, median house hold incomes, and population > > >>>>>>>>>>>>>> sizes > > >>>>>>>>>>>>>> * Product generator -- library for enumerating products > > >>>>>>>>>>>>>> from a specification file. Comes with default > > >>>>>>>>>>>>>> specifications for > > >>>>>>>> BigPetStore > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I also expect that I'll add libraries for: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> * Particle simulation -- customer movement in a room > > >>>>>>>>>>>>>> * Latent factor model generation -- generate latent > > >>>>>>>>>>>>>> factors and customer weights to create something like > > >>> MovieLens data. > > >>>>>>>>>>>>>> Used in > > >>>>>>>> Bazaar > > >>>>>>>>>>>>>> for booth preferences and potentially in BigPetStore for > > >>>>>>>>>>>>>> customer > > >>>>>>>> item > > >>>>>>>>>>>>>> preferences > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Most of these libraries came out of the BigPetStore data > > >>>>>>>>>>>>>> generator > > >>>>>>>> but the > > >>>>>>>>>>>>>> other generators have been refactored to be based off the > > >>>>>>>>>>>>>> standard > > >>>>>>>> set of > > >>>>>>>>>>>>>> libraries. > > >>>> > > >>>> > > >>>> -- > > >>>> jay vyas > > >>>> > > >>> > > >> > > > > > -- jay vyas
