[ https://issues.apache.org/jira/browse/BIGTOP-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902623#comment-13902623 ]
jay vyas commented on BIGTOP-1089: ---------------------------------- Great news on this folks ! We finally have a stable, production quality codebase, with profiles for each ecosystem tool and preliiminary testing in a real hadoop cluster. - First phase of testing (generation of transactions) works, bigpetstore now works in bigtop-deploy/vm/vagrant-puppet based VMs. - We also now have maven profiles for pig, hive and crunch. You can build and run any ecosystem ETL using those profiles. So, once I finish testing the whole pipeline in psuedo distributed mode, ill be crafting the first official bigpetstore patch ! Note: It kinda overloads VMs because it creates many tasks (one per state), by nature of the custom generating input format. {noformat} 14/02/16 02:37:53 INFO mapreduce.JobSubmitter: number of splits:7 14/02/16 02:37:53 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/02/16 02:37:53 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/02/16 02:37:53 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 14/02/16 02:37:53 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/02/16 02:37:53 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/02/16 02:37:53 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 14/02/16 02:37:53 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/02/16 02:37:53 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 14/02/16 02:37:53 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/02/16 02:37:53 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/02/16 02:37:53 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 14/02/16 02:37:53 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/02/16 02:37:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1392513928307_0005 14/02/16 02:37:58 WARN mapred.JobConf: The variable mapred.child.ulimit is no longer used. 14/02/16 02:37:58 INFO client.YarnClientImpl: Submitted application application_1392513928307_0005 to ResourceManager at vagrant.bigtop1/127.0.0.1:8032 14/02/16 02:37:58 INFO mapreduce.Job: The url to track the job: http://vagrant.bigtop1:20888/proxy/application_1392513928307_0005/ 14/02/16 02:37:58 INFO mapreduce.Job: Running job: job_1392513928307_0005 14/02/16 02:38:07 INFO mapreduce.Job: Job job_1392513928307_0005 running in uber mode : false 14/02/16 02:38:07 INFO mapreduce.Job: map 0% reduce 0% 14/02/16 02:38:35 INFO mapreduce.Job: map 14% reduce 0% 14/02/16 02:38:44 INFO mapreduce.Job: map 29% reduce 0% 14/02/16 02:38:45 INFO mapreduce.Job: Task Id : attempt_1392513928307_0005_m_000001_0, Status : FAILED Killed by external signal 14/02/16 02:38:54 INFO mapreduce.Job: Task Id : attempt_1392513928307_0005_m_000004_0, Status : FAILED Killed by external signal 14/02/16 02:38:55 INFO mapreduce.Job: map 57% reduce 0% 14/02/16 02:39:13 INFO mapreduce.Job: map 71% reduce 0% 14/02/16 02:39:22 INFO mapreduce.Job: map 71% reduce 1% 14/02/16 02:39:23 INFO mapreduce.Job: map 86% reduce 2% 14/02/16 02:39:26 INFO mapreduce.Job: map 86% reduce 3% 14/02/16 02:39:31 INFO mapreduce.Job: map 86% reduce 2% 14/02/16 02:40:27 INFO mapreduce.Job: map 100% reduce 2% 14/02/16 02:40:28 INFO mapreduce.Job: map 100% reduce 5% 14/02/16 02:40:29 INFO mapreduce.Job: map 100% reduce 9% 14/02/16 02:40:30 INFO mapreduce.Job: map 100% reduce 10% 14/02/16 02:40:32 INFO mapreduce.Job: map 100% reduce 14% 14/02/16 02:40:33 INFO mapreduce.Job: map 100% reduce 17% 14/02/16 02:40:57 INFO mapreduce.Job: map 100% reduce 27% 14/02/16 02:40:58 INFO mapreduce.Job: map 100% reduce 30% 14/02/16 02:40:59 INFO mapreduce.Job: map 100% reduce 37% 14/02/16 02:41:26 INFO mapreduce.Job: map 100% reduce 47% 14/02/16 02:41:27 INFO mapreduce.Job: map 100% reduce 57% 14/02/16 02:41:53 INFO mapreduce.Job: map 100% reduce 67% 14/02/16 02:41:54 INFO mapreduce.Job: map 100% reduce 70% 14/02/16 02:41:55 INFO mapreduce.Job: map 100% reduce 77% 14/02/16 02:42:18 INFO mapreduce.Job: map 100% reduce 80% 14/02/16 02:42:21 INFO mapreduce.Job: map 100% reduce 90% 14/02/16 02:42:22 INFO mapreduce.Job: map 100% reduce 93% 14/02/16 02:42:23 INFO mapreduce.Job: map 100% reduce 97% 14/02/16 02:42:26 INFO mapreduce.Job: map 100% reduce 100% 14/02/16 02:42:26 INFO mapreduce.Job: Job job_1392513928307_0005 completed successfully 14/02/16 02:42:26 WARN mapred.JobConf: The variable mapred.child.ulimit is no longer used. 14/02/16 02:42:26 INFO mapreduce.Job: Counters: 45 File System Counters FILE: Number of bytes read=1067 FILE: Number of bytes written=2755986 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=497 HDFS: Number of bytes written=867 HDFS: Number of read operations=104 HDFS: Number of large read operations=0 HDFS: Number of write operations=60 Job Counters Failed map tasks=2 Killed reduce tasks=11 Launched map tasks=9 Launched reduce tasks=41 Other local map tasks=9 Total time spent by all maps in occupied slots (ms)=308583 Total time spent by all reduces in occupied slots (ms)=1013311 Map-Reduce Framework Map input records=10 Map output records=10 Map output bytes=867 Map output materialized bytes=2147 Input split bytes=497 Combine input records=0 Combine output records=0 Reduce input groups=10 Reduce shuffle bytes=2147 Reduce input records=10 Reduce output records=10 Spilled Records=20 Shuffled Maps =210 Failed Shuffles=0 Merged Map outputs=210 GC time elapsed (ms)=9317 CPU time spent (ms)=21960 Physical memory (bytes) snapshot=4822437888 Virtual memory (bytes) snapshot=59736100864 Total committed heap usage (bytes)=2466344960 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=867 [root@vagrant vagrant]# hadoop fs -cat /tmp/bps2/* BigPetStore,storeCode_CO,1 heidi,o'neill,Sun Dec 28 01:54:42 UTC 1969,15.1,choke-collar BigPetStore,storeCode_CT,1 shawn,cantrell,Sat Jan 24 05:08:29 UTC 1970,19.1,fuzzy-collar BigPetStore,storeCode_OK,1 herbert,dejesus,Fri Jan 16 08:14:57 UTC 1970,10.5,dog-food BigPetStore,storeCode_AZ,1 walter,richardson,Wed Dec 31 19:45:21 UTC 1969,10.5,dog-food BigPetStore,storeCode_CA,1 natasha,caldwell,Thu Dec 18 04:46:14 UTC 1969,11.75,fish-food BigPetStore,storeCode_CA,2 natasha,caldwell,Sat Jan 17 00:50:34 UTC 1970,7.5,cat-food BigPetStore,storeCode_CA,3 natasha,caldwell,Sun Jan 25 19:31:17 UTC 1970,11.75,fish-food BigPetStore,storeCode_NY,1 margaret,sims,Wed Jan 21 03:56:34 UTC 1970,10.5,dog-food BigPetStore,storeCode_NY,2 margaret,sims,Sun Dec 28 06:44:04 UTC 1969,19.75,fish-food BigPetStore,storeCode_AK,1 sharon,vargas,Thu Jan 22 15:46:47 UTC 1970,19.1,fuzzy-collar [root@vagrant vagrant]# hadoop fs -cat /tmp/bps3/* BigPetStore,storeCode_CO,1 shawn,cantrell,Sat Jan 24 05:08:29 UTC 1970,10.5,dog-food BigPetStore,storeCode_CT,1 clarence,robles,Wed Jan 21 18:14:05 UTC 1970,10.5,dog-food BigPetStore,storeCode_OK,1 tia,mckee,Tue Jan 06 18:35:34 UTC 1970,5.1,hay-bail BigPetStore,storeCode_AZ,1 judy,drake,Mon Dec 29 04:55:38 UTC 1969,30.1,snake-bite ointment BigPetStore,storeCode_CA,1 darrell,watkins,Mon Dec 08 15:04:55 UTC 1969,11.75,fish-food BigPetStore,storeCode_CA,2 mickey,garrison,Sat Jan 17 20:53:21 UTC 1970,11.75,fish-food BigPetStore,storeCode_CA,3 mickey,garrison,Fri Jan 23 14:59:35 UTC 1970,7.5,cat-food BigPetStore,storeCode_NY,1 clarence,robles,Wed Jan 21 18:14:05 UTC 1970,20.1,steel-leash BigPetStore,storeCode_NY,2 valerie,wise,Sun Jan 04 03:11:53 UTC 1970,20.1,steel-leash BigPetStore,storeCode_AK,1 lindsey,mcneil,Fri Jan 16 13:43:11 UTC 1970,19.1,fuzzy-collar {noformat} > BigPetStore: A polyglot big data processing blueprint inside of bigtop for > comparing and learning about the tools in the bigtop packaged hadoop > ecosystem. > ---------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: BIGTOP-1089 > URL: https://issues.apache.org/jira/browse/BIGTOP-1089 > Project: Bigtop > Issue Type: New Feature > Components: Blueprints > Reporter: jay vyas > Assignee: jay vyas > > The need for templates for processing big data pipelines is obvious - and > also - given the increasing amount of overlap across different big data and > nosql projects, it will provide a ground truth in the future for comparing > the behaviour and approach of different tools to solve a common, easily > comprehended problem. > This ticket formalizes the conversation in mailing list archives regarding > the BigPetStore proposal. > At the moment, (with the exception of word count), there are very few > examples of bigdata problems that have been solved by a variety of different > technologies. And, even with wordcount, there arent alot of templates which > can be customized for applications. > Comparatively: Other application developer communities (i.e.the Rails folks, > those using maven archetypes, etc.. ) have a plethora of template > applications which can be used to kickstart their applications and use cases. > > This big pet store JIRA thus aims to do the following: > 0) Curate a single, central, standard input data set . (modified: generating > a large input data set on the fly). > 1) Define a big data processing pipeline (using the pet store theme - except > morphing it to be analytics rather than transaction oriented), and implement > basic aggregations in hive, pig, etc... > 2) Sink the results of 2 into some kind of NoSQL store or search engine. > > Some implementation details -- open to change these, please comment/review -- > . > - initial data source will be raw text or (better yet) some kind of > automatically generated data. > - the source will initially go in bigtop/blueprints > - the application sources can be in any modern JVM language > (java,scala,groovy,clojure), since bigtop supports scala, java, groovy > natively already and clojure is easy to support with the right jars. > - each "job" will be named according to the corresponding DAG of the big data > pipeline . > - all jobs should (not sure if requirement?) be controlled by a global > program (maybe oozie?) which runs the tasks in order, and can easily be > customized to use different tools at different stages. > - for now, all outputs will be to files: so that users don't require servers > to run the app. > - final data sinks will be into a highly available transaction oriented store > (solr/hbase/...) > This ticket will be completed once a first iteration of BigPetStore is > complete using 3 ecosystem components, along with a depiction of the pipeline > which can be used for development. > I've assigned this to myself :) I hope thats okay? Seems like at the moment > im the only one working on it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)