[jira] [Commented] (BIGTOP-1089) BigPetStore: A polyglot big data processing blueprint inside of bigtop for comparing and learning about the tools in the bigtop packaged hadoop ecosystem.

jay vyas (JIRA) Sat, 15 Feb 2014 19:18:32 -0800

    [ 
https://issues.apache.org/jira/browse/BIGTOP-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902623#comment-13902623
 ]


jay vyas commented on BIGTOP-1089:
----------------------------------

Great news on this folks ! We finally have a stable, production quality 
codebase, with profiles for each ecosystem tool and preliiminary testing in a 
real hadoop cluster. 

- First phase of testing (generation of transactions) works, bigpetstore now 
works in bigtop-deploy/vm/vagrant-puppet based VMs.

- We also now have maven profiles for pig, hive and crunch.  You can build and 
run any ecosystem ETL using those profiles.  

So, once I finish testing the whole pipeline in psuedo distributed mode, ill be 
crafting the first official bigpetstore patch !

Note: It kinda overloads VMs because it creates many tasks (one per state), by 
nature of the custom generating input format.   

{noformat}
14/02/16 02:37:53 INFO mapreduce.JobSubmitter: number of splits:7
14/02/16 02:37:53 WARN conf.Configuration: mapred.jar is deprecated. Instead, 
use mapreduce.job.jar
14/02/16 02:37:53 WARN conf.Configuration: mapred.output.value.class is 
deprecated. Instead, use mapreduce.job.output.value.class
14/02/16 02:37:53 WARN conf.Configuration: mapred.mapoutput.value.class is 
deprecated. Instead, use mapreduce.map.output.value.class
14/02/16 02:37:53 WARN conf.Configuration: mapreduce.map.class is deprecated. 
Instead, use mapreduce.job.map.class
14/02/16 02:37:53 WARN conf.Configuration: mapred.job.name is deprecated. 
Instead, use mapreduce.job.name
14/02/16 02:37:53 WARN conf.Configuration: mapreduce.inputformat.class is 
deprecated. Instead, use mapreduce.job.inputformat.class
14/02/16 02:37:53 WARN conf.Configuration: mapred.output.dir is deprecated. 
Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/16 02:37:53 WARN conf.Configuration: mapreduce.outputformat.class is 
deprecated. Instead, use mapreduce.job.outputformat.class
14/02/16 02:37:53 WARN conf.Configuration: mapred.map.tasks is deprecated. 
Instead, use mapreduce.job.maps
14/02/16 02:37:53 WARN conf.Configuration: mapred.output.key.class is 
deprecated. Instead, use mapreduce.job.output.key.class
14/02/16 02:37:53 WARN conf.Configuration: mapred.mapoutput.key.class is 
deprecated. Instead, use mapreduce.map.output.key.class
14/02/16 02:37:53 WARN conf.Configuration: mapred.working.dir is deprecated. 
Instead, use mapreduce.job.working.dir
14/02/16 02:37:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1392513928307_0005
14/02/16 02:37:58 WARN mapred.JobConf: The variable mapred.child.ulimit is no 
longer used.
14/02/16 02:37:58 INFO client.YarnClientImpl: Submitted application 
application_1392513928307_0005 to ResourceManager at 
vagrant.bigtop1/127.0.0.1:8032
14/02/16 02:37:58 INFO mapreduce.Job: The url to track the job: 
http://vagrant.bigtop1:20888/proxy/application_1392513928307_0005/
14/02/16 02:37:58 INFO mapreduce.Job: Running job: job_1392513928307_0005
14/02/16 02:38:07 INFO mapreduce.Job: Job job_1392513928307_0005 running in 
uber mode : false
14/02/16 02:38:07 INFO mapreduce.Job:  map 0% reduce 0%
14/02/16 02:38:35 INFO mapreduce.Job:  map 14% reduce 0%
14/02/16 02:38:44 INFO mapreduce.Job:  map 29% reduce 0%
14/02/16 02:38:45 INFO mapreduce.Job: Task Id : 
attempt_1392513928307_0005_m_000001_0, Status : FAILED

Killed by external signal

14/02/16 02:38:54 INFO mapreduce.Job: Task Id : 
attempt_1392513928307_0005_m_000004_0, Status : FAILED

Killed by external signal

14/02/16 02:38:55 INFO mapreduce.Job:  map 57% reduce 0%
14/02/16 02:39:13 INFO mapreduce.Job:  map 71% reduce 0%
14/02/16 02:39:22 INFO mapreduce.Job:  map 71% reduce 1%
14/02/16 02:39:23 INFO mapreduce.Job:  map 86% reduce 2%
14/02/16 02:39:26 INFO mapreduce.Job:  map 86% reduce 3%
14/02/16 02:39:31 INFO mapreduce.Job:  map 86% reduce 2%
14/02/16 02:40:27 INFO mapreduce.Job:  map 100% reduce 2%
14/02/16 02:40:28 INFO mapreduce.Job:  map 100% reduce 5%
14/02/16 02:40:29 INFO mapreduce.Job:  map 100% reduce 9%
14/02/16 02:40:30 INFO mapreduce.Job:  map 100% reduce 10%
14/02/16 02:40:32 INFO mapreduce.Job:  map 100% reduce 14%
14/02/16 02:40:33 INFO mapreduce.Job:  map 100% reduce 17%
14/02/16 02:40:57 INFO mapreduce.Job:  map 100% reduce 27%
14/02/16 02:40:58 INFO mapreduce.Job:  map 100% reduce 30%
14/02/16 02:40:59 INFO mapreduce.Job:  map 100% reduce 37%
14/02/16 02:41:26 INFO mapreduce.Job:  map 100% reduce 47%
14/02/16 02:41:27 INFO mapreduce.Job:  map 100% reduce 57%
14/02/16 02:41:53 INFO mapreduce.Job:  map 100% reduce 67%
14/02/16 02:41:54 INFO mapreduce.Job:  map 100% reduce 70%
14/02/16 02:41:55 INFO mapreduce.Job:  map 100% reduce 77%
14/02/16 02:42:18 INFO mapreduce.Job:  map 100% reduce 80%
14/02/16 02:42:21 INFO mapreduce.Job:  map 100% reduce 90%
14/02/16 02:42:22 INFO mapreduce.Job:  map 100% reduce 93%
14/02/16 02:42:23 INFO mapreduce.Job:  map 100% reduce 97%
14/02/16 02:42:26 INFO mapreduce.Job:  map 100% reduce 100%
14/02/16 02:42:26 INFO mapreduce.Job: Job job_1392513928307_0005 completed 
successfully
14/02/16 02:42:26 WARN mapred.JobConf: The variable mapred.child.ulimit is no 
longer used.
14/02/16 02:42:26 INFO mapreduce.Job: Counters: 45
        File System Counters
                FILE: Number of bytes read=1067
                FILE: Number of bytes written=2755986
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=497
                HDFS: Number of bytes written=867
                HDFS: Number of read operations=104
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=60
        Job Counters 
                Failed map tasks=2
                Killed reduce tasks=11
                Launched map tasks=9
                Launched reduce tasks=41
                Other local map tasks=9
                Total time spent by all maps in occupied slots (ms)=308583
                Total time spent by all reduces in occupied slots (ms)=1013311
        Map-Reduce Framework
                Map input records=10
                Map output records=10
                Map output bytes=867
                Map output materialized bytes=2147
                Input split bytes=497
                Combine input records=0
                Combine output records=0
                Reduce input groups=10
                Reduce shuffle bytes=2147
                Reduce input records=10
                Reduce output records=10
                Spilled Records=20
                Shuffled Maps =210
                Failed Shuffles=0
                Merged Map outputs=210
                GC time elapsed (ms)=9317
                CPU time spent (ms)=21960
                Physical memory (bytes) snapshot=4822437888
                Virtual memory (bytes) snapshot=59736100864
                Total committed heap usage (bytes)=2466344960
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=867
[root@vagrant vagrant]# hadoop fs -cat /tmp/bps2/*
BigPetStore,storeCode_CO,1      heidi,o'neill,Sun Dec 28 01:54:42 UTC 
1969,15.1,choke-collar
BigPetStore,storeCode_CT,1      shawn,cantrell,Sat Jan 24 05:08:29 UTC 
1970,19.1,fuzzy-collar
BigPetStore,storeCode_OK,1      herbert,dejesus,Fri Jan 16 08:14:57 UTC 
1970,10.5,dog-food
BigPetStore,storeCode_AZ,1      walter,richardson,Wed Dec 31 19:45:21 UTC 
1969,10.5,dog-food
BigPetStore,storeCode_CA,1      natasha,caldwell,Thu Dec 18 04:46:14 UTC 
1969,11.75,fish-food
BigPetStore,storeCode_CA,2      natasha,caldwell,Sat Jan 17 00:50:34 UTC 
1970,7.5,cat-food
BigPetStore,storeCode_CA,3      natasha,caldwell,Sun Jan 25 19:31:17 UTC 
1970,11.75,fish-food
BigPetStore,storeCode_NY,1      margaret,sims,Wed Jan 21 03:56:34 UTC 
1970,10.5,dog-food
BigPetStore,storeCode_NY,2      margaret,sims,Sun Dec 28 06:44:04 UTC 
1969,19.75,fish-food
BigPetStore,storeCode_AK,1      sharon,vargas,Thu Jan 22 15:46:47 UTC 
1970,19.1,fuzzy-collar
[root@vagrant vagrant]# hadoop fs -cat /tmp/bps3/*
BigPetStore,storeCode_CO,1      shawn,cantrell,Sat Jan 24 05:08:29 UTC 
1970,10.5,dog-food
BigPetStore,storeCode_CT,1      clarence,robles,Wed Jan 21 18:14:05 UTC 
1970,10.5,dog-food
BigPetStore,storeCode_OK,1      tia,mckee,Tue Jan 06 18:35:34 UTC 
1970,5.1,hay-bail
BigPetStore,storeCode_AZ,1      judy,drake,Mon Dec 29 04:55:38 UTC 
1969,30.1,snake-bite ointment
BigPetStore,storeCode_CA,1      darrell,watkins,Mon Dec 08 15:04:55 UTC 
1969,11.75,fish-food
BigPetStore,storeCode_CA,2      mickey,garrison,Sat Jan 17 20:53:21 UTC 
1970,11.75,fish-food
BigPetStore,storeCode_CA,3      mickey,garrison,Fri Jan 23 14:59:35 UTC 
1970,7.5,cat-food
BigPetStore,storeCode_NY,1      clarence,robles,Wed Jan 21 18:14:05 UTC 
1970,20.1,steel-leash
BigPetStore,storeCode_NY,2      valerie,wise,Sun Jan 04 03:11:53 UTC 
1970,20.1,steel-leash
BigPetStore,storeCode_AK,1      lindsey,mcneil,Fri Jan 16 13:43:11 UTC 
1970,19.1,fuzzy-collar

{noformat}

> BigPetStore: A polyglot big data processing blueprint inside of bigtop for 
> comparing and learning about the tools in the bigtop packaged hadoop 
> ecosystem.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BIGTOP-1089
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1089
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Blueprints
>            Reporter: jay vyas
>            Assignee: jay vyas
>
> The need for templates for processing big data pipelines is obvious - and 
> also - given the increasing amount of overlap across different big data and 
> nosql projects, it will provide a ground truth in the future for comparing 
> the behaviour and approach of different tools to solve a common, easily 
> comprehended problem. 
> This ticket formalizes the conversation in mailing list archives regarding 
> the BigPetStore proposal. 
> At the moment, (with the exception of word count), there are very few 
> examples of bigdata problems that have been solved by a variety of different 
> technologies.  And, even with wordcount, there arent alot of templates which 
> can be customized for applications. 
> Comparatively: Other application developer communities (i.e.the Rails folks, 
> those using maven archetypes, etc.. ) have a plethora of template 
> applications which can be used to kickstart their applications and use cases. 
>   
> This big pet store JIRA thus aims to do the following: 
> 0) Curate a single, central, standard input data set . (modified: generating 
> a large input data set on the fly).
> 1) Define a big data processing pipeline (using the pet store theme - except 
> morphing it to be analytics rather than transaction oriented), and implement 
> basic aggregations in hive, pig, etc...
> 2) Sink the results of 2 into some kind of NoSQL store or search engine.
>  
> Some implementation details -- open to change these, please comment/review -- 
> .
> - initial data source will be raw text or (better yet) some kind of 
> automatically generated data.
> - the source will initially go in bigtop/blueprints
> - the application sources can be in any modern JVM language 
> (java,scala,groovy,clojure), since bigtop supports scala, java, groovy 
> natively already and clojure is easy to support with the right jars.  
> - each "job" will be named according to the corresponding DAG of the big data 
> pipeline . 
> - all jobs should (not sure if requirement?) be controlled by a global 
> program (maybe oozie?) which runs the tasks in order, and can easily be 
> customized to use different tools at different stages. 
> - for now, all outputs will be to files: so that users don't require servers 
> to run the app. 
> - final data sinks will be into a highly available transaction oriented store 
> (solr/hbase/...)
> This ticket will be completed once a first iteration of BigPetStore is 
> complete using 3 ecosystem components, along with a depiction of the pipeline 
> which can be used for development.
> I've assigned this to myself :) I hope thats okay? Seems like at the moment 
> im the only one working on it. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (BIGTOP-1089) BigPetStore: A polyglot big data processing blueprint inside of bigtop for comparing and learning about the tools in the bigtop packaged hadoop ecosystem.

Reply via email to