HI Chris,

Great proposal.

Looks like the people from CMU are excluded from list of initial committers?
They are mentioned in the affiliations section but not in the
committers section.


- Henry

On Sun, Mar 22, 2015 at 10:55 PM, Mattmann, Chris A (3980)
<chris.a.mattm...@jpl.nasa.gov> wrote:
> Hi Everyone,
>
> I am pleased to submit for consideration to the Apache Incubator
> the Climate Model Diagnostic Analyzer proposal. We are actively
> soliciting interested mentors in this project related to climate
> science and analytics and big data.
>
> Please find the wiki text of the proposal below and the link up
> on the wiki here:
>
> https://wiki.apache.org/incubator/ClimateModelDiagnosticAnalyzerProposal
>
> Thank you for your consideration!
>
> Cheers,
> Chris
> (on behalf of the Climate Model Diagnostic Analyzer community)
>
> = Apache ClimateModelDiagnosticAnalyzer Proposal =
>
> == Abstract ==
>
> The Climate Model Diagnostic Analyzer (CMDA) provides web services for
> multi-aspect physics-based and phenomenon-oriented climate model
> performance evaluation and diagnosis through the comprehensive and
> synergistic use of multiple observational data, reanalysis data, and model
> outputs.
>
> == Proposal ==
>
> The proposed web-based tools let users display, analyze, and download
> earth science data interactively. These tools help scientists quickly
> examine data to identify specific features, e.g., trends, geographical
> distributions, etc., and determine whether a further study is needed. All
> of the tools are designed and implemented to be general so that data from
> models, observation, and reanalysis are processed and displayed in a
> unified way to facilitate fair comparisons. The services prepare and
> display data as a colored map or an X-Y plot and allow users to download
> the analyzed data. Basic visual capabilities include 1) displaying
> two-dimensional variable as a map, zonal mean, and time series 2)
> displaying three-dimensional variable’s zonal mean, a two-dimensional
> slice at a specific altitude, and a vertical profile. General analysis can
> be done using the difference, scatter plot, and conditional sampling
> services. All the tools support display options for using linear or
> logarithmic scales and allow users to specify a temporal range and months
> in a year. The source/input datasets for these tools are CMIP5 model
> outputs, Obs4MIP observational datasets, and ECMWF reanalysis datasets.
> They are stored on the server and are selectable by a user through the web
> services.
>
> === Service descriptions ===
>
> 1. '''Two dimensional variable services'''
>
> * Map of two-dimensional variable:  This services displays a two
> dimensional variable as a colored longitude and latitude map with values
> represented by a color scheme. Longitude and latitude ranges can be
> specified to magnify a specific region.
>
> * Two dimensional variable zonal mean:  This service plots the zonal mean
> value of a two-dimensional variable as a function of the latitude in terms
> of an X-Y plot.
>
> * Two dimensional variable time series:  This service displays the average
> of a two-dimensional variable over the specific region as function of time
> as an X-Y plot.
>
> 2. '''Three dimensional variable services'''
>
> * Map of a two dimensional slice of a three-dimensional variable:  This
> service displays a two-dimensional slice of a three-dimensional variable
> at a specific altitude as a colored longitude and latitude map with values
> represented by a color scheme.
>
> * Three dimensional zonal mean:  Zonal mean of the specified
> three-dimensional variable is computed and displayed as a colored
> altitude-latitude map.
>
> * Vertical profile of a three-dimensional variable:  Compute the area
> weighted average of a three-dimensional variable over the specified region
> and display the average as function of pressure level (altitude) as an X-Y
> plot.
>
> 3. '''General services'''
>
> * Difference of two variables:  This service displays the differences
> between the two variables, which can be either a two dimensional variable
> or a slice of a three-dimensional variable at a specified altitude as
> colored longitude and latitude maps
>
> * Scatter and histogram plots of two variables:  This service displays the
> scatter plot (X-Y plot) between two specified variables and the histograms
> of the two variables. The number of samples can be specified and the
> correlation is computed. The two variables can be either a two-dimensional
> variable or a slice of a three-dimensional variable at a specific altitude.
>
> * Conditional sampling:  This service lets user to sort a physical
> quantity of two or dimensions according to the values of another variable
> (environmental condition, e.g. SST) which may be a two-dimensional
> variable or a slice of a three-dimensional variable at a specific
> altitude. For a two dimensional quantity, the plot is displayed an X-Y
> plot, and for a two-dimensional quantity, plot is displayed as a
> colored-map.
>
>
> == Background and Rationale ==
>
> The latest Intergovernmental Panel on Climate Change (IPCC) Fourth
> Assessment Report stressed the need for the comprehensive and innovative
> evaluation of climate models with newly available global observations. The
> traditional approach to climate model evaluation, which is the comparison
> of a single parameter at a time, identifies symptomatic model biases and
> errors but fails to diagnose the model problems. The model diagnosis
> process requires physics-based multi-variable comparisons, which typically
> involve large-volume and heterogeneous datasets, and computationally
> demanding and data-intensive operations. We propose to develop a
> computationally efficient information system to enable the physics-based
> multi-variable model performance evaluations and diagnoses through the
> comprehensive and synergistic use of multiple observational data,
> reanalysis data, and model outputs.
>
> Satellite observations have been widely used in model-data
> inter-comparisons and model evaluation studies. These studies normally
> involve the comparison of a single parameter at a time using a time and
> space average. For example, modeling cloud-related processes in global
> climate models requires cloud parameterizations that provide quantitative
> rules for expressing the location, frequency of occurrence, and intensity
> of the clouds in terms of multiple large-scale model-resolved parameters
> such as temperature, pressure, humidity, and wind. One can evaluate the
> performance of the cloud parameterization by comparing the cloud water
> content with satellite data and can identify symptomatic model biases or
> errors. However, in order to understand the cause of the biases and
> errors, one has to simultaneously investigate several parameters that are
> integrated in the cloud parameterization.
>
> Such studies, aimed at a multi-parameter model diagnosis, require
> locating, understanding, and manipulating multi-source observation
> datasets, model outputs, and (re)analysis outputs that are physically
> distributed, massive in volume, heterogeneous in format, and provide
> little information on data quality and production legacy. Additionally,
> these studies involve various data preparation and processing steps that
> can easily become computationally demanding since many datasets have to be
> combined and processed simultaneously. It is notorious that scientists
> spend more than 60% of their research time on just preparing the dataset
> before it can be analyzed for their research.
>
> To address these challenges, we propose to build Climate Model Diagnostic
> Analyzer (CMDA) that will enable a streamlined and structured preparation
> of multiple large-volume and heterogeneous datasets, and provide a
> computationally efficient approach to processing the datasets for model
> diagnosis. We will leverage the existing information technologies and
> scientific tools that we developed in our current NASA ROSES COUND, MAP,
> and AIST projects. We will utilize the open-source Web-service technology.
> We will make CMDA complementary to other climate model analysis tools
> currently available to the research community (e.g., PCMDI’s CDAT and
> NCAR’s CCMVal) by focusing on the missing capabilities such as conditional
> sampling, and probability distribution function and cluster analysis of
> multiple-instrument datasets. The users will be able to use a web browser
> to interface with CMDA.
>
> == Current Status ==
>
> The current version of ClimateModelDiagnosticAnalyzer was developed by a
> team at The Jet Propulsion Laboratory (JPL). The project was initiated as
> a NASA-sponsored project (ROSES-CMAC) in 2011.
>
> == Meritocracy ==
>
> The current developers are not familiar with meritocratic open source
> development at Apache, but would like to encourage this style of
> development for the project.
>
> == Community ==
>
> While ClimateModelDiagnosticAnalyzer started as a JPL research project, it
> has been used in The 2014 Caltech Summer School sponsored by the JPL
> Center for Climate Sciences. Some 23 students from different institutions
> over the world participated. We deployed the tool to the Amazon Cloud and
> let every student each has his or her own virtual machine. Students gave
> positive feedback mostly on the usability and speed of our web services.
> We also collected a number of enhancement requests. We seek to further
> grow the developer and user communities using the Apache open source
> venue. During incubation we will explicitly seek increased academic
> collaborations (e.g., with The Carnegie Mellon University) as well as
> industrial participation.
>
> One instance of our web services can be found at:
> http://cmacws.jpl.nasa.gov:8080/cmac/
>
> == Core Developers ==
>
> The core developers of the project are JPL scientists and software
> developers.
>
> == Alignment ==
>
> Apache is the most natural home for taking the
> ClimateModelDiagnosticAnalyzer project forward. It is well-aligned with
> some Apache projects such as Apache Open Climate Workbench.
> ClimateModelDiagnosticAnalyzer also seeks to achieve an Apache-style
> development model; it is seeking a broader community of contributors and
> users in order to achieve its full potential and value to the Climate
> Science and Big Data community.
>
> There are also a number of dependencies that will be mentioned below in
> the Relationships with Other Apache products section.
>
>
> == Known Risks ==
>
> === Orphaned products ===
>
> Given the current level of intellectual investment in
> ClimateModelDiagnosticAnalyzer, the risk of the project being abandoned is
> very small. The Carnegie Mellon University and JPL are collaborating
> (2014-2015) to build a service for climate analytics workflow
> recommendation using fund from NASA. A two-year NASA AIST project
> (2015-2016) will soon start to add diagnostic analysis methodologies such
> as conditional sampling method, conditional probability density function,
> data co-location, and random forest. We will also infuse the provenance
> technology into CMDA so that the history of the data products and
> workflows will be automatically collected and saved. This information will
> also be indexed so that the products and workflows can be searchable by
> the community of climate scientists and students.
>
> === Inexperience with Open Source ===
>
> The current developers of ClimateModelDiagnosticAnalyzer are inexperienced
> with Open Source. However, our Champion Chris Mattmann is experienced
> (Champions of ApacheOpenClimateWorkbench and AsterixDB) and will be
> working closely with us, also as the Chief Architect of our JPL section.
>
> === Relationships with Other Apache Products ===
>
> Clearly there is a direct relationship between this project and the Apache
> Open Climate Workbench already a top level Apache project and also brought
> to the ASF by its Champion (and ours) Chris Mattmann. We plan on directly
> collaborating with the Open Climate Workbench community via our Champion
> and we also welcome ASF mentors familiar with the OCW project to help
> mentor our project. In addition our team is extremely welcoming of ASF
> projects and if there are synergies with them we invite participation in
> the proposal and in the discussion.
>
> === Homogeneous Developers ===
>
> The current community is within JPL but we would like to increase the
> heterogeneity.
>
> === Reliance on Salaried Developers ===
>
> The initial committers are full-time JPL staff from 2013 to 2014. The
> other committers from 2014 to 2015 are a mix of CMU faculty, students and
> JPL staff.
>
> === An Excessive Fascination with the Apache Brand ===
>
> We believe in the processes, systems, and framework Apache has put in
> place. Apache is also known to foster a great community around their
> projects and provide exposure. While brand is important, our fascination
> with it is not excessive. We believe that the ASF is the right home for
> ClimateModelDiagnosticAnalyzer and that having
> ClimateModelDiagnosticAnalyzer inside of the ASF will lead to a better
> long-term outcome for the Climate Science and Big Data community.
>
> === Documentation ===
>
> The ClimateModelDiagnosticAnalyzer services and documentation can be found
> at: http://cmacws.jpl.nasa.gov:8080/cmac/.
>
> === Initial Source ===
>
> Current source resides in ...
>
> === External Dependencies ===
>
> ClimateModelDiagnosticAnalyzer depends on a number of open source projects:
>
>  * Flask
>  * Gunicorn
>  * Tornado Web Server
>  * GNU octave
>  * epd python
>  * NOAA ferret
>  * GNU plot
>
> == Required Resources ==
>
> === Developer and user mailing lists ===
>
>  * priv...@cmda.incubator.apache.org (with moderated subscriptions)
>  * comm...@cmda.incubator.apache.org
>  * d...@cmda.incubator.apache.org
>  * us...@cmda.incubator.apache.org
>
> A git repository
>
> https://git-wip-us.apache.org/repos/asf/incubator-cmda.git
>
> A JIRA issue tracker
>
> https://issues.apache.org/jira/browse/CMDA
>
> === Initial Committers ===
>
> The following is a list of the planned initial Apache committers (the
> active subset of the committers for the current repository at Google code).
>
>  * Seungwon Lee (seungwon....@jpl.nasa.gov)
>  * Lei Pan (lei....@jpl.nasa.gov)
>  * Chengxing Zhai (chengxing.z...@jpl.nasa.gov)
>  * Benyang Tang (benyang.t...@jpl.nasa.gov)
>
>
> === Affiliations ===
>
> JPL
>
>  * Seungwon Lee
>  * Lei Pan
>  * Chengxing Zhai
>  * Benyang Tang
>
> CMU
>
>  * Jia Zhang
>  * Wei Wang
>  * Chris Lee
>  * Xing Wei
>
> == Sponsors ==
>
> NASA
>
> === Champion ===
>
> Chris Mattmann (NASA/JPL)
>
> === Nominated Mentors ===
>
> TBD
>
> === Sponsoring Entity ===
>
> The Apache Incubator
>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to