Re: [RESULT] WAS Re: [VOTE] Accept Science Data Analytics Platform (SDAP) into Apache Incubator WAS Re: [DISCUSS] Accept Science Data Analytics Platform (SDAP) into Apache Incubator

Raphael Bircher Fri, 20 Oct 2017 16:28:08 -0700

Hi all

well... let's start then. But I have no Idea how ;-) The docu is a "bit"outdate. Maybe someone can tell me how it works, and what is still donemanual and what is done within whimsy.


Regards, Raphael

Am .10.2017, 23:21 Uhr, schrieb lewis john mcgibbney <lewi...@apache.org>:

Hi general@,
72 hours has now elapsed since this VOTE was opened...
The RESULT is as follows
[6] +1 Accept Science Data Analytics Platform (SDAP) into ApacheIncubator
Lewis John McGibbney*
Madhawa Kasun Gunasekara
Tom Barber*
Julian Hyde*
Chris Mattmann*
Raphael Bircher*

[0] +/-0 ... just because
[0] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache
Incubator... because

* IPMC Binding
I would like to thank everyone from the Incubator community who was abletoreview, DISCUSS and VOTE. Also like to thank Thomas Huang for hispatience
and vision to bring SDAP to Apache.
Best
Lewis
On Tue, Oct 17, 2017 at 2:04 PM, lewis john mcgibbney<lewi...@apache.org>
wrote:
Hi Folks,
Having secured a mentorship team consisting of the following IPMCMembers,
I am happy to open a formal VOTE thread on accepting the Science Data
Analytics Platform (SDAP) into Apache Incubator.

   - Lewis John McGibbney (lewi...@apache.org)
   - Raphael Bircher (bircher at apace dot org)
   - Suneel Marthi (smarthi at apache dot org)

Thank you to both Raphael and Suneel for coming forward. :)
The VOTE will be open for at least 72 hours.
[ ] +1 Accept Science Data Analytics Platform (SDAP) into ApacheIncubator
[ ] +/-0 ... just because
[ ] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache
Incubator... because

Thanks in advance to all participants.
Lewis

P.S. Here is a binding +1 from me
On Wed, Oct 11, 2017 at 11:22 AM, lewis john mcgibbney<lewi...@apache.org
> wrote:
Hi Folks,
I would like to open a DISCUSS thread on the topic of accepting the
Science Data Analytics Platform (SDAP) <https://wiki.apache.org/incub
ator/SDAPProposal> Project into the Incubator.
I am CC'ing Thomas Huang from NASA JPL who I have been working with to
build community around a kick-ass set of software projects under theSDAP
umbrella.
At this stage we would very much appreciate critical feedback from
general@ community. We are also open to mentors who may have aninterest
in the project proposal.
The proposal is pasted below.
Thanks in advance,
Lewis

= Abstract =
The Science Data Analytics Platform (SDAP) establishes an integrateddata
analytic center for Big Science problems. It focuses on technology
integration, advancement and maturity.

= Proposal =
SDAP currently represents a collaboration between NASA Jet Propulsion
Laboratory (JPL), Florida State University (FSU), the National CenterforAtmospheric Research (NCAR), and George Mason University (GMU). SDAPbrings
together a number of big data technologies including a NASA funded
OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data
analytic platform), DOMS (Distributed in-situ to satellite matchup),MUDROD(Search relevancy and discovery) and VQSS (Virtualized QualityScreeningService) under a single umbrella. Within the original Incubatorproposal,VQSS will not be included however it is anticipated that a futuresource
code donation will cover VQSS.

= Background and Rationale =
SDAP is a technology software solution currently geared to betterenable
scientists involved in advancing the study of the Earth's physical
oceanography. With increasing global temperature, warming of theocean, andmelting ice sheets and glaciers, the impacts can be observed fromchanges
in anomalous ocean temperature and circulation patterns, to increasing
extreme weather events and stronger/more frequent hurricanes, sea level
rise and storm surges affecting coastlines, and may involve drasticchangesand shifts in marine ecosystems. Ocean science communities are relyingon
data distributed through data centers such as the JPL's Physical
Oceanographic Data Active Archive Center (PO.DAAC) to conduct their
research. In typical investigations, oceanographers follow atraditionalworkflow for using datasets: search, evaluate, download, and applytools
and algorithms to look for trends. While this workflow has been working
very well historically for the oceanographic community, it cannotscale if
the research involves massive amount of data. NASA's Surface Water and
Ocean Topography (SWOT) mission, scheduled to launch in April of 2021,isexpected to generate over 20PB data for a nominal 3-year mission. Thiswill
challenge all existing NASA Earth Science data archival/distribution
paradigms. It will no longer be feasible for Earth scientists todownloadand analyze such volumes of data. SDAP was therefore developedprimarily asa Web-service platform for big ocean data science at the PO.DAAC withopensource solutions used to enable fast analysis of oceanographic data.SDAPhas been developed collaboratively between JPL, FSU, NCAR, and GMU andisrapidly maturing to become the generic platform for the nextgeneration of
big science data solutions. The platform is an orchestration of several
previously funded NASA big ocean data solutions using cloud technology,
which include data analysis (NEXUS), anomaly detection (OceanXtremes),
matchup (DOMS), subsetting, discovery (MUDROD), and visualization(VQSS).
SDAP will enable web-accessible, fast data analysis directly on huge
scientific data archives to minimize data movement and provide access,
including subset, only to the relevant data.

= Science Data Analytics Platform Project Overview =
SDAP consists of several loosely coupled, independently functioning
sub-projects. The graphic below displays an overview of how these
sub-projects fuse together. N.B., although the graphic uses terminology
relating to OceanWorks, essentially the SDAP architecture is identical.

{{attachment:sdap.png}}

== OceanXtremes ==
Oceanographic Data-Intensive Anomaly Detection and Analysis Portal. An
application that allows you to view imagery and perform analysis on sea
level rise data.

'''Objective'''
Develop an anomaly detection system which identifies items, events or
observations which do not conform to an expected pattern.
 * Mature and test domain-specific, multi-scale anomaly and feature
detection algorithms.
 * Identify unexpected correlations between key measured variables.

Demonstrate value of technologies in this service:
 * Adapted Map-Reduce data mining.
 * Algorithm profiling service.
 * Shared discovery and exploration search tools.
 * Automatic notification of events of interest.

== NEXUS ==
NEXUS is an emerging technology developed at JPL
 * A Cloud-based/Cluster-based data platform that performs scalable
handling of observational parameters analysis designed to scalehorizontally
 * Leveraging high-performance indexed, temporal, and geospatial search
solution
 * Breaks data products into small chunks and stores them in a
Cloud-based data store

''Data Volumes Exploding''
 * SWOT mission is coming
 * File I/O is slow

''Scalable Store & Compute is Available''
 * NoSQL cluster databases
 * Parallel compute, in-memory map-reduce
 * Bring Compute to Highly-Accessible Data (using Hybrid Cloud)

''Pre-Chunk and Summarize Key Variables''
 * Easy statistics instantly (milliseconds)
 * Harder statistics on-demand (in seconds)
 * Visualize original data (layers) on a map quickly

== DOMS ==
The Distributed Oceanographic Match-Up Service
DOMS is designed to reconcile satellite and in situ datasets in support
of NASA's Earth Science mission. The service will provide a mechanismforusers to input a series of geospatial references for satelliteobservationsand receive the in situ observations that are matched to the satellitedata
within a selectable temporal and spatial domain. DOMS includes several
characteristic in situ and satellite observation datasets - with aninitial
focus on salinity, sea temperature, and winds. DOMS will be used by the
marine and satellite research communities to support a range ofactivitiesand several use cases will be described. The service is designed toprovide
a community-accessible tool that dynamically delivers matched data and
allows the scientist to only work with the subset of data where thematches
exist.

== MUDROD ==
Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to
Improve Data Discovery and Access
Data discovery accuracy is a challenging topic for both Earth scienceandother domains. It is especially true for scientific data sets that arenotas popular as Amazon or Google data. MUDROD is focused on miningoceanic
knowledge from the PO.DAAC user log files to improve the end user data
discovery experience at PO.DAAC. There are three steps in theresearch: a)the oceanographic semantics were extracted from three resources ofSWEET,
GCMD ontology, and the keywords used by end users for searching PO.DAAC
datasets, b) mining the linkage among different vocabularies based onuserdata discvoery sessions, and c) build the linkage among vocabulariesbasedon a comprehensive approach by considering domain de facto standard,e.g.,SWEET and GCMD, and the knowledge mined from the log files. Thesemantics
is used to improve data discovery for ranking results, navigating among
vocabularies, and recommending data based on user searchers.

= Current Status =
All components of SDAP were originally designed and developed under
grants from the NASA-funded Advanced Information Systems andTechnologies(AIST) program. The initiative to bring them the components togetherunderthe SDAP umbrella was granted through an AIST-funded follow-on grantwhich
will run for another ~18 or so months.
Currently no projects have made official releases so outside ofcommunitybuilding, this will be our primary Incubating goal. All SDAP sourcecode is
currently publicly available and licensed under the ALv2.0.

= Meritocracy =
The current developers are familiar with meritocratic open source
development at Apache. The SDAP team consumes Apache products heavilywith
members being part of several Apache user communities. SDAP itself has
critical dependencies upon Apache products. Lewis McGibbney (JPLemployee),a Member of the ASF and V.P. of Apache Any23, Gora PMC Nutch, Tika,OODT,
OCW, etc., is championing the effort to bring SDAP into and through the
Apache Incubator and has been evangelizing the Apache Way to thecurrentSDAP contributors such that the meritocratic process is wellunderstood andfollowed. Apache was chosen specifically because we want to encouragethisstyle of community development for the project and for it to sustainSDAP
forward to become the generic platform for the next generation of big
science data solutions

= Community =
The SDAP project is a fairly new effort and our community is not yet
fully/firmly established. Initial committers comprising the SDAP roster
have only recently fully come together as a unified team however thereis alarge degree of synergy between constituent members at JPL, FSU, NCAR,andGMU. Therefore, community building and publicity continues to be amajor
thrust. With the activity and exposure regularly attained by several
community members, we hope to grow the SDAP presence in and acrossseveral
(scientific) forums. The SDAP technology is generating interest within
communities such as the Earth Science Information Partnership (ESIP),
American Geophysical Union (AGU) and plethora or science meetingsaround
the globe. This in effect, we hope, will further contribute towards the
possibility of SDAP being used across Government Agencies such as NASA,
NOAA, USGS, EPA, DOI, etc. as well as by researchers and students in
academic institutions around the globe.
During incubation, we will explicitly seek to increase our adoption,withSDAP already being featured on the agenda for several high profileglobally
significant scientific conferences and meetings.

= Core Developers =
The current set of core developers is relatively small, including
full-time and students from across JPL, FSU, NCAR, and GMU. Initial
community management and participation will be distributed across the
entire team, most of which have been involved with the constituentprojects
for <2 years.

= Alignment =
All SDAP code is licensed under Apache v2.0.

= Known Risks =

== Orphaned products ==
There are currently no orphaned products. Each component of SDAP has
dedicated personnel leading and participating in its ongoingdevelopment.
Additionally, there is substantial collaboration between projects
facilitated by regular project meetings which are specific the theinitialmember entities and focused on advancing physical oceanographicscience.
== Inexperience with Open Source ==
JPL (in particular Lewis McGibbney) has been part of several efforts to
transition to and grow projects communities at Apache e.g. Apache OODT,
Apache Open Climate Workbench, Apache Joshua (Incubating), ApacheSensSoft(Incubating), Apache DRAT (Incubating). Most of the code developedunderthe SDAP umbrella was and is open source prior to the Incubator effortso
we are well familiarized with the nuances of open source software.

= Relationships with Other Apache Products =
SDAP has strong dependency upon a number of high profile and smaller
profile Apache products. Examples can be seen in the breakdown ofExternal
Dependencies. As we continue to grow SDAP within the Incubator, we will
make efforts to share community stories, software advancements andpossibleimprovements in our use of our Apache dependencies back to thoseproject
communities.

= Developers =
The SDAP project and hence developers is currently funded through aNASAAIST follow-on grant with funding secured for the next ~18 months.There
are currently no 100% time dedicated developers, however, the same core
team that does work currently will continue to work on the project
throughout the next current funding period and after. There iscurrently nobusiness strategy aligned with SDAP however it is perceived thatfuture,yet unsecured funding may by directed to further feature advancementand
project evangelism.

= Documentation =
Documentation is currently available in a number of locations e.g.Github
wiki, Github pages, etc. with each repository under the oceanworks-aist
Github Org maintaining documentation available through wiki’s attachedto
the repositories. Additionally, most of the SDAP sub-projects have been
extensively documented within plethora of formal academic publications
across several academic communities. It would be our intention,certainlyatleast to unify the Github wiki ad Github pages documentation mostlikely
to make up the sdap.apache.org Website content.

= Initial Source =
Current source resides in several locations Github:
 * https://github.com/dataplumber/nexus (NEXUS, OceanXtremes, DOMS)
 * https://github.com/dataplumber/edge (EDGE)
 * https://github.com/aist-oceanworks/mudrod (MUDROD)
 * https://bitbucket.org/coaps_mdc/doms/src (DOMS)

= External Dependencies =
Each component of the Science Data Analytics Platform has its own
dependencies. Documentation will be available for integrating them.

== MUDROD ==
'''Core'''
com.google.code.gson gson 2.5 compile
jar false
org.jdom jdom 2.0.2 compile
jar false
org.elasticsearch elasticsearch 5.2.0 compile
jar false
org.elasticsearch elasticsearch-spark-20_2.11 5.2.0 compile
jar false
joda-time joda-time 2.9.4 compile
jar false
com.carrotsearch hppc 0.7.1 compile
jar false
org.apache.spark spark-core_2.11 2.1.0 compile
jar false
org.apache.spark spark-sql_2.11 2.1.0 compile
jar false
org.apache.spark spark-mllib_2.11 2.1.0 compile
jar false
org.scala-lang scala-library 2.11.8 compile
jar false
org.codehaus.jettison jettison 1.3.8 compile
jar false
commons-cli commons-cli 1.2 compile
jar false
net.sf.opencsv opencsv 2.3 compile
jar false
org.apache.jena jena-core 3.3.0 compile
jar false
junit junit 4.12 test
jar false

'''Service'''
gov.nasa.jpl.mudrod mudrod-core 0.0.1-SNAPSHOT compile
jar false
javax.servlet javax.servlet-api 3.1.0 provided
jar false
com.google.code.gson gson 2.5 compile
jar false

'''Web'''
 * AngularJS - MIT License
 * BootstrapJS - MIT License
 * jQueryJS - MIT License
 * Underscore JS - MIT License

== DOMS ==
 * Apache Solr version 5.5.1http://lucene.apache.org/solr/
 * EDGE https://github.com/dataplumber/edge
 * NetCDF4 http://unidata.github.io/netcdf4-python/
 * Python 3.5 (NOTE: only partial support for py2.7)

Non stdlib Python dependencies:
 * Jinja2==2.9.5
 * python-dateutil==2.6.0
 * cython==0.25.2
 * numpy==1.12.0
 * scipy==0.18.1
 * netCDF4==1.2.7
 * solrpy3
 * siphon==0.4.0
 * neo4j-driver==1.1.0
 * matplotlib==2.0.0
 * requests==2.13.0
 * shapely==1.5.17
 * flask==0.12
 * networkx==1.11
 * pyproj==1.9.5.1
 * blist==1.3.6

== NEXUS ==
'''Analysis'''
 * https://github.com/dataplumber/nexus/blob/master/analysis/
package-list.txt
 * https://github.com/dataplumber/nexus/blob/master/analysis/
requirements.txt

'''Client'''
 * https://github.com/dataplumber/nexus/blob/master/client/
requirements.txt

'''Climatology'''
 * matplotlib
 * numpy
 * netCDF4
 * pathos (https://pypi.python.org/pypi/pathos)

'''Data-access'''
 * https://github.com/dataplumber/nexus/blob/master/data-
access/requirements.txt

'''Nexus-ingest'''
''Dataset-tiler''
 * https://github.com/dataplumber/nexus/tree/master/nexus-
ingest/dataset-tiler/build/reports

''developer-box''
* Just a collection of scripts/vagrant file used to stand up adeveloper
instance of nexus ingestion. No dependencies to report

''Groovy-scripts''
 * Collection of Groovy scripts that can be used as part of data
ingestion. They only rely on the standard Groovy library and the
‘nexus-messages’ project

''Nexus-messages''
 * https://github.com/dataplumber/nexus/tree/master/nexus-
ingest/nexus-messages/build/reports

''nexus-sink''
 * https://github.com/dataplumber/nexus/tree/master/nexus-
ingest/nexus-sink/build/reports

''nexus-xd-python-modules''
 * https://github.com/dataplumber/nexus/blob/master/nexus-
ingest/nexus-xd-python-modules/package-list.txt
 * https://github.com/dataplumber/nexus/blob/master/nexus-
ingest/nexus-xd-python-modules/requirements.txt

''spring-xd-python''
 * only python standard libraries are used

''tcp-shell''
 * https://github.com/dataplumber/nexus/tree/master/nexus-
ingest/tcp-shell/build/reports

'''tools/deletebyquery'''
 * https://github.com/dataplumber/nexus/blob/master/tools/
deletebyquery/requirements.txt

= Required Resources =
Mailing Lists
 * priv...@sdap.incubator.apache.org
 * d...@sdap.incubator.apache.org
 * comm...@sdap.incubator.apache.org

Git Repos
 * https://git-wip-us.apache.org/repos/asf/incubator-nexus.git
 * https://git-wip-us.apache.org/repos/asf/incubator-doms.git
 * https://git-wip-us.apache.org/repos/asf/incubator-mudrod.git

Issue Tracking
 * JIRA Science Data Analytics Platform (SDAP)

Continuous Integration
 * Jenkins builds on https://builds.apache.org/

Web
 * http://sdap.incubator.apache.org/
 * wiki at http://cwiki.apache.org

= Initial Committers =
The following is a list of the planned initial Apache committers (the
active subset of the committers for the current repository on Github).
 * Lewis John McGibbney (lewi...@apache.org)
 * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov)
 * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov)
 * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov)
 * Frank Greguska (gregu...@jpl.nasa.gov)
 * Brian Wilson (brian.wil...@jpl.nasa.gov)
 * Chaowe Phil Yang (cya...@gmu.edu)
 * Yongyao Jiang (yjia...@gmu.edu)
 * Yun Li (yl...@gmu.edu)
 * Shawn R. Smith (sm...@coaps.fsu.edu)
 * Jocelyn Elya (je...@coaps.fsu.edu)
 * Mark Bourassa (boura...@coaps.fsu.edu)
 * Thomas Cram (tc...@ucar.edu)
 * Thomas Huang (thomas.hu...@jpl.nasa.gov)
 * Steven Worley (wor...@ucar.edu)
 * Zaihua Ji (z...@ucar.edu)

= Affiliations =
NASA JPL
 * Lewis John McGibbney (lewi...@apache.org)
 * Vardis M. Tsontos (vardis.m.tson...@jpl.nasa.gov)
 * Joseph C. Jacob (joseph.c.ja...@jpl.nasa.gov)
 * Ed Armstrong (edward.m.armstr...@jpl.nasa.gov)
 * Frank Greguska (gregu...@jpl.nasa.gov)
 * Thomas Huang (thomas.hu...@jpl.nasa.gov)
 * Brian Wilson (brian.wil...@jpl.nasa.gov)

George Mason University
 * Chaowe Phil Yang (cya...@gmu.edu)
 * Yongyao Jiang (yjia...@gmu.edu)
 * Yun Li (yl...@gmu.edu)
Center for Ocean-Atmospheric Prediction Studies, Florida StateUniversity
 * Shawn R. Smith (sm...@coaps.fsu.edu)
 * Jocelyn Elya (je...@coaps.fsu.edu)
 * Mark Bourassa (boura...@coaps.fsu.edu)
Computational Information Systems Laboratory (CISL) / National Centerfor
Atmospheric Research (NCAR)
 * Thomas Cram (tc...@ucar.edu)
 * Zaihua Ji (z...@ucar.edu)
 * Steven Worley (wor...@ucar.edu)

= Sponsors =

= Champion =
* Lewis McGibbney (NASA/JPL)

= Nominated Mentors =
 * TBD
 * TBD
 * TBD

= Sponsoring Entity =
The Apache Incubator


--
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney
--
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney



--
My introduction https://youtu.be/Ln4vly5sxYU

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [RESULT] WAS Re: [VOTE] Accept Science Data Analytics Platform (SDAP) into Apache Incubator WAS Re: [DISCUSS] Accept Science Data Analytics Platform (SDAP) into Apache Incubator

Reply via email to