>> = Any23 =
>> == Abstract ==
>> The following proposal is about ''Anything To Triples'' (shortly Any23) 
>> defined as a Java library,  a Web service and a set of command line tools to 
>> extract and validate structured data  in [[http://www.w3.org/RDF/|RDF]] 
>> format from a variety of Web documents and markup formats.  Any23 is what it 
>> is informally named an ''RDF Distiller''.
>> == Proposal ==
>> Any23 "Anything to Triples" is a library written in Java 6 and released 
>> under the Apache 2.0 License. It provides a set of extractors for scraping 
>> semantic markup (such as [[http://microformats.org/|Microformats]], 
>> [[http://www.w3.org/TR/rdfa-syntax/|RDFa]] and 
>> [[http://www.w3.org/TR/microdata/|Microdata]])  from several sources (HTML4, 
>> XHTML5, CSV), a set of data validations, a set of parsers and writers to 
>> handle the main RDF transport formats (RDFXML, Ntriples, NQuads, Turtle).  
>> The library provides a command line tool for dealing with data extraction, 
>> conversion and validation, and a REST service implementation. The library is 
>> plugin based, allowing the hot loading of new extractors and validators. 
>> Any23 enables third-parties developers to access structured data from Web 
>> pages without the need of implementing ad-hoc scraping techniques. In this 
>> sense, Any23 will relieve developers from build complex solutions when 
>> developing data acquisition pipelines and processes targeted to semantically 
>> marked-up Web data.
>> == Background ==
>> Any23 has been initially developed at [[http://www.deri.ie/|DERI (Digital 
>> Enterprise Research Institute)]],  as main component of the RDF extraction 
>> pipeline used in [[http://sindice.com/|Sindice (the Semantic Web Index)]], 
>> now is evolved in joint effort with [[http://www.fbk.eu/|FBK (Fondazione 
>> Bruno Kessler)]]. At present time the Any23 official 
>> [[http://developers.any23.org|developers page]] contains all the 
>> documentation, while the code is maintained on 
>> [[http://code.google.com/p/any23/|Google Code]]. An official up-to-date 
>> showcase [[http://any23.org|demo]] is also available.
>> == Rationale ==
>> Provide and maintain a robust, standard and updated library for extracting 
>> and validating semantic markup from heterogeneous sources would provide 
>> large benefits to the entire Open Source Community. Researchers and academic 
>> projects are adopting RDF related technologies from years  while the 
>> industry is actually moving toward Semantic Web technologies with more 
>> concreteness. Several industry initiatives related to the 
>> [[http://en.wikipedia.org/wiki/Semantic_Web|Web of Data]]  are taking place 
>> in the these months. [[http://schema.org|Schema.org]], for example, is an 
>> initiative sponsored by  
>> [[http://www.google.com/about/corporate/company/|Google Inc]], 
>> [[http://info.yahoo.com/center/us/yahoo/|Yahoo Inc]]  and 
>> [[http://www.microsoft.com/about/companyinformation/en/us/default.aspx|Microsoft
>>  Corporation]]  to structure the data in a harmonized way on 
>> [[http://dev.w3.org/html5/spec/Overview.html|HTML5]] pages. 
>> [[http://schema.org|Schema.org]] leverages on the 
>> [[http://dev.w3.org/html5/md/|HTML5 Microdata]] native specification. 
>> [[http://ogp.me/|OpenGraphProtocol]] is the open standard sponsored by  
>> [[https://www.facebook.com/pages/Facebooking/114721225206500|Facebook Inc]] 
>> to include metadata in HTML page headers.  
>> [[http://ogp.me/|OpenGraphProtocol]], initially based on 
>> [[http://www.w3.org/TR/xhtml-rdfa-primer/|RDFa]], allows to describe the 
>> content of a Web page and its underlying vocabulary could be directly 
>> represented using RDF.
>> = Current Status =
>> == Meritocracy ==
>> The historical Any23 team believes in meritocracy and always acted as a 
>> community. Mailing list, open issue tracker and other communication channels 
>> have always been adopted since its first release. The adoption in a larger 
>> community, such as Apache,  is the natural evolution for Any23. Moreover, 
>> the Apache standards will enforce the existing Any23 community practices and 
>> will be a foundation for future committers involvement.
>> == Core Developers ==
>> In alphabetical order:
>> * Davide Palmisano <dpalmisano at gmail dot com>
>> * Giovanni Tummarello <giovanni dot tummarello at deri dot org>
>> * Michele Mostarda <michele dot mostarda at gmail dot com>
>> * Richard Cyganiak <richard at cyganiak dot de>
>> * Reto Bachmann-Gmuer <reto at apache dot org>
>> * Simone Tripodi <simonetripodi at apache dot org>
>> * Szymon Danielczyk <danielczyk.szymon at gmail dot com>
>> * Tommaso Teofili <tommaso at apache dot org>
>> == Alignment ==
>> Main aim of the project is to develop and maintain a fully flavored semantic 
>>  markup distiller that can be used by other Apache projects that need an RDF 
>> extraction tool. The Any23 library core is written using the following 
>> Apache libraries.
>> * [[http://commons.apache.org/lang/|Apache Commons Lang]]
>> * [[http://hc.apache.org/httpclient-3.x/|Apache Commons HTTP Client]]
>> * [[http://commons.apache.org/codec/|Apache Commons Codec]]
>> * [[http://tika.apache.org/|Apache Tika]]
>> * [[http://commons.apache.org/cli/|Apache Commons CLI]]
>> * [[http://poi.apache.org/|Apache POI]]
>> The Any23 service is targeted to run within any compliant Servlet  container 
>> like Tomcat.
>> = Known Risks =
>> == Orphaned Products ==
>> The increasing number of Any23 adopters and the raising interest for 
>> Semantic Web related technologies let us believe that there is a minimal 
>> risk for this work to being abandoned  from the community. Moreover Any23 
>> has already been used in production by Sindice.com and  other DERI projects 
>> for years.
>> == Inexperience with Open Source ==
>> All of the committers have experience working in one or more open source 
>> projects inside and outside ASF.
>> == Homogeneous Developers ==
>> The list of initial committers are geographically distributed across Europe 
>> with no one company being associated with a majority of the developers.  
>> Many of these initial developers are experienced Apache committers already  
>> and all are experienced with working in distributed development communities.
>> == Reliance on Salaried Developers ==
>> To the best of our knowledge, the biggest part of the initial committers is 
>> being paid to develop code for this project due to the adoption of Any23 in 
>> their organizations infrastructures. In any case, some of the core 
>> historical developers (some of them no longer getting paid from the original 
>> companies behind Any23)  are still committing even if Any23 is not employed 
>> in their actual organizations. Any23 has already proven its capability to 
>> attract external developers.
>> == Relationships with Other Apache Products ==
>> In the last years, other projects have been under ASF incubation process 
>> relying on the Semantic Web technology stack, such as Apache Clerezza, 
>> Stanbol and Jena. This could be seen as a proof of the consolidation and the 
>> adoption growing tendency of such technologies. Apart the specificity of 
>> those projects, sharing the same underlying stack, Any23 could be employed 
>> in every projects needing a reliable framework to access structured semantic 
>> markup. Any23 core could be easily released also as a  
>> [[http://wiki.apache.org/nutch/PluginCentral|Apache Nutch Plugin]] and then, 
>> used to handy fill 
>> [[http://www.openrdf.org/doc/sesame2/system/ch05.html|SAIL-compliant]] 
>> triple stores.
>> == An Excessive Fascination with the Apache Brand ==
>> Even if the Any23 community recognizes the power and the attractiveness  of 
>> the ASF brand, we are absolutely aware of our already established role in 
>> the wider Semantic Web developers community. Any23 already proved its 
>> reliability in closely support all the new specifications coming  from the 
>> Microformats communities, our major contributors in term of  opened issues 
>> about new feature requests. Furthermore, we are convinced that we can 
>> enthusiastically bring inside the ASF new and fresh energies in order to 
>> improve our visions, insights and knowledge about the other  projects and, 
>> most important, to have the possibility of enlarge our small  community with 
>> talented and passionate developers.
>> = Documentation =
>> Any23 Documentation
>> 1. [[http://developers.any23.org/|Any23 Project Homepage]]
>> 1. [[http://code.google.com/p/any23/|Any23 Developer Homepage]]
>> 1. [[http://any23.org/|Any23 Live Demo]]
>> Any23 Related Specifications
>> 1. [[http://www.w3.org/RDF/|RDF]]
>> 1. [[http://www.w3.org/TR/html5/|HTML5]]
>> 1. [[http://www.w3.org/TR/rdfa-syntax/|RDFa]]
>> 1. [[http://www.w3.org/TR/microdata/|Microdata]]
>> 1. [[http://microformats.org/|Microformats]]
>> 1. [[http://www.w3.org/TR/rdf-syntax-grammar/|RDF/XML]]
>> 1. [[http://www.w3.org/TeamSubmission/turtle/|Turtle]]
>> 1. [[http://www.w3.org/TR/rdf-testcases/#ntriples|N-Triples]]
>> 1. [[http://sw.deri.org/2008/07/n-quads/|N-Quads]]
>> Any23 Other documentation
>> 1. 
>> [[http://www.slideshare.net/dpalmisano/distilling-the-web-of-data-drop-by-drop-with-java|Any23
>>  presentation on Slideshare]]
>> = Initial Source =
>> The intial source comprises code developed on 
>> [[http://code.google.com/p/any23/|GoogleCode]] licensed under the Apache 
>> License 2.0 (to be contributed under Grant from Giovanni Tummarello for 
>> Any23).
>> = Source and Intellectual Property Submission Plan =
>> Source code will be moved from 
>> [[http://code.google.com/p/any23/|GoogleCode]] space inside the SVN space of 
>> the podling.
>> = External Dependencies =
>> All the external dependencies (and their licenses) used by Any23 follows:
>> * [[http://nekohtml.sourceforge.net/|Nekohtml]] (Apache 2.0)
>> * [[http://www.openrdf.org|OpenRDF Sesame]] (BSD-style license)
>> * [[http://jetty.codehaus.org/jetty/|Jetty]] (Apache License 2.0 and Eclipse 
>> Public License 1.0)
>> * [[http://code.google.com/p/jspf/|Java Simple Plugin Framework]] (new BSD 
>> License)
>> * [[http://code.google.com/p/boilerpipe/[|Boilerpipe]] (Apache License 2.0)
>> * [[http://www.slf4j.org/|slf4j]] (MIT License)
>> * [[http://www.junit.org/|junit]] (Common Public License - v 1.0)
>> * [[http://mockito.org/|Mockito]] (MIT License)
>> = Cryptography =
>> The project does not handle cryptography in any way.
>> = Required Resources =
>> * Mailing lists
>>  * any23-private (with moderated subscriptions)
>>  * any23-dev
>>  * any23-user
>>  * any23-commits
>> * Subversion directory
>>  * https://svn.apache.org/repos/asf/incubator/any23
>> * Website
>>  * Confluence (ANY23)
>> * Issue Tracking
>>  * JIRA (ANY23)
>> = Initial Committers =
>> Names of initial committers - in alphabetical order - with current ASF 
>> status:
>> * Chris Mattmann <mattmann at apache dot org> (Member)
>> * Davide Palmisano <dpalmisano at gmail dot com> (ICLA signed)
>> * Giovanni Tumarello <giovanni dot tummarello at deri dot org> (ICLA signed)
>> * Lewis John !McGibbney <lewismc at apache dot org> (PMC Member)
>> * Michele Mostarda <michele dot mostarda at gmail dot com> (ICLA signed)
>> * Paul Ramirez <pramirez at apache dot org> (Member)
>> * Reto Bachmann-Gmuer <reto at apache dot org> (Committer)
>> * Szymon Danielczyk <danielczyk.szymon at gmail dot com> (ICLA signed)
>> = Sponsors =
>> == Champion ==
>> * Chris Mattmann <mattmann at apache dot org> (Member)
>> == Nominated Mentors ==
>> * Chris Mattmann <mattmann at apache dot org>
>> * Paul Ramirez <pramirez at apache dot org>
>> * Simone Tripodi <simonetripodi at apache dot org>
>> * Tommaso Teofili <tommaso at apache dot org>
>> == Sponsoring Entity ==
>> * Tika PMC
>> = Other interested people (in alphabetical order) =
Reply via email to