Begin forwarded message:

===
CERN Fellowships: text mining scientific documents; author
disambiguation in INSPIRE

The CERN Scientific Information Service is looking for two enthusiastic
and motivated developers with experience in text-mining or digital
libraries, to join a dynamic international collaboration which is
building, enhancing and operating the INSPIRE information service, a
digital library which is a key working tool used by 50’000 scientists
worldwide in their cutting-edge research in High-Energy Physics. We have
two fellowships: the first for the text mining of scientific documents,
the second for author disambiguation and management.

What you will do (text mining fellowship):

-       Develop and expand our current text-mining of documents to
extract all possible metadata: authors, affiliations, references and
additional scientific content (figures, tables and more). Build
infrastructure to mine in real time, leveraging user feedback, as
scientists share documents, or for bulk mining of large collections of
scanned/OCR’ed historical material.
-       Integrate, harmonize and expand all steps in the treatment of
documents upon ingestion in INSPIRE from multiple sources, from
extracting metadata to grabbing figures, from detecting similarities to
spotting duplication.
-       Explore opportunities in the extraction of the contextual
information provided by the location of references, figures and tables
in scientific texts.

What you will do (author disambiguation and management fellowship):

-       Expand and develop our author disambiguation and
profile-claiming production infrastructure, with the aim to
automatically associate every newly accessed document to the correct
author profile.
-       Extend our author-article algorithmic and crowd-sourced tools to
provide assertions about the academic affiliation of scientists
-       Assure seamless interoperability and bulk-data exchange with
other relevant partners such as NASA-ADS, arXiv.org, ORCID and leading
publishers in Physics.

Other things you will do (for both fellowships):

-       According to your inclination and abilities, help out on other
projects, such as crowdsourcing aspects of digital library curation,
integrating our services with other data sources via linked open data,
UI/UX design, operations of production and mining of usage data.
-       We require limited participation in stand-by duty for hot-fixes
in the operation of the INSPIRE web service on evenings, weekends and
public holidays.

Your profile:

-       You are a citizen of one of the CERN Member states: Austria,
Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France,
Germany, Greece, Hungary, Italy, Netherlands, Norway, Poland, Portugal,
Slovakia, Spain, Sweden, Switzerland and the United Kingdom. Citizens
from Romania can now also apply.
-       You hold a BSc, MSc or PhD in Computer Science and have less
than 10 years professional experience after your highest diploma.
-       You understand how scientists communicate and have either a
proven track record in handling or mining technical or academic
documents, or an experience in author disambiguation in a large-scale
digital library.
-       You have a solid experience in developing in a LAMP (Linux,
Apache, MySQL, Python) stack, preferably in open source projects, using
git or similar DVCS, and desirably in a production environment.
-       Familiarity with issues and standards in information systems are
an asset: XML, XSLT, RSS, OAI-PMH

Who we are:

CERN is the world leading laboratory in High-Energy Physics, home to the
record-smashing LHC accelerator. Together with partners at
SLAC/Stanford, Fermilab and DESY/Hamburg, The CERN Scientific
Information Service and IT teams are building INSPIRE: a digital library
serving 1 million records to 50’000 scientists in the field worldwide,
which is in beta at http://inspirebeta.net. We collaborate closely with
sister infrastructures arXiv at Cornell and the NASA/ADS at Harvard, as
well as leading publishers in the field. We are founding members of the
ORCID initiative, and stalwarts of Open Access through a myriad projects
and initiatives.

What we offer:

-       Contract duration: One year, which might be extended for a
second year, conditional to performance. Further extension up to a
maximum of three years can be granted under some circumstances.
-       Financial conditions: Fellows stipends are competitive and
calculated individually according to age and qualifications, in the
range 55’000-85’000 CHF per annum, net. Fellow are entitled to
additional family and child allowances. International civil servants in
the area are allowed to purchase discounted tax-free vehicles.
-       Leave: Fellows are entitled to 2.5 days paid leave per month,
plus two weeks at Christmas and a few other local holidays.
-       Insurance: Fellows are covered by CERN’s comprehensive health
scheme for themselves and their dependents.
-       Travel expenses: Fellows are entitled to travel expenses for
themselves and their family and may be entitled to an installation
grant. We also offer help with finding suitable accommodation.

How to apply:

Create an account and submit a complete electronic application form at
http://bit.ly/oDhSRq , containing your Curriculum Vitae, photocopy of
the last (highest) qualification,  a short (half page) description of
your motivation for coming to CERN and work with INSPIRE, and the names
of three referees who will provide us with letters of recommendation. It
is your responsibility to arrange for these letters. Please indicate
“INSPIRE” in the field “Miscellaneous information: Please give details
of the work you are interested in doing at CERN”. NOTE that we will not
be able to process your application otherwise.

In parallel, it is indispensable that you also send us a copy of your CV
at [email protected]

Deadline:

Irrespective of deadlines indicate on the application web page, the
application and ALL supporting documents should reach us BEFORE August
10th, 2011. Retained candidates will be interviewed remotely in the
second half of August. The two successful candidates will start on
October 1st.

Background:

Built on the CERN Open Source Invenio digital library software, and
hosting 1 million records hand-curated over 40 years by partners at
SLAC/Stanford, Fermilab and DESY/Hamburg, INSPIRE serves 1 million
records to 50’000 High-Energy Physics researchers worldwide. INSPIRE, in
beta athttp://inspirebeta.net, provides fast metadata and full-text
searches, author disambiguation, citation analysis, and is expanding its
content and services in a community-centric approach, in addition to
journal publications and other scientific contents. We anticipate users
will soon be submitting scientific documents, and large scale recovery
of historical OCR’ed material will take place, with hundreds of
thousands of documents from 3 to 300 pages long, which will have to be
mined for automatic generation of metadata. Further, we will explore and
expand initiatives for figures and tables extraction from the text, as
well as contextual information on references.

Further information about the CERN fellowship program is available
athttps://ert.cern.ch/browse_www/wd_pds?p_web_site_id=1&p_web_page_id=5834

Further information about the position can be obtained by writing to
[email protected]


-- 
Samuele Kaplun
Invenio Developer ** <http://invenio-software.org/>

Reply via email to