Cornell University seeks an experienced applications programmer for arXiv to 
serve as Service Reliability Engineer for its next generation system. arXiv is 
the premier open access platform serving scientists in physics, mathematics, 
computer science, and other disciplines. For over 25 years, arXiv has enabled 
scientists to share papers within scientific communities and to publish 
"pre-prints” that are scientific papers shared prior to a paper being 
published. Around the world, arXiv is recognized as an essential resource for 
the scientists it serves.

The arXiv team is adopting practices that improve the reliability of 
deployment, the ability to respond to increases in traffic, and the ability to 
observe, monitor, and troubleshoot the system. The primary activity involved in 
these changes is designing, coding, testing, and debugging highly complex 
programs that control the infrastructure and configurations that form the 
backbone of the arXiv platform. This work also has strong ties to site 
security, including detecting malicious activity in the system, and ensuring 
compliance with security and data protection best practices.

This programmer will be a full-time member of the arXiv IT team, reporting to 
the IT Team Lead, and will collaborate with team members on the design and 
implementation of programs, configurations, and workflows to test, deploy, 
monitor, and scale the arXiv software system.

Responsibilities include:

Design, code, test, debug, document, and maintain highly complex systems for 
deploying, monitoring, and scaling software that supports the arXiv platform.
Design and validate test routines and schedules for deployment, monitoring, and 
scaling processes.
Analyze and support end-user needs and experiences related to service 
reliability.
Analyze system errors, including security faults and missed performance goals. 
Plan and implement solutions to address these system faults.
Identify risks and threats related to the availability and security of software 
components and the arXiv system as a whole, and design and implement strategies 
to mitigate those risks.
Collaborate with members of the IT team to research, identify, plan, and lead 
the adoption of practices and technologies that increase service reliability 
and reduce the complexity of operating the arXiv system.

Required Qualifications

A Bachelor’s degree or equivalent experience
5-7 years of relevant experience (developing, deploying, monitoring web 
applications)
Demonstrated aptitude for collaboration and open communication
Experience developing and deploying production services based on open source 
technologies and tools
Demonstrated aptitude for quickly learning new tools and technologies
Proficiency with managing AWS services to support production systems
Experience with DevOps practices and tools, such as Ansible, Puppet, TerraForm
Experience running Docker containers in a production environment
Experience operating Kubernetes in a production environment
Knowledge of security best practices for distributed online systems

Preferred Qualifications

Knowledge of or experience with the ELK (Elasticsearch, Logstash, Kibana) stack 
and related technology
Knowledge of or experience with Helm
Experience developing and deploying web applications using Python web 
frameworks such as Flask, Django, or Pyramid


----
Brought to you by code4lib jobs: 
https://jobs.code4lib.org/jobs/33564-arxiv-service-reliability-security-developer

Reply via email to