Operations Reliability Engineer_San Ramon, CA_6 months(OPT, GC or USC)

Mayank Dixit Tue, 08 Dec 2015 11:20:07 -0800

*Hi,*

*Kindly let me know  if you are comfortable on below position.*




*Position: Operations Reliability Engineer*

*Location: San Ramon, CA*

*OPT, GC or USC*








*MUST HAVE's: 1.  UNIX Systems 2.  Automation with scripting. 3.
Troubleshoot: Network layer, Storage layer, Application layer, compute
layer 4.  Hands on experience  5.  Willing to eventually work 24x7, nights,
evenings, weekends etc.*

*Project:* 24x7 predix cloud operations.  Maintain and provide services on
the predix cloud.



*(Mid-Level) Operations Reliability Engineer*
The Operations Reliability Engineering (ORE) team at GE Software, San Ramon
reports to the Manager of Cloud Command Center (C3), Industrial Cloud
Services team and is responsible for operational delivery and support of
production cloud infrastructure of Industrial Internet services and
applications, focusing on proactive monitoring, rapid response, and Tier 2
support for our IaaS, PaaS and Application services. The purpose of the ORE
team is to identify and address the factors that directly and indirectly
affect the reliability of service offerings for GE’s Predix Cloud Service.

*Job Function/Responsibilities*
As an Operations Reliability Engineer, you will be a key member of the ORE
team, you will be working to improve the reliability and performance of our
Operations. The ORE works as a first responder and is ultimately
responsible for ensuring our cloud infrastructure services are up and
running. You will work shoulder-to-shoulder with our engineering teams to
deliver, build and operate the next generation of IaaS, PaaS and SaaS Cloud
infrastructure and services, focusing on automation, availability and
performance. You will diagnose and resolve latent and systemic reliability
issues across entire stack: hardware, software, services, database,
application and network. Drive standardization efforts across multiple
disciplines and services.

   - Willing to roll up your sleeves and debug/tune/code/fix
   - Strong background and experience in scripting and automation
   - Represent the ORE organization in design reviews and operational
   readiness exercises for new and existing services with other teams.
   - Work with internal operational teams on driving availability, latency,
   scalability and efficiency of service/applications by instilling
   reliability into operational life cycle with a focus on fault tolerant
   approaches
   - Making sure the IaaS, PaaS & SaaS Cloud infrastructure and services
   platform meets or exceeds organization goals for availability, capacity,
   efficiency, scalability, and performance by engineering reliability into
   software and systems
   - Perform proactive daily system monitoring including reviewing system
   and application logs as well as responding to, triaging, troubleshooting
   and remediating incidents
   - Repair and recover from hardware or software failures. Coordinate and
   communicate with impacted stakeholders and clients, escalating where
   appropriate
   - Work closely with Infrastructure services, software support, security,
   development and engineering teams helping to build, maintain and extend the
   IaaS, PaaS & SaaS “live” services. Contribute in new and ongoing technology
   projects; Performance, High Availability and Scalability including
   partitioning, sharding, dynamic provisioning and de-provisioning of systems
   for current load, etc.
   - Review entire environment and execute initiatives to reduce failures,
   defects and improving overall performance.
   - Design, develop and execute automated tests to validate solutions and
   environments.
   - Monitor and troubleshoot issues across the entire stack - hardware,
   software, application and network.
   - Participate in performance analysis and tuning, service capacity
   planning and demand forecasting
   - Document current and future configuration processes and policies.
   - Assist with the implementation and development of SRE tools and
   applications
   - Manage and support SRE tools and applications
   - *Participate in a 24x7 rotation for production issue escalations.*

*Qualifications* & Requirements

   - BS or MS degree in Computer Science, or a related field
   - 3 - 5 years of experience administering Linux systems and
   infrastructure in a SaaS/Cloud production environment – AWS/Private Cloud
   - Good understanding of service orientation methodology
   - Strong working knowledge of networking, packet tracing, understanding
   latency and throughput.
   - Strong working knowledge of Linux operating systems, their underlying
   components, system statistics, performance tuning, filesystems and io.
   - Solid understanding of systems and application design, including the
   operational trade-offs of various designs
   - Practical knowledge of various aspects of service design, including
   messaging protocols & behavior, caching strategies and software service
   design practices
   - Specialist in at least 2-3 of the following: Pivtotal Cloud Foundry,
   OpenStack, Hadoop, Pivotal HD, HAWQ, MSQL, RabbitMQ, Redis, Jenkins, IaaS
   [Compute – Linux, Storage, Network -  SDN – Juniper Contrail, Palo Alto
   Network FW, F5 load balancers]
   - Experience administering in customer-facing, high-availability, large
   scale environments.
   - Experience in one or more of the following languages: Shell, Python,
   PHP or Perl
   - Must have an understanding of building and managing large-scale
   systems and application architectures
   - Prior experience with configuration and maintenance of common
   applications such as Apache, MySQL, DHCP, SSH, DNS, etc.
   - Proficient in one or more of the following monitoring and logging
   tools: New Relic, App dynamics, Neustar, Gomez, Nimsoft, Zabbix, Nagios,
   Ganglia, Cacti, Splunk, Logstash, Graphite.
   - Working knowledge of Linux, TCP/IP, and web services
   - Prior experience with one or more of the following tools: Chef,
   Puppet, BOSH
   - Experience working in Agile environments
   - Solid verbal and written communication skills

*Desired*

   - You obsess about uptime, availability, and reliability to deliver a
   world-class service
   - You are proactive about everything. You believe in continual
   improvement, and are the kind of person
   - who treats failures as big learning opportunities
   - Your peers describe you as a responsible, conscientious, dependable
   colleague and collaborator
   - A team player, fast learner, with a focus on getting work done
   - You look at the big picture; think long term, while still delivering
   on short term
   - You are constantly looking at ways to automate, operationalize, and
   standardize day-to-day work
   - You are excited by the opportunity of being part of a team of
   engineers that are responsible for making sure GE’s Industrial Cloud
   Infrastructure





Regards

*Mayank*

978-558-4666 x 103

*may...@teknavigators.com* <may...@teknavigators.com>

-- 
You received this message because you are subscribed to the Google Groups "Hot 
List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to directclienteq+unsubscr...@googlegroups.com.
To post to this group, send email to directclienteq@googlegroups.com.
Visit this group at http://groups.google.com/group/directclienteq.
For more options, visit https://groups.google.com/d/optout.

Operations Reliability Engineer_San Ramon, CA_6 months(OPT, GC or USC)

Reply via email to