Spark Developer Columbus Indiana
Skills : Spark, Hive Profile Summary • Leads projects for design, development and maintenance of a data and analytics platform. Effectively and efficiently process, store and make data available to analysts and other consumers. Works with key business stakeholders, IT experts and subject-matter experts to plan, design and deliver optimal analytics and data science solutions. Works on one or many product teams at a time. Key Responsibilities • Designs and automates deployment of our distributed system for ingesting and transforming data from various types of sources (relational, event-based, unstructured). • Designs and implements framework to continuously monitor and troubleshoot data quality and data integrity issues. • Implements data governance processes and methods for managing metadata, access, retention to data for internal and external users. • Designs and provide guidance on building reliable, efficient, scalable and quality data pipelines with monitoring and alert mechanisms that combine a variety of sources using ETL/ELT tools or scripting languages. • Designs and implements physical data models to define the database structure. Optimizing database performance through efficient indexing and table relationships. • Participates in optimizing, testing, and troubleshooting of data pipelines. • Designs, develops and operates large scale data storage and processing solutions using different distributed and cloud based platforms for storing data (e.g. Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, others). • Uses innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity. Assists with renovating the data management infrastructure to drive automation in data integration and management. • Ensures the timeliness and success of critical analytics initiatives by using agile development technologies such as DevOps, Scrum, Kanban • Coaches and develops less experienced team members. Experiences REQUIRED - Intermediate experience resulting in the following skills and knowledge: - Familiarity analyzing complex business systems, industry requirements, and/or data regulations - Background in processing and managing large data sets - Design and development for a Big Data platform using open source and third-party tools - SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent college coursework - SQL query language - Clustered compute cloud-based implementation experience - Experience developing applications requiring large file movement for a Cloud-based environment and other data extraction tools and methods from a variety of sources - Experience in building analytical solutions PREFERRED - Intermediate experience resulting in the following skills and knowledge: - Experience with IoT technology - Experience in Agile software development *Regards,* *ANKIT MENDIRATTA* *Lead Recruiter* *Net**2**Source Inc.* *Global HQ Address – 7250 Dallas Pkwy, Suite 825 Plano, Texas 75024* *Office: (201) 340-8700 x **459 | Fax: (201) 221-8131| Email: **anki...@net2source.com <anki...@net2source.com>* -- You received this message because you are subscribed to "rtc-linux". Membership options at http://groups.google.com/group/rtc-linux . Please read http://groups.google.com/group/rtc-linux/web/checklist before submitting a driver. --- You received this message because you are subscribed to the Google Groups "rtc-linux" group. To unsubscribe from this group and stop receiving emails from it, send an email to rtc-linux+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/rtc-linux/CALQk_bdK-kzs6Xc5FHp-HKvwWzN2y7CNk9q0d0W%3D3i2TBVBzLw%40mail.gmail.com.