GitHub user SamSynnada closed a discussion: Enhancing DataFusion's Community Engagement and Visibility
### Who are we? I'm Sami, co-founder of Synnada, and I'm working alongside my colleague Kuter to support the DataFusion community. We are ready to dedicate some time/energy to increase awareness around DataFusion and helping the project expand its audience. We believe our team we can create high-quality semi-technical content that makes DataFusion more accessible to a broader audience. We can repurpose existing technical information into more digestible formats, conduct user interviews, and manage social media to engage the community effectively. ### **Objectives** 1. Making sure that DF is the go-to choice for data system builders, recognized as the "LLVM of data systems" that provides a robust, flexible, and efficient foundation. 2. Significantly lowering the barrier to entry for data system builders, improving ease of use and reducing time-to-first-prototype, thus accelerating adoption and innovation in the data systems space. 3. Transforming DataFusion's brand perception into a mark of quality and reliability, so that "Built on DataFusion" becomes synonymous with robust, high-performance data systems. 4. Expanding DataFusion's reach and recognition beyond system builders to the broader audience of data-intensive application developers, positioning it as an essential tool in their toolkit. ## **Proposed actions** We propose the following short term actions for community management. We can take the lead for these. - Collecting & presenting DF related content on [apache.datafusion.org](http://apache.datafusion.org/) - Content on DataFusion is all over the place. We need a centralized repository for all relevant content, at least a simple web page linking to other sites. Some content should be linked (e.g. content on your personal site, our website, etc.), some content should be migrated (e.g. release notes on the Arrow website), while some content could be re-posted. - [As of October 21, 2024](https://docs.google.com/spreadsheets/d/1c2QXGhpcYjbXY6hyWlF00IqV347i_ZinyOgmnTe_Dl8), we identified 71 core contents, of which 23 are listed on DataFusion website. - Repurposing and distributing existing core content - Once we identify core content, we can repurpose it for other channels. - DF Paper → Turn it into a series of blog posts explaining **inner workings of DF**. - Meetup presentations → Turn into show & tell / use case content. - The Apache Arrow DataFusion Architecture series by Andrew Lamb → Turn slides into a series of blog posts. - Initiating **Show-and-Tell** sessions to grow core content - **What?** We may start with **Show-and-Tell** blog posts. These could be published on [[apache.datafusion.org](http://apache.datafusion.org/)](http://apache.datafusion.org) (and the co-authors website, if applicable). Authors can present the content on blog-posts in meetups (digital or physical), that content can be distributed on Youtube. Our main objective will be to keep a comprehensive and accurate list of active users of DataFusion and showcase how they are using DF in their project. - **How?** We can create an interview template, start interviewing people, turn transcript into a blog post, post together with the author (on DataFusion’s website and the author’s preferred medium), promote/distribute, reuse the content in Meetups for presentations. This could be done in reverse too — turn meetup presentations to show-and-tells. - **Draft Question Set** 1. **Could you please introduce yourself and your organization?** 2. **How did you first discover Apache DataFusion?** *What motivated you to give it a chance over other alternatives? Why did you choose DataFusion, and what factors influenced your decision?* 3. **Can you describe your learning process with DataFusion?** *Include any resources or strategies that were particularly helpful. Did you face any challenges during the learning or implementation phase? If so, how did you overcome them?* 4. **What challenges or problems were you facing before using DataFusion?** *What tools or solutions were you using at that time? What limitations did you encounter with those solutions?* 5. **Please explain your specific use case for Apache DataFusion.** *Detail how you utilize it in your project or workflow. How did DataFusion solve your problem or improve your workflow? What benefits or improvements have you observed since implementing it?* 6. **Do you have any performance metrics or results that demonstrate the impact of using DataFusion?** *If available, could you share any performance metrics, screenshots, graphs, or diagrams that illustrate your use case or results? Did you discover any unexpected benefits or features in DataFusion that were particularly helpful? Can you comment on the return on investment (ROI) since implementing DataFusion, in terms of time saved, cost reduction, or other efficiencies?* 7. **What key insights or lessons have you learned from using DataFusion?** *What advice would you give to others considering using it? What are the key takeaways from your experience with DataFusion that you believe would be valuable for the community? How satisfied are you with DataFusion overall, and would you recommend it to others? Why or why not?* 8. **Are there any features or improvements you would like to see in future versions of DataFusion and what are your future plans with it?** **Additional Thoughts (Optional)** 1. Would you like to share any additional thoughts or experiences regarding DataFusion? 2. Please provide any relevant links (e.g., project repositories, blog posts) or contact information if you'd like to be contacted for further discussion. 3. How was your experience interacting with the Apache DataFusion community or support channels? 4. Have you contributed back to the DataFusion project (e.g., bug reports, feature requests, code contributions)? If so, could you describe your contributions? - Other content ideas that can be deemed as quick wins. - Write regular “What’s New” blog/newsletter covering updates and changes to DataFusion. - Coordinate community calls, make sure calls are recorded and shared with rest of the community. - Active Twitter/X management - Our objective for social media management should be to increase DataFusion's visibility, engage the community, and foster growth by consistently sharing valuable content and updates. We are volunteering to manage the Twitter account for the project, adhering to the following general guidelines: - Who to follow? - ASF Official - Other relevant Apache projects (Arrow, sub-project of DF) - PMC Members: Consider following either all PMC members or just those who actively engage on Twitter. - Key Project Users: Identify and follow notable users of DF. - What to share? - **Tone of voice.** We can adopt the tone of voice from other Apache projects that have demonstrated successful community management, such as Cassandra, Superset, Airflow. - **Regular Updates / Release Notes**: Use a consistent format for each post, making it easier for the audience to recognize and engage with the content. - **Event Announcements**: - Announce events at least one month in advance to give sufficient notice. - Post weekly reminders leading up to the event, each with a clear call to action (CTA) to boost engagement. - During the event, post live updates to maintain momentum and interaction with the community. ### Call for actions for the community - **Quarterly roadmap** - **What?** Create a comprehensive roadmap blog detailing upcoming features and improvements. - **Benchmarks & comparisons** - **What?** Create a more comprehensive benchmark methodology - Comment on this document, any other suggestions? Any one would like to contribute? GitHub link: https://github.com/apache/datafusion/discussions/13049 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
