rzo1 opened a new issue, #1542:
URL: https://github.com/apache/incubator-stormcrawler/issues/1542
# Description
Many new users report that Apache StormCrawler (SC) is difficult to set up
and run for the first time. To improve accessibility and lower the entry
barrier, the documentation should include a beginner-friendly tutorial that
walks through the basic setup and execution of a simple crawler topology.
# Proposed Solution
Add a new section or page in the documentation that includes:
1. Quickstart Tutorial
A step-by-step guide that covers:
- Setting up SC using Docker (or Docker Compose)
- Setting up and configuring a basic topology
- Submitting and running the topology on a local cluster (and on the docker
compose environment)
- Verifying that the crawler is working (e.g., viewing fetched URLs/logs)
2. Follow-up Topics
Provide links or notes on how users can:
- Extend the setup with custom configurations
Use SC with Playwright or other browser automation tools
- Handle politeness, filters, and parsing rules
- Integrate storage (e.g., OpenSearch)
- Building custom bolts
# Motivation
This will help new users get started quickly and understand the power of SC
without having to dig through fragmented examples or advanced configuration too
early. Better documentation will make SC more approachable and could help grow
the community by reducing the initial learning curve.
# Additional Context
I’ve often heard that while SC is powerful, it's perceived as too
complicated to get up and running. A quickstart tutorial could go a long way
toward solving that issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]