The Apache StormCrawler team is pleased to announce the release of version 3.4.0 of Apache StormCrawler. StormCrawler is a collection of resources for building low-latency, customisable and scalable web crawlers on Apache Storm.
Apache StormCrawler 3.4.0 source distributions is available for download from our download page: https://stormcrawler.apache.org/download/index.html Apache StormCrawler is distributed by Maven Central as well. Noteable changes in this version: - LLM-based Text Extraction: A new text extractor component leveraging OpenAI compatible APIs enables advanced content processing using large language models. - Improved #Apache #Solr Integration: Now with asynchronous queries and update batching, offering better indexing performance. For a complete list of fixed bugs and improvements please see the release notes on GitHub. The Apache StormCrawler Team
