Hi all,

The crawler issue has been identified and fixed.

The root cause was that  by the crawler fails when the latest result
contains less than 90% of the previous result. Increasing the
`maxLostRecordsPercentage` threshold resolves the issue.

https://www.algolia.com/doc/tools/crawler/apis/configuration/safety-checks

On Wed, Dec 17, 2025 at 10:03 PM Xiao Li <[email protected]> wrote:

> Thanks for reporting it! Will take a look
>
> Nicholas Chammas <[email protected]> 于2025年12月5日周五 04:19写道:
>
>> Bueller?
>>
>> Is anyone on this list able to fix the crawler?
>>
>>
>> On Dec 1, 2025, at 12:19 PM, Nicholas Chammas <[email protected]>
>> wrote:
>>
>> Hello,
>>
>> This seems to be happening again.
>>
>> Perhaps we should add a new test (but where, I wonder?) to ensure that
>> Algolia search doesn’t break without us knowing.
>>
>> Nick
>>
>>
>> On Dec 11, 2023, at 5:02 AM, Gengliang Wang <[email protected]> wrote:
>>
>> Hi Nick,
>>
>> Thank you for reporting the issue with our web crawler.
>>
>> I've found that the issue was due to a change(specifically, pull request
>> #40269 <https://github.com/apache/spark/pull/40269>) in the website's
>> HTML structure, where the JavaScript selector ".container-wrapper" is now
>> ".container". I've updated the crawler accordingly, and it's working
>> properly now.
>>
>> Gengliang
>>
>> On Sun, Dec 10, 2023 at 8:15 AM Nicholas Chammas <
>> [email protected]> wrote:
>>
>>> Pinging Gengliang and Xiao about this, per these docs
>>> <https://github.com/apache/spark-website/blob/0ceaaaf528ec1d0201e1eab1288f37cce607268b/release-process.md#update-the-configuration-of-algolia-crawler>
>>> .
>>>
>>> It looks like to fix this problem you need access to the Algolia Crawler
>>> Admin Console.
>>>
>>>
>>> On Dec 5, 2023, at 11:28 AM, Nicholas Chammas <
>>> [email protected]> wrote:
>>>
>>> Should I report this instead on Jira? Apologies if the dev list is not
>>> the right place.
>>>
>>> Search on the website appears to be broken. For example, here is a
>>> search for “analyze”:
>>>
>>> <Image 12-5-23 at 11.26 AM.jpeg>
>>>
>>> And here is the same search using DDG
>>> <https://duckduckgo.com/?q=site:https://spark.apache.org/docs/latest/+analyze&t=osx&ia=web>
>>> .
>>>
>>> Nick
>>>
>>>
>>>
>>
>>

Reply via email to