[ 
https://issues.apache.org/jira/browse/CAMEL-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156745#comment-17156745
 ] 

ASF GitHub Bot commented on CAMEL-14952:
----------------------------------------

aashnajena edited a comment on pull request #423:
URL: https://github.com/apache/camel-website/pull/423#issuecomment-657582171


   @zregvart @Delawen @AemieJ I've made some changes and pushed a new index. 
This iteration fixes a lot of previous issues and has the following features :
   
   - Pages are grouped by versions. Versions are displayed along with the 
top-level heading. Versions should appear for all documentation pages now 
   - Each search result is grouped under User Manual [version], Camel 
Components [version], Blog, Books, Downloads, Community, or some sub-project 
[version].  
   - No duplicate results are shown. Try searching for "Components" or "Kafka" 
or an intermediate hierarchy level like "Architecture". You shouldn't be seeing 
duplicate results as was the case earlier.
   - Each result contains a snippet of text/abstract with relevant words 
highlighted. Earlier, abstract was not appearing in many cases.
   - Intermediate levels of hierarchy are visible in the header/title of each 
search result. As far as I have checked, there are no missing levels. Earlier I 
had observed some inconsistencies with pages like "Properties" 
   
   Known problems: 
   - For Blog, the < > arrows are showing up in abstract. I have removed this 
in the final config file, however, it's too lengthy for me to create the whole 
index again. Please overlook this detail. 
   - "Building" page appears separately because it's not included in the user 
manual. We need to either shift it or decide how to display it.
   
   Please review the changes this week, because I'm again on a free Algolia 
trial, as our community account does not allow me to push >10k records for test 
app.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Better search on the website
> ----------------------------
>
>                 Key: CAMEL-14952
>                 URL: https://issues.apache.org/jira/browse/CAMEL-14952
>             Project: Camel
>          Issue Type: Improvement
>          Components: website
>            Reporter: Zoran Regvart
>            Priority: Major
>         Attachments: 
> BH4D9OD16A_apache_camel_20200608-20200614_no_result_searches.csv, 
> List_Of_Crawled_Pages_by_DocSearch.txt, Screenshot_2020-07-08 Home - Apache 
> Camel.png, apache_demo.json, camel.json, error-search.png, 
> getbootstrap-searchresult.png, image-2020-06-13-14-39-08-776.png, 
> list_of_crawled_pages.txt, pr-preview-search.png, properties-new-config.png, 
> search-result-hazelcast.png, search-results-example.png, sitemap-camel.png
>
>
> We use [Algolia|http://algolia.com/] for the search functionality on the 
> website using their [free plan|https://www.algolia.com/for-open-source/] for 
> Open Source projects. The index is built by Algolia's crawler using the 
> [DocSearch|https://docsearch.algolia.com/].
> When this was done we built our own UI on top of Algolia JavaScript API, as 
> one if requirements is that clients use Algolia's JavaScript clients. We did 
> not use Algolia UI as at that point it was rather large JavaScript dependency 
> to add and it would slow down the loading of the website.
> We also have some [initial 
> work|https://github.com/apache/camel-website/pull/74] on creating our own 
> Algolia index at build time.
> The current search doesn't seem to index the whole website, some results 
> don't appear in the search, looks like most of the content from Antora is not 
> indexed: trying to search for {{removeHeader}}, the [FAQ 
> entry|https://camel.apache.org/manual/latest/faq/how-to-avoid-sending-some-or-all-message-headers.html]
>  is not found. There's also a list of failed searches on the Algolia 
> dashboard we can use to benchmark the search.
> What we need is to build the search index over the whole content. Approach 
> taken in [#74|https://github.com/apache/camel-website/pull/74] is good start 
> for Hugo generated content. We need to expand that to Antora built content as 
> well.
> This search index would be built at the website build time and would include 
> both Hugo and Algolia content in the same file or possibly in several files 
> if we use multi-index search. More on how indexes are built can be seen in 
> the [Algolia 
> documentation|https://www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/].
> We need to figure out what data to send and how to integrate this with 
> Antora, for Hugo we have a good idea from 
> [#74|https://github.com/apache/camel-website/pull/74], importantly the 
> structure needs to be the same. One good source of inspiration on building 
> the index for Antora content is in the [Lunr.js 
> integration|https://github.com/Mogztter/antora-lunr].
> We need to build the index with the search UI in mind, i.e. the index needs 
> to contain the data we wish to present in the UI as well as enough content 
> for Algolia to be able to use the content to perform search. So starting with 
> a mockup of what we wish to present/utilize in the search UI and deriving the 
> data structure for the index from that would be a good start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to