[ 
https://issues.apache.org/jira/browse/SOLR-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064927#comment-16064927
 ] 

Cassandra Targett commented on SOLR-10299:
------------------------------------------

Parsing the raw content is one approach that might be successful. Indexing the 
generated HTML is another option. Seeing what happens with {{bin/post}} on the 
HTML files would be another simple experiment to try. I'm not sure it would be 
preferable, but will reflect what end-users see. We don't do this yet, but 
someday we will have raw content files that do not stand alone but are snippets 
included inside another file that together become a single HTML page. 

The harder questions IMO are going to be how to integrate it with the CMS, 
keeping the index up to date, the facet options, the end-user UI, etc.

bq. One thing that might help in the short term could be enabling fuzzy search 
mentioned on https://github.com/christian-fei/Simple-Jekyll-Search ? the 
search.json file we have doesn't mention it and the docs doesn't specify 
whether it is true or false by default

As I've mentioned a few times to the list(s), we're currently using a 
JavaScript to generate the title title-keyword approach that's in use now. That 
doesn't come from Jekyll, but from an open-source Jekyll theme that I borrowed 
for the basic layout of the pages. That Javascript _can_ index the body when 
it's generated, but the author of it notes in his documentation that it can 
cause problems. I never had time to try it to see what these problems are so I 
can't speak to it being a satisfactory stopgap - I'll guess, though, that the 
problems are related to performance, relevance, and proper parsing of text 
(only, you know, all the problems that we know plague inadequate attempts at 
full-text search).

If you are interested, though, here are the docs for the keyword lookup that's 
currently in place: 
http://idratherbewriting.com/documentation-theme-jekyll/mydoc_search_configuration.html.
 You will see immediately the similarities between that site and ours ;)

I have seen the Simple-Jekyll-Search project early on, but I suspect it's going 
to be also inadequate for similar reasons the current JavaScript solution is 
inadequate. Since the theme I used already had a JavaScript-based lookup, I 
didn't bother to investigate another solution in favor of other issues that 
needed to be dealt with. Perhaps it's worth a look, I'm not sure.

By the way, the title-keyword lookup was 100% intended as *the* stopgap 
solution. I knew it would be unsatisfactory, but I also know that despite all I 
know of Solr, I cannot carry the majority of the weight to make this feature 
happen.

> Provide search for online Ref Guide
> -----------------------------------
>
>                 Key: SOLR-10299
>                 URL: https://issues.apache.org/jira/browse/SOLR-10299
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: documentation
>            Reporter: Cassandra Targett
>
> The POC to move the Ref Guide off Confluence did not address providing 
> full-text search of the page content. Not because it's hard or impossible, 
> but because there were plenty of other issues to work on.
> The current HTML page design provides a title index, but to replicate the 
> current Confluence experience, the online version(s) need to provide a 
> full-text search experience.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to