kinow commented on code in PR #146:
URL: https://github.com/apache/jena-site/pull/146#discussion_r1103840564


##########
layouts/_default/search.html:
##########
@@ -0,0 +1,200 @@
+{{ define "main" }}
+<!-- Source: https://makewithhugo.com/add-search-to-a-hugo-site/ -->
+<main>
+  <div id="search-results"></div>
+  <div class="search-loading">Loading...</div>
+
+  <script id="search-result-template" type="text/x-js-template">
+    <div id="summary-${key}">
+      <h3><a href="${link}">${title}</a></h3>
+      <p class="pb-0 mb-0">${snippet}</p>
+      <p class="opacity-50 pt-0 mt-0"><small>Score: ${score}</small></p>
+      <p>
+        <small>
+          ${ isset tags }Tags: ${tags}<br>${ end }
+        </small>
+      </p>
+    </div>
+  </script>
+
+  <script src="/js/fuse.min.js" type="text/javascript" crossorigin="anonymous" 
referrerpolicy="no-referrer"></script>
+  <script src="/js/mark.min.js" type="text/javascript" crossorigin="anonymous" 
referrerpolicy="no-referrer"></script>
+  <script type="text/javascript">
+    (function() {
+      const summaryInclude = 180;
+      // See: https://fusejs.io/api/options.html
+      const fuseOptions = {
+        // Indicates whether comparisons should be case sensitive.
+        isCaseSensitive: false,
+        // Whether the score should be included in the result set.
+        // A score of 0 indicates a perfect match, while a score of 1 
indicates a complete mismatch.
+        includeScore: true,
+        // Whether the matches should be included in the result set.
+        // When true, each record in the result set will include the indices 
of the matched characters.
+        // These can consequently be used for highlighting purposes.
+        includeMatches: true,
+        // Only the matches whose length exceeds this value will be returned.
+        // (For instance, if you want to ignore single character matches in 
the result, set it to 2).
+        minMatchCharLength: 2,
+        // Whether to sort the result list, by score.
+        shouldSort: true,
+        // List of keys that will be searched.
+        // This supports nested paths, weighted search, searching in arrays of 
strings and objects.
+        keys: [
+          {name: "title", weight: 0.8},
+          {name: "contents", weight: 0.7},
+          // {name: "tags", weight: 0.95},
+          // {name: "categories", weight: 0.05}
+        ],
+        // --- Fuzzy Matching Options
+        // Determines approximately where in the text is the pattern expected 
to be found.
+        location: 0,
+        // At what point does the match algorithm give up.
+        // A threshold of 0.0 requires a perfect match (of both letters and 
location),
+        // a threshold of 1.0 would match anything.
+        threshold: 0.2,

Review Comment:
   With a `threshold` of `0` I get `7` search results for SHACL. That's the 
same number I get when grepping for it (case insensitive),
   
   ```bash
   kinow@ranma:~/Development/java/jena/jena-site/source$ grep -r -H -o -i SHACL 
| awk -F: '{ print $1 }' | sort -h | uniq
   documentation/fuseki2/fuseki-config-endpoint.md
   documentation/__index.md
   documentation/javadoc.md
   documentation/notes/system-initialization.md
   documentation/shacl/__index.md
   documentation/tools/__index.md
   download/maven.md
   ```
   
   But if I search for "shakl" it brings `0` results.
   
   With `0.2`, both SHACL and SHAKL bring me 14 search results. The 7 first 
results have a score lower than `1` (in Fuse.js higher is worse), and the other 
7 have a score of `1` (I left the score to be displayed with results to help 
users).
   
   So I decided to leave it to 0.2 so users still get some result if they 
misspell their search query.



##########
layouts/_default/search.html:
##########
@@ -0,0 +1,200 @@
+{{ define "main" }}
+<!-- Source: https://makewithhugo.com/add-search-to-a-hugo-site/ -->
+<main>
+  <div id="search-results"></div>
+  <div class="search-loading">Loading...</div>
+
+  <script id="search-result-template" type="text/x-js-template">
+    <div id="summary-${key}">
+      <h3><a href="${link}">${title}</a></h3>
+      <p class="pb-0 mb-0">${snippet}</p>
+      <p class="opacity-50 pt-0 mt-0"><small>Score: ${score}</small></p>
+      <p>
+        <small>
+          ${ isset tags }Tags: ${tags}<br>${ end }
+        </small>
+      </p>
+    </div>
+  </script>
+
+  <script src="/js/fuse.min.js" type="text/javascript" crossorigin="anonymous" 
referrerpolicy="no-referrer"></script>
+  <script src="/js/mark.min.js" type="text/javascript" crossorigin="anonymous" 
referrerpolicy="no-referrer"></script>
+  <script type="text/javascript">
+    (function() {
+      const summaryInclude = 180;
+      // See: https://fusejs.io/api/options.html
+      const fuseOptions = {
+        // Indicates whether comparisons should be case sensitive.
+        isCaseSensitive: false,
+        // Whether the score should be included in the result set.
+        // A score of 0 indicates a perfect match, while a score of 1 
indicates a complete mismatch.
+        includeScore: true,
+        // Whether the matches should be included in the result set.
+        // When true, each record in the result set will include the indices 
of the matched characters.
+        // These can consequently be used for highlighting purposes.
+        includeMatches: true,
+        // Only the matches whose length exceeds this value will be returned.
+        // (For instance, if you want to ignore single character matches in 
the result, set it to 2).
+        minMatchCharLength: 2,
+        // Whether to sort the result list, by score.
+        shouldSort: true,
+        // List of keys that will be searched.
+        // This supports nested paths, weighted search, searching in arrays of 
strings and objects.
+        keys: [
+          {name: "title", weight: 0.8},
+          {name: "contents", weight: 0.7},
+          // {name: "tags", weight: 0.95},
+          // {name: "categories", weight: 0.05}
+        ],
+        // --- Fuzzy Matching Options
+        // Determines approximately where in the text is the pattern expected 
to be found.
+        location: 0,
+        // At what point does the match algorithm give up.
+        // A threshold of 0.0 requires a perfect match (of both letters and 
location),
+        // a threshold of 1.0 would match anything.
+        threshold: 0.2,
+        // Determines how close the match must be to the fuzzy location 
(specified by location).
+        // An exact letter match which is distance characters away from the 
fuzzy location would
+        // score as a complete mismatch. A distance of 0 requires the match be 
at the exact
+        // location specified. A distance of 1000 would require a perfect 
match to be within 800
+        // characters of the location to be found using a threshold of 0.8.
+        distance: 100,
+        // When true, search will ignore location and distance, so it won't 
matter where in
+        // the string the pattern appears.
+        //
+        // NOTE: These settings are used to calculate the Fuzziness Score 
(Bitap algorithm) in Fuse.js.
+        //       It calculates threshold (default 0.6) * distance (default 
(100), which gives 60 by
+        //       default, meaning it will search for the query-term within 60 
characters from the location
+        //       (default 0). Since Jena docs may have very long text that 
includes the query term anywhere
+        //       we disable it with ignoreLocation: true.
+        //       For more: 
https://fusejs.io/concepts/scoring-theory.html#scoring-theory
+        ignoreLocation: true,

Review Comment:
   @afs Fuse.js uses the location of the match in its algorithm, which IMO 
doesn't make much sense for our use case. For example, by default it excludes 
documents that have the search match appearing 60 after the initial 60 
characters.
   
   The setting above disables it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jena.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to