22 11:22:30: Generated dev website from groovy-website@e37adf7

git-site-role Fri, 22 Nov 2024 03:22:39 -0800

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-dev-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 6b15d3f  2024/11/22 11:22:30: Generated dev website from 
groovy-website@e37adf7
6b15d3f is described below

commit 6b15d3f809d1262ee19640ffe5c3a08758757401
Author: jenkins <[email protected]>
AuthorDate: Fri Nov 22 11:22:30 2024 +0000

    2024/11/22 11:22:30: Generated dev website from groovy-website@e37adf7
---
 blog/community-over-code-eu-2024.html   |   8 +-
 blog/feed.atom                          |   2 +-
 blog/groovy-lucene.html                 | 677 +++++++++++++++++++++++---------
 blog/index.html                         |   2 +-
 blog/reading-and-writing-csv-files.html |   2 +-
 5 files changed, 488 insertions(+), 203 deletions(-)

diff --git a/blog/community-over-code-eu-2024.html 
b/blog/community-over-code-eu-2024.html
index ccae3ae..15a7c26 100644
--- a/blog/community-over-code-eu-2024.html
+++ b/blog/community-over-code-eu-2024.html
@@ -340,23 +340,23 @@ ways to visualize the results were examined:</p>
 <li>
 <p>The same case study was also done using Spark:</p>
 <div class="paragraph">
-<p><span class="image"><img src="img/coceu2024_whiskey1.png" alt="Whiskey 
flavour profiles with Spark"></span></p>
+<p><span class="image"><img src="img/coceu2024_whiskey1.png" alt="Whiskey 
flavour profiles with Apache Spark"></span></p>
 </div>
 </li>
 <li>
-<p>The same case study was also done using Wayang:</p>
+<p>The same case study was also done using Apache Wayang:</p>
 <div class="paragraph">
 <p><span class="image"><img src="img/coceu2024_whiskey2.png" alt="Whiskey 
flavour profiles with Wayang"></span></p>
 </div>
 </li>
 <li>
-<p>The same case study was also done using Beam (Python-style version shown 
here):</p>
+<p>The same case study was also done using Apache Beam (Python-style version 
shown here):</p>
 <div class="paragraph">
 <p><span class="image"><img src="img/coceu2024_whiskey3.png" alt="Whiskey 
flavour profiles with Beam"></span></p>
 </div>
 </li>
 <li>
-<p>The same case study was also done using Flink:</p>
+<p>The same case study was also done using Apache Flink:</p>
 <div class="paragraph">
 <p><span class="image"><img src="img/coceu2024_whiskey4.png" alt="Whiskey 
flavour profiles with Flink"></span></p>
 </div>
diff --git a/blog/feed.atom b/blog/feed.atom
index f82ca9d..57380d9 100644
--- a/blog/feed.atom
+++ b/blog/feed.atom
@@ -564,7 +564,7 @@
     <link href="http://groovy.apache.org/blog/reading-and-writing-csv-files"/>
     <updated>2022-07-25T14:26:20Z</updated>
     <published>2022-07-25T14:26:20Z</published>
-    <summary type="html">This post looks at processing CSV files using 
OpenCSV, Commons CSV, and Jackson Databind libraries.</summary>
+    <summary type="html">This post looks at processing CSV files using 
OpenCSV, Apache Commons CSV, and Jackson Databind libraries.</summary>
   </entry>
   <entry>
     <id>http://groovy.apache.org/blog/groovy-release-train-4-0</id>
diff --git a/blog/groovy-lucene.html b/blog/groovy-lucene.html
index 41a0c8b..28397ab 100644
--- a/blog/groovy-lucene.html
+++ b/blog/groovy-lucene.html
@@ -53,11 +53,15 @@
                                     </ul>
                                 </div>
                             </div>
-                        </div><div id='content' class='page-1'><div 
class='row'><div class='row-fluid'><div class='col-lg-3'><ul 
class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a 
href='#doc'>Searching with Lucene</a></li><li><a 
href='#_finding_project_names_with_a_regex' class='anchor-link'>Finding project 
names with a regex</a></li><li><a 
href='#_finding_project_names_using_regex_matching' class='anchor-link'>Finding 
project names using regex matching</a></li [...]
+                        </div><div id='content' class='page-1'><div 
class='row'><div class='row-fluid'><div class='col-lg-3'><ul 
class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a 
href='#doc'>Searching with Lucene</a></li><li><a 
href='#_finding_project_names_with_a_regex' class='anchor-link'>Finding project 
names with a regex</a></li><li><a 
href='#_finding_project_names_using_regex_matching' class='anchor-link'>Finding 
project names using regex matching</a></li [...]
 <div class="sectionbody">
 <div class="paragraph">
 <p>The Groovy <a href="https://groovy.apache.org/blog/";>blog posts</a> often 
reference other Apache projects.
-Let&#8217;s have a look at how we can find such references, first using 
regular expressions
+Given that these pages are published, we could use something like <a 
href="https://nutch.apache.org";>Apache Nutch</a> or
+<a href="https://solr.apache.org";>Apache Solr</a> to crawl/index those web 
pages and search using those tools.
+For this post, we are going to search for the
+information we require from the original source (<a 
href="https://asciidoc.org/";>AsciiDoc</a>) files.
+We&#8217;ll first look at how we can find project references using regular 
expressions
 and then using Apache Lucene.</p>
 </div>
 </div>
@@ -85,33 +89,24 @@ so we won&#8217;t.</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">String tokenRegex 
= /(?ix)           # ignore case, enable whitespace &amp; comments
-    \b                               # word boundary
-    (                                # start capture of all terms
-        (                            # capture project name
-            (apache|eclipse)\s       # foundation name
-            (commons\s)?             # optional subproject name
-                (                    # capture next word unless excluded word
-                    ?!(
-                        groovy       # excluded words
-                      | and
-                      | license
-                      | users
-                      | software
-                      | projects
-                      | https
-                      | or
-                      | prefixes
-                      | technologies
-                      )
-                )\w+                 # end capture #2
-        )
-        |                            # alternatively
-        (                            # capture non-project word
-            (?!(apache|eclipse))
-            \w+
-        )                            # end capture #3
-    )                                # end capture #1
+<pre class="prettyprint highlight"><code data-lang="groovy">String tokenRegex 
= /(?ix)               # ignore case, enable whitespace &amp; comments
+    \b                                   # word boundary
+    (                                    # start capture of all terms
+        (                                # capture project name term
+            (apache|eclipse)\s           # foundation name
+            (commons\s)?                 # optional subproject name
+            (
+                ?!(groovy                # negative lookahead for excluded 
words
+                | and   | license  | users
+                | https | projects | software
+                | or    | prefixes | technologies)
+            )\w+
+        )                                # end capture project name term
+        |                                # alternatively
+        (                                # capture non-project term
+            \w+?\b                       # non-greedily match any other words
+        )                                # end capture non-project term
+    )                                    # end capture term
 /</code></pre>
 </div>
 </div>
@@ -127,14 +122,31 @@ Feel free to make a compact (long) one-liner without 
comments if you prefer.</p>
 <div class="sectionbody">
 <div class="paragraph">
 <p>With our regex sorted, let&#8217;s look at how you could use a Groovy 
matcher
-to find all the project names.</p>
+to find all the project names. First we&#8217;ll define one other common 
constant,
+the base directory for our blogs, which you might need to change if you
+are wanting to follow along and run these examples:</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">var blogBaseDir = 
'/projects/apache-websites/groovy-website/site/src/site/blog' // <b 
class="conum">(1)</b>
-var histogram = [:].withDefault { 0 }
+<pre class="prettyprint highlight"><code data-lang="groovy">String baseDir = 
'/projects/apache-websites/groovy-website/site/src/site/blog' // <b 
class="conum">(1)</b></code></pre>
+</div>
+</div>
+<div class="colist arabic">
+<ol>
+<li>
+<p>You&#8217;d need to check out the Groovy website and point to it here</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>Now our script will traverse all the files in that directory, processing 
them with our regex
+and track the hits we find.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var histogram = 
[:].withDefault { 0 } // <b class="conum">(1)</b>
 
-new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file -&gt;  // <b 
class="conum">(2)</b>
+new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file -&gt;  // <b 
class="conum">(2)</b>
     var m = file.text =~ tokenRegex // <b class="conum">(3)</b>
     var projects = m*.get(2).grep()*.toLowerCase()*.replaceAll('\n', ' ') // 
<b class="conum">(4)</b>
     var counts = projects.countBy() // <b class="conum">(5)</b>
@@ -144,7 +156,7 @@ new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { 
file -&gt;  // <b clas
     }
 }
 
-println()
+println "\nFrequency of total hits mentioning a project:"
 histogram.sort { e -&gt; -e.value }.each { k, v -&gt; // <b 
class="conum">(8)</b>
     var label = "$k ($v)"
     println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
@@ -154,10 +166,10 @@ histogram.sort { e -&gt; -e.value }.each { k, v -&gt; // 
<b class="conum">(8)</b
 <div class="colist arabic">
 <ol>
 <li>
-<p>You&#8217;d need to check out the Groovy website and point to it here</p>
+<p>This is a map which provides a default value for non-existent keys</p>
 </li>
 <li>
-<p>This traverse the directory processing each asciidoc file</p>
+<p>This traverse the directory processing each AsciiDoc file</p>
 </li>
 <li>
 <p>We define our matcher</p>
@@ -185,7 +197,7 @@ histogram.sort { e -&gt; -e.value }.each { k, v -&gt; // <b 
class="conum">(8)</b
 <pre>
 apache-nlpcraft-with-groovy.adoc: [apache&nbsp;nlpcraft:5]
 classifying-iris-flowers-with-deep.adoc: [eclipse&nbsp;deeplearning4j:5, 
apache&nbsp;commons math:1, apache&nbsp;spark:2]
-community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1]
+community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1, apache&nbsp;spark:1, apache&nbsp;wayang:1, 
apache&nbsp;beam:1, apache&nbsp;flink:1]
 community-over-code-na-2023.adoc: [apache&nbsp;ignite:8, apache&nbsp;commons 
numbers:1, apache&nbsp;commons csv:1]
 deck-of-cards-with-groovy.adoc: [eclipse&nbsp;collections:5]
 deep-learning-and-eclipse-collections.adoc: [eclipse&nbsp;collections:7, 
eclipse&nbsp;deeplearning4j:2]
@@ -196,7 +208,7 @@ groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons 
cli:2]
 groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
 groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
 groovy-list-processing-cheat-sheet.adoc: [eclipse&nbsp;collections:4, 
apache&nbsp;commons collections:3]
-groovy-lucene.adoc: [apache&nbsp;lucene:2, apache&nbsp;commons:1, 
apache&nbsp;commons math:2]
+groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:2, apache&nbsp;commons:1, apache&nbsp;commons math:2]
 groovy-null-processing.adoc: [eclipse&nbsp;collections:6, apache&nbsp;commons 
collections:4]
 groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
 groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
@@ -204,7 +216,7 @@ handling-byte-order-mark-characters.adoc: 
[apache&nbsp;commons io:1]
 lego-bricks-with-groovy.adoc: [eclipse&nbsp;collections:6]
 matrix-calculations-with-groovy-apache.adoc: [apache&nbsp;commons math:6, 
eclipse&nbsp;deeplearning4j:1, apache&nbsp;commons:1]
 natural-language-processing-with-groovy.adoc: [apache&nbsp;opennlp:2, 
apache&nbsp;spark:1]
-reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:1]
+reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:2]
 set-operations-with-groovy.adoc: [eclipse&nbsp;collections:3]
 solving-simple-optimization-problems-with-groovy.adoc: [apache&nbsp;commons 
math:5, apache&nbsp;kie:1]
 using-groovy-with-apache-wayang.adoc: [apache&nbsp;wayang:9, 
apache&nbsp;spark:7, apache&nbsp;flink:1, apache&nbsp;commons csv:1, 
apache&nbsp;ignite:1]
@@ -212,38 +224,42 @@ whiskey-clustering-with-groovy-and.adoc: 
[apache&nbsp;ignite:7, apache&nbsp;waya
 wordle-checker.adoc: [eclipse&nbsp;collections:3]
 zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
 
+Frequency of total hits mentioning a project:
 eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
 apache&nbsp;commons math (18)         ██████████████████▏
 apache&nbsp;ignite (17)               █████████████████▏
-apache&nbsp;spark (12)                ████████████▏
+apache&nbsp;spark (13)                █████████████▏
 apache&nbsp;mxnet (12)                ████████████▏
+apache&nbsp;wayang (11)               ███████████▏
 apache&nbsp;age (11)                  ███████████▏
-apache&nbsp;wayang (10)               ██████████▏
 eclipse&nbsp;deeplearning4j (8)       ████████▏
 apache&nbsp;commons collections (7)   ███████▏
+apache&nbsp;commons csv (6)           ██████▏
 apache&nbsp;nlpcraft (5)              █████▏
-apache&nbsp;commons csv (5)           █████▏
 apache&nbsp;pekko (4)                 ████▏
 apache&nbsp;hugegraph (3)             ███▏
 apache&nbsp;tinkerpop (3)             ███▏
+apache&nbsp;flink (2)                 ██▏
 apache&nbsp;commons cli (2)           ██▏
-apache&nbsp;commons (2)               ██▏
 apache&nbsp;lucene (2)                ██▏
+apache&nbsp;commons (2)               ██▏
 apache&nbsp;opennlp (2)               ██▏
 apache&nbsp;ofbiz (1)                 █▏
+apache&nbsp;beam (1)                  █▏
 apache&nbsp;commons numbers (1)       █▏
+apache&nbsp;nutch (1)                 █▏
+apache&nbsp;solr (1)                  █▏
 apache&nbsp;commons codec (1)         █▏
 apache&nbsp;commons io (1)            █▏
 apache&nbsp;kie (1)                   █▏
-apache&nbsp;flink (1)                 █▏
 </pre>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_using_lucene">Using Lucene</h2>
+<h2 id="_indexing_with_lucene">Indexing with Lucene</h2>
 <div class="sectionbody">
 <div class="paragraph">
-<p><span class="image right"><img 
src="https://www.apache.org/logos/res/lucene/default.png"; alt="lucene logo" 
width="100"></span>
+<p><span class="image right"><img 
src="https://www.apache.org/logos/res/lucene/default.png"; alt="lucene logo" 
width="200"></span>
 Okay, regular expressions weren&#8217;t that hard but in general we might want 
to search many things.
 Search frameworks like Lucene help with that. Let&#8217;s see what it looks 
like to apply
 Lucene to our problem.</p>
@@ -252,15 +268,19 @@ Lucene to our problem.</p>
 <p>First, we&#8217;ll define a custom analyzer. Lucene is very flexible and 
comes with builtin
 analyzers. In a typical scenario, we might just search on all words.
 There&#8217;s a builtin analyzer for that.
-If we used that, to query for our project names,
+If we used one of the builtin analyzers, to query for our project names,
 we&#8217;d construct a query that spanned multiple (word) terms.
-For the purposes of our little example, we are going to assume project names
-are indivisible terms and slice them up that way. There is a pattern tokenizer
+We&#8217;ll look at what that might look like later, but
+for the purposes of our little example, we are going to assume project names
+are indivisible terms and slice up our documents that way.</p>
+</div>
+<div class="paragraph">
+<p>Luckily, Lucene has a pattern tokenizer
 which lets us reuse our existing regex.</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">class 
ApacheProjectAnalyzer extends Analyzer {
+<pre class="prettyprint highlight"><code data-lang="groovy">class 
ProjectNameAnalyzer extends Analyzer {
     @Override
     protected TokenStreamComponents createComponents(String fieldName) {
         var src = new PatternTokenizer(~tokenRegex, 0)
@@ -275,74 +295,281 @@ which lets us reuse our existing regex.</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new 
ApacheProjectAnalyzer() // <b class="conum">(1)</b>
+<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new 
ProjectNameAnalyzer() // <b class="conum">(1)</b>
 var indexDir = new ByteBuffersDirectory() // <b class="conum">(2)</b>
 var config = new IndexWriterConfig(analyzer)
 
-var blogBaseDir = '/projects/apache-websites/groovy-website/site/src/site/blog'
 new IndexWriter(indexDir, config).withCloseable { writer -&gt;
-    new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file -&gt;
+    var indexedWithFreq = new FieldType(stored: true,
+        indexOptions: IndexOptions.DOCS_AND_FREQS,
+        storeTermVectors: true)
+    new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file -&gt;
         file.withReader { br -&gt;
             var document = new Document()
-            var fieldType = new FieldType(stored: true,
-                indexOptions: 
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
-                storeTermVectors: true,
-                storeTermVectorPositions: true,
-                storeTermVectorOffsets: true)
-            document.add(new Field('content', br.text, fieldType)) // <b 
class="conum">(3)</b>
+            document.add(new Field('content', br.text, indexedWithFreq)) // <b 
class="conum">(3)</b>
             document.add(new StringField('name', file.name, Field.Store.YES)) 
// <b class="conum">(4)</b>
             writer.addDocument(document)
         }
     }
-}
-
-var reader = DirectoryReader.open(indexDir)
-var searcher = new IndexSearcher(reader)
-var parser = new QueryParser("content", analyzer)
-
-var query = parser.parse('apache* OR eclipse*') // <b class="conum">(5)</b>
-var results = searcher.search(query, 30) // <b class="conum">(6)</b>
-println "Total documents with hits for $query --&gt; $results.totalHits"
+}</code></pre>
+</div>
+</div>
+<div class="colist arabic">
+<ol>
+<li>
+<p>This is our regex-based analyzer</p>
+</li>
+<li>
+<p>We&#8217;ll use a memory-based index for our little example</p>
+</li>
+<li>
+<p>Store content of document along with term position info</p>
+</li>
+<li>
+<p>Also store the name of the file</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>With an index defined, we&#8217;d typically now perform some kind of search.
+We&#8217;ll do just that shortly, but first for the kind of information we are 
interested in,
+part of the Lucene API lets us explore the index. Here is how we might do 
that:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var reader = 
DirectoryReader.open(indexDir)
+var vectors = reader.termVectors()
+var storedFields = reader.storedFields()
 
-var storedFields = searcher.storedFields()
-var histogram = [:].withDefault { 0 }
-results.scoreDocs.each { ScoreDoc doc -&gt; // <b class="conum">(7)</b>
-    var document = storedFields.document(doc.doc)
-    var found = handleHit(doc, query, reader) // <b class="conum">(8)</b>
-    println "${document.get('name')}: ${found*.replaceAll('\n', ' 
').countBy()}"
-    found.each { histogram[it.replaceAll('\n', ' ')] += 1 } // <b 
class="conum">(9)</b>
+Set projects = []
+for (docId in 0..&lt;reader.maxDoc()) {
+    String name = storedFields.document(docId).get('name')
+    TermsEnum terms = vectors.get(docId, 'content').iterator() // <b 
class="conum">(1)</b>
+    var found = [:]
+    while (terms.next() != null) {
+        PostingsEnum postingsEnum = terms.postings(null, PostingsEnum.ALL)
+        while (postingsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
+            int freq = postingsEnum.freq()
+            var string = terms.term().utf8ToString().replaceAll('\n', ' ')
+            if (string.startsWith('apache ') || string.startsWith('eclipse ')) 
{ // <b class="conum">(2)</b>
+                found[string] = freq
+            }
+        }
+    }
+    if (found) {
+        println "$name: $found"
+        projects += found.keySet()
+    }
 }
-println()
 
-histogram.sort { e -&gt; -e.value }.each { k, v -&gt; // <b 
class="conum">(10)</b>
+var terms = projects.collect { name -&gt; new Term('content', name) }
+var byReverseValue = { e -&gt; -e.value }
+
+println "\nFrequency of total hits mentioning a project (top 10):"
+var termFreq = terms.collectEntries { term -&gt; [term.text(), 
reader.totalTermFreq(term)] } // <b class="conum">(3)</b>
+termFreq.sort(byReverseValue).take(10).each { k, v -&gt;
     var label = "$k ($v)"
     println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
 }
 
-List&lt;String&gt; handleHit(ScoreDoc hit, Query query, DirectoryReader 
dirReader) { // <b class="conum">(11)</b>
-    boolean phraseHighlight = true
-    boolean fieldMatch = true
-    FieldQuery fieldQuery = new FieldQuery(query, dirReader, phraseHighlight, 
fieldMatch)
-    FieldTermStack stack = new FieldTermStack(dirReader, hit.doc, 'content', 
fieldQuery)
-    FieldPhraseList phrases = new FieldPhraseList(stack, fieldQuery)
-    phrases.phraseList*.termsInfos*.text.flatten()
+println "\nFrequency of documents mentioning a project (top 10):"
+var docFreq = terms.collectEntries { term -&gt; [term.text(), 
reader.docFreq(term)] } // <b class="conum">(4)</b>
+docFreq.sort(byReverseValue).take(10).each { k, v -&gt;
+    var label = "$k ($v)"
+    println "${label.padRight(32)} ${bar(v * 2, 0, 20, 20)}"
 }</code></pre>
 </div>
 </div>
 <div class="colist arabic">
 <ol>
 <li>
-<p>This is our regex-based analyzer</p>
+<p>Get all index terms</p>
 </li>
 <li>
-<p>We&#8217;ll use a memory-based index for our little example</p>
+<p>Look for terms which match project names, so we can save them to a set</p>
 </li>
 <li>
-<p>Store content of document along with term position info</p>
+<p>Grab hit frequency metadata for our term</p>
 </li>
 <li>
-<p>Also store the name of the file</p>
+<p>Grab document frequency metadata for our term</p>
 </li>
+</ol>
+</div>
+<div class="paragraph">
+<p>When we run this we see:</p>
+</div>
+<pre>
+apache-nlpcraft-with-groovy.adoc: [apache&nbsp;nlpcraft:5]
+classifying-iris-flowers-with-deep.adoc: [apache&nbsp;commons math:1, 
apache&nbsp;spark:2, eclipse&nbsp;deeplearning4j:5]
+community-over-code-eu-2024.adoc: [apache&nbsp;beam:1, apache&nbsp;commons 
math:2, apache&nbsp;flink:1, apache&nbsp;ignite:1, apache&nbsp;ofbiz:1, 
apache&nbsp;spark:1, apache&nbsp;wayang:1]
+community-over-code-na-2023.adoc: [apache&nbsp;commons csv:1, 
apache&nbsp;commons numbers:1, apache&nbsp;ignite:8]
+deck-of-cards-with-groovy.adoc: [eclipse&nbsp;collections:5]
+deep-learning-and-eclipse-collections.adoc: [eclipse&nbsp;collections:7, 
eclipse&nbsp;deeplearning4j:2]
+detecting-objects-with-groovy-the.adoc: [apache&nbsp;mxnet:12]
+fruity-eclipse-collections.adoc: [apache&nbsp;commons math:1, 
eclipse&nbsp;collections:9]
+fun-with-obfuscated-groovy.adoc: [apache&nbsp;commons math:1]
+groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons cli:2]
+groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
+groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
+groovy-list-processing-cheat-sheet.adoc: [apache&nbsp;commons collections:3, 
eclipse&nbsp;collections:4]
+groovy-lucene.adoc: [apache&nbsp;commons:1, apache&nbsp;commons math:2, 
apache&nbsp;lucene:2, apache&nbsp;nutch:1, apache&nbsp;solr:1]
+groovy-null-processing.adoc: [apache&nbsp;commons collections:4, 
eclipse&nbsp;collections:6]
+groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
+groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
+handling-byte-order-mark-characters.adoc: [apache&nbsp;commons io:1]
+lego-bricks-with-groovy.adoc: [eclipse&nbsp;collections:6]
+matrix-calculations-with-groovy-apache.adoc: [apache&nbsp;commons:1, 
apache&nbsp;commons math:6, eclipse&nbsp;deeplearning4j:1]
+natural-language-processing-with-groovy.adoc: [apache&nbsp;opennlp:2, 
apache&nbsp;spark:1]
+reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:2]
+set-operations-with-groovy.adoc: [eclipse&nbsp;collections:3]
+solving-simple-optimization-problems-with-groovy.adoc: [apache&nbsp;commons 
math:4, apache&nbsp;kie:1]
+using-groovy-with-apache-wayang.adoc: [apache&nbsp;commons csv:1, 
apache&nbsp;flink:1, apache&nbsp;ignite:1, apache&nbsp;spark:7, 
apache&nbsp;wayang:9]
+whiskey-clustering-with-groovy-and.adoc: [apache&nbsp;commons csv:2, 
apache&nbsp;ignite:7, apache&nbsp;spark:2, apache&nbsp;wayang:1]
+wordle-checker.adoc: [eclipse&nbsp;collections:3]
+zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
+
+Frequency of total hits mentioning a project (top 10):
+eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
+apache&nbsp;commons math (17)         █████████████████▏
+apache&nbsp;ignite (17)               █████████████████▏
+apache&nbsp;spark (13)                █████████████▏
+apache&nbsp;mxnet (12)                ████████████▏
+apache&nbsp;wayang (11)               ███████████▏
+apache&nbsp;age (11)                  ███████████▏
+eclipse&nbsp;deeplearning4j (8)       ████████▏
+apache&nbsp;commons collections (7)   ███████▏
+apache&nbsp;commons csv (6)           ██████▏
+
+Frequency of documents mentioning a project (top 10):
+eclipse&nbsp;collections (10)         ████████████████████▏
+apache&nbsp;commons math (7)          ██████████████▏
+apache&nbsp;spark (5)                 ██████████▏
+apache&nbsp;ignite (4)                ████████▏
+apache&nbsp;commons csv (4)           ████████▏
+eclipse&nbsp;deeplearning4j (3)       ██████▏
+apache&nbsp;wayang (3)                ██████▏
+apache&nbsp;flink (2)                 ████▏
+apache&nbsp;commons collections (2)   ████▏
+apache&nbsp;commons (2)               ████▏
+
+</pre>
+<div class="paragraph">
+<p>So far, we have just displayed curated metadata about our index.
+But just to show that we have an index that supports searching,
+let&#8217;s look for all documents which mention emojis.
+They often make programming examples a lot of fun!</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var parser = new 
QueryParser("content", analyzer)
+var searcher = new IndexSearcher(reader)
+var query = parser.parse('emoji*')
+var results = searcher.search(query, 10)
+println "\nTotal documents with hits for $query --&gt; $results.totalHits"
+results.scoreDocs.each {
+    var doc = storedFields.document(it.doc)
+    println "${doc.get('name')}"
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When we run this we see:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Total documents with hits for content:emoji* --&gt; 11 hits
+adventures-with-groovyfx.adoc
+create-groovy-blog.adoc
+deep-learning-and-eclipse-collections.adoc
+fruity-eclipse-collections.adoc
+groovy-haiku-processing.adoc
+groovy-lucene.adoc
+helloworldemoji.adoc
+seasons-greetings-emoji.adoc
+set-operations-with-groovy.adoc
+solving-simple-optimization-problems-with-groovy.adoc</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Lucene has a very rich API. Let&#8217;s now look at some alternative
+ways we could use Lucene.</p>
+</div>
+<div class="paragraph">
+<p>Rather than exploring index metadata, we&#8217;d more typically run queries
+and explore those results. We&#8217;ll look at how to do that now.
+When exploring query results, we are going to use some classes in the 
<code>vectorhighlight</code>
+package in the <code>lucene-highlight</code> module. You&#8217;d typically use 
functionality in that
+module to highlight hits as part of potentially displaying them on a web page
+as part of some web search functionality. For us, we are going to just
+pick out the terms of interest, project names that matching our query.</p>
+</div>
+<div class="paragraph">
+<p>We the highlight functionality to work, we ask the indexer to store some 
additional information
+when indexing about term positions. The index code changes to look like 
this:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">new 
IndexWriter(indexDir, config).withCloseable { writer -&gt;
+    new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file -&gt;
+        file.withReader { br -&gt;
+            var document = new Document()
+            var fieldType = new FieldType(stored: true,
+                indexOptions: 
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
+                storeTermVectors: true,
+                storeTermVectorPositions: true,
+                storeTermVectorOffsets: true)
+            document.add(new Field('content', br.text, fieldType))
+            document.add(new StringField('name', file.name, Field.Store.YES))
+            writer.addDocument(document)
+        }
+    }
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We could have stored this additional information even for our previous 
example,
+but it wasn&#8217;t needed previously.</p>
+</div>
+<div class="paragraph">
+<p>Next, we define a helper method to extract the actual project names from 
matches:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">List&lt;String&gt; 
handleHit(ScoreDoc hit, Query query, DirectoryReader dirReader) {
+    boolean phraseHighlight = true
+    boolean fieldMatch = true
+    var fieldQuery = new FieldQuery(query, dirReader, phraseHighlight, 
fieldMatch)
+    var stack = new FieldTermStack(dirReader, hit.doc, 'content', fieldQuery)
+    var phrases = new FieldPhraseList(stack, fieldQuery)
+    phrases.phraseList*.termsInfos*.text.flatten()
+}</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var query = 
parser.parse(/apache\ * OR eclipse\ */) // <b class="conum">(1)</b>
+var results = searcher.search(query, 30) // <b class="conum">(2)</b>
+println "Total documents with hits for $query --&gt; $results.totalHits\n"
+
+var storedFields = searcher.storedFields()
+var histogram = [:].withDefault { 0 }
+results.scoreDocs.each { ScoreDoc scoreDoc -&gt; // <b class="conum">(3)</b>
+    var doc = storedFields.document(scoreDoc.doc)
+    var found = handleHit(scoreDoc, query, reader) // <b class="conum">(4)</b>
+    println "${doc.get('name')}: ${found*.replaceAll('\n', ' ').countBy()}"
+    found.each { histogram[it.replaceAll('\n', ' ')] += 1 } // <b 
class="conum">(5)</b>
+}
+
+println "\nFrequency of total hits mentioning a project:"
+histogram.sort { e -&gt; -e.value }.each { k, v -&gt; // <b 
class="conum">(6)</b>
+    var label = "$k ($v)"
+    println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
+}</code></pre>
+</div>
+</div>
+<div class="colist arabic">
+<ol>
 <li>
 <p>Search for terms with the apache or eclipse prefixes</p>
 </li>
@@ -361,23 +588,21 @@ List&lt;String&gt; handleHit(ScoreDoc hit, Query query, 
DirectoryReader dirReade
 <li>
 <p>Display the aggregates as a pretty barchart</p>
 </li>
-<li>
-<p>Helper method</p>
-</li>
 </ol>
 </div>
 <div class="paragraph">
 <p>The output is essentially the same as before:</p>
 </div>
 <pre>
-Total documents with hits for content:apache* content:eclipse* --> 28 hits
+Total documents with hits for content:apache&nbsp;* content:eclipse&nbsp;* --> 
28 hits
+
 classifying-iris-flowers-with-deep.adoc: [eclipse&nbsp;deeplearning4j:5, 
apache&nbsp;commons math:1, apache&nbsp;spark:2]
 fruity-eclipse-collections.adoc: [eclipse&nbsp;collections:9, 
apache&nbsp;commons math:1]
 groovy-list-processing-cheat-sheet.adoc: [eclipse&nbsp;collections:4, 
apache&nbsp;commons collections:3]
 groovy-null-processing.adoc: [eclipse&nbsp;collections:6, apache&nbsp;commons 
collections:4]
 matrix-calculations-with-groovy-apache.adoc: [apache&nbsp;commons math:6, 
eclipse&nbsp;deeplearning4j:1, apache&nbsp;commons:1]
 apache-nlpcraft-with-groovy.adoc: [apache&nbsp;nlpcraft:5]
-community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1]
+community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1, apache&nbsp;spark:1, apache&nbsp;wayang:1, 
apache&nbsp;beam:1, apache&nbsp;flink:1]
 community-over-code-na-2023.adoc: [apache&nbsp;ignite:8, apache&nbsp;commons 
numbers:1, apache&nbsp;commons csv:1]
 deck-of-cards-with-groovy.adoc: [eclipse&nbsp;collections:5]
 deep-learning-and-eclipse-collections.adoc: [eclipse&nbsp;collections:7, 
eclipse&nbsp;deeplearning4j:2]
@@ -386,13 +611,13 @@ fun-with-obfuscated-groovy.adoc: [apache&nbsp;commons 
math:1]
 groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons cli:2]
 groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
 groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
-groovy-lucene.adoc: [apache&nbsp;lucene:2, apache&nbsp;commons:1, 
apache&nbsp;commons math:2]
+groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:2, apache&nbsp;commons:1, apache&nbsp;commons math:2]
 groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
 groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
 handling-byte-order-mark-characters.adoc: [apache&nbsp;commons io:1]
 lego-bricks-with-groovy.adoc: [eclipse&nbsp;collections:6]
 natural-language-processing-with-groovy.adoc: [apache&nbsp;opennlp:2, 
apache&nbsp;spark:1]
-reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:1]
+reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:2]
 set-operations-with-groovy.adoc: [eclipse&nbsp;collections:3]
 solving-simple-optimization-problems-with-groovy.adoc: [apache&nbsp;commons 
math:5, apache&nbsp;kie:1]
 using-groovy-with-apache-wayang.adoc: [apache&nbsp;wayang:9, 
apache&nbsp;spark:7, apache&nbsp;flink:1, apache&nbsp;commons csv:1, 
apache&nbsp;ignite:1]
@@ -400,30 +625,17 @@ whiskey-clustering-with-groovy-and.adoc: 
[apache&nbsp;ignite:7, apache&nbsp;waya
 wordle-checker.adoc: [eclipse&nbsp;collections:3]
 zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
 
+Frequency of total hits mentioning a project (top 10):
 eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
 apache&nbsp;commons math (18)         ██████████████████▏
 apache&nbsp;ignite (17)               █████████████████▏
-apache&nbsp;spark (12)                ████████████▏
+apache&nbsp;spark (13)                █████████████▏
 apache&nbsp;mxnet (12)                ████████████▏
+apache&nbsp;wayang (11)               ███████████▏
 apache&nbsp;age (11)                  ███████████▏
-apache&nbsp;wayang (10)               ██████████▏
 eclipse&nbsp;deeplearning4j (8)       ████████▏
 apache&nbsp;commons collections (7)   ███████▏
-apache&nbsp;nlpcraft (5)              █████▏
-apache&nbsp;commons csv (5)           █████▏
-apache&nbsp;pekko (4)                 ████▏
-apache&nbsp;hugegraph (3)             ███▏
-apache&nbsp;tinkerpop (3)             ███▏
-apache&nbsp;commons (2)               ██▏
-apache&nbsp;commons cli (2)           ██▏
-apache&nbsp;lucene (2)                ██▏
-apache&nbsp;opennlp (2)               ██▏
-apache&nbsp;ofbiz (1)                 █▏
-apache&nbsp;commons numbers (1)       █▏
-apache&nbsp;commons codec (1)         █▏
-apache&nbsp;commons io (1)            █▏
-apache&nbsp;kie (1)                   █▏
-apache&nbsp;flink (1)                 █▏
+apache&nbsp;commons csv (6)           ██████▏
 </pre>
 </div>
 </div>
@@ -432,7 +644,7 @@ apache&nbsp;flink (1)                 █▏
 <div class="sectionbody">
 <div class="listingblock">
 <div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new 
ApacheProjectAnalyzer()
+<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new 
ProjectNameAnalyzer()
 var indexDir = new ByteBuffersDirectory()
 var taxonDir = new ByteBuffersDirectory()
 var config = new IndexWriterConfig(analyzer)
@@ -447,18 +659,15 @@ var fConfig = new FacetsConfig().tap {
     setIndexFieldName('projectHitCounts', '$projectHitCounts')
 }
 
-var blogBaseDir = '/projects/apache-websites/groovy-website/site/src/site/blog'
-new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file -&gt;
+new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file -&gt;
     var m = file.text =~ tokenRegex
     var projects = m*.get(2).grep()*.toLowerCase()*.replaceAll('\n', ' 
').countBy()
     file.withReader { br -&gt;
         var document = new Document()
-        var fieldType = new FieldType(stored: true,
-            indexOptions: 
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
-            storeTermVectors: true,
-            storeTermVectorPositions: true,
-            storeTermVectorOffsets: true)
-        document.add(new Field('content', br.text, fieldType))
+        var indexedWithFreq = new FieldType(stored: true,
+            indexOptions: IndexOptions.DOCS_AND_FREQS,
+            storeTermVectors: true)
+        document.add(new Field('content', br.text, indexedWithFreq))
         document.add(new StringField('name', file.name, Field.Store.YES))
         if (projects) {
             println "$file.name: $projects"
@@ -472,45 +681,13 @@ new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { 
file -&gt;
     }
 }
 indexWriter.close()
-taxonWriter.close()
-println()
-
-var reader = DirectoryReader.open(indexDir)
-var searcher = new IndexSearcher(reader)
-var taxonReader = new DirectoryTaxonomyReader(taxonDir)
-var fcm = new FacetsCollectorManager()
-var fc = FacetsCollectorManager.search(searcher, new MatchAllDocsQuery(), 10, 
fcm).facetsCollector()
-
-var projects = new TaxonomyFacetIntAssociations('$projectHitCounts', 
taxonReader, fConfig, fc, AssociationAggregationFunction.SUM)
-var hitCounts = projects.getTopChildren(10, "projectHitCounts")
-println hitCounts
-
-var facets = new FastTaxonomyFacetCounts(taxonReader, fConfig, fc)
-var fileCounts = facets.getTopChildren(10, "projectFileCounts")
-println fileCounts
-
-var nameCounts = facets.getTopChildren(10, "projectNameCounts")
-println nameCounts
-nameCounts = facets.getTopChildren(10, "projectNameCounts", 'apache')
-println nameCounts
-nameCounts = facets.getTopChildren(10, "projectNameCounts", 'apache', 
'commons')
-println nameCounts
-
-var parser = new QueryParser("content", analyzer)
-var query = parser.parse('apache* AND eclipse*')
-var results = searcher.search(query, 10)
-println "Total documents with hits for $query --&gt; $results.totalHits"
-var storedFields = searcher.storedFields()
-results.scoreDocs.each { ScoreDoc doc -&gt;
-    var document = storedFields.document(doc.doc)
-    println "${document.get('name')}"
-}</code></pre>
+taxonWriter.close()</code></pre>
 </div>
 </div>
 <pre>
 apache-nlpcraft-with-groovy.adoc: [apache&nbsp;nlpcraft:5]
 classifying-iris-flowers-with-deep.adoc: [eclipse&nbsp;deeplearning4j:5, 
apache&nbsp;commons math:1, apache&nbsp;spark:2]
-community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1]
+community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1, apache&nbsp;spark:1, apache&nbsp;wayang:1, 
apache&nbsp;beam:1, apache&nbsp;flink:1]
 community-over-code-na-2023.adoc: [apache&nbsp;ignite:8, apache&nbsp;commons 
numbers:1, apache&nbsp;commons csv:1]
 deck-of-cards-with-groovy.adoc: [eclipse&nbsp;collections:5]
 deep-learning-and-eclipse-collections.adoc: [eclipse&nbsp;collections:7, 
eclipse&nbsp;deeplearning4j:2]
@@ -521,7 +698,7 @@ groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons 
cli:2]
 groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
 groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
 groovy-list-processing-cheat-sheet.adoc: [eclipse&nbsp;collections:4, 
apache&nbsp;commons collections:3]
-groovy-lucene.adoc: [apache&nbsp;lucene:2, apache&nbsp;commons:1, 
apache&nbsp;commons math:2]
+groovy-lucene.adoc: [apache&nbsp;nutch:1, apache&nbsp;solr:1, 
apache&nbsp;lucene:2, apache&nbsp;commons:1, apache&nbsp;commons math:2]
 groovy-null-processing.adoc: [eclipse&nbsp;collections:6, apache&nbsp;commons 
collections:4]
 groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
 groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
@@ -529,7 +706,7 @@ handling-byte-order-mark-characters.adoc: 
[apache&nbsp;commons io:1]
 lego-bricks-with-groovy.adoc: [eclipse&nbsp;collections:6]
 matrix-calculations-with-groovy-apache.adoc: [apache&nbsp;commons math:6, 
eclipse&nbsp;deeplearning4j:1, apache&nbsp;commons:1]
 natural-language-processing-with-groovy.adoc: [apache&nbsp;opennlp:2, 
apache&nbsp;spark:1]
-reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:1]
+reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:2]
 set-operations-with-groovy.adoc: [eclipse&nbsp;collections:3]
 solving-simple-optimization-problems-with-groovy.adoc: [apache&nbsp;commons 
math:5, apache&nbsp;kie:1]
 using-groovy-with-apache-wayang.adoc: [apache&nbsp;wayang:9, 
apache&nbsp;spark:7, apache&nbsp;flink:1, apache&nbsp;commons csv:1, 
apache&nbsp;ignite:1]
@@ -537,62 +714,170 @@ whiskey-clustering-with-groovy-and.adoc: 
[apache&nbsp;ignite:7, apache&nbsp;waya
 wordle-checker.adoc: [eclipse&nbsp;collections:3]
 zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
 
-dim=projectHitCounts path=[] value=-1 childCount=24
-  eclipse&nbsp;collections (50)
-  apache&nbsp;commons math (18)
-  apache&nbsp;ignite (17)
-  apache&nbsp;spark (12)
-  apache&nbsp;mxnet (12)
-  apache&nbsp;age (11)
-  apache&nbsp;wayang (10)
-  eclipse&nbsp;deeplearning4j (8)
-  apache&nbsp;commons collections (7)
-  apache&nbsp;nlpcraft (5)
-
-dim=projectFileCounts path=[] value=-1 childCount=24
+</pre>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var reader = 
DirectoryReader.open(indexDir)
+var searcher = new IndexSearcher(reader)
+var taxonReader = new DirectoryTaxonomyReader(taxonDir)
+var fcm = new FacetsCollectorManager()
+var fc = FacetsCollectorManager.search(searcher, new MatchAllDocsQuery(), 0, 
fcm).facetsCollector()
+
+var topN = 5
+var projects = new TaxonomyFacetIntAssociations('$projectHitCounts', 
taxonReader, fConfig, fc, AssociationAggregationFunction.SUM)
+var hitCounts = projects.getTopChildren(topN, 
"projectHitCounts").labelValues.collect{
+    [label: it.label, hits: it.value, files: it.count]
+}
+
+println "\nFrequency of total hits mentioning a project (top $topN):"
+hitCounts.sort{ m -&gt; -m.hits }.each { m -&gt;
+    var label = "$m.label ($m.hits)"
+    println "${label.padRight(32)} ${bar(m.hits, 0, 50, 50)}"
+}
+
+println "\nFrequency of documents mentioning a project (top $topN):"
+hitCounts.sort{ m -&gt; -m.files }.each { m -&gt;
+    var label = "$m.label ($m.files)"
+    println "${label.padRight(32)} ${bar(m.files * 2, 0, 20, 20)}"
+}</code></pre>
+</div>
+</div>
+<pre>
+Frequency of total hits mentioning a project (top 5):
+eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
+apache&nbsp;commons math (18)         ██████████████████▏
+apache&nbsp;ignite (17)               █████████████████▏
+apache&nbsp;spark (13)                █████████████▏
+apache&nbsp;mxnet (12)                ████████████▏
+
+Frequency of documents mentioning a project (top 5):
+eclipse&nbsp;collections (10)         ████████████████████▏
+apache&nbsp;commons math (7)          ██████████████▏
+apache&nbsp;spark (5)                 ██████████▏
+apache&nbsp;ignite (4)                ████████▏
+apache&nbsp;mxnet (1)                 ██▏
+
+</pre>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var facets = new 
FastTaxonomyFacetCounts(taxonReader, fConfig, fc)
+
+println "\nFrequency of documents mentioning a project (top $topN):"
+var fileCounts = facets.getTopChildren(topN, "projectFileCounts")
+println fileCounts</code></pre>
+</div>
+</div>
+<pre>
+Frequency of documents mentioning a project (top 5):
+dim=projectFileCounts path=[] value=-1 childCount=27
   eclipse&nbsp;collections (10)
   apache&nbsp;commons math (7)
-  apache&nbsp;spark (4)
+  apache&nbsp;spark (5)
   apache&nbsp;ignite (4)
-  apache&nbsp;commons csv (4)
-  eclipse&nbsp;deeplearning4j (3)
-  apache&nbsp;commons collections (2)
-  apache&nbsp;commons (2)
-  apache&nbsp;wayang (2)
-  apache&nbsp;nlpcraft (1)
+  apache commons csv (4)
 
+</pre>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">['apache', 
'commons'].inits().reverseEach { path -&gt;
+    println "Frequency of documents mentioning a project with path $path (top 
$topN):"
+    var nameCounts = facets.getTopChildren(topN, "projectNameCounts", *path)
+    println "$nameCounts"
+}</code></pre>
+</div>
+</div>
+<pre>
+Frequency of documents mentioning a project with path [] (top 5):
 dim=projectNameCounts path=[] value=-1 childCount=2
   apache (21)
   eclipse (12)
 
-dim=projectNameCounts path=[apache] value=-1 childCount=15
+Frequency of documents mentioning a project with path [apache] (top 5):
+dim=projectNameCounts path=[apache] value=-1 childCount=18
   commons (16)
-  spark (4)
+  spark (5)
   ignite (4)
-  wayang (2)
-  nlpcraft (1)
-  ofbiz (1)
-  mxnet (1)
-  age (1)
-  hugegraph (1)
-  tinkerpop (1)
-
-dim=projectNameCounts path=[apache,&nbsp;commons] value=-1 childCount=7
+  wayang (3)
+  flink (2)
+
+Frequency of documents mentioning a project with path [apache, commons] (top 
5):
+dim=projectNameCounts path=[apache, commons] value=-1 childCount=7
   math (7)
   csv (4)
   collections (2)
   numbers (1)
   cli (1)
-  codec (1)
-  io (1)
 
-Total documents with hits for +content:apache* +content:eclipse* --> 5 hits
-classifying-iris-flowers-with-deep.adoc
-fruity-eclipse-collections.adoc
-groovy-list-processing-cheat-sheet.adoc
-groovy-null-processing.adoc
-matrix-calculations-with-groovy-apache.adoc
 </pre>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var parser = new 
QueryParser("content", analyzer)
+var query = parser.parse(/apache\ * AND eclipse\ * AND emoji*/)
+var results = searcher.search(query, topN)
+var storedFields = searcher.storedFields()
+assert results.totalHits.value() == 1 &amp;&amp;
+    storedFields.document(results.scoreDocs[0].doc).get('name') == 
'fruity-eclipse-collections.adoc'</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_more_complex_queries">More complex queries</h2>
+<div class="sectionbody">
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new 
StandardAnalyzer()
+var indexDir = new ByteBuffersDirectory()
+var config = new IndexWriterConfig(analyzer)
+
+new IndexWriter(indexDir, config).withCloseable { writer -&gt;
+    new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file -&gt;
+        file.withReader { br -&gt;
+            var document = new Document()
+            var fieldType = new FieldType(stored: true,
+                indexOptions: 
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
+                storeTermVectors: true,
+                storeTermVectorPositions: true,
+                storeTermVectorOffsets: true)
+            document.add(new Field('content', br.text, fieldType))
+            document.add(new StringField('name', file.name, Field.Store.YES))
+            writer.addDocument(document)
+        }
+    }
+}</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">IndexReader reader 
= DirectoryReader.open(indexDir)
+var searcher = new IndexSearcher(reader)
+
+var namepart = new SpanMultiTermQueryWrapper(new RegexpQuery(new 
Term("content", '''(
+math|spark|lucene|collections|deeplearning4j
+|beam|wayang|csv|io|numbers|ignite|mxnet|age
+|nlpcraft|pekko|hugegraph|tinkerpop|commons
+|cli|opennlp|ofbiz|codec|kie|flink
+)'''.replaceAll('\n', ''))))
+
+var (apache, commons) = ['apache', 'commons'].collect{ new Term('content', it) 
}
+var apacheCommons = new SpanNearQuery([new SpanTermQuery(apache), new 
SpanTermQuery(commons), namepart] as SpanQuery[], 0, true)
+
+var foundation = new SpanMultiTermQueryWrapper(new RegexpQuery(new 
Term("content", "(apache|eclipse)")))
+var otherProject = new SpanNearQuery([foundation, namepart] as SpanQuery[], 0, 
true)
+
+var builder = new BooleanQuery.Builder(minimumNumberShouldMatch: 1)
+builder.add(otherProject, BooleanClause.Occur.SHOULD)
+builder.add(apacheCommons, BooleanClause.Occur.SHOULD)
+var query = builder.build()
+var results = searcher.search(query, 30)
+println "Total documents with hits for $query --&gt; 
$results.totalHits"</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Total documents with hits for 
(spanNear([SpanMultiTermQueryWrapper(content:/(apache|eclipse)/), 
SpanMultiTermQueryWrapper(content:/(math|spark|lucene|collections|deeplearning4j|beam|wayang|csv|io|numbers|ignite|mxnet|age|nlpcraft|pekko|hugegraph|tinkerpop|commons|cli|opennlp|ofbiz|codec|kie|flink)/)],
 0, true) spanNear([content:apache, content:commons, 
SpanMultiTermQueryWrapper(content:/(math|spark|lucene|collections|deeplearning4j|beam|wayang|csv|io|numbers|ignite|mxnet|age|nlpcraf
 [...]
+</div>
+</div>
 </div>
 </div>
 <div class="sect1">
diff --git a/blog/index.html b/blog/index.html
index 705993c..502c128 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -53,7 +53,7 @@
                                     </ul>
                                 </div>
                             </div>
-                        </div><div id='content' class='page-1'><div 
class='row'><div class='row-fluid'><div class='col-lg-3' id='blog-index'><ul 
class='nav-sidebar list'><li class='active'><a 
href='/blog/'>Blogs</a></li><li><a href='groovy-pekko-gpars'>Using Apache Pekko 
actors and GPars actors with Groovy</a></li><li><a 
href='groovy-graph-databases'>Using Graph Databases with Groovy</a></li><li><a 
href='handling-byte-order-mark-characters'>Handling Byte-Order-Mark Characters 
in Groovy</ [...]
+                        </div><div id='content' class='page-1'><div 
class='row'><div class='row-fluid'><div class='col-lg-3' id='blog-index'><ul 
class='nav-sidebar list'><li class='active'><a 
href='/blog/'>Blogs</a></li><li><a href='groovy-pekko-gpars'>Using Apache Pekko 
actors and GPars actors with Groovy</a></li><li><a 
href='groovy-graph-databases'>Using Graph Databases with Groovy</a></li><li><a 
href='handling-byte-order-mark-characters'>Handling Byte-Order-Mark Characters 
in Groovy</ [...]
                             <div class='row'>
                                 <div class='colset-3-footer'>
                                     <div class='col-1'>
diff --git a/blog/reading-and-writing-csv-files.html 
b/blog/reading-and-writing-csv-files.html
index ae32642..6d9c3b8 100644
--- a/blog/reading-and-writing-csv-files.html
+++ b/blog/reading-and-writing-csv-files.html
@@ -3,7 +3,7 @@
 <!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
 <!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
 <!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]--><head>
-    <meta charset='utf-8'/><meta http-equiv='X-UA-Compatible' 
content='IE=edge'/><meta name='viewport' content='width=device-width, 
initial-scale=1'/><meta name='keywords' content='csv, data, deserialization, 
files, groovy, reading, records, data science, serialization, writing, opencsv, 
commons csv, jackson databind, cycling'/><meta name='description' content='This 
post looks at processing CSV files using OpenCSV, Commons CSV, and Jackson 
Databind libraries.'/><title>The Apache Groovy p [...]
+    <meta charset='utf-8'/><meta http-equiv='X-UA-Compatible' 
content='IE=edge'/><meta name='viewport' content='width=device-width, 
initial-scale=1'/><meta name='keywords' content='csv, data, deserialization, 
files, groovy, reading, records, data science, serialization, writing, opencsv, 
commons csv, jackson databind, cycling'/><meta name='description' content='This 
post looks at processing CSV files using OpenCSV, Apache Commons CSV, and 
Jackson Databind libraries.'/><title>The Apache G [...]
 </head><body>
     <div id='fork-me'>
         <a href='https://github.com/apache/groovy'>

(groovy-dev-site) branch asf-site updated: 2024/11/22 11:22:30: Generated dev website from groovy-website@e37adf7

Reply via email to