This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-dev-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 6b15d3f 2024/11/22 11:22:30: Generated dev website from
groovy-website@e37adf7
6b15d3f is described below
commit 6b15d3f809d1262ee19640ffe5c3a08758757401
Author: jenkins <[email protected]>
AuthorDate: Fri Nov 22 11:22:30 2024 +0000
2024/11/22 11:22:30: Generated dev website from groovy-website@e37adf7
---
blog/community-over-code-eu-2024.html | 8 +-
blog/feed.atom | 2 +-
blog/groovy-lucene.html | 677 +++++++++++++++++++++++---------
blog/index.html | 2 +-
blog/reading-and-writing-csv-files.html | 2 +-
5 files changed, 488 insertions(+), 203 deletions(-)
diff --git a/blog/community-over-code-eu-2024.html
b/blog/community-over-code-eu-2024.html
index ccae3ae..15a7c26 100644
--- a/blog/community-over-code-eu-2024.html
+++ b/blog/community-over-code-eu-2024.html
@@ -340,23 +340,23 @@ ways to visualize the results were examined:</p>
<li>
<p>The same case study was also done using Spark:</p>
<div class="paragraph">
-<p><span class="image"><img src="img/coceu2024_whiskey1.png" alt="Whiskey
flavour profiles with Spark"></span></p>
+<p><span class="image"><img src="img/coceu2024_whiskey1.png" alt="Whiskey
flavour profiles with Apache Spark"></span></p>
</div>
</li>
<li>
-<p>The same case study was also done using Wayang:</p>
+<p>The same case study was also done using Apache Wayang:</p>
<div class="paragraph">
<p><span class="image"><img src="img/coceu2024_whiskey2.png" alt="Whiskey
flavour profiles with Wayang"></span></p>
</div>
</li>
<li>
-<p>The same case study was also done using Beam (Python-style version shown
here):</p>
+<p>The same case study was also done using Apache Beam (Python-style version
shown here):</p>
<div class="paragraph">
<p><span class="image"><img src="img/coceu2024_whiskey3.png" alt="Whiskey
flavour profiles with Beam"></span></p>
</div>
</li>
<li>
-<p>The same case study was also done using Flink:</p>
+<p>The same case study was also done using Apache Flink:</p>
<div class="paragraph">
<p><span class="image"><img src="img/coceu2024_whiskey4.png" alt="Whiskey
flavour profiles with Flink"></span></p>
</div>
diff --git a/blog/feed.atom b/blog/feed.atom
index f82ca9d..57380d9 100644
--- a/blog/feed.atom
+++ b/blog/feed.atom
@@ -564,7 +564,7 @@
<link href="http://groovy.apache.org/blog/reading-and-writing-csv-files"/>
<updated>2022-07-25T14:26:20Z</updated>
<published>2022-07-25T14:26:20Z</published>
- <summary type="html">This post looks at processing CSV files using
OpenCSV, Commons CSV, and Jackson Databind libraries.</summary>
+ <summary type="html">This post looks at processing CSV files using
OpenCSV, Apache Commons CSV, and Jackson Databind libraries.</summary>
</entry>
<entry>
<id>http://groovy.apache.org/blog/groovy-release-train-4-0</id>
diff --git a/blog/groovy-lucene.html b/blog/groovy-lucene.html
index 41a0c8b..28397ab 100644
--- a/blog/groovy-lucene.html
+++ b/blog/groovy-lucene.html
@@ -53,11 +53,15 @@
</ul>
</div>
</div>
- </div><div id='content' class='page-1'><div
class='row'><div class='row-fluid'><div class='col-lg-3'><ul
class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a
href='#doc'>Searching with Lucene</a></li><li><a
href='#_finding_project_names_with_a_regex' class='anchor-link'>Finding project
names with a regex</a></li><li><a
href='#_finding_project_names_using_regex_matching' class='anchor-link'>Finding
project names using regex matching</a></li [...]
+ </div><div id='content' class='page-1'><div
class='row'><div class='row-fluid'><div class='col-lg-3'><ul
class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a
href='#doc'>Searching with Lucene</a></li><li><a
href='#_finding_project_names_with_a_regex' class='anchor-link'>Finding project
names with a regex</a></li><li><a
href='#_finding_project_names_using_regex_matching' class='anchor-link'>Finding
project names using regex matching</a></li [...]
<div class="sectionbody">
<div class="paragraph">
<p>The Groovy <a href="https://groovy.apache.org/blog/">blog posts</a> often
reference other Apache projects.
-Let’s have a look at how we can find such references, first using
regular expressions
+Given that these pages are published, we could use something like <a
href="https://nutch.apache.org">Apache Nutch</a> or
+<a href="https://solr.apache.org">Apache Solr</a> to crawl/index those web
pages and search using those tools.
+For this post, we are going to search for the
+information we require from the original source (<a
href="https://asciidoc.org/">AsciiDoc</a>) files.
+We’ll first look at how we can find project references using regular
expressions
and then using Apache Lucene.</p>
</div>
</div>
@@ -85,33 +89,24 @@ so we won’t.</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">String tokenRegex
= /(?ix) # ignore case, enable whitespace & comments
- \b # word boundary
- ( # start capture of all terms
- ( # capture project name
- (apache|eclipse)\s # foundation name
- (commons\s)? # optional subproject name
- ( # capture next word unless excluded word
- ?!(
- groovy # excluded words
- | and
- | license
- | users
- | software
- | projects
- | https
- | or
- | prefixes
- | technologies
- )
- )\w+ # end capture #2
- )
- | # alternatively
- ( # capture non-project word
- (?!(apache|eclipse))
- \w+
- ) # end capture #3
- ) # end capture #1
+<pre class="prettyprint highlight"><code data-lang="groovy">String tokenRegex
= /(?ix) # ignore case, enable whitespace & comments
+ \b # word boundary
+ ( # start capture of all terms
+ ( # capture project name term
+ (apache|eclipse)\s # foundation name
+ (commons\s)? # optional subproject name
+ (
+ ?!(groovy # negative lookahead for excluded
words
+ | and | license | users
+ | https | projects | software
+ | or | prefixes | technologies)
+ )\w+
+ ) # end capture project name term
+ | # alternatively
+ ( # capture non-project term
+ \w+?\b # non-greedily match any other words
+ ) # end capture non-project term
+ ) # end capture term
/</code></pre>
</div>
</div>
@@ -127,14 +122,31 @@ Feel free to make a compact (long) one-liner without
comments if you prefer.</p>
<div class="sectionbody">
<div class="paragraph">
<p>With our regex sorted, let’s look at how you could use a Groovy
matcher
-to find all the project names.</p>
+to find all the project names. First we’ll define one other common
constant,
+the base directory for our blogs, which you might need to change if you
+are wanting to follow along and run these examples:</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">var blogBaseDir =
'/projects/apache-websites/groovy-website/site/src/site/blog' // <b
class="conum">(1)</b>
-var histogram = [:].withDefault { 0 }
+<pre class="prettyprint highlight"><code data-lang="groovy">String baseDir =
'/projects/apache-websites/groovy-website/site/src/site/blog' // <b
class="conum">(1)</b></code></pre>
+</div>
+</div>
+<div class="colist arabic">
+<ol>
+<li>
+<p>You’d need to check out the Groovy website and point to it here</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>Now our script will traverse all the files in that directory, processing
them with our regex
+and track the hits we find.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var histogram =
[:].withDefault { 0 } // <b class="conum">(1)</b>
-new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file -> // <b
class="conum">(2)</b>
+new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file -> // <b
class="conum">(2)</b>
var m = file.text =~ tokenRegex // <b class="conum">(3)</b>
var projects = m*.get(2).grep()*.toLowerCase()*.replaceAll('\n', ' ') //
<b class="conum">(4)</b>
var counts = projects.countBy() // <b class="conum">(5)</b>
@@ -144,7 +156,7 @@ new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) {
file -> // <b clas
}
}
-println()
+println "\nFrequency of total hits mentioning a project:"
histogram.sort { e -> -e.value }.each { k, v -> // <b
class="conum">(8)</b>
var label = "$k ($v)"
println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
@@ -154,10 +166,10 @@ histogram.sort { e -> -e.value }.each { k, v -> //
<b class="conum">(8)</b
<div class="colist arabic">
<ol>
<li>
-<p>You’d need to check out the Groovy website and point to it here</p>
+<p>This is a map which provides a default value for non-existent keys</p>
</li>
<li>
-<p>This traverse the directory processing each asciidoc file</p>
+<p>This traverse the directory processing each AsciiDoc file</p>
</li>
<li>
<p>We define our matcher</p>
@@ -185,7 +197,7 @@ histogram.sort { e -> -e.value }.each { k, v -> // <b
class="conum">(8)</b
<pre>
apache-nlpcraft-with-groovy.adoc: [apache nlpcraft:5]
classifying-iris-flowers-with-deep.adoc: [eclipse deeplearning4j:5,
apache commons math:1, apache spark:2]
-community-over-code-eu-2024.adoc: [apache ofbiz:1, apache commons
math:2, apache ignite:1]
+community-over-code-eu-2024.adoc: [apache ofbiz:1, apache commons
math:2, apache ignite:1, apache spark:1, apache wayang:1,
apache beam:1, apache flink:1]
community-over-code-na-2023.adoc: [apache ignite:8, apache commons
numbers:1, apache commons csv:1]
deck-of-cards-with-groovy.adoc: [eclipse collections:5]
deep-learning-and-eclipse-collections.adoc: [eclipse collections:7,
eclipse deeplearning4j:2]
@@ -196,7 +208,7 @@ groovy-2-5-clibuilder-renewal.adoc: [apache commons
cli:2]
groovy-graph-databases.adoc: [apache age:11, apache hugegraph:3,
apache tinkerpop:3]
groovy-haiku-processing.adoc: [eclipse collections:3]
groovy-list-processing-cheat-sheet.adoc: [eclipse collections:4,
apache commons collections:3]
-groovy-lucene.adoc: [apache lucene:2, apache commons:1,
apache commons math:2]
+groovy-lucene.adoc: [apache nutch:1, apache solr:1,
apache lucene:2, apache commons:1, apache commons math:2]
groovy-null-processing.adoc: [eclipse collections:6, apache commons
collections:4]
groovy-pekko-gpars.adoc: [apache pekko:4]
groovy-record-performance.adoc: [apache commons codec:1]
@@ -204,7 +216,7 @@ handling-byte-order-mark-characters.adoc:
[apache commons io:1]
lego-bricks-with-groovy.adoc: [eclipse collections:6]
matrix-calculations-with-groovy-apache.adoc: [apache commons math:6,
eclipse deeplearning4j:1, apache commons:1]
natural-language-processing-with-groovy.adoc: [apache opennlp:2,
apache spark:1]
-reading-and-writing-csv-files.adoc: [apache commons csv:1]
+reading-and-writing-csv-files.adoc: [apache commons csv:2]
set-operations-with-groovy.adoc: [eclipse collections:3]
solving-simple-optimization-problems-with-groovy.adoc: [apache commons
math:5, apache kie:1]
using-groovy-with-apache-wayang.adoc: [apache wayang:9,
apache spark:7, apache flink:1, apache commons csv:1,
apache ignite:1]
@@ -212,38 +224,42 @@ whiskey-clustering-with-groovy-and.adoc:
[apache ignite:7, apache waya
wordle-checker.adoc: [eclipse collections:3]
zipping-collections-with-groovy.adoc: [eclipse collections:4]
+Frequency of total hits mentioning a project:
eclipse collections (50)
██████████████████████████████████████████████████▏
apache commons math (18) ██████████████████▏
apache ignite (17) █████████████████▏
-apache spark (12) ████████████▏
+apache spark (13) █████████████▏
apache mxnet (12) ████████████▏
+apache wayang (11) ███████████▏
apache age (11) ███████████▏
-apache wayang (10) ██████████▏
eclipse deeplearning4j (8) ████████▏
apache commons collections (7) ███████▏
+apache commons csv (6) ██████▏
apache nlpcraft (5) █████▏
-apache commons csv (5) █████▏
apache pekko (4) ████▏
apache hugegraph (3) ███▏
apache tinkerpop (3) ███▏
+apache flink (2) ██▏
apache commons cli (2) ██▏
-apache commons (2) ██▏
apache lucene (2) ██▏
+apache commons (2) ██▏
apache opennlp (2) ██▏
apache ofbiz (1) █▏
+apache beam (1) █▏
apache commons numbers (1) █▏
+apache nutch (1) █▏
+apache solr (1) █▏
apache commons codec (1) █▏
apache commons io (1) █▏
apache kie (1) █▏
-apache flink (1) █▏
</pre>
</div>
</div>
<div class="sect1">
-<h2 id="_using_lucene">Using Lucene</h2>
+<h2 id="_indexing_with_lucene">Indexing with Lucene</h2>
<div class="sectionbody">
<div class="paragraph">
-<p><span class="image right"><img
src="https://www.apache.org/logos/res/lucene/default.png" alt="lucene logo"
width="100"></span>
+<p><span class="image right"><img
src="https://www.apache.org/logos/res/lucene/default.png" alt="lucene logo"
width="200"></span>
Okay, regular expressions weren’t that hard but in general we might want
to search many things.
Search frameworks like Lucene help with that. Let’s see what it looks
like to apply
Lucene to our problem.</p>
@@ -252,15 +268,19 @@ Lucene to our problem.</p>
<p>First, we’ll define a custom analyzer. Lucene is very flexible and
comes with builtin
analyzers. In a typical scenario, we might just search on all words.
There’s a builtin analyzer for that.
-If we used that, to query for our project names,
+If we used one of the builtin analyzers, to query for our project names,
we’d construct a query that spanned multiple (word) terms.
-For the purposes of our little example, we are going to assume project names
-are indivisible terms and slice them up that way. There is a pattern tokenizer
+We’ll look at what that might look like later, but
+for the purposes of our little example, we are going to assume project names
+are indivisible terms and slice up our documents that way.</p>
+</div>
+<div class="paragraph">
+<p>Luckily, Lucene has a pattern tokenizer
which lets us reuse our existing regex.</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">class
ApacheProjectAnalyzer extends Analyzer {
+<pre class="prettyprint highlight"><code data-lang="groovy">class
ProjectNameAnalyzer extends Analyzer {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
var src = new PatternTokenizer(~tokenRegex, 0)
@@ -275,74 +295,281 @@ which lets us reuse our existing regex.</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new
ApacheProjectAnalyzer() // <b class="conum">(1)</b>
+<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new
ProjectNameAnalyzer() // <b class="conum">(1)</b>
var indexDir = new ByteBuffersDirectory() // <b class="conum">(2)</b>
var config = new IndexWriterConfig(analyzer)
-var blogBaseDir = '/projects/apache-websites/groovy-website/site/src/site/blog'
new IndexWriter(indexDir, config).withCloseable { writer ->
- new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->
+ var indexedWithFreq = new FieldType(stored: true,
+ indexOptions: IndexOptions.DOCS_AND_FREQS,
+ storeTermVectors: true)
+ new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->
file.withReader { br ->
var document = new Document()
- var fieldType = new FieldType(stored: true,
- indexOptions:
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
- storeTermVectors: true,
- storeTermVectorPositions: true,
- storeTermVectorOffsets: true)
- document.add(new Field('content', br.text, fieldType)) // <b
class="conum">(3)</b>
+ document.add(new Field('content', br.text, indexedWithFreq)) // <b
class="conum">(3)</b>
document.add(new StringField('name', file.name, Field.Store.YES))
// <b class="conum">(4)</b>
writer.addDocument(document)
}
}
-}
-
-var reader = DirectoryReader.open(indexDir)
-var searcher = new IndexSearcher(reader)
-var parser = new QueryParser("content", analyzer)
-
-var query = parser.parse('apache* OR eclipse*') // <b class="conum">(5)</b>
-var results = searcher.search(query, 30) // <b class="conum">(6)</b>
-println "Total documents with hits for $query --> $results.totalHits"
+}</code></pre>
+</div>
+</div>
+<div class="colist arabic">
+<ol>
+<li>
+<p>This is our regex-based analyzer</p>
+</li>
+<li>
+<p>We’ll use a memory-based index for our little example</p>
+</li>
+<li>
+<p>Store content of document along with term position info</p>
+</li>
+<li>
+<p>Also store the name of the file</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>With an index defined, we’d typically now perform some kind of search.
+We’ll do just that shortly, but first for the kind of information we are
interested in,
+part of the Lucene API lets us explore the index. Here is how we might do
that:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var reader =
DirectoryReader.open(indexDir)
+var vectors = reader.termVectors()
+var storedFields = reader.storedFields()
-var storedFields = searcher.storedFields()
-var histogram = [:].withDefault { 0 }
-results.scoreDocs.each { ScoreDoc doc -> // <b class="conum">(7)</b>
- var document = storedFields.document(doc.doc)
- var found = handleHit(doc, query, reader) // <b class="conum">(8)</b>
- println "${document.get('name')}: ${found*.replaceAll('\n', '
').countBy()}"
- found.each { histogram[it.replaceAll('\n', ' ')] += 1 } // <b
class="conum">(9)</b>
+Set projects = []
+for (docId in 0..<reader.maxDoc()) {
+ String name = storedFields.document(docId).get('name')
+ TermsEnum terms = vectors.get(docId, 'content').iterator() // <b
class="conum">(1)</b>
+ var found = [:]
+ while (terms.next() != null) {
+ PostingsEnum postingsEnum = terms.postings(null, PostingsEnum.ALL)
+ while (postingsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
+ int freq = postingsEnum.freq()
+ var string = terms.term().utf8ToString().replaceAll('\n', ' ')
+ if (string.startsWith('apache ') || string.startsWith('eclipse '))
{ // <b class="conum">(2)</b>
+ found[string] = freq
+ }
+ }
+ }
+ if (found) {
+ println "$name: $found"
+ projects += found.keySet()
+ }
}
-println()
-histogram.sort { e -> -e.value }.each { k, v -> // <b
class="conum">(10)</b>
+var terms = projects.collect { name -> new Term('content', name) }
+var byReverseValue = { e -> -e.value }
+
+println "\nFrequency of total hits mentioning a project (top 10):"
+var termFreq = terms.collectEntries { term -> [term.text(),
reader.totalTermFreq(term)] } // <b class="conum">(3)</b>
+termFreq.sort(byReverseValue).take(10).each { k, v ->
var label = "$k ($v)"
println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
}
-List<String> handleHit(ScoreDoc hit, Query query, DirectoryReader
dirReader) { // <b class="conum">(11)</b>
- boolean phraseHighlight = true
- boolean fieldMatch = true
- FieldQuery fieldQuery = new FieldQuery(query, dirReader, phraseHighlight,
fieldMatch)
- FieldTermStack stack = new FieldTermStack(dirReader, hit.doc, 'content',
fieldQuery)
- FieldPhraseList phrases = new FieldPhraseList(stack, fieldQuery)
- phrases.phraseList*.termsInfos*.text.flatten()
+println "\nFrequency of documents mentioning a project (top 10):"
+var docFreq = terms.collectEntries { term -> [term.text(),
reader.docFreq(term)] } // <b class="conum">(4)</b>
+docFreq.sort(byReverseValue).take(10).each { k, v ->
+ var label = "$k ($v)"
+ println "${label.padRight(32)} ${bar(v * 2, 0, 20, 20)}"
}</code></pre>
</div>
</div>
<div class="colist arabic">
<ol>
<li>
-<p>This is our regex-based analyzer</p>
+<p>Get all index terms</p>
</li>
<li>
-<p>We’ll use a memory-based index for our little example</p>
+<p>Look for terms which match project names, so we can save them to a set</p>
</li>
<li>
-<p>Store content of document along with term position info</p>
+<p>Grab hit frequency metadata for our term</p>
</li>
<li>
-<p>Also store the name of the file</p>
+<p>Grab document frequency metadata for our term</p>
</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>When we run this we see:</p>
+</div>
+<pre>
+apache-nlpcraft-with-groovy.adoc: [apache nlpcraft:5]
+classifying-iris-flowers-with-deep.adoc: [apache commons math:1,
apache spark:2, eclipse deeplearning4j:5]
+community-over-code-eu-2024.adoc: [apache beam:1, apache commons
math:2, apache flink:1, apache ignite:1, apache ofbiz:1,
apache spark:1, apache wayang:1]
+community-over-code-na-2023.adoc: [apache commons csv:1,
apache commons numbers:1, apache ignite:8]
+deck-of-cards-with-groovy.adoc: [eclipse collections:5]
+deep-learning-and-eclipse-collections.adoc: [eclipse collections:7,
eclipse deeplearning4j:2]
+detecting-objects-with-groovy-the.adoc: [apache mxnet:12]
+fruity-eclipse-collections.adoc: [apache commons math:1,
eclipse collections:9]
+fun-with-obfuscated-groovy.adoc: [apache commons math:1]
+groovy-2-5-clibuilder-renewal.adoc: [apache commons cli:2]
+groovy-graph-databases.adoc: [apache age:11, apache hugegraph:3,
apache tinkerpop:3]
+groovy-haiku-processing.adoc: [eclipse collections:3]
+groovy-list-processing-cheat-sheet.adoc: [apache commons collections:3,
eclipse collections:4]
+groovy-lucene.adoc: [apache commons:1, apache commons math:2,
apache lucene:2, apache nutch:1, apache solr:1]
+groovy-null-processing.adoc: [apache commons collections:4,
eclipse collections:6]
+groovy-pekko-gpars.adoc: [apache pekko:4]
+groovy-record-performance.adoc: [apache commons codec:1]
+handling-byte-order-mark-characters.adoc: [apache commons io:1]
+lego-bricks-with-groovy.adoc: [eclipse collections:6]
+matrix-calculations-with-groovy-apache.adoc: [apache commons:1,
apache commons math:6, eclipse deeplearning4j:1]
+natural-language-processing-with-groovy.adoc: [apache opennlp:2,
apache spark:1]
+reading-and-writing-csv-files.adoc: [apache commons csv:2]
+set-operations-with-groovy.adoc: [eclipse collections:3]
+solving-simple-optimization-problems-with-groovy.adoc: [apache commons
math:4, apache kie:1]
+using-groovy-with-apache-wayang.adoc: [apache commons csv:1,
apache flink:1, apache ignite:1, apache spark:7,
apache wayang:9]
+whiskey-clustering-with-groovy-and.adoc: [apache commons csv:2,
apache ignite:7, apache spark:2, apache wayang:1]
+wordle-checker.adoc: [eclipse collections:3]
+zipping-collections-with-groovy.adoc: [eclipse collections:4]
+
+Frequency of total hits mentioning a project (top 10):
+eclipse collections (50)
██████████████████████████████████████████████████▏
+apache commons math (17) █████████████████▏
+apache ignite (17) █████████████████▏
+apache spark (13) █████████████▏
+apache mxnet (12) ████████████▏
+apache wayang (11) ███████████▏
+apache age (11) ███████████▏
+eclipse deeplearning4j (8) ████████▏
+apache commons collections (7) ███████▏
+apache commons csv (6) ██████▏
+
+Frequency of documents mentioning a project (top 10):
+eclipse collections (10) ████████████████████▏
+apache commons math (7) ██████████████▏
+apache spark (5) ██████████▏
+apache ignite (4) ████████▏
+apache commons csv (4) ████████▏
+eclipse deeplearning4j (3) ██████▏
+apache wayang (3) ██████▏
+apache flink (2) ████▏
+apache commons collections (2) ████▏
+apache commons (2) ████▏
+
+</pre>
+<div class="paragraph">
+<p>So far, we have just displayed curated metadata about our index.
+But just to show that we have an index that supports searching,
+let’s look for all documents which mention emojis.
+They often make programming examples a lot of fun!</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var parser = new
QueryParser("content", analyzer)
+var searcher = new IndexSearcher(reader)
+var query = parser.parse('emoji*')
+var results = searcher.search(query, 10)
+println "\nTotal documents with hits for $query --> $results.totalHits"
+results.scoreDocs.each {
+ var doc = storedFields.document(it.doc)
+ println "${doc.get('name')}"
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When we run this we see:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Total documents with hits for content:emoji* --> 11 hits
+adventures-with-groovyfx.adoc
+create-groovy-blog.adoc
+deep-learning-and-eclipse-collections.adoc
+fruity-eclipse-collections.adoc
+groovy-haiku-processing.adoc
+groovy-lucene.adoc
+helloworldemoji.adoc
+seasons-greetings-emoji.adoc
+set-operations-with-groovy.adoc
+solving-simple-optimization-problems-with-groovy.adoc</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Lucene has a very rich API. Let’s now look at some alternative
+ways we could use Lucene.</p>
+</div>
+<div class="paragraph">
+<p>Rather than exploring index metadata, we’d more typically run queries
+and explore those results. We’ll look at how to do that now.
+When exploring query results, we are going to use some classes in the
<code>vectorhighlight</code>
+package in the <code>lucene-highlight</code> module. You’d typically use
functionality in that
+module to highlight hits as part of potentially displaying them on a web page
+as part of some web search functionality. For us, we are going to just
+pick out the terms of interest, project names that matching our query.</p>
+</div>
+<div class="paragraph">
+<p>We the highlight functionality to work, we ask the indexer to store some
additional information
+when indexing about term positions. The index code changes to look like
this:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">new
IndexWriter(indexDir, config).withCloseable { writer ->
+ new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->
+ file.withReader { br ->
+ var document = new Document()
+ var fieldType = new FieldType(stored: true,
+ indexOptions:
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
+ storeTermVectors: true,
+ storeTermVectorPositions: true,
+ storeTermVectorOffsets: true)
+ document.add(new Field('content', br.text, fieldType))
+ document.add(new StringField('name', file.name, Field.Store.YES))
+ writer.addDocument(document)
+ }
+ }
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We could have stored this additional information even for our previous
example,
+but it wasn’t needed previously.</p>
+</div>
+<div class="paragraph">
+<p>Next, we define a helper method to extract the actual project names from
matches:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">List<String>
handleHit(ScoreDoc hit, Query query, DirectoryReader dirReader) {
+ boolean phraseHighlight = true
+ boolean fieldMatch = true
+ var fieldQuery = new FieldQuery(query, dirReader, phraseHighlight,
fieldMatch)
+ var stack = new FieldTermStack(dirReader, hit.doc, 'content', fieldQuery)
+ var phrases = new FieldPhraseList(stack, fieldQuery)
+ phrases.phraseList*.termsInfos*.text.flatten()
+}</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var query =
parser.parse(/apache\ * OR eclipse\ */) // <b class="conum">(1)</b>
+var results = searcher.search(query, 30) // <b class="conum">(2)</b>
+println "Total documents with hits for $query --> $results.totalHits\n"
+
+var storedFields = searcher.storedFields()
+var histogram = [:].withDefault { 0 }
+results.scoreDocs.each { ScoreDoc scoreDoc -> // <b class="conum">(3)</b>
+ var doc = storedFields.document(scoreDoc.doc)
+ var found = handleHit(scoreDoc, query, reader) // <b class="conum">(4)</b>
+ println "${doc.get('name')}: ${found*.replaceAll('\n', ' ').countBy()}"
+ found.each { histogram[it.replaceAll('\n', ' ')] += 1 } // <b
class="conum">(5)</b>
+}
+
+println "\nFrequency of total hits mentioning a project:"
+histogram.sort { e -> -e.value }.each { k, v -> // <b
class="conum">(6)</b>
+ var label = "$k ($v)"
+ println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
+}</code></pre>
+</div>
+</div>
+<div class="colist arabic">
+<ol>
<li>
<p>Search for terms with the apache or eclipse prefixes</p>
</li>
@@ -361,23 +588,21 @@ List<String> handleHit(ScoreDoc hit, Query query,
DirectoryReader dirReade
<li>
<p>Display the aggregates as a pretty barchart</p>
</li>
-<li>
-<p>Helper method</p>
-</li>
</ol>
</div>
<div class="paragraph">
<p>The output is essentially the same as before:</p>
</div>
<pre>
-Total documents with hits for content:apache* content:eclipse* --> 28 hits
+Total documents with hits for content:apache * content:eclipse * -->
28 hits
+
classifying-iris-flowers-with-deep.adoc: [eclipse deeplearning4j:5,
apache commons math:1, apache spark:2]
fruity-eclipse-collections.adoc: [eclipse collections:9,
apache commons math:1]
groovy-list-processing-cheat-sheet.adoc: [eclipse collections:4,
apache commons collections:3]
groovy-null-processing.adoc: [eclipse collections:6, apache commons
collections:4]
matrix-calculations-with-groovy-apache.adoc: [apache commons math:6,
eclipse deeplearning4j:1, apache commons:1]
apache-nlpcraft-with-groovy.adoc: [apache nlpcraft:5]
-community-over-code-eu-2024.adoc: [apache ofbiz:1, apache commons
math:2, apache ignite:1]
+community-over-code-eu-2024.adoc: [apache ofbiz:1, apache commons
math:2, apache ignite:1, apache spark:1, apache wayang:1,
apache beam:1, apache flink:1]
community-over-code-na-2023.adoc: [apache ignite:8, apache commons
numbers:1, apache commons csv:1]
deck-of-cards-with-groovy.adoc: [eclipse collections:5]
deep-learning-and-eclipse-collections.adoc: [eclipse collections:7,
eclipse deeplearning4j:2]
@@ -386,13 +611,13 @@ fun-with-obfuscated-groovy.adoc: [apache commons
math:1]
groovy-2-5-clibuilder-renewal.adoc: [apache commons cli:2]
groovy-graph-databases.adoc: [apache age:11, apache hugegraph:3,
apache tinkerpop:3]
groovy-haiku-processing.adoc: [eclipse collections:3]
-groovy-lucene.adoc: [apache lucene:2, apache commons:1,
apache commons math:2]
+groovy-lucene.adoc: [apache nutch:1, apache solr:1,
apache lucene:2, apache commons:1, apache commons math:2]
groovy-pekko-gpars.adoc: [apache pekko:4]
groovy-record-performance.adoc: [apache commons codec:1]
handling-byte-order-mark-characters.adoc: [apache commons io:1]
lego-bricks-with-groovy.adoc: [eclipse collections:6]
natural-language-processing-with-groovy.adoc: [apache opennlp:2,
apache spark:1]
-reading-and-writing-csv-files.adoc: [apache commons csv:1]
+reading-and-writing-csv-files.adoc: [apache commons csv:2]
set-operations-with-groovy.adoc: [eclipse collections:3]
solving-simple-optimization-problems-with-groovy.adoc: [apache commons
math:5, apache kie:1]
using-groovy-with-apache-wayang.adoc: [apache wayang:9,
apache spark:7, apache flink:1, apache commons csv:1,
apache ignite:1]
@@ -400,30 +625,17 @@ whiskey-clustering-with-groovy-and.adoc:
[apache ignite:7, apache waya
wordle-checker.adoc: [eclipse collections:3]
zipping-collections-with-groovy.adoc: [eclipse collections:4]
+Frequency of total hits mentioning a project (top 10):
eclipse collections (50)
██████████████████████████████████████████████████▏
apache commons math (18) ██████████████████▏
apache ignite (17) █████████████████▏
-apache spark (12) ████████████▏
+apache spark (13) █████████████▏
apache mxnet (12) ████████████▏
+apache wayang (11) ███████████▏
apache age (11) ███████████▏
-apache wayang (10) ██████████▏
eclipse deeplearning4j (8) ████████▏
apache commons collections (7) ███████▏
-apache nlpcraft (5) █████▏
-apache commons csv (5) █████▏
-apache pekko (4) ████▏
-apache hugegraph (3) ███▏
-apache tinkerpop (3) ███▏
-apache commons (2) ██▏
-apache commons cli (2) ██▏
-apache lucene (2) ██▏
-apache opennlp (2) ██▏
-apache ofbiz (1) █▏
-apache commons numbers (1) █▏
-apache commons codec (1) █▏
-apache commons io (1) █▏
-apache kie (1) █▏
-apache flink (1) █▏
+apache commons csv (6) ██████▏
</pre>
</div>
</div>
@@ -432,7 +644,7 @@ apache flink (1) █▏
<div class="sectionbody">
<div class="listingblock">
<div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new
ApacheProjectAnalyzer()
+<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new
ProjectNameAnalyzer()
var indexDir = new ByteBuffersDirectory()
var taxonDir = new ByteBuffersDirectory()
var config = new IndexWriterConfig(analyzer)
@@ -447,18 +659,15 @@ var fConfig = new FacetsConfig().tap {
setIndexFieldName('projectHitCounts', '$projectHitCounts')
}
-var blogBaseDir = '/projects/apache-websites/groovy-website/site/src/site/blog'
-new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->
+new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->
var m = file.text =~ tokenRegex
var projects = m*.get(2).grep()*.toLowerCase()*.replaceAll('\n', '
').countBy()
file.withReader { br ->
var document = new Document()
- var fieldType = new FieldType(stored: true,
- indexOptions:
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
- storeTermVectors: true,
- storeTermVectorPositions: true,
- storeTermVectorOffsets: true)
- document.add(new Field('content', br.text, fieldType))
+ var indexedWithFreq = new FieldType(stored: true,
+ indexOptions: IndexOptions.DOCS_AND_FREQS,
+ storeTermVectors: true)
+ document.add(new Field('content', br.text, indexedWithFreq))
document.add(new StringField('name', file.name, Field.Store.YES))
if (projects) {
println "$file.name: $projects"
@@ -472,45 +681,13 @@ new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) {
file ->
}
}
indexWriter.close()
-taxonWriter.close()
-println()
-
-var reader = DirectoryReader.open(indexDir)
-var searcher = new IndexSearcher(reader)
-var taxonReader = new DirectoryTaxonomyReader(taxonDir)
-var fcm = new FacetsCollectorManager()
-var fc = FacetsCollectorManager.search(searcher, new MatchAllDocsQuery(), 10,
fcm).facetsCollector()
-
-var projects = new TaxonomyFacetIntAssociations('$projectHitCounts',
taxonReader, fConfig, fc, AssociationAggregationFunction.SUM)
-var hitCounts = projects.getTopChildren(10, "projectHitCounts")
-println hitCounts
-
-var facets = new FastTaxonomyFacetCounts(taxonReader, fConfig, fc)
-var fileCounts = facets.getTopChildren(10, "projectFileCounts")
-println fileCounts
-
-var nameCounts = facets.getTopChildren(10, "projectNameCounts")
-println nameCounts
-nameCounts = facets.getTopChildren(10, "projectNameCounts", 'apache')
-println nameCounts
-nameCounts = facets.getTopChildren(10, "projectNameCounts", 'apache',
'commons')
-println nameCounts
-
-var parser = new QueryParser("content", analyzer)
-var query = parser.parse('apache* AND eclipse*')
-var results = searcher.search(query, 10)
-println "Total documents with hits for $query --> $results.totalHits"
-var storedFields = searcher.storedFields()
-results.scoreDocs.each { ScoreDoc doc ->
- var document = storedFields.document(doc.doc)
- println "${document.get('name')}"
-}</code></pre>
+taxonWriter.close()</code></pre>
</div>
</div>
<pre>
apache-nlpcraft-with-groovy.adoc: [apache nlpcraft:5]
classifying-iris-flowers-with-deep.adoc: [eclipse deeplearning4j:5,
apache commons math:1, apache spark:2]
-community-over-code-eu-2024.adoc: [apache ofbiz:1, apache commons
math:2, apache ignite:1]
+community-over-code-eu-2024.adoc: [apache ofbiz:1, apache commons
math:2, apache ignite:1, apache spark:1, apache wayang:1,
apache beam:1, apache flink:1]
community-over-code-na-2023.adoc: [apache ignite:8, apache commons
numbers:1, apache commons csv:1]
deck-of-cards-with-groovy.adoc: [eclipse collections:5]
deep-learning-and-eclipse-collections.adoc: [eclipse collections:7,
eclipse deeplearning4j:2]
@@ -521,7 +698,7 @@ groovy-2-5-clibuilder-renewal.adoc: [apache commons
cli:2]
groovy-graph-databases.adoc: [apache age:11, apache hugegraph:3,
apache tinkerpop:3]
groovy-haiku-processing.adoc: [eclipse collections:3]
groovy-list-processing-cheat-sheet.adoc: [eclipse collections:4,
apache commons collections:3]
-groovy-lucene.adoc: [apache lucene:2, apache commons:1,
apache commons math:2]
+groovy-lucene.adoc: [apache nutch:1, apache solr:1,
apache lucene:2, apache commons:1, apache commons math:2]
groovy-null-processing.adoc: [eclipse collections:6, apache commons
collections:4]
groovy-pekko-gpars.adoc: [apache pekko:4]
groovy-record-performance.adoc: [apache commons codec:1]
@@ -529,7 +706,7 @@ handling-byte-order-mark-characters.adoc:
[apache commons io:1]
lego-bricks-with-groovy.adoc: [eclipse collections:6]
matrix-calculations-with-groovy-apache.adoc: [apache commons math:6,
eclipse deeplearning4j:1, apache commons:1]
natural-language-processing-with-groovy.adoc: [apache opennlp:2,
apache spark:1]
-reading-and-writing-csv-files.adoc: [apache commons csv:1]
+reading-and-writing-csv-files.adoc: [apache commons csv:2]
set-operations-with-groovy.adoc: [eclipse collections:3]
solving-simple-optimization-problems-with-groovy.adoc: [apache commons
math:5, apache kie:1]
using-groovy-with-apache-wayang.adoc: [apache wayang:9,
apache spark:7, apache flink:1, apache commons csv:1,
apache ignite:1]
@@ -537,62 +714,170 @@ whiskey-clustering-with-groovy-and.adoc:
[apache ignite:7, apache waya
wordle-checker.adoc: [eclipse collections:3]
zipping-collections-with-groovy.adoc: [eclipse collections:4]
-dim=projectHitCounts path=[] value=-1 childCount=24
- eclipse collections (50)
- apache commons math (18)
- apache ignite (17)
- apache spark (12)
- apache mxnet (12)
- apache age (11)
- apache wayang (10)
- eclipse deeplearning4j (8)
- apache commons collections (7)
- apache nlpcraft (5)
-
-dim=projectFileCounts path=[] value=-1 childCount=24
+</pre>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var reader =
DirectoryReader.open(indexDir)
+var searcher = new IndexSearcher(reader)
+var taxonReader = new DirectoryTaxonomyReader(taxonDir)
+var fcm = new FacetsCollectorManager()
+var fc = FacetsCollectorManager.search(searcher, new MatchAllDocsQuery(), 0,
fcm).facetsCollector()
+
+var topN = 5
+var projects = new TaxonomyFacetIntAssociations('$projectHitCounts',
taxonReader, fConfig, fc, AssociationAggregationFunction.SUM)
+var hitCounts = projects.getTopChildren(topN,
"projectHitCounts").labelValues.collect{
+ [label: it.label, hits: it.value, files: it.count]
+}
+
+println "\nFrequency of total hits mentioning a project (top $topN):"
+hitCounts.sort{ m -> -m.hits }.each { m ->
+ var label = "$m.label ($m.hits)"
+ println "${label.padRight(32)} ${bar(m.hits, 0, 50, 50)}"
+}
+
+println "\nFrequency of documents mentioning a project (top $topN):"
+hitCounts.sort{ m -> -m.files }.each { m ->
+ var label = "$m.label ($m.files)"
+ println "${label.padRight(32)} ${bar(m.files * 2, 0, 20, 20)}"
+}</code></pre>
+</div>
+</div>
+<pre>
+Frequency of total hits mentioning a project (top 5):
+eclipse collections (50)
██████████████████████████████████████████████████▏
+apache commons math (18) ██████████████████▏
+apache ignite (17) █████████████████▏
+apache spark (13) █████████████▏
+apache mxnet (12) ████████████▏
+
+Frequency of documents mentioning a project (top 5):
+eclipse collections (10) ████████████████████▏
+apache commons math (7) ██████████████▏
+apache spark (5) ██████████▏
+apache ignite (4) ████████▏
+apache mxnet (1) ██▏
+
+</pre>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var facets = new
FastTaxonomyFacetCounts(taxonReader, fConfig, fc)
+
+println "\nFrequency of documents mentioning a project (top $topN):"
+var fileCounts = facets.getTopChildren(topN, "projectFileCounts")
+println fileCounts</code></pre>
+</div>
+</div>
+<pre>
+Frequency of documents mentioning a project (top 5):
+dim=projectFileCounts path=[] value=-1 childCount=27
eclipse collections (10)
apache commons math (7)
- apache spark (4)
+ apache spark (5)
apache ignite (4)
- apache commons csv (4)
- eclipse deeplearning4j (3)
- apache commons collections (2)
- apache commons (2)
- apache wayang (2)
- apache nlpcraft (1)
+ apache commons csv (4)
+</pre>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">['apache',
'commons'].inits().reverseEach { path ->
+ println "Frequency of documents mentioning a project with path $path (top
$topN):"
+ var nameCounts = facets.getTopChildren(topN, "projectNameCounts", *path)
+ println "$nameCounts"
+}</code></pre>
+</div>
+</div>
+<pre>
+Frequency of documents mentioning a project with path [] (top 5):
dim=projectNameCounts path=[] value=-1 childCount=2
apache (21)
eclipse (12)
-dim=projectNameCounts path=[apache] value=-1 childCount=15
+Frequency of documents mentioning a project with path [apache] (top 5):
+dim=projectNameCounts path=[apache] value=-1 childCount=18
commons (16)
- spark (4)
+ spark (5)
ignite (4)
- wayang (2)
- nlpcraft (1)
- ofbiz (1)
- mxnet (1)
- age (1)
- hugegraph (1)
- tinkerpop (1)
-
-dim=projectNameCounts path=[apache, commons] value=-1 childCount=7
+ wayang (3)
+ flink (2)
+
+Frequency of documents mentioning a project with path [apache, commons] (top
5):
+dim=projectNameCounts path=[apache, commons] value=-1 childCount=7
math (7)
csv (4)
collections (2)
numbers (1)
cli (1)
- codec (1)
- io (1)
-Total documents with hits for +content:apache* +content:eclipse* --> 5 hits
-classifying-iris-flowers-with-deep.adoc
-fruity-eclipse-collections.adoc
-groovy-list-processing-cheat-sheet.adoc
-groovy-null-processing.adoc
-matrix-calculations-with-groovy-apache.adoc
</pre>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var parser = new
QueryParser("content", analyzer)
+var query = parser.parse(/apache\ * AND eclipse\ * AND emoji*/)
+var results = searcher.search(query, topN)
+var storedFields = searcher.storedFields()
+assert results.totalHits.value() == 1 &&
+ storedFields.document(results.scoreDocs[0].doc).get('name') ==
'fruity-eclipse-collections.adoc'</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_more_complex_queries">More complex queries</h2>
+<div class="sectionbody">
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">var analyzer = new
StandardAnalyzer()
+var indexDir = new ByteBuffersDirectory()
+var config = new IndexWriterConfig(analyzer)
+
+new IndexWriter(indexDir, config).withCloseable { writer ->
+ new File(baseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->
+ file.withReader { br ->
+ var document = new Document()
+ var fieldType = new FieldType(stored: true,
+ indexOptions:
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
+ storeTermVectors: true,
+ storeTermVectorPositions: true,
+ storeTermVectorOffsets: true)
+ document.add(new Field('content', br.text, fieldType))
+ document.add(new StringField('name', file.name, Field.Store.YES))
+ writer.addDocument(document)
+ }
+ }
+}</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">IndexReader reader
= DirectoryReader.open(indexDir)
+var searcher = new IndexSearcher(reader)
+
+var namepart = new SpanMultiTermQueryWrapper(new RegexpQuery(new
Term("content", '''(
+math|spark|lucene|collections|deeplearning4j
+|beam|wayang|csv|io|numbers|ignite|mxnet|age
+|nlpcraft|pekko|hugegraph|tinkerpop|commons
+|cli|opennlp|ofbiz|codec|kie|flink
+)'''.replaceAll('\n', ''))))
+
+var (apache, commons) = ['apache', 'commons'].collect{ new Term('content', it)
}
+var apacheCommons = new SpanNearQuery([new SpanTermQuery(apache), new
SpanTermQuery(commons), namepart] as SpanQuery[], 0, true)
+
+var foundation = new SpanMultiTermQueryWrapper(new RegexpQuery(new
Term("content", "(apache|eclipse)")))
+var otherProject = new SpanNearQuery([foundation, namepart] as SpanQuery[], 0,
true)
+
+var builder = new BooleanQuery.Builder(minimumNumberShouldMatch: 1)
+builder.add(otherProject, BooleanClause.Occur.SHOULD)
+builder.add(apacheCommons, BooleanClause.Occur.SHOULD)
+var query = builder.build()
+var results = searcher.search(query, 30)
+println "Total documents with hits for $query -->
$results.totalHits"</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Total documents with hits for
(spanNear([SpanMultiTermQueryWrapper(content:/(apache|eclipse)/),
SpanMultiTermQueryWrapper(content:/(math|spark|lucene|collections|deeplearning4j|beam|wayang|csv|io|numbers|ignite|mxnet|age|nlpcraft|pekko|hugegraph|tinkerpop|commons|cli|opennlp|ofbiz|codec|kie|flink)/)],
0, true) spanNear([content:apache, content:commons,
SpanMultiTermQueryWrapper(content:/(math|spark|lucene|collections|deeplearning4j|beam|wayang|csv|io|numbers|ignite|mxnet|age|nlpcraf
[...]
+</div>
+</div>
</div>
</div>
<div class="sect1">
diff --git a/blog/index.html b/blog/index.html
index 705993c..502c128 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -53,7 +53,7 @@
</ul>
</div>
</div>
- </div><div id='content' class='page-1'><div
class='row'><div class='row-fluid'><div class='col-lg-3' id='blog-index'><ul
class='nav-sidebar list'><li class='active'><a
href='/blog/'>Blogs</a></li><li><a href='groovy-pekko-gpars'>Using Apache Pekko
actors and GPars actors with Groovy</a></li><li><a
href='groovy-graph-databases'>Using Graph Databases with Groovy</a></li><li><a
href='handling-byte-order-mark-characters'>Handling Byte-Order-Mark Characters
in Groovy</ [...]
+ </div><div id='content' class='page-1'><div
class='row'><div class='row-fluid'><div class='col-lg-3' id='blog-index'><ul
class='nav-sidebar list'><li class='active'><a
href='/blog/'>Blogs</a></li><li><a href='groovy-pekko-gpars'>Using Apache Pekko
actors and GPars actors with Groovy</a></li><li><a
href='groovy-graph-databases'>Using Graph Databases with Groovy</a></li><li><a
href='handling-byte-order-mark-characters'>Handling Byte-Order-Mark Characters
in Groovy</ [...]
<div class='row'>
<div class='colset-3-footer'>
<div class='col-1'>
diff --git a/blog/reading-and-writing-csv-files.html
b/blog/reading-and-writing-csv-files.html
index ae32642..6d9c3b8 100644
--- a/blog/reading-and-writing-csv-files.html
+++ b/blog/reading-and-writing-csv-files.html
@@ -3,7 +3,7 @@
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]--><head>
- <meta charset='utf-8'/><meta http-equiv='X-UA-Compatible'
content='IE=edge'/><meta name='viewport' content='width=device-width,
initial-scale=1'/><meta name='keywords' content='csv, data, deserialization,
files, groovy, reading, records, data science, serialization, writing, opencsv,
commons csv, jackson databind, cycling'/><meta name='description' content='This
post looks at processing CSV files using OpenCSV, Commons CSV, and Jackson
Databind libraries.'/><title>The Apache Groovy p [...]
+ <meta charset='utf-8'/><meta http-equiv='X-UA-Compatible'
content='IE=edge'/><meta name='viewport' content='width=device-width,
initial-scale=1'/><meta name='keywords' content='csv, data, deserialization,
files, groovy, reading, records, data science, serialization, writing, opencsv,
commons csv, jackson databind, cycling'/><meta name='description' content='This
post looks at processing CSV files using OpenCSV, Apache Commons CSV, and
Jackson Databind libraries.'/><title>The Apache G [...]
</head><body>
<div id='fork-me'>
<a href='https://github.com/apache/groovy'>