(groovy-website) branch asf-site updated: draft blog about using lucene with groovy

paulk Mon, 18 Nov 2024 13:08:20 -0800

This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 53329d1  draft blog about using lucene with groovy
53329d1 is described below

commit 53329d16574e6dc4fdd1cc3e312fb1d1fc3b6cbf
Author: Paul King <[email protected]>
AuthorDate: Tue Nov 19 07:08:00 2024 +1000

    draft blog about using lucene with groovy
---
 site/src/site/blog/groovy-lucene.adoc | 490 ++++++++++++++++++++++++++++++++++
 1 file changed, 490 insertions(+)

diff --git a/site/src/site/blog/groovy-lucene.adoc 
b/site/src/site/blog/groovy-lucene.adoc
new file mode 100644
index 0000000..25ae523
--- /dev/null
+++ b/site/src/site/blog/groovy-lucene.adoc
@@ -0,0 +1,490 @@
+= Searching with Lucene
+Paul King
+:revdate: 2024-11-18T20:30:00+00:00
+:draft: true
+:keywords: aggregation, search, lucene, groovy
+:description: This post looks at using Lucene to find references to other 
projects in Groovy's blog posts.
+
+The Groovy https://groovy.apache.org/blog/[blog posts] often reference other 
Apache projects.
+Let's have a look at how we can find such references, first using regular 
expressions
+and then using Apache Lucene.
+
+== Finding project names with a regex
+
+For the sake of this post, let's assume that project references will
+include the work "Apache" followed by the project name. To make it more
+interesting, we'll also include references to Eclipse projects.
+We'll also make provision for project with subprojects, at least for
+Apache Commons, so this will pick up names like Apache Commons Math
+for instance. We'll exclude Apache Groovy since that would hit possibly
+every Groovy blog post. We'll also exclude a bunch of words that appear in
+commonly used phrases like "Apache License" and "Apache Projects".
+
+This is by no means a perfect name reference finder, for example,
+we often refer to Apache Commons Math by its full name when first introduced
+but later in posts we fall back to the more friendly "Commons Math" reference
+where the "Apache" is understood from the context. We could make the regex
+more elaborate to cater for such cases but there isn't really any benefit,
+so we won't.
+
+[source,groovy]
+----
+String tokenRegex = /(?ix)           # ignore case, enable whitespace & 
comments
+    \b                               # word boundary
+    (                                # start capture of all terms
+        (                            # capture project name
+            (apache|eclipse)\s       # foundation name
+            (commons\s)?             # optional subproject name
+                (                    # capture next word unless excluded word
+                    ?!(
+                        groovy       # excluded words
+                      | and
+                      | license
+                      | users
+                      | software
+                      | projects
+                      | https
+                      | technologies
+                      )
+                )\w+                 # end capture #2
+        )
+        |                            # alternatively
+        (                            # capture non-project word
+            (?!(apache|eclipse))
+            \w+
+        )                            # end capture #3
+    )                                # end capture #1
+/
+----
+
+We've used Groovy's multiline slashy string to save having to escape 
backslashes.
+We've also enabled regex whitespace and comments to explain the regex.
+Feel free to make a compact (long) one-liner without comments if you prefer.
+
+== Finding project names using regex matching
+
+With our regex sorted, let's look at how you could use a Groovy matcher
+to find all the project names.
+
+[source,groovy]
+----
+var blogBaseDir = 
'/projects/apache-websites/groovy-website/site/src/site/blog' // <1>
+var histogram = [:].withDefault { 0 }
+
+new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->  // <2>
+    var m = file.text =~ tokenRegex // <3>
+    var projects = m*.get(2).grep()*.toLowerCase()*.replaceAll('\n', ' 
').countBy() // <4>
+    if (projects) {
+        println "$file.name: $projects" // <5>
+        projects.each { k, v -> histogram[k] += v } // <6>
+    }
+}
+
+println()
+
+histogram.sort { e -> -e.value }.each { k, v -> // <7>
+    var label = "$k ($v)"
+    println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
+}
+----
+<1> You'd need to check out the Groovy website and point to it here
+<2> This traverse the directory processing each asciidoc file
+<3> We define our matcher
+<4> This pulls out project names (capture group 2) and ignores other words 
(using grep) then aggregates the hits for that file
+<5> We print out each blog post file name and its project references
+<6> We add the file aggregates to the overall aggregates
+<7> We print out the pretty ascii barchart summarising the overall aggregates
+
+The output looks like:
+
+// &nbsp; entered below so that we don't hit this whole table as a bunch of 
references
+++++
+<pre>
+apache-nlpcraft-with-groovy.adoc: [apache&nbsp;nlpcraft:5]
+classifying-iris-flowers-with-deep.adoc: [eclipse&nbsp;deeplearning4j:5, 
apache&nbsp;commons math:1, apache&nbsp;spark:2]
+community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1]
+community-over-code-na-2023.adoc: [apache&nbsp;ignite:8, apache&nbsp;commons 
numbers:1, apache&nbsp;commons csv:1]
+deck-of-cards-with-groovy.adoc: [eclipse&nbsp;collections:5]
+deep-learning-and-eclipse-collections.adoc: [eclipse&nbsp;collections:7, 
eclipse&nbsp;deeplearning4j:2]
+detecting-objects-with-groovy-the.adoc: [apache&nbsp;mxnet:12]
+fruity-eclipse-collections.adoc: [eclipse&nbsp;collections:9, 
apache&nbsp;commons math:1]
+fun-with-obfuscated-groovy.adoc: [apache&nbsp;commons math:1]
+groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons cli:2]
+groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
+groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
+groovy-list-processing-cheat-sheet.adoc: [eclipse&nbsp;collections:4, 
apache&nbsp;commons collections:3]
+groovy-lucene.adoc: [apache&nbsp;lucene:2, apache&nbsp;commons:1, 
apache&nbsp;commons math:2]
+groovy-null-processing.adoc: [eclipse&nbsp;collections:6, apache&nbsp;commons 
collections:4]
+groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
+groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
+handling-byte-order-mark-characters.adoc: [apache&nbsp;commons io:1]
+lego-bricks-with-groovy.adoc: [eclipse&nbsp;collections:6]
+matrix-calculations-with-groovy-apache.adoc: [apache&nbsp;commons math:6, 
eclipse&nbsp;deeplearning4j:1, apache&nbsp;commons:1]
+natural-language-processing-with-groovy.adoc: [apache&nbsp;opennlp:2, 
apache&nbsp;spark:1]
+reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:1]
+set-operations-with-groovy.adoc: [eclipse&nbsp;collections:3]
+solving-simple-optimization-problems-with-groovy.adoc: [apache&nbsp;commons 
math:5, apache&nbsp;kie:1]
+using-groovy-with-apache-wayang.adoc: [apache&nbsp;wayang:9, 
apache&nbsp;spark:7, apache&nbsp;flink:1, apache&nbsp;commons csv:1, 
apache&nbsp;ignite:1]
+whiskey-clustering-with-groovy-and.adoc: [apache&nbsp;ignite:7, 
apache&nbsp;wayang:1, apache&nbsp;spark:2, apache&nbsp;commons csv:2]
+wordle-checker.adoc: [eclipse&nbsp;collections:3]
+zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
+
+eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
+apache&nbsp;commons math (18)         ██████████████████▏
+apache&nbsp;ignite (17)               █████████████████▏
+apache&nbsp;spark (12)                ████████████▏
+apache&nbsp;mxnet (12)                ████████████▏
+apache&nbsp;age (11)                  ███████████▏
+apache&nbsp;wayang (10)               ██████████▏
+eclipse&nbsp;deeplearning4j (8)       ████████▏
+apache&nbsp;commons collections (7)   ███████▏
+apache&nbsp;nlpcraft (5)              █████▏
+apache&nbsp;commons csv (5)           █████▏
+apache&nbsp;pekko (4)                 ████▏
+apache&nbsp;hugegraph (3)             ███▏
+apache&nbsp;tinkerpop (3)             ███▏
+apache&nbsp;commons cli (2)           ██▏
+apache&nbsp;commons (2)               ██▏
+apache&nbsp;lucene (2)                ██▏
+apache&nbsp;opennlp (2)               ██▏
+apache&nbsp;ofbiz (1)                 █▏
+apache&nbsp;commons numbers (1)       █▏
+apache&nbsp;commons codec (1)         █▏
+apache&nbsp;commons io (1)            █▏
+apache&nbsp;kie (1)                   █▏
+apache&nbsp;flink (1)                 █▏
+</pre>
+++++
+
+== Using Lucene
+
+Okay, regular expressions weren't that hard but in general we might want to 
search many things.
+Search frameworks like Lucene help with that. Let's see what it looks like to 
apply
+Lucene to our problem.
+
+First, we'll define a custom analyzer. Lucene is very flexible and comes with 
builtin
+analyzers. In a typical scenario, we might just search on all words.
+There's a builtin analyzer for that.
+If we used that, to query for our project names,
+we'd construct a query that spanned multiple (word) terms.
+For the purposes of our little example, we are going to assume project names
+are indivisible terms and slice them up that way. There is a pattern tokenizer
+which lets us reuse our existing regex.
+
+[source,groovy]
+----
+class ApacheProjectAnalyzer extends Analyzer {
+    @Override
+    protected TokenStreamComponents createComponents(String fieldName) {
+        var src = new PatternTokenizer(~tokenRegex, 0)
+        var result = new LowerCaseFilter(src)
+        new TokenStreamComponents(src, result)
+    }
+}
+----
+
+Let's now tokenize our documents and let Lucene index them.
+
+[source,groovy]
+----
+var analyzer = new ApacheProjectAnalyzer() // <1>
+var indexDir = new ByteBuffersDirectory() // <2>
+var config = new IndexWriterConfig(analyzer)
+var writer = new IndexWriter(indexDir, config)
+
+var blogBaseDir = '/projects/apache-websites/groovy-website/site/src/site/blog'
+new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->
+    file.withReader { br ->
+        var document = new Document()
+        var fieldType = new FieldType(stored: true,
+            indexOptions: 
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
+            storeTermVectors: true,
+            storeTermVectorPositions: true,
+            storeTermVectorOffsets: true)
+        document.add(new Field('content', br.text, fieldType)) // <3>
+        document.add(new StringField('name', file.name, Field.Store.YES)) // 
<4>
+        writer.addDocument(document)
+    }
+}
+writer.close()
+
+var reader = DirectoryReader.open(indexDir)
+var searcher = new IndexSearcher(reader)
+var parser = new QueryParser("content", analyzer)
+
+var query = parser.parse('apache* OR eclipse*') // <5>
+var results = searcher.search(query, 30) // <6>
+println "Total documents with hits for $query --> $results.totalHits"
+
+var storedFields = searcher.storedFields()
+var histogram = [:].withDefault { 0 }
+results.scoreDocs.each { ScoreDoc doc -> // <7>
+    var document = storedFields.document(doc.doc)
+    var found = handleHit(doc, query, reader) // <8>
+    println "${document.get('name')}: ${found*.replaceAll('\n', ' 
').countBy()}"
+    found.each { histogram[it.replaceAll('\n', ' ')] += 1 } // <9>
+}
+
+println()
+
+histogram.sort { e -> -e.value }.each { k, v -> // <10>
+    var label = "$k ($v)"
+    println "${label.padRight(32)} ${bar(v, 0, 50, 50)}"
+}
+
+List<String> handleHit(ScoreDoc hit, Query query, DirectoryReader dirReader) { 
// <11>
+    boolean phraseHighlight = Boolean.TRUE
+    boolean fieldMatch = Boolean.TRUE
+    FieldQuery fieldQuery = new FieldQuery(query, dirReader, phraseHighlight, 
fieldMatch)
+    FieldTermStack stack = new FieldTermStack(dirReader, hit.doc, 'content', 
fieldQuery)
+    FieldPhraseList phrases = new FieldPhraseList(stack, fieldQuery)
+    phrases.phraseList*.termsInfos*.text.flatten()
+}
+----
+<1> This is our regex-based analyzer
+<2> We'll use a memory-based index for our little example
+<3> Store content of document along with term position info
+<4> Also store the name of the file
+<5> Search for terms with the apache or eclipse prefixes
+<6> Perform our query with a limit of 30 results
+<7> Process each result
+<8> Pull out the actual matched terms
+<9> Also aggregate the counts
+<10> Display the aggregates as a pretty barchart
+<11> Helper method
+
+The output is essentially the same as before:
+
+// &nbsp; used instead of space below so that we don't hit this whole table as 
a bunch of project references
+++++
+<pre>
+Total documents with hits for content:apache* content:eclipse* --> 28 hits
+classifying-iris-flowers-with-deep.adoc: [eclipse&nbsp;deeplearning4j:5, 
apache&nbsp;commons math:1, apache&nbsp;spark:2]
+fruity-eclipse-collections.adoc: [eclipse&nbsp;collections:9, 
apache&nbsp;commons math:1]
+groovy-list-processing-cheat-sheet.adoc: [eclipse&nbsp;collections:4, 
apache&nbsp;commons collections:3]
+groovy-null-processing.adoc: [eclipse&nbsp;collections:6, apache&nbsp;commons 
collections:4]
+matrix-calculations-with-groovy-apache.adoc: [apache&nbsp;commons math:6, 
eclipse&nbsp;deeplearning4j:1, apache&nbsp;commons:1]
+apache-nlpcraft-with-groovy.adoc: [apache&nbsp;nlpcraft:5]
+community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1]
+community-over-code-na-2023.adoc: [apache&nbsp;ignite:8, apache&nbsp;commons 
numbers:1, apache&nbsp;commons csv:1]
+deck-of-cards-with-groovy.adoc: [eclipse&nbsp;collections:5]
+deep-learning-and-eclipse-collections.adoc: [eclipse&nbsp;collections:7, 
eclipse&nbsp;deeplearning4j:2]
+detecting-objects-with-groovy-the.adoc: [apache&nbsp;mxnet:12]
+fun-with-obfuscated-groovy.adoc: [apache&nbsp;commons math:1]
+groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons cli:2]
+groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
+groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
+groovy-lucene.adoc: [apache&nbsp;lucene:2, apache&nbsp;commons:1, 
apache&nbsp;commons math:2]
+groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
+groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
+handling-byte-order-mark-characters.adoc: [apache&nbsp;commons io:1]
+lego-bricks-with-groovy.adoc: [eclipse&nbsp;collections:6]
+natural-language-processing-with-groovy.adoc: [apache&nbsp;opennlp:2, 
apache&nbsp;spark:1]
+reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:1]
+set-operations-with-groovy.adoc: [eclipse&nbsp;collections:3]
+solving-simple-optimization-problems-with-groovy.adoc: [apache&nbsp;commons 
math:5, apache&nbsp;kie:1]
+using-groovy-with-apache-wayang.adoc: [apache&nbsp;wayang:9, 
apache&nbsp;spark:7, apache&nbsp;flink:1, apache&nbsp;commons csv:1, 
apache&nbsp;ignite:1]
+whiskey-clustering-with-groovy-and.adoc: [apache&nbsp;ignite:7, 
apache&nbsp;wayang:1, apache&nbsp;spark:2, apache&nbsp;commons csv:2]
+wordle-checker.adoc: [eclipse&nbsp;collections:3]
+zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
+
+eclipse&nbsp;collections (50)         
██████████████████████████████████████████████████▏
+apache&nbsp;commons math (18)         ██████████████████▏
+apache&nbsp;ignite (17)               █████████████████▏
+apache&nbsp;spark (12)                ████████████▏
+apache&nbsp;mxnet (12)                ████████████▏
+apache&nbsp;age (11)                  ███████████▏
+apache&nbsp;wayang (10)               ██████████▏
+eclipse&nbsp;deeplearning4j (8)       ████████▏
+apache&nbsp;commons collections (7)   ███████▏
+apache&nbsp;nlpcraft (5)              █████▏
+apache&nbsp;commons csv (5)           █████▏
+apache&nbsp;pekko (4)                 ████▏
+apache&nbsp;hugegraph (3)             ███▏
+apache&nbsp;tinkerpop (3)             ███▏
+apache&nbsp;commons (2)               ██▏
+apache&nbsp;commons cli (2)           ██▏
+apache&nbsp;lucene (2)                ██▏
+apache&nbsp;opennlp (2)               ██▏
+apache&nbsp;ofbiz (1)                 █▏
+apache&nbsp;commons numbers (1)       █▏
+apache&nbsp;commons codec (1)         █▏
+apache&nbsp;commons io (1)            █▏
+apache&nbsp;kie (1)                   █▏
+apache&nbsp;flink (1)                 █▏
+</pre>
+++++
+
+== Using Lucene Facets
+
+[source,groovy]
+----
+var analyzer = new ApacheProjectAnalyzer()
+var indexDir = new ByteBuffersDirectory()
+var taxonDir = new ByteBuffersDirectory()
+var config = new IndexWriterConfig(analyzer)
+var indexWriter = new IndexWriter(indexDir, config)
+var taxonWriter = new DirectoryTaxonomyWriter(taxonDir)
+
+var fConfig = new FacetsConfig().tap {
+    setHierarchical("projectNameCounts", true)
+    setMultiValued("projectNameCounts", true)
+    setMultiValued("projectFileCounts", true)
+    setMultiValued("projectHitCounts", true)
+    setIndexFieldName('projectHitCounts', '$projectHitCounts')
+}
+
+var blogBaseDir = '/projects/apache-websites/groovy-website/site/src/site/blog'
+new File(blogBaseDir).traverse(nameFilter: ~/.*\.adoc/) { file ->
+    var m = file.text =~ tokenRegex
+    var projects = m*.get(2).grep()*.toLowerCase()*.replaceAll('\n', ' 
').countBy()
+    file.withReader { br ->
+        var document = new Document()
+        var fieldType = new FieldType(stored: true,
+            indexOptions: 
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS,
+            storeTermVectors: true,
+            storeTermVectorPositions: true,
+            storeTermVectorOffsets: true)
+        document.add(new Field('content', br.text, fieldType))
+        document.add(new StringField('name', file.name, Field.Store.YES))
+        if (projects) {
+            println "$file.name: $projects"
+            projects.each { k, v ->
+                document.add(new IntAssociationFacetField(v, 
"projectHitCounts", k))
+                document.add(new FacetField("projectFileCounts", k))
+                document.add(new FacetField("projectNameCounts", k.split()))
+            }
+        }
+        indexWriter.addDocument(fConfig.build(taxonWriter, document))
+    }
+}
+indexWriter.close()
+taxonWriter.close()
+println()
+
+var reader = DirectoryReader.open(indexDir)
+var searcher = new IndexSearcher(reader)
+var taxonReader = new DirectoryTaxonomyReader(taxonDir)
+var fcm = new FacetsCollectorManager()
+var fc = FacetsCollectorManager.search(searcher, new MatchAllDocsQuery(), 10, 
fcm).facetsCollector()
+
+var projects = new TaxonomyFacetIntAssociations('$projectHitCounts', 
taxonReader, fConfig, fc, AssociationAggregationFunction.SUM)
+var hitCounts = projects.getTopChildren(10, "projectHitCounts")
+println hitCounts
+
+var facets = new FastTaxonomyFacetCounts(taxonReader, fConfig, fc)
+var fileCounts = facets.getTopChildren(10, "projectFileCounts")
+println fileCounts
+
+var nameCounts = facets.getTopChildren(10, "projectNameCounts")
+println nameCounts
+nameCounts = facets.getTopChildren(10, "projectNameCounts", 'apache')
+println nameCounts
+nameCounts = facets.getTopChildren(10, "projectNameCounts", 'apache', 
'commons')
+println nameCounts
+
+var parser = new QueryParser("content", analyzer)
+var query = parser.parse('apache* AND eclipse*')
+var results = searcher.search(query, 10)
+println "Total documents with hits for $query --> $results.totalHits"
+var storedFields = searcher.storedFields()
+results.scoreDocs.each { ScoreDoc doc ->
+    var document = storedFields.document(doc.doc)
+    println "${document.get('name')}"
+}
+----
+
+// &nbsp; entered below so that we don't hit this whole table as a bunch of 
references
+++++
+<pre>
+apache-nlpcraft-with-groovy.adoc: [apache&nbsp;nlpcraft:5]
+classifying-iris-flowers-with-deep.adoc: [eclipse&nbsp;deeplearning4j:5, 
apache&nbsp;commons math:1, apache&nbsp;spark:2]
+community-over-code-eu-2024.adoc: [apache&nbsp;ofbiz:1, apache&nbsp;commons 
math:2, apache&nbsp;ignite:1]
+community-over-code-na-2023.adoc: [apache&nbsp;ignite:8, apache&nbsp;commons 
numbers:1, apache&nbsp;commons csv:1]
+deck-of-cards-with-groovy.adoc: [eclipse&nbsp;collections:5]
+deep-learning-and-eclipse-collections.adoc: [eclipse&nbsp;collections:7, 
eclipse&nbsp;deeplearning4j:2]
+detecting-objects-with-groovy-the.adoc: [apache&nbsp;mxnet:12]
+fruity-eclipse-collections.adoc: [eclipse&nbsp;collections:9, 
apache&nbsp;commons math:1]
+fun-with-obfuscated-groovy.adoc: [apache&nbsp;commons math:1]
+groovy-2-5-clibuilder-renewal.adoc: [apache&nbsp;commons cli:2]
+groovy-graph-databases.adoc: [apache&nbsp;age:11, apache&nbsp;hugegraph:3, 
apache&nbsp;tinkerpop:3]
+groovy-haiku-processing.adoc: [eclipse&nbsp;collections:3]
+groovy-list-processing-cheat-sheet.adoc: [eclipse&nbsp;collections:4, 
apache&nbsp;commons collections:3]
+groovy-lucene.adoc: [apache&nbsp;lucene:2, apache&nbsp;commons:1, 
apache&nbsp;commons math:2]
+groovy-null-processing.adoc: [eclipse&nbsp;collections:6, apache&nbsp;commons 
collections:4]
+groovy-pekko-gpars.adoc: [apache&nbsp;pekko:4]
+groovy-record-performance.adoc: [apache&nbsp;commons codec:1]
+handling-byte-order-mark-characters.adoc: [apache&nbsp;commons io:1]
+lego-bricks-with-groovy.adoc: [eclipse&nbsp;collections:6]
+matrix-calculations-with-groovy-apache.adoc: [apache&nbsp;commons math:6, 
eclipse&nbsp;deeplearning4j:1, apache&nbsp;commons:1]
+natural-language-processing-with-groovy.adoc: [apache&nbsp;opennlp:2, 
apache&nbsp;spark:1]
+reading-and-writing-csv-files.adoc: [apache&nbsp;commons csv:1]
+set-operations-with-groovy.adoc: [eclipse&nbsp;collections:3]
+solving-simple-optimization-problems-with-groovy.adoc: [apache&nbsp;commons 
math:5, apache&nbsp;kie:1]
+using-groovy-with-apache-wayang.adoc: [apache&nbsp;wayang:9, 
apache&nbsp;spark:7, apache&nbsp;flink:1, apache&nbsp;commons csv:1, 
apache&nbsp;ignite:1]
+whiskey-clustering-with-groovy-and.adoc: [apache&nbsp;ignite:7, 
apache&nbsp;wayang:1, apache&nbsp;spark:2, apache&nbsp;commons csv:2]
+wordle-checker.adoc: [eclipse&nbsp;collections:3]
+zipping-collections-with-groovy.adoc: [eclipse&nbsp;collections:4]
+
+dim=projectHitCounts path=[] value=-1 childCount=24
+  eclipse&nbsp;collections (50)
+  apache&nbsp;commons math (18)
+  apache&nbsp;ignite (17)
+  apache&nbsp;spark (12)
+  apache&nbsp;mxnet (12)
+  apache&nbsp;age (11)
+  apache&nbsp;wayang (10)
+  eclipse&nbsp;deeplearning4j (8)
+  apache&nbsp;commons collections (7)
+  apache&nbsp;nlpcraft (5)
+
+dim=projectFileCounts path=[] value=-1 childCount=24
+  eclipse&nbsp;collections (10)
+  apache&nbsp;commons math (7)
+  apache&nbsp;spark (4)
+  apache&nbsp;ignite (4)
+  apache&nbsp;commons csv (4)
+  eclipse&nbsp;deeplearning4j (3)
+  apache&nbsp;commons collections (2)
+  apache&nbsp;commons (2)
+  apache&nbsp;wayang (2)
+  apache&nbsp;nlpcraft (1)
+
+dim=projectNameCounts path=[] value=-1 childCount=2
+  apache (21)
+  eclipse (12)
+
+dim=projectNameCounts path=[apache] value=-1 childCount=15
+  commons (16)
+  spark (4)
+  ignite (4)
+  wayang (2)
+  nlpcraft (1)
+  ofbiz (1)
+  mxnet (1)
+  age (1)
+  hugegraph (1)
+  tinkerpop (1)
+
+dim=projectNameCounts path=[apache,&nbsp;commons] value=-1 childCount=7
+  math (7)
+  csv (4)
+  collections (2)
+  numbers (1)
+  cli (1)
+  codec (1)
+  io (1)
+
+Total documents with hits for +content:apache* +content:eclipse* --> 5 hits
+classifying-iris-flowers-with-deep.adoc
+fruity-eclipse-collections.adoc
+groovy-list-processing-cheat-sheet.adoc
+groovy-null-processing.adoc
+matrix-calculations-with-groovy-apache.adoc
+</pre>
+++++
+
+== Conclusion
+
+We have analyzed the Groovy blog posts looking for referenced projects
+using regular expressions and Apache Lucene.

(groovy-website) branch asf-site updated: draft blog about using lucene with groovy

Reply via email to