This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 7caf909  add BOM characters blog post
7caf909 is described below

commit 7caf9095f3d9d68353eb95d960b0a9b7ec34c983
Author: Paul King <pa...@asert.com.au>
AuthorDate: Fri Jul 12 00:50:04 2024 +1000

    add BOM characters blog post
---
 .../blog/handling-byte-order-mark-characters.adoc  | 39 ++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/site/src/site/blog/handling-byte-order-mark-characters.adoc 
b/site/src/site/blog/handling-byte-order-mark-characters.adoc
new file mode 100644
index 0000000..35c268c
--- /dev/null
+++ b/site/src/site/blog/handling-byte-order-mark-characters.adoc
@@ -0,0 +1,39 @@
+= Handling Byte-Order-Mark Characters in Groovy
+Paul King
+:revdate: 2024-07-11T20:00:00+00:00
+:keywords: groovy, bom_chars, unicode, encoding
+:description: Handling Byte Order Mark (BOM) characters in Groovy
+
+A 
https://www.javacodegeeks.com/remove-byte-order-mark-characters-from-file.html[recent
 article]
+showed how to process https://en.wikipedia.org/wiki/Byte_order_mark[Byte Order 
Mark (BOM)] characters
+within text files when coding in Java. In particular, often manual removal of 
those characters might
+be needed when processing text files. The article showed how to remove the BOM 
characters when using
+the `InputStream` and `Reader` classes as well as how to do it using `NIO` 
functionality. It also showed
+how the `BOMInputStream` class in 
https://commons.apache.org/proper/commons-io/[Apache Commons IO]
+could be used. It automatically skips over the BOM characters.
+
+Those examples can be run as is in Groovy (albeit after fixing a bug in the 
first example)
+but the (complete!) idiomatic solution in Groovy is:
+
+[source,groovy]
+----
+println new File('file.txt').text
+----
+
+That's right, Groovy automatically detects
+the encoding, and removes BOM characters,
+when using the `getText()` method
+along with others like `eachLine`, `splitEachLine`,
+`readLines`, `withReader`, and `filterLine`.
+The same functionality can be obtained using
+the `newReader` method too on files and URLs.
+
+When needed there are variants that let you
+specify the encoding should you wish to explicitly
+declare it. In that case, you'd need to handle the
+BOM characters manually.
+
+Groovy's methods like `getText` call an underlying
+`CharsetToolkit` class. You can also use that class directly
+should you wish to learn more about the encoding
+of a file.
\ No newline at end of file

Reply via email to