RE: Cannot process zip file with Groovy

2024-02-17 Thread Bob Brown
Not entirely sure that is what James is looking for…I THINK he’s more 
interested in reading than creating.

Commons compress has some example code at 
https://commons.apache.org/proper/commons-compress/examples.html:

===

InputStream fin = Files.newInputStream(Paths.get("some-file"));

BufferedInputStream in = new BufferedInputStream(fin);

OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));

Deflate64CompressorInputStream defIn = new Deflate64CompressorInputStream(in);

final byte[] buffer = new byte[buffersize];

int n = 0;

while (-1 != (n = defIn.read(buffer))) {

out.write(buffer, 0, n);

}

out.close();

defIn.close();
===

BOB

From: MG 
Sent: Saturday, February 17, 2024 10:37 AM
To: users@groovy.apache.org; Bob Brown 
Subject: Re: Cannot process zip file with Groovy

I agree, would also recommend using Apache libs, we use e.g. the ZIP classes 
that come with the ant lib in the Groovy distribution (org.apache.tools.zip.*):

Here is a quickly sanitzed version of our code (disclaimer: Not 
compiled/tested; Zip64Mode.Always is important if you expect larger files):

InputStream zipInputStream(String compressedFilename) {
final zipFile = new ZipFile(new File(compressedFilename))
final zipEntry = (ZipEntry) zipFile.entries.nextElement()
if(zipEntry === null) { throw new Exception("${zipFile.name} has no 
entries") }
final zis = zipFile.getInputStream(zipEntry)
return zis
}

OutputStream zipOutputStream(String filename, String compressedFileExtension = 
"zip") {
final fos = new FileOutputStream(filename + '.' + compressedFileExtension)
final zos = new ZipOutputStream(fos)
zos.useZip64 = Zip64Mode.Always // To avoid 
org.apache.tools.zip.Zip64RequiredException: ... exceeds the limit of 4GByte.
final zipFileName = org.apache.commons.io.FilenameUtils.getName(filename)
final zipEntry = new ZipEntry(zipFileName)
zos.putNextEntry(zipEntry)
return zos
}

Cheers,
mg


On 17/02/2024 00:52, Bob Brown wrote:
MY first thought was “are you SURE it is a kosher Zip file?”

Sometimes one gets ‘odd’ gzip files masquerading as plain zip files.

Also, apparently “java.util.Zip does not support DEFLATE64 compression method.” 
: 
https://www.ibm.com/support/pages/zip-file-fails-route-invalid-compression-method-error

IF this is the case, you may need to use: 
https://commons.apache.org/proper/commons-compress/zip.html
(maybe worth looking at the “Known Interoperability Problems” section of the 
above doc)

May be helpful: https://stackoverflow.com/a/76321625

HTH

BOB

From: James McMahon <mailto:jsmcmah...@gmail.com>
Sent: Saturday, February 17, 2024 4:20 AM
To: users@groovy.apache.org<mailto:users@groovy.apache.org>
Subject: Re: Cannot process zip file with Groovy

Hello Paul, and thanks again for taking a moment to look at this. I tried as 
you suggested:
- - - - - - - - - -
import java.util.zip.ZipInputStream

def ff = session.get()
if (!ff) return

try {
ff = session.write(ff, { inputStream, outputStream ->
def zipInputStream = new ZipInputStream(inputStream)
def entry = zipInputStream.getNextEntry()
while (entry != null) {
entry = zipInputStream.getNextEntry()
}
outputStream = inputStream
} as StreamCallback)

session.transfer(ff, REL_SUCCESS)
} catch (Exception e) {
log.error('Error occurred processing FlowFile', e)
session.transfer(ff, REL_FAILURE)
}
- - - - - - - - - -

Once again it threw this error and failed:


ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086] Error occurred 
processing FlowFile: org.apache.nifi.processor.exception.ProcessException: 
IOException thrown from ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086]: 
java.util.zip.ZipException: invalid compression method

- Caused by: java.util.zip.ZipException: invalid compression method



It bears repeating: I am able to list and unzip the file at the linux command 
line, but cannot get it to work from the script.



What is interesting (and a little frustrating) is that the NiFi UnpackContent 
will successfully unzip the zip file. However, the reason I am trying to do it 
in Groovy is that UnpackContent exposes the file metadata for each file in a 
tar archive - lastModifiedDate, for example - but it does not do so for files 
extracted from zips. And I need that metadata. So here I be.



Can I explicitly set my (de)compression in the Groovy script? Where would I do 
that, and what values does one typically encounter for zip compression?



Jim

On Thu, Feb 15, 2024 at 9:26 PM Paul King 
mailto:pa...@asert.com.au>> wrote:
What you are doing to read the zip looks okay.

Just a guess, but it could be that because you haven't written to the
output stream, it is essentially a corrupt data stream as far as NiFi
processing is concerned. What happens if you set "outputStream =
inputStream" as the last line of your callback?

Paul.

<

RE: Cannot process zip file with Groovy

2024-02-16 Thread Bob Brown
MY first thought was “are you SURE it is a kosher Zip file?”

Sometimes one gets ‘odd’ gzip files masquerading as plain zip files.

Also, apparently “java.util.Zip does not support DEFLATE64 compression method.” 
: 
https://www.ibm.com/support/pages/zip-file-fails-route-invalid-compression-method-error

IF this is the case, you may need to use: 
https://commons.apache.org/proper/commons-compress/zip.html
(maybe worth looking at the “Known Interoperability Problems” section of the 
above doc)

May be helpful: https://stackoverflow.com/a/76321625

HTH

BOB

From: James McMahon 
Sent: Saturday, February 17, 2024 4:20 AM
To: users@groovy.apache.org
Subject: Re: Cannot process zip file with Groovy

Hello Paul, and thanks again for taking a moment to look at this. I tried as 
you suggested:
- - - - - - - - - -
import java.util.zip.ZipInputStream

def ff = session.get()
if (!ff) return

try {
ff = session.write(ff, { inputStream, outputStream ->
def zipInputStream = new ZipInputStream(inputStream)
def entry = zipInputStream.getNextEntry()
while (entry != null) {
entry = zipInputStream.getNextEntry()
}
outputStream = inputStream
} as StreamCallback)

session.transfer(ff, REL_SUCCESS)
} catch (Exception e) {
log.error('Error occurred processing FlowFile', e)
session.transfer(ff, REL_FAILURE)
}
- - - - - - - - - -

Once again it threw this error and failed:


ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086] Error occurred 
processing FlowFile: org.apache.nifi.processor.exception.ProcessException: 
IOException thrown from ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086]: 
java.util.zip.ZipException: invalid compression method

- Caused by: java.util.zip.ZipException: invalid compression method



It bears repeating: I am able to list and unzip the file at the linux command 
line, but cannot get it to work from the script.



What is interesting (and a little frustrating) is that the NiFi UnpackContent 
will successfully unzip the zip file. However, the reason I am trying to do it 
in Groovy is that UnpackContent exposes the file metadata for each file in a 
tar archive - lastModifiedDate, for example - but it does not do so for files 
extracted from zips. And I need that metadata. So here I be.



Can I explicitly set my (de)compression in the Groovy script? Where would I do 
that, and what values does one typically encounter for zip compression?



Jim

On Thu, Feb 15, 2024 at 9:26 PM Paul King 
mailto:pa...@asert.com.au>> wrote:
What you are doing to read the zip looks okay.

Just a guess, but it could be that because you haven't written to the
output stream, it is essentially a corrupt data stream as far as NiFi
processing is concerned. What happens if you set "outputStream =
inputStream" as the last line of your callback?

Paul.


Virus-free.www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Fri, Feb 16, 2024 at 8:48 AM James McMahon 
mailto:jsmcmah...@gmail.com>> wrote:
>
> I am struggling to build a Groovy scri[t I can run from a NiFi ExecuteScript 
> processor to extract from a zip file and stream to a tar archive.
>
> I tried to tackle it all at once and made little progress.
> I am now just trying to read the zip file, and am getting this error:
>
> ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086] Error occurred 
> processing FlowFile: org.apache.nifi.processor.exception.ProcessException: 
> IOException thrown from 
> ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086]: 
> java.util.zip.ZipException: invalid compression method
> - Caused by: java.util.zip.ZipException: invalid compression method
>
>
> This is my simplified code:
>
>
> import java.util.zip.ZipInputStream
>
> def ff = session.get()
> if (!ff) return
>
> try {
> ff = session.write(ff, { inputStream, outputStream ->
> def zipInputStream = new ZipInputStream(inputStream)
> def entry = zipInputStream.getNextEntry()
> while (entry != null) {
> entry = zipInputStream.getNextEntry()
> }
> } as StreamCallback)
>
> session.transfer(ff, REL_SUCCESS)
> } catch (Exception e) {
> log.error('Error occurred processing FlowFile', e)
> session.transfer(ff, REL_FAILURE)
> }
>
>
> I am able to list and unzip the file at the linux command line, but cannot 
> get it to work from the script.
>
>
> Has anyone had success doing this? Can anyone help me get past this error?
>
>
> Thanks in advance.
>
> Jim
>
>


RE: How to add a ssl server cert for groovy?

2024-02-15 Thread Bob Brown
As Paul says, Groovy sits on top of Java...but...over the years, I have noticed 
that people are often unaware of the flexibility in the JVM.
I have seen "how to make a certificate for Application X" documents that are 
several dozen pages long, filled with generations of accumulated "good stuff" 
but...a 3-line command sequence would actually suffice...

So: Just FYI,

These two links GREATLY help with debugging cert-related issues:
https://stackoverflow.com/questions/23659564/limiting-java-ssl-debug-logging
https://colinpaice.blog/2020/04/05/using-java-djavax-net-debug-to-examine-data-flows-including-tls/

I also find the following useful; just add it to a class that is early in your 
application's startup sequence:

// the following will help debug the HttpURLConnection, including showing all 
headers...
 static {
 ConsoleHandler handler = new ConsoleHandler();
 handler.setLevel(Level.ALL);
 java.util.logging.Logger jlog = 
java.util.logging.Logger.getLogger("sun.net.www.protocol.http.HttpURLConnection");
 jlog.addHandler(handler);
 jlog.setLevel(Level.ALL);
 }

*I* am of the opinion that "Thou Shalt Never Modify The JVM Installation" and 
this especially applies to the cacerts file. *I* always pass configuration 
options around, eg:

-Djavax.net.ssl.trustStore="..."
-Djavax.net.ssl.trustStorePassword=password
-Djavax.net.ssl.trustStoreType=PKCS12

(...many other properties exist, including keystore-related ones) Note that 
these properties are looked for 'automatically.' You should be able to specify 
them for your CI step.

One benefit of this is (especially in a closed infrastructure environment): 
YOUR stores can contain ONLY immediately relevant certificates...the cacerts 
file is a BIiiig catch-all blob. This is a nasty beast!

MUCH more at: 
https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html

If you worry about sensitive info being leaked on the command line, you MAY be 
able to build a 'starter' class that simply asserts these into the 
System.properties Map and then starts your app proper, something like:

System.properties.with { p ->
p['sun.security.ssl.allowUnsafeRenegotiation'] = 'true'
p['javax.net.debug'] = 'ssl:handshake'

p['com.unboundid.ldap.sdk.debug.enabled'] = 'true'
p['com.unboundid.ldap.sdk.debug.level'] = 'ALL'
p['com.unboundid.ldap.sdk.debug.type'] = 'LDAP,LDIF'

p['javax.net.ssl.trustStore']='...'
 }

I'd also point you to: https://keystore-explorer.org/ a VERY nice tool for 
those who don't worship the command-line.

HTH,

BOB

-Original Message-
From: Paul King  
Sent: Friday, February 16, 2024 12:07 PM
To: users@groovy.apache.org
Subject: Re: How to add a ssl server cert for groovy?

Hi David,

Groovy sits on top of the JDK, so if you install cacerts into the JDK you are 
using, then Groovy should use them just fine.

Possibly there could be issues depending on what client library you are using 
to make the https connection.

Cheers, Paul.


Virus-free.www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Fri, Feb 16, 2024 at 9:25 AM David Karr  wrote:
>
> I work behind a firewall, and it requires that I add a cert for our proxy to 
> the cacerts file in the Java distribution. This works fine.
>
> I have a quite old version of Groovy installed on my desktop, v2.4.21, which 
> is the version used by our Jenkins pipeline script.  I want to test some code 
> in groovyConsole before I try to run it on our CI server. For many things, 
> this works fine. However, I'm trying to iterate on some code that makes a 
> https connection, and I'm getting an error in groovyConsole that I believe is 
> the same error I get when the server cert is missing ("PKIX path building 
> failed"), which isn't surprising because I never installed the root cert in 
> the Groovy distribution.
>
> I've never really looked inside the Groovy distribution before. I don't even 
> see a cacerts file or anything that really looks like it, so it must do this 
> in a different way than the Java distribution. Is it possible that this is 
> because I'm using such an old version of Groovy?


RE: Groovy on Windows 11, unable to resolve class

2024-01-19 Thread Bob Brown
*I* find that the best way to handle the "spaces in path" problem on windows is 
to avoid it by using the '8dot3' version of a file/dir name:

===
C:\>dir /x
 Volume in drive C has no label.
 Volume Serial Number is D215-0F37

 Directory of C:\

31/10/2023  08:01 AM   BIN
22/11/2023  06:58 PM   Dell
23/12/2023  09:44 PM   DEV
20/01/2024  10:41 AM   DEVTOOLS
13/01/2024  10:43 AM   Java
31/10/2023  07:42 AM   LIB
19/01/2024  09:23 AM  PROGRA~1 Program Files
30/12/2023  10:20 AM  PROGRA~2 Program Files (x86)
23/10/2023  07:24 PM   Users
10/01/2024  08:11 PM   Windows
24/11/2023  07:24 PM   WSL2
   0 File(s)  0 bytes
  11 Dir(s)  736,955,072,512 bytes free

C:\>set GROOVY_HOME=C:\PROGRA~2\Groovy

C:\>
===


Of course, this relies on the filesystem actually generating these alternative 
filenames, not all do. To determine your situation:

=== (run as administrator):
C:\>fsutil 8dot3name query c:
The volume state is: 0 (8dot3 name creation is ENABLED)
The registry state is: 2 (Per volume setting - the default)

Based on the above settings, 8dot3 name creation is ENABLED on "c:"

C:\>
===

Ref: 
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/fsutil-8dot3name

HTH

BOB


-Original Message-
From: Paul King  
Sent: Friday, January 19, 2024 10:00 PM
To: users@groovy.apache.org
Subject: Re: Groovy on Windows 11, unable to resolve class

Or do you already have GROOVY_HOME set but to somewhere else?


Virus-free.www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Thu, Jan 18, 2024 at 9:58 PM Søren Berg Glasius  wrote:
>
> I'm not at windows user myself, but seems to remember, that is most likely 
> because of the spaces in "C:\Program Files (x86)\Groovy\"
>
> Den tors. 18. jan. 2024 kl. 11.47 skrev poubelle zenira 
> :
>>
>> I just installed groovy 4 on windows from the installer.
>>
>> When running groovysh I get:
>> "ClassNotFoundException: org.apache.groovy.groovysh.Main"
>>
>> And when trying to import a groovy, i get an "unable to resolve 
>> class" error
>>
>> I checked the windows path,
>> GROOVY_HOME is set to C:\Program Files (x86)\Groovy\, where groovy is 
>> installed (I leaved the default installation parameters) And I manually 
>> added %GROOVY_HOME%\bin to the path because it was not there.
>>
>> But it still doesn't work.
>
>
>
> --
>
> Med venlig hilsen,
> Søren Berg Glasius
>
> Hedevej 1, Gl. Rye, 8680 Ry
> Mobile: +45 40 44 91 88
> --- Press ESC once to quit - twice to save the changes.


RE: Strange behaviour when using the Grab annotation

2024-01-06 Thread Bob Brown
I think that @Grab needs to be 'attached' to something like an import.

The doco (https://groovy-lang.org/grape.html) says:

"""
Note that we are using an annotated import here, which is the recommended way.
"""

Take a look at:

https://dzone.com/articles/groovy-and-jsch-sftp

HTH

BOB

From: Quoß, Clemens (UIT) 
Sent: Sunday, January 7, 2024 7:04 AM
To: users@groovy.apache.org
Subject: Strange behaviour when using the Grab annotation

Hello everyone!

When I am running this script with 4.0.17 ...
>>>
@GrabResolver(name = 'nexus', root = 'https://...')
@Grab(group = 'com.jcraft', module = 'jsch', version = '0.1.55')

println "Hallo, Groovy!"
<<<

... I am getting this:
>>>
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
C:\Temp\test.groovy: 4: Unexpected input: '"Hallo, Groovy!"' @ line 4, column 9.
   println "Hallo, Groovy!"
   ^

1 error
<<<

When I remove the Grapes annotations everything works as expected.

Has anyone encountered similar issues? Is there a cure? Is this considered a 
bug? To me it looks that way. But maybe I am missing something here.

TIA

Regards

Union IT-Services GmbH
Clemens Quoß
FDI-INT
Senior Software Entwickler
Neue Mainzer Straße 12
60311 Frankfurt am Main

Tel. +49 69 2567 1241
Fax +49 69 2567 61241
Mobil +49 151 55157195
clemens.qu...@union-investment.de

Sitz der Gesellschaft: Weißfrauenstraße 7, 60311 Frankfurt am Main
Registergericht: Amtsgericht Frankfurt am Main HRB 33314
Geschäftsführer: Stephan Nasterlack, Siegfried Ehlert, Tobias Meier, Gregor 
Sauerzapf



Re: Working with Calendar object in Groovy

2023-06-19 Thread Bob Brown
Try parseDate, rather than parseDateStrictly?

https://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/time/DateUtils.html#parseDate(java.lang.String,%20java.lang.String...)

Trouble is, you are deliberately trying to make an invalid date, I think...


From: James McMahon 
Sent: Tuesday, 20 June 2023 10:58 AM
To: users@groovy.apache.org 
Subject: Re: Working with Calendar object in Groovy

If I want to set the month and day in the date|calendar object to 00 when I 
only have a year, it seems that DateUtil forces those to be 01 even when all it 
gets is a year such as 1999. I can't allow it to default to 19990101 in such 
cases - there will be legit occurrences of 19990101. I want to force it to be 
1999. The Calendar object comes close, because it sets cal.MONTH to 0, but 
it does that if the month is 01 legitimately, or if it is missing. That seems 
unfortunate.

This is my code so far:

  // https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html
final String[] PATTERNS = new String[] {
" dd, ",
"",
"// //  dd, ",  // I am not sure this one is a legit pattern, 
and may drop this
"MM/dd/",
"-MM-dd", "-mm"
}

 // https://stackoverflow.com/a/54952272, reference credit: Bob Brown
datesToNormalize.split(/(?ms)::-::/, -1).collect { it.trim() }.each { 
candidate ->
   try {
   parsed = DateUtils.parseDateStrictly(candidate, PATTERNS)
   def Calendar cal = Calendar.getInstance()
   cal.setTime(parsed)

   log.info<http://log.info>("Given: ${candidate}; parsed: ${cal} 
month: ${cal.get(Calendar.MONTH)}")
   } catch (Exception e) {
   log.error("Could not parse: ${candidate}")
   }
}

What I find when I look closely at the details of the calendar object for a 
year only, such as 1991..

2023-06-19 17:21:26,235 INFO [Timer-Driven Process Thread-8] 
o.a.nifi.processors.script.ExecuteScript 
ExecuteScript[id=33a5179c-1df4-128b-52be-aaa96b947012] Given: 1991; parsed: 
java.util.GregorianCalendar[time=66268800,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="UTC",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=1991,MONTH=0,WEEK_OF_YEAR=1,WEEK_OF_MONTH=1,DAY_OF_MONTH=1,DAY_OF_YEAR=1,DAY_OF_WEEK=3,DAY_OF_WEEK_IN_MONTH=1,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0]
 month: 0

Why it sets DAY_OF_MONTH to 1 when we only present a  is unfortunate too.

I wish to get my final normalized output to be 1991 in such cases.

On Mon, Jun 19, 2023 at 8:03 PM Bob Brown 
mailto:b...@transentia.com.au>> wrote:
I guess the key thing to bear in mind is:

https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/Calendar.html
"""
The calendar field values can be set by calling the set methods. Any field 
values set in a Calendar will not be interpreted until it needs to calculate 
its time value (milliseconds from the Epoch) or values of the calendar fields. 
Calling the get, getTimeInMillis, getTime, add and roll involves such 
calculation.
"""

Calendar doesn't provide a way to differentiate, in other words.

See default values for fields at 
https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/GregorianCalendar.html

Calendar is often used as a ' bucket' for timey-wimey things: you stuff things 
into fields that contextually make sense and ignore those that don't. Not very 
nice OO or helpful as an API.

It MAY be (speaking off the top of my head here) that you don't need/want 
Calendar...the newer java.time package has many "finer-grained" classes for 
things like Instance, Period, Duration, etc. that might be a better fit for a 
specific use-case: https://www.baeldung.com/java-8-date-time-intro

BOB

From: James McMahon mailto:jsmcmah...@gmail.com>>
Sent: Tuesday, 20 June 2023 8:53 AM
To: users@groovy.apache.org<mailto:users@groovy.apache.org> 
mailto:users@groovy.apache.org>>
Subject: Working with Calendar object in Groovy

If I have a Calendar object created for 1999-01-01, a get() of calendar.MONTH 
will return 0. From references I’ve found through Google, we have to add 1 to 
MONTH to get the 1 for January. The calendar object has 0 for MONTH.

Now let’s take the case where we set our calendar object from “1999”,. When we 
get MONTH it is also 0 - but not because our month was January, but because it 
was not present.

How does the calendar instance differentiate between those two cases? Is there 
another calendar object element that tells me “hey, I could set no day or month 
from a date like 1999”?


Re: Working with Calendar object in Groovy

2023-06-19 Thread Bob Brown
I guess the key thing to bear in mind is:

https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/Calendar.html
"""
The calendar field values can be set by calling the set methods. Any field 
values set in a Calendar will not be interpreted until it needs to calculate 
its time value (milliseconds from the Epoch) or values of the calendar fields. 
Calling the get, getTimeInMillis, getTime, add and roll involves such 
calculation.
"""

Calendar doesn't provide a way to differentiate, in other words.

See default values for fields at 
https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/GregorianCalendar.html

Calendar is often used as a ' bucket' for timey-wimey things: you stuff things 
into fields that contextually make sense and ignore those that don't. Not very 
nice OO or helpful as an API.

It MAY be (speaking off the top of my head here) that you don't need/want 
Calendar...the newer java.time package has many "finer-grained" classes for 
things like Instance, Period, Duration, etc. that might be a better fit for a 
specific use-case: https://www.baeldung.com/java-8-date-time-intro

BOB

From: James McMahon 
Sent: Tuesday, 20 June 2023 8:53 AM
To: users@groovy.apache.org 
Subject: Working with Calendar object in Groovy

If I have a Calendar object created for 1999-01-01, a get() of calendar.MONTH 
will return 0. From references I’ve found through Google, we have to add 1 to 
MONTH to get the 1 for January. The calendar object has 0 for MONTH.

Now let’s take the case where we set our calendar object from “1999”,. When we 
get MONTH it is also 0 - but not because our month was January, but because it 
was not present.

How does the calendar instance differentiate between those two cases? Is there 
another calendar object element that tells me “hey, I could set no day or month 
from a date like 1999”?


Re: Existing resources to seek date patterns from raw data and normalize them

2023-06-14 Thread Bob Brown
Just wondering if this will help you:

https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/time/DateUtils.html#parseDate-java.lang.String-java.util.Locale-java.lang.String...-

You'll still need to extract the candidate date strings but once you have them, 
this can parse them using various formats.

Perhaps...since we are now in "The Age of AI" (:-)), you could use Apache 
OpenNLP, per this:

https://stackoverflow.com/questions/27182040/how-to-detect-dates-with-opennlp

I've used NLP in other situations...it's not popular but it does the job nicely.

A bit more of a general discussion:

https://www.baeldung.com/cs/finding-dates-addresses-in-emails

Hope this helps.

BOB


From: Jochen Theodorou 
Sent: Wednesday, 14 June 2023 4:42 AM
To: users@groovy.apache.org 
Subject: Re: Existing resources to seek date patterns from raw data and 
normalize them

On 13.06.23 16:52, James McMahon wrote:
> Hello.  I have a task to parse dates out of incoming raw content. Of
> course the date patterns can assume any number of forms -   -MM-DD,
> /MM/DD, MMDD, MMDD, etc etc etc. I can build myself a robust
> regex to match a broad set of such patterns in the raw data, but I
> wonder if there is a project or library available for Groovy that
> already offes this?

I always wanted to try one time
https://github.com/joestelmach/natty/tree/master or at least
https://github.com/sisyphsu/dateparser... never came to it ;)

> Assuming I get pattern matches parsed out of my raw data, I will have a
> collection of strings representing year-month-days in a variety of
> formats. I'd then like to normalize them to a standard form so that I
> can sort and compare them. I intend to identify the range of dates in
> the raw data as a sorted Groovy list.

once you have the library identified the format this is the easy step

  [...]
> I intend to write a Groovy script that will run from an Apache NiFi
> ExecuteScript processor. I'll read in my data flowfile content using a
> buffered reader so I can handle flowfiles that may be large.

what does large mean? 1TB? Then BufferedReader may not be the right
choice ;)

bye Jochen


RE: Dynamic assignment of list name in iterator statement?

2023-03-05 Thread Bob Brown

Forgive me if I am misunderstanding, but does one of the following options do 
what you need?

===
println "Option 1"
def langs = ['english', 'spanish']

def englishMethod() { 'hello, old chap!' }
def spanishMethod() { 'hola! Como esta?' }

langs.each { lang ->
  println(lang + ': ' + "${lang}Method"())
}

println "Option 2"
def fns = [
('english'): { -> 'hello, old chap!' },
('spanish'): { -> 'hola! Como esta?' }
]

langs.each { lang ->
  println(lang + ': ' + fns[(lang)]())
}
===

Runs in groovyConsole/4.0.9/jdk20 like:

“””
groovy> println "Option 1"
groovy> def langs = ['english', 'spanish']
groovy> def englishMethod() { 'hello, old chap!' }
groovy> def spanishMethod() { 'hola! Como esta?' }
groovy> langs.each { lang ->
groovy>   println(lang + ': ' + "${lang}Method"())
groovy> }
groovy> println "Option 2"
groovy> def fns = [
groovy> ('english'): { -> 'hello, old chap!' },
groovy> ('spanish'): { -> 'hola! Como esta?' }
groovy> ]
groovy> langs.each { lang ->
groovy>   println(lang + ': ' + fns[(lang)]())
groovy> }

Option 1
english: hello, old chap!
spanish: hola! Como esta?
Option 2
english: hello, old chap!
spanish: hola! Como esta?
Result: [english, spanish]
“””

Note: There is some trickiness associated with MUTABLE “GStrings” as keys. It 
is often best to use string-valued expressions (eg (lang), as shown).
See: 
http://docs.groovy-lang.org/latest/html/documentation/index.html#_gstring_and_string_hashcodes

Note also, it is perfectly OK to have methods named like:

===
def  “A very…long…{weird} name!”() { }

def "null"() { true }  // not recommended; potentially confusing
===

The Spock and Geb frameworks use this feature to great effect…

HTH

BOB

From: Søren Berg Glasius 
Sent: Sunday, March 5, 2023 9:32 PM
To: users@groovy.apache.org
Subject: Re: Dynamic assignment of list name in iterator statement?

Hi Jim,

If your switch hits "English" it will also set the rest of the cases. You need 
a "break" after "containsEnglish = true" - just like in Java


Med venlig hilsen,
Søren Berg Glasius

Hedevej 1, Gl. Rye, 8680 Ry
Mobile: +45 40 44 91 88
--- Press ESC once to quit - twice to save the changes.


Den søn. 5. mar. 2023 kl. 09.37 skrev James McMahon 
mailto:jsmcmah...@gmail.com>>:
Was trying to come up with a Groovy way to collapse a lengthy switch statement 
to dynamically building the variable name. I've failed at that. Instead, I've 
fallen back on this option:

 switch("$k") {
   case "English":
containsEnglish = true
   case "Spanish":
containsSpanish = true
   case "French":
containsFrench = true
   case "Japanese":
containsJapanese = true
   case "German":
containsGerman = true
   .
   .
   .
   default:
break
  }

I initialize each of my "containsXYZ" variables to false at the beginning of my 
Groovy script. It works well, though it seems to lack elegance and brevity to 
me.

Thanks again.
Jim

On Sat, Mar 4, 2023 at 5:10 PM James McMahon 
mailto:jsmcmah...@gmail.com>> wrote:
Søren  ,
May I ask you a follow up? I am trying what I thought I read in your reply 
(thank you for that, by the way). But I continue to get this error:
"The LHS of an assignment should be a variable or a field accessing expression 
@ "

This is what I currently have, attempting to set my variable name to include 
the key drawn from my Groovy map. How must I change this to get it to work?

 mapLanguages.each { k, x ->
  log.warn('mapLanguages entry is this: {} {}', ["$k", "$x"] as 
Object[])
  x.each {
   languageChar -> log.warn('language char in {} is this: {}', 
["$k", "$languageChar"] as Object[])
  }
  "contains${k}" = true
 }

Many thanks again,
Jim

On Thu, Feb 23, 2023 at 3:01 AM Søren Berg Glasius 
mailto:soe...@glasius.dk>> wrote:
Hi Jim,

It is possible:

languages = ['english', 'french', 'spanish']
englishCharsList = ['a','b']
frenchCharsList = ['c','d']
spanishCharsList = ['e','f']

languages.each { lang ->
this."${lang}CharsList".each { ch ->
println "$lang -> $ch"
}
}

Check it out here: 
https://gwc-experiment.appspot.com/?g=groovy_3_0=eJxVjkEKwyAQRfeeYhDBTZobtJtue4PShbVGBRmCY1fBu2e0ppBZDMN__38mGfRf4x3BFZ7aoU-Rgp5AL9mh7RetBpv4EgPfg8n0iFR6xuhJvxn-AmdmmX2YjYozdAwXhiIdP8zO2AAbNAEuNwE8JUSapdqaVv8F8rDyGsY2a45YEoJUowKUDbLjKqrYAZXRSNo


Best regards,
Søren Berg Glasius

Hedevej 1, Gl. Rye, 8680 Ry
Mobile: +45 40 44 91 88
--- Press ESC once to quit - twice to save the changes.


Den tor. 23. feb. 2023 kl. 01.52 skrev James McMahon 
mailto:jsmcmah...@gmail.com>>:
Good evening. I have a list named languageCharactersList. I begin my iteration 
through elements in that list with this:

languageCharactersList.eachWithIndex( it, i ->

I hope to make this more generic, so that I can build a variable name 

RE: Design pattern for processing a huge directory tree of files using GPars

2022-05-13 Thread Bob Brown
A few (untested, off-the-cuff) follow-up thoughts:

(if you have a 32-bit JVM) doing "processedCount++" in multiple threads will 
blow up on you:
https://stackoverflow.com/questions/17481153/long-and-double-assignments-are-not-atomic-how-does-it-matter

You should use something like LongAdder: 
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/atomic/LongAdder.html

And peek might be your friend (https://www.baeldung.com/java-streams-peek-api); 
for the counting part, you could do something like:


streamyStuff.peek(unused -> longAdder.increment()).forEach(msgFile -> );

*I* think it is a good idea to keep each step in a stream as "single-minded" as 
possible: makes working with tools like debuggers, 
https://www.baeldung.com/intellij-debugging-java-streams, etc. easier.

*I* really think you should experiment with the position of the filter...you 
may not need to worry about 'waste' work if you have lots of threads "chipping 
in" to get the work done...your way guarantees that only 1 thread can be 
dedicated to the task...so 0 speedup is possible. With the parallel situation, 
even if 50% of the filters reject, you might still get an approx. (#threads / 
2) speedup.

This all seems so 'groovy' to me that I haven't looked for even-more-Groovy 
ways...

BOB

From: Merlin Beedell 
Sent: Friday, 13 May 2022 7:05 PM
To: users@groovy.apache.org
Subject: RE: Design pattern for processing a huge directory tree of files using 
GPars

Thank you Bob, that did work for me.
Some Java syntax is new to me - like this .map(Path::toFile). Back to school 
again.
This is standard Java, which is pretty groovy already, but I wonder if this 
could be (or already has been) groovy-ised in some way, e.g. to simplify the 
Files.walk(..).collect(..).parallelStream().
I put the filter before the collect - on the assertion that it would be more 
efficient to skip unnecessary files before adding to the parallel processing.
In the following snippet I include a processedCount counter - and although this 
works, I am aware that altering things outside of the parallel process can be 
bad.

import java.nio.file.*
import java.util.stream.*

   long scanFolder (File directory, Pattern fileMatch)
   {
long processedCount = 0
Files.walk(directory.toPath(), 1)  //just walk the current directory, not 
subdirectories
  .filter(p -> (Files.isRegularFile(p) && p.toString().matches(fileMatch) ) )  
//skip files that do not match a regex pattern
  .collect(Collectors.toList())
  .parallelStream()
 .map(Path::toFile)
  .forEach( msgFile -> {
  
   processedCount++
} )
return processedCount
   }

Merlin Beedell

From: Bob Brown mailto:b...@transentia.com.au>>
Sent: 10 May 2022 09:19
To: users@groovy.apache.org<mailto:users@groovy.apache.org>
Subject: RE: Design pattern for processing a huge directory tree of files using 
GPars

If you are able to use a modern Java implementation, you can use pure-Java 
streams, eg:

https://stackoverflow.com/a/66044221

///
Files.walk(Paths.get("/path/to/root/directory")) // create a stream of paths
.collect(Collectors.toList()) // collect paths into list to better parallize
.parallelStream() // process this stream in multiple threads
.filter(Files::isRegularFile) // filter out any non-files (such as 
directories)
.map(Path::toFile) // convert Path to File object
.sorted((a, b) -> Long.compare(a.lastModified(), b.lastModified())) // sort 
files date
.limit(500) // limit processing to 500 files (optional)
.forEachOrdered(f -> {
// do processing here
System.out.println(f);
});
///

also read : 
https://www.airpair.com/java/posts/parallel-processing-of-io-based-data-with-java-streams

Hope this helps some.

BOB


From: Merlin Beedell mailto:mbeed...@cryoserver.com>>
Sent: Monday, 9 May 2022 8:12 PM
To: users@groovy.apache.org<mailto:users@groovy.apache.org>
Subject: Design pattern for processing a huge directory tree of files using 
GPars

I am trying to process millions of files, spread over a tree of directories.  
At the moment I can collect the set of top level directories into a list and 
then process these in parallel using GPars with list processing (e.g. 
.eachParallel).
But what would be more efficient would be a 'parallel' for the File handling 
routines, for example:

   withPool() {
  directory.eachFileMatchParallel (FILES, 
~/($fileMatch)/) {aFile ->  ...

then I would be a very happy bunny!

I know I could copy the list of matching files into an Array list and then use 
the withPool { filesArray.eachParallel { ... - but this does not seem like an 
efficient solution - especially if there are several hundred thousand files in 
a directory.

What design pattern(s) might be better to consider using?

Merlin Beedell



RE: Design pattern for processing a huge directory tree of files using GPars

2022-05-10 Thread Bob Brown
If you are able to use a modern Java implementation, you can use pure-Java 
streams, eg:

https://stackoverflow.com/a/66044221

///
Files.walk(Paths.get("/path/to/root/directory")) // create a stream of paths
.collect(Collectors.toList()) // collect paths into list to better parallize
.parallelStream() // process this stream in multiple threads
.filter(Files::isRegularFile) // filter out any non-files (such as 
directories)
.map(Path::toFile) // convert Path to File object
.sorted((a, b) -> Long.compare(a.lastModified(), b.lastModified())) // sort 
files date
.limit(500) // limit processing to 500 files (optional)
.forEachOrdered(f -> {
// do processing here
System.out.println(f);
});
///

also read : 
https://www.airpair.com/java/posts/parallel-processing-of-io-based-data-with-java-streams

Hope this helps some.

BOB


From: Merlin Beedell 
Sent: Monday, 9 May 2022 8:12 PM
To: users@groovy.apache.org
Subject: Design pattern for processing a huge directory tree of files using 
GPars

I am trying to process millions of files, spread over a tree of directories.  
At the moment I can collect the set of top level directories into a list and 
then process these in parallel using GPars with list processing (e.g. 
.eachParallel).
But what would be more efficient would be a 'parallel' for the File handling 
routines, for example:

   withPool() {
  directory.eachFileMatchParallel (FILES, 
~/($fileMatch)/) {aFile ->  ...

then I would be a very happy bunny!

I know I could copy the list of matching files into an Array list and then use 
the withPool { filesArray.eachParallel { ... - but this does not seem like an 
efficient solution - especially if there are several hundred thousand files in 
a directory.

What design pattern(s) might be better to consider using?

Merlin Beedell