Hi All,

I am closing this vote and take all what we have learned to create a new
release candidate.
I would like to thank everyone for their vote and insights!

Cheers,
Hans

On Mon, Dec 21, 2020 at 12:59 PM Matt Casters
<[email protected]> wrote:

> Well, we're apparently still carrying around old archived kettle code which
> hasn't been ported to Hop yet.  I'm in favor or getting rid of it since
> it's still available elsewhere.
> Same goes for the old samples.  So that should clear out most of the
> ignored code and files.
>
> https://issues.apache.org/jira/browse/HOP-2335 : remove archive-samples
> https://issues.apache.org/jira/browse/HOP-2336 : Remove the
> archive-pipeline-transforms folder
>
> On the other hand we'll be building up integration tests since we do want
> to do things better than before.
> These tests will indeed use very old FoxPro files to check if these .dbf
> files are still being read as they should.  You'd be surprised how many of
> those are still around.
>
> https://issues.apache.org/jira/browse/HOP-2325 : .properties files
> https://issues.apache.org/jira/browse/HOP-2326 : .sh and .bat files
> https://issues.apache.org/jira/browse/HOP-2327 : .xml files
>
> Those 3 cover over 4000 files so that's that.
>
> So your logic makes a lot of sense.  We'll continue to exclude files like
> SVG and indeed Hop Pipelines and Workflows (all XML variants but considered
> binary files).
>
> Cheers,
>
> Matt
>
>
> On Sun, Dec 20, 2020 at 7:32 PM Julian Hyde <[email protected]>
> wrote:
>
> >
> >
> > > On Dec 20, 2020, at 1:18 AM, Matt Casters <[email protected]
> .INVALID>
> > wrote:
> > >
> > > Thank you very much Julian.
> > > I mainly wonder where on earth that font comes from since we're not
> using
> > > it anywhere.
> >
> > Yeah, fonts have a habit of sneaking in. :)
> >
> > > As for rat exclusions: are there any particular file formats besides
> > .java
> > > files that need an Apache license header? We'd be happy to add them
> > > elsewhere.
> > > The shell scripts perhaps as they support comments?  We could even add
> > them
> > > to the SVG filed even though it will probably blow up memory
> consumption
> > > unless we code the comments out of the file loads somehow.
> > > Perhaps it's easier to just look at other projects and ask which files
> > need
> > > a header?
> >
> > My preference is to put a header on pretty much any file that can have a
> > header. Which in my experience is pretty much all text files, except
> those
> > used as test inputs or reference logs. For example, in .md files you can
> > add the header inside comments that do not appear in the generated HTML.
> > Shell scripts, pom files, properties files, etc. all support comments, so
> > we should add headers.
> >
> > I agree, I would not put a header on SVG files because they are treated
> as
> > de facto binaries and they need to be small.
> >
> > I suggest that for 0.60 we pare down the RAT exclusions to the absolute
> > minimum. RAT is a powerful tool if we are not holding it back! I ran RAT
> > with the -debug flag and I saw lots of Java files being excluded, and
> that
> > was concerning.
> >
> > Binary files are always a problem. They are just as susceptible to
> > copyright and licensing issues but are more difficult to audit. One
> > strategy is to audit them one by one and add an exclusion line for each
> > individual file. I know that’s a big task, so definitely not for 0.50.
> >
> > By the way, I ran a command to find out what kinds of files are in Hop.
> > The results are interesting. There’s even one FoxPro file in there!:
> >
> > $ git ls-files -z | xargs -0 file -b | sort | uniq -c
> >    2827 ASCII text
> >       9 ASCII text, with CRLF, LF line terminators
> >      47 ASCII text, with CRLF line terminators
> >       3 ASCII text, with CR line terminators
> >      16 ASCII text, with no line terminators
> >     428 ASCII text, with very long lines
> >       2 Big-endian UTF-16 Unicode text, with no line terminators
> >       7 Bourne-Again shell script, ASCII text executable
> >       1 Bourne-Again shell script, ASCII text executable, with very long
> > lines
> >       2 bzip2 compressed data, block size = 900k
> >       2 Composite Document File V2 Document, Little Endian, Os: Windows,
> > Version 10.0, Code page: 1252, Author: Matthias Hietland Heie, Last Saved
> > By: Sergio Ribeiro, Name of Creating Application: Microsoft Excel, Create
> > Time/Date: Fri Nov 17 14:48:53 2017, Last Saved Time/Date: Tue Jun 18
> > 09:34:04 2019, Security: 0
> >       2 Composite Document File V2 Document, Little Endian, Os: Windows,
> > Version 10.0, Code page: 1252, Author: Sergio Ribeiro, Last Saved By:
> > Sergio Ribeiro, Name of Creating Application: Microsoft Excel, Create
> > Time/Date: Tue Sep 11 09:41:24 2018, Last Saved Time/Date: Tue Sep 11
> > 10:20:56 2018, Security: 0
> >       2 Composite Document File V2 Document, Little Endian, Os: Windows,
> > Version 10.0, Code page: 1252, Author: Sergio Ribeiro, Last Saved By:
> > Sergio Ribeiro, Name of Creating Application: Microsoft Excel, Create
> > Time/Date: Tue Sep 11 09:41:24 2018, Last Saved Time/Date: Tue Sep 11
> > 10:55:49 2018, Security: 0
> >       2 Composite Document File V2 Document, Little Endian, Os: Windows,
> > Version 1.0, Code page: -535, Author: JB, Revision Number: 3, Total
> Editing
> > Time: 02:08, Create Time/Date: Thu Oct 27 19:46:23 2011, Last Saved
> > Time/Date: Thu Feb 20 09:00:44 2014
> >       2 Composite Document File V2 Document, Little Endian, Os: Windows,
> > Version 5.0, Code page: 0
> >       1 Composite Document File V2 Document, Little Endian, Os: Windows,
> > Version 5.0, Code page: 1252, Author: Jens Bleuel, Last Saved By: Jens
> > Bleuel, Name of Creating Application: Microsoft Excel, Create Time/Date:
> > Wed Aug 23 15:46:56 2006, Last Saved Time/Date: Wed Aug 23 15:56:14 2006,
> > Security: 0
> >       1 Composite Document File V2 Document, Little Endian, Os: Windows,
> > Version 5.1, Code page: 1252, Author: Matt Casters, Last Saved By: Matt
> > Casters, Name of Creating Application: Microsoft Excel, Create Time/Date:
> > Tue Sep  7 16:08:18 2010, Last Saved Time/Date: Tue Sep  7 16:15:32 2010,
> > Security: 0
> >       2 Composite Document File V2 Document, Little Endian, Os: Windows,
> > Version 5.1, Code page: 1252, Last Saved By: Jens Bleuel, Name of
> Creating
> > Application: Microsoft Excel, Create Time/Date: Thu Oct 17 06:27:31 1996,
> > Last Saved Time/Date: Tue Nov 28 15:07:48 2006, Security: 0
> >       5 C source, ASCII text
> >       7 C++ source, ASCII text
> >      25 CSV text
> >       1 data
> >       3 DOS batch file, ASCII text
> >       1 Embedded OpenType (EOT), icomoon family
> >       1 Embedded OpenType (EOT), OpenSansLight family
> >       1 Embedded OpenType (EOT), OpenSansRegular family
> >      28 empty
> >       9 exported SGML document, ASCII text
> >       1 FoxBase+/dBase III DBF, 279 records * 52, update-date 106-7-25,
> > codepage ID=0xf, at offset 161 1st record "     1das ist doch keine
> > leistung        44.00hw   *     2Meister                            48"
> >       2 GIF image data, version 89a, 16 x 16
> >       1 GIF image data, version 89a, 9 x 9
> >       1 gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT),
> > original size modulo 2^32 703
> >       2 gzip compressed data, was "default.csv", last modified: Wed Aug
> 26
> > 08:50:54 2015, from Unix, original size modulo 2^32 67
> >      30 HTML document, ASCII text
> >       1 HTML document, ASCII text, with very long lines
> >       2 HTML document, UTF-8 Unicode text
> >       1 ISO-8859 text
> >       1 ISO-8859 text, with CR line terminators
> >       3 ISO-8859 text, with very long lines
> >    3179 Java source, ASCII text
> >       1 Java source, ASCII text, with CRLF, LF line terminators
> >       1 Java source, ASCII text, with very long lines
> >      13 Java source, UTF-8 Unicode text
> >       1 JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1,
> > segment length 16, progressive, precision 8, 400x400, components 3
> >      29 JSON data
> >       2 Little-endian UTF-16 Unicode text, with CRLF line terminators
> >       2 Little-endian UTF-16 Unicode text, with no line terminators
> >      10 Microsoft Excel 2007+
> >       1 Microsoft OOXML
> >       1 MS Windows icon resource - 1 icon, 32x32, 24 bits/pixel
> >       2 MS Windows icon resource - 1 icon, 32x32, 32 bits/pixel
> >       3 Non-ISO extended-ASCII text, with no line terminators
> >       5 OpenDocument Spreadsheet
> >       1 PNG image data, 1244 x 686, 8-bit/color RGB, non-interlaced
> >       1 PNG image data, 1460 x 816, 8-bit/color RGB, non-interlaced
> >       2 PNG image data, 15 x 15, 8-bit/color RGBA, non-interlaced
> >       1 PNG image data, 1680 x 1050, 8-bit/color RGB, non-interlaced
> >       3 PNG image data, 16 x 16, 8-bit/color RGBA, non-interlaced
> >       2 PNG image data, 22 x 22, 8-bit/color RGB, non-interlaced
> >       1 PNG image data, 403 x 138, 8-bit/color RGB, non-interlaced
> >       4 PNG image data, 4702 x 1702, 8-bit/color RGB, non-interlaced
> >       4 PNG image data, 5010 x 1990, 8-bit/color RGB, non-interlaced
> >       1 PNG image data, 551 x 626, 8-bit/color RGB, non-interlaced
> >       1 PNG image data, 642 x 368, 8-bit/color RGBA, non-interlaced
> >       1 PNG image data, 972 x 464, 8-bit/color RGB, non-interlaced
> >       3 ReStructuredText file, ASCII text
> >       1 ReStructuredText file, ASCII text, with very long lines
> >       2 SAS
> >     654 SVG Scalable Vector Graphics image
> >       1 TIFF image data, big-endian, direntries=16, height=16, bps=0,
> > compression=none, PhotometricIntepretation=RGB, orientation=upper-left,
> > width=16
> >       1 TrueType Font data, 11 tables, 1st "OS/2", 14 names, Macintosh,
> > type 1 string, icomoon
> >       1 TrueType Font data, 18 tables, 1st "FFTM", 26 names, Macintosh
> >       1 TrueType Font data, 18 tables, 1st "FFTM", 30 names, Macintosh
> >       2 Unicode text, UTF-32, big-endian
> >       2 Unicode text, UTF-32, little-endian
> >     385 UTF-8 Unicode text
> >       2 UTF-8 Unicode text, with no line terminators
> >      40 UTF-8 Unicode text, with very long lines
> >       2 UTF-8 Unicode (with BOM) text, with no line terminators
> >       1 Visual FoxPro DBF, 2 records * 205, update-date 15-10-20, at
> > offset 129 1st record "value11
> >                                           "
> >       1 Web Open Font Format, TrueType, length 1168, version 1.0
> >       1 Web Open Font Format, TrueType, length 67528, version 1.10
> >       1 Web Open Font Format, TrueType, length 69392, version 1.10
> >     958 XML 1.0 document, ASCII text
> >       1 XML 1.0 document, ASCII text, with CRLF, LF line terminators
> >      82 XML 1.0 document, ASCII text, with very long lines
> >       1 XML 1.0 document, ASCII text, with very long lines, with no line
> > terminators
> >       1 XML 1.0 document, UTF-8 Unicode text
> >       2 XML 1.0 document, UTF-8 Unicode text, with very long lines
> >       1 XML 1.0 document, UTF-8 Unicode (with BOM) text
> >       2 Zip data (MIME type "application/vnd.pentaho.reporting.classic"?)
> >
> > Julian
> >
> >
>
> --
> Neo4j Chief Solutions Architect
> *✉   *[email protected]
> ☎  +32486972937
>

Reply via email to