date:20191124

[GitHub] [commons-collections] dota17 opened a new pull request #118: Fixed the typo and deal the NPE with Objects.requireNonNull in Abstr…

2019-11-24 Thread GitBox

dota17 opened a new pull request #118: Fixed the typo and deal the NPE with 
Objects.requireNonNull  in Abstr…
URL: https://github.com/apache/commons-collections/pull/118
 
 
   Fixed the typo and deal the NPE with Objects.requireNonNull  in 
AbstractLinkedList


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-compress] coveralls commented on issue #87: COMPRESS-124 : Add support for extracting sparse entries from tar archives

2019-11-24 Thread GitBox

coveralls commented on issue #87: COMPRESS-124 : Add support for extracting 
sparse entries from tar archives
URL: https://github.com/apache/commons-compress/pull/87#issuecomment-557956506
 
 
   
   [![Coverage 
Status](https://coveralls.io/builds/27209256/badge)](https://coveralls.io/builds/27209256)
   
   Coverage increased (+0.3%) to 86.979% when pulling 
**269c522e3910a9c197856e9054a8d0ef7649b322 on PeterAlfreadLee:COMPRESS-124** 
into **28049d2f89a76c5a4845c12151a483ca7773fb6f on apache:master**.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Work logged] (COMPRESS-124) Unable to extract a sparse entries from tar archives

2019-11-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/COMPRESS-124?focusedWorklogId=348883&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348883
 ]

ASF GitHub Bot logged work on COMPRESS-124:
---

Author: ASF GitHub Bot
Created on: 25/Nov/19 01:49
Start Date: 25/Nov/19 01:49
Worklog Time Spent: 10m 
  Work Description: coveralls commented on issue #87: COMPRESS-124 : Add 
support for extracting sparse entries from tar archives
URL: https://github.com/apache/commons-compress/pull/87#issuecomment-557956506
 
 
   
   [![Coverage 
Status](https://coveralls.io/builds/27209256/badge)](https://coveralls.io/builds/27209256)
   
   Coverage increased (+0.3%) to 86.979% when pulling 
**269c522e3910a9c197856e9054a8d0ef7649b322 on PeterAlfreadLee:COMPRESS-124** 
into **28049d2f89a76c5a4845c12151a483ca7773fb6f on apache:master**.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348883)
Time Spent: 20m  (was: 10m)

> Unable to extract a sparse entries from tar archives
> 
>
> Key: COMPRESS-124
> URL: https://issues.apache.org/jira/browse/COMPRESS-124
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Archivers
>Affects Versions: 1.1, 1.2
> Environment: Platform independent. However, I'm currently using 
> Window 7 Enterprise.
>Reporter: Patrick Dreyer
>Priority: Major
>  Labels: tar
> Attachments: gnuSparseFile.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Good news first: I already have the patch ready for that.
> I got several TAR files which I could not extract with any of the existing 
> Java implementations, but I could extract all those TAR files successfully 
> with GNU tar.
> It turned out that all the failing TAR files contained so called sparse 
> files. Investigating the source code of all existing Java TAR implementations 
> showed me that none of them even recognizes the existence of GNU sparse 
> entries.
> Actually, I don't need to process one of the contained sparse files and I'm 
> happy if I'm at least able to correctly untar all the non-sparsed files. 
> Thus, it would be sufficient recognizing sparse files without the need to 
> correctly un-sparse them while extracting. As long as all non-sparsed files 
> get extracted correctly, I'm fine.
> The TAR files in question have all been VMware Diagnostic File bundles.
> See 
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=653
>  to know how to get them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (COMPRESS-124) Unable to extract a sparse entries from tar archives

2019-11-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/COMPRESS-124?focusedWorklogId=348880&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348880
 ]

ASF GitHub Bot logged work on COMPRESS-124:
---

Author: ASF GitHub Bot
Created on: 25/Nov/19 01:40
Start Date: 25/Nov/19 01:40
Worklog Time Spent: 10m 
  Work Description: PeterAlfreadLee commented on pull request #87: 
COMPRESS-124 : Add support for extracting sparse entries from tar archives
URL: https://github.com/apache/commons-compress/pull/87
 
 
   [COMPRESS-124](https://issues.apache.org/jira/browse/COMPRESS-124)
   Add support for extracting sparse entries from tar archives, including :
   
   1. Old GNU Format
   The sparse map of old GNU is stored in tar header with only 4 sparse 
headers. If more spar headers exist, they should be stored in the extension 
block
   
   2. PAX 0.0 Format
   The sparse map of PAX 0.0 Format is stored in tar headers with 
`GNU.sparse.offset` and `GNU.sparse.numbytes`. They may appear more than 1 
time, so they cannot be stored in a map.
   
   3. PAX 0.1 Format
   The sparse map of PAX 0.1 Format is stored in tar headers with 
`GNU.sparse.map`.
   
   3. PAX 1.0 Format
   The sparse map of PAX 1.0 Format is stored in the head of data file block. 
Therefore we need to handle it separately when we encounted a PAX 1.0 tar.
   
   The implemention is mainly implemented by the TarArchiveSparseInputStream. 
It returns 0 when reading the "holes", and the file data from original tar file 
input stream when reading the sparse content. These are all decided by the 
sparse headers. The TarArchiveSparseInputStream reads all sparse headers, 
creates all-zero input streams surrounded by BoundedInputStream and actual tar 
file input stream surrounded by BoundedInputStream, and combines them together 
to a single TarArchiveSparseInputStream. Then we can straightly read from 
TarArchiveSparseInputStream when we encounted a sparse tar entry.
   
   TODO : I notices in the code that there are some kind of "star sparse data". 
I didn't find it in [GNU Tar Manual](https://www.gnu.org/software/tar/manual), 
so I don't know how to parse such kind of tars.
   
   Please let me know if something need to be modified. :-) @bodewig 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348880)
Remaining Estimate: 0h
Time Spent: 10m

> Unable to extract a sparse entries from tar archives
> 
>
> Key: COMPRESS-124
> URL: https://issues.apache.org/jira/browse/COMPRESS-124
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Archivers
>Affects Versions: 1.1, 1.2
> Environment: Platform independent. However, I'm currently using 
> Window 7 Enterprise.
>Reporter: Patrick Dreyer
>Priority: Major
>  Labels: tar
> Attachments: gnuSparseFile.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Good news first: I already have the patch ready for that.
> I got several TAR files which I could not extract with any of the existing 
> Java implementations, but I could extract all those TAR files successfully 
> with GNU tar.
> It turned out that all the failing TAR files contained so called sparse 
> files. Investigating the source code of all existing Java TAR implementations 
> showed me that none of them even recognizes the existence of GNU sparse 
> entries.
> Actually, I don't need to process one of the contained sparse files and I'm 
> happy if I'm at least able to correctly untar all the non-sparsed files. 
> Thus, it would be sufficient recognizing sparse files without the need to 
> correctly un-sparse them while extracting. As long as all non-sparsed files 
> get extracted correctly, I'm fine.
> The TAR files in question have all been VMware Diagnostic File bundles.
> See 
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=653
>  to know how to get them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [commons-compress] PeterAlfreadLee opened a new pull request #87: COMPRESS-124 : Add support for extracting sparse entries from tar archives

2019-11-24 Thread GitBox

PeterAlfreadLee opened a new pull request #87: COMPRESS-124 : Add support for 
extracting sparse entries from tar archives
URL: https://github.com/apache/commons-compress/pull/87
 
 
   [COMPRESS-124](https://issues.apache.org/jira/browse/COMPRESS-124)
   Add support for extracting sparse entries from tar archives, including :
   
   1. Old GNU Format
   The sparse map of old GNU is stored in tar header with only 4 sparse 
headers. If more spar headers exist, they should be stored in the extension 
block
   
   2. PAX 0.0 Format
   The sparse map of PAX 0.0 Format is stored in tar headers with 
`GNU.sparse.offset` and `GNU.sparse.numbytes`. They may appear more than 1 
time, so they cannot be stored in a map.
   
   3. PAX 0.1 Format
   The sparse map of PAX 0.1 Format is stored in tar headers with 
`GNU.sparse.map`.
   
   3. PAX 1.0 Format
   The sparse map of PAX 1.0 Format is stored in the head of data file block. 
Therefore we need to handle it separately when we encounted a PAX 1.0 tar.
   
   The implemention is mainly implemented by the TarArchiveSparseInputStream. 
It returns 0 when reading the "holes", and the file data from original tar file 
input stream when reading the sparse content. These are all decided by the 
sparse headers. The TarArchiveSparseInputStream reads all sparse headers, 
creates all-zero input streams surrounded by BoundedInputStream and actual tar 
file input stream surrounded by BoundedInputStream, and combines them together 
to a single TarArchiveSparseInputStream. Then we can straightly read from 
TarArchiveSparseInputStream when we encounted a sparse tar entry.
   
   TODO : I notices in the code that there are some kind of "star sparse data". 
I didn't find it in [GNU Tar Manual](https://www.gnu.org/software/tar/manual), 
so I don't know how to parse such kind of tars.
   
   Please let me know if something need to be modified. :-) @bodewig 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-codec] coveralls commented on issue #29: [CODEC-270] Base32/64: Fix masked check of the final bits to discard.

2019-11-24 Thread GitBox

coveralls commented on issue #29: [CODEC-270] Base32/64: Fix masked check of 
the final bits to discard.
URL: https://github.com/apache/commons-codec/pull/29#issuecomment-557947630
 
 
   
   [![Coverage 
Status](https://coveralls.io/builds/27208428/badge)](https://coveralls.io/builds/27208428)
   
   Coverage increased (+0.2%) to 93.08% when pulling 
**a0447626dcffd6bcd4decd13e4dac1da4661d6f7 on aherbert:fix-CODEC-270** into 
**1b9889762a86ae50657d5ce0048cbf9c5f62ffa7 on apache:master**.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (CODEC-270) Base32 and Base64 still allow decoding some invalid trailing characters

2019-11-24 Thread Alex Herbert (Jira)



[ 
https://issues.apache.org/jira/browse/CODEC-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981218#comment-16981218
 ] 

Alex Herbert commented on CODEC-270:


Fix is in PR #29.

When implementing the fix I noticed that Base64 ignores a single trailing base 
64 character. This encodes only 6-bits and is not enough to create a byte. But 
for consistency should this:
{code:java}
switch (context.modulus) {
//  case 0 : // impossible, as excluded above
case 1 : // 6 bits - ignore entirely
// TODO not currently tested; perhaps it is impossible?
break;
{code}
be changed to:
{code:java}
switch (context.modulus) {
//  case 0 : // impossible, as excluded above
case 1 : // 6 bits = no bytes
validateCharacter(MASK_6BITS, context);
break;
{code}
This would then throw under all circumstances except if the single trailing 
digit encoded zero. Given that this is confusing (the '=' character is padding) 
I would instead just throw an exception:
{code:java}
switch (context.modulus) {
//  case 0 : // impossible, as excluded above
case 1 : // 6 bits = no bytes
throw new IllegalArgumentException("Single trailing 
character is not allowed");
break;
{code}
This would make the following impossible to decode by throwing:
{code:java}
String text = "A";
byte[] bytes = Base64.decodeBase64(text);
{code}
Currently it returns a zero length byte[] as this test currently passes:
{code:java}
@Test
public void decodeSmallString() {
String text = "A";
byte[] bytes = Base64.decodeBase64(text);
Assert.assertArrayEquals(new byte[0], bytes);
}
{code}

> Base32 and Base64 still allow decoding some invalid trailing characters
> ---
>
> Key: CODEC-270
> URL: https://issues.apache.org/jira/browse/CODEC-270
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Both Base32 and Base64 check that the final bits from the trailing digit that 
> will be discarded are zero.
> The test for the trailing bits in the final digits in Base64 is:
> {code:java}
> private long validateCharacter(final int numBitsToDrop, final Context 
> context) {
> if ((context.ibitWorkArea & numBitsToDrop) != 0) {
> {code}
> It should be:
> {code:java}
> private long validateCharacter(final int numBitsToDrop, final Context 
> context) {
> int mask = (1 << numBitsToDrop) - 1;
> if ((context.ibitWorkArea & mask) != 0) {
> {code}
> Likewise in Base32.
> The following base64 is illegal but is still decoded:
> {noformat}
> AB==
> A : 00
> B : 01
> byte =  + 0001 discarded 
> {noformat}
> Here the check for the 4 trailing bits to drop in this case checks only bit 3 
> and ignores bit 1 which is set.
> Same for Base32, this is illegal:
> {noformat}
> AB==
> A : 0
> B : 1
> byte =  + 01 discarded
> {noformat}
> But the check for the 2 trailing bits to drop in this case checks bit 2 and 
> ignores bit 1 which is set.
> Note: The test cases using "AC" has bit 2 set and so is flagged as invalid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (CODEC-270) Base32 and Base64 still allow decoding some invalid trailing characters

2019-11-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/CODEC-270?focusedWorklogId=348857&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348857
 ]

ASF GitHub Bot logged work on CODEC-270:


Author: ASF GitHub Bot
Created on: 24/Nov/19 23:48
Start Date: 24/Nov/19 23:48
Worklog Time Spent: 10m 
  Work Description: aherbert commented on pull request #29: [CODEC-270] 
Base32/64: Fix masked check of the final bits to discard.
URL: https://github.com/apache/commons-codec/pull/29
 
 
   Fixed the Base32/64 mask check to ensure it checks all the bits to be 
discarded are zero.
   
   Added tests that enumerate all possible trailing characters. Only those with 
zero for the discarded bits should be decoded.
   
   Also fixed the BCodecTest data to use the RFC 1522 "encoded-word" header.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348857)
Remaining Estimate: 0h
Time Spent: 10m

> Base32 and Base64 still allow decoding some invalid trailing characters
> ---
>
> Key: CODEC-270
> URL: https://issues.apache.org/jira/browse/CODEC-270
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Both Base32 and Base64 check that the final bits from the trailing digit that 
> will be discarded are zero.
> The test for the trailing bits in the final digits in Base64 is:
> {code:java}
> private long validateCharacter(final int numBitsToDrop, final Context 
> context) {
> if ((context.ibitWorkArea & numBitsToDrop) != 0) {
> {code}
> It should be:
> {code:java}
> private long validateCharacter(final int numBitsToDrop, final Context 
> context) {
> int mask = (1 << numBitsToDrop) - 1;
> if ((context.ibitWorkArea & mask) != 0) {
> {code}
> Likewise in Base32.
> The following base64 is illegal but is still decoded:
> {noformat}
> AB==
> A : 00
> B : 01
> byte =  + 0001 discarded 
> {noformat}
> Here the check for the 4 trailing bits to drop in this case checks only bit 3 
> and ignores bit 1 which is set.
> Same for Base32, this is illegal:
> {noformat}
> AB==
> A : 0
> B : 1
> byte =  + 01 discarded
> {noformat}
> But the check for the 2 trailing bits to drop in this case checks bit 2 and 
> ignores bit 1 which is set.
> Note: The test cases using "AC" has bit 2 set and so is flagged as invalid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [commons-codec] aherbert opened a new pull request #29: [CODEC-270] Base32/64: Fix masked check of the final bits to discard.

2019-11-24 Thread GitBox

aherbert opened a new pull request #29: [CODEC-270] Base32/64: Fix masked check 
of the final bits to discard.
URL: https://github.com/apache/commons-codec/pull/29
 
 
   Fixed the Base32/64 mask check to ensure it checks all the bits to be 
discarded are zero.
   
   Added tests that enumerate all possible trailing characters. Only those with 
zero for the discarded bits should be decoded.
   
   Also fixed the BCodecTest data to use the RFC 1522 "encoded-word" header.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-geometry] asfgit merged pull request #40: GEOMETRY-32: Refactor Partitioning Code

2019-11-24 Thread GitBox

asfgit merged pull request #40: GEOMETRY-32: Refactor Partitioning Code
URL: https://github.com/apache/commons-geometry/pull/40
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-geometry] asfgit merged pull request #40: GEOMETRY-32: Refactor Partitioning Code

2019-11-24 Thread GitBox

asfgit merged pull request #40: GEOMETRY-32: Refactor Partitioning Code
URL: https://github.com/apache/commons-geometry/pull/40
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CODEC-270) Base32 and Base64 still allow decoding some invalid trailing characters

2019-11-24 Thread Alex Herbert (Jira)



 [ 
https://issues.apache.org/jira/browse/CODEC-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Herbert updated CODEC-270:
---
Description: 
Both Base32 and Base64 check that the final bits from the trailing digit that 
will be discarded are zero.

The test for the trailing bits in the final digits in Base64 is:
{code:java}
private long validateCharacter(final int numBitsToDrop, final Context context) {
if ((context.ibitWorkArea & numBitsToDrop) != 0) {
{code}

It should be:
{code:java}
private long validateCharacter(final int numBitsToDrop, final Context context) {
int mask = (1 << numBitsToDrop) - 1;
if ((context.ibitWorkArea & mask) != 0) {
{code}

Likewise in Base32.

The following base64 is illegal but is still decoded:

{noformat}
AB==

A : 00
B : 01

byte =  + 0001 discarded 
{noformat}

Here the check for the 4 trailing bits to drop in this case checks only bit 3 
and ignores bit 1 which is set.

Same for Base32, this is illegal:

{noformat}
AB==

A : 0
B : 1

byte =  + 01 discarded
{noformat}

But the check for the 2 trailing bits to drop in this case checks bit 2 and 
ignores bit 1 which is set.

Note: The test cases using "AC" has bit 2 set and so is flagged as invalid.



  was:
Both Base32 and Base64 check that the final bits from the trailing digit that 
will be discarded are zero.

The test for the trailing bits in the final digits in Base64 is:
{code:java}
private long validateCharacter(final int numBitsToDrop, final Context context) {
if ((context.ibitWorkArea & numBitsToDrop) != 0) {
{code}

It should be:
{code:java}
private long validateCharacter(final int numBitsToDrop, final Context context) {
int mask = (1 << numBitsToDrop) - 1;
if ((context.ibitWorkArea & mask) != 0) {
{code}

Likewise in Base32.

The following currently are illegal but are still decoded:

{noformat}
AB==

A : 00
B : 01

byte =  + 0001 discarded 
{noformat}

But the check for the 4 trailing bits to drop in this case checks bit 3 and 
ignores bit 1 which is set.

Same for Base32

{noformat}
AB==

A : 0
B : 1

byte =  + 01 discarded
{noformat}

But the check for the 2 trailing bits to drop in this case checks bit 2 and 
ignores bit 1 which is set.

Note: The test cases using "AC" has bit 2 set and so is flagged as invalid.




> Base32 and Base64 still allow decoding some invalid trailing characters
> ---
>
> Key: CODEC-270
> URL: https://issues.apache.org/jira/browse/CODEC-270
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Minor
>
> Both Base32 and Base64 check that the final bits from the trailing digit that 
> will be discarded are zero.
> The test for the trailing bits in the final digits in Base64 is:
> {code:java}
> private long validateCharacter(final int numBitsToDrop, final Context 
> context) {
> if ((context.ibitWorkArea & numBitsToDrop) != 0) {
> {code}
> It should be:
> {code:java}
> private long validateCharacter(final int numBitsToDrop, final Context 
> context) {
> int mask = (1 << numBitsToDrop) - 1;
> if ((context.ibitWorkArea & mask) != 0) {
> {code}
> Likewise in Base32.
> The following base64 is illegal but is still decoded:
> {noformat}
> AB==
> A : 00
> B : 01
> byte =  + 0001 discarded 
> {noformat}
> Here the check for the 4 trailing bits to drop in this case checks only bit 3 
> and ignores bit 1 which is set.
> Same for Base32, this is illegal:
> {noformat}
> AB==
> A : 0
> B : 1
> byte =  + 01 discarded
> {noformat}
> But the check for the 2 trailing bits to drop in this case checks bit 2 and 
> ignores bit 1 which is set.
> Note: The test cases using "AC" has bit 2 set and so is flagged as invalid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CODEC-270) Base32 and Base64 still allow decoding some invalid trailing characters

2019-11-24 Thread Alex Herbert (Jira)

Alex Herbert created CODEC-270:
--

 Summary: Base32 and Base64 still allow decoding some invalid 
trailing characters
 Key: CODEC-270
 URL: https://issues.apache.org/jira/browse/CODEC-270
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.13
Reporter: Alex Herbert
Assignee: Alex Herbert


Both Base32 and Base64 check that the final bits from the trailing digit that 
will be discarded are zero.

The test for the trailing bits in the final digits in Base64 is:
{code:java}
private long validateCharacter(final int numBitsToDrop, final Context context) {
if ((context.ibitWorkArea & numBitsToDrop) != 0) {
{code}

It should be:
{code:java}
private long validateCharacter(final int numBitsToDrop, final Context context) {
int mask = (1 << numBitsToDrop) - 1;
if ((context.ibitWorkArea & mask) != 0) {
{code}

Likewise in Base32.

The following currently are illegal but are still decoded:

{noformat}
AB==

A : 00
B : 01

byte =  + 0001 discarded 
{noformat}

But the check for the 4 trailing bits to drop in this case checks bit 3 and 
ignores bit 1 which is set.

Same for Base32

{noformat}
AB==

A : 0
B : 1

byte =  + 01 discarded
{noformat}

But the check for the 2 trailing bits to drop in this case checks bit 2 and 
ignores bit 1 which is set.

Note: The test cases using "AC" has bit 2 set and so is flagged as invalid.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CODEC-134) Base32 would decode some invalid Base32 encoded string into arbitrary value

2019-11-24 Thread Alex Herbert (Jira)



[ 
https://issues.apache.org/jira/browse/CODEC-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981199#comment-16981199
 ] 

Alex Herbert commented on CODEC-134:


This is not totally fixed. For example the fix for Base64 uses:
{code:java}
if ((context.ibitWorkArea & numBitsToDrop) != 0) {
{code}
This is called with either 2 or 4 as the numBitsToDrop argument. Thus you are 
only checking 1 bit of the possible trailing bits are not zero. It should be:
{code:java}
int mask = (1 << numBitsToDrop) - 1;
if ((context.ibitWorkArea & mask) != 0) {
{code}
This will check all the bits to drop are zero.

I will open a new ticket to demonstrate.

> Base32 would decode some invalid Base32 encoded string into arbitrary value
> ---
>
> Key: CODEC-134
> URL: https://issues.apache.org/jira/browse/CODEC-134
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.6
> Environment: All
>Reporter: Hanson Char
>Assignee: Gary D. Gregory
>Priority: Major
>  Labels: security
> Fix For: 1.13
>
> Attachments: diff-120305-20.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example, there is no byte array value that can be encoded into the string 
> "C5CYMIHWQUUZMKUGZHGEOSJSQDE4L===", but the existing Base32 implementation 
> would not reject it but decode it into an arbitrary value which if re-encoded 
> again using the same implementation would result in the string 
> "C5CYMIHWQUUZMKUGZHGEOSJSQDE4K===".
> Instead of blindly decoding the invalid string, the Base32 codec should 
> reject it (eg by throwing IlleglArgumentException) to avoid security 
> exploitation (such as tunneling additional information via seemingly valid 
> base 32 strings).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CODEC-263) Base64.decodeBase64 throw exception

2019-11-24 Thread Alex Herbert (Jira)



[ 
https://issues.apache.org/jira/browse/CODEC-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981194#comment-16981194
 ] 

Alex Herbert commented on CODEC-263:


When encoding bytes which are 8-bit into 6-bit characters you can encode 3 
bytes using 4 characters. Any trailing bytes leftover when dividing your byte 
length by 3 (i.e. 1 or 2 trailing bytes) are encoded as either 2 or 3 
characters and padded with the '=' character.

If you have 1 trailing byte this is encoded using 1 6-bit character and a 
second 6-bit character using the most significant 2 bits. The final 4 bits of 
the character should be zero.

If you have 2 trailing bytes this is encoded using 2 6-bit characters and a 
third 6-bit character using the most significant 4 bits. The final 2 bits of 
the character should be zero.

When decoding you collate the bits from 4 6-bit characters to make 3 8-bit 
bytes. If we remove all the characters that can be collected from your text 
file we have the following remaining at the end and the decoder throws an 
exception:
{noformat}
final String message = "/9o=";
Base64.decodeBase64(message);
{noformat}
The 6-bit hex digits are:
{noformat}
/ = 63 = 11
9 = 61 = 01
n = 39 = 100111
{noformat}
This is packed to make the following 2 bytes with 2 bits to discard:
{noformat}
1101100111
{noformat}
Since 1.13 the decoder now checks trailing bits that are to be discarded are 
zero. In the case above the bits are not zero and so decoding produces an error 
since the encoding is illegal.

The same can be done for the final characters of "publishMessage":
{noformat}
final String message = "ge==";
Base64.decodeBase64(message);
{noformat}
The 6-bit hex digits are:
{noformat}
g = 32 = 10
e = 30 = 00
{noformat}
This is packed to make the following 1 byte with 4 bits to discard:
{noformat}
10011110
{noformat}
Again in the case above the bits are not zero and so decoding produces an error 
since the encoding is illegal.

 

The question is should {{Base64.isBase64(String)}} do the same validation on 
the final character of the string?

> Base64.decodeBase64 throw exception
> ---
>
> Key: CODEC-263
> URL: https://issues.apache.org/jira/browse/CODEC-263
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
> Environment: JDK 7/JDK 8 
> commons-codec 1.13
>Reporter: xie tao
>Priority: Critical
> Attachments: image-jpg-01-big.base64.txt
>
>
> Codec upgrade to 1.13, code  throw exception as follows：
> {code:java}
>   @Test
>   public  void test(){
> Base64.decodeBase64("publishMessage");
>   }
> {code}
> exception like：
> {code:java}
> java.lang.IllegalArgumentException: Last encoded character (before the 
> paddings if any) is a valid base 64 alphabet but not a possible value
>   at 
> org.apache.commons.codec.binary.Base64.validateCharacter(Base64.java:798)
>   at org.apache.commons.codec.binary.Base64.decode(Base64.java:472)
>   at 
> org.apache.commons.codec.binary.BaseNCodec.decode(BaseNCodec.java:412)
>   at 
> org.apache.commons.codec.binary.BaseNCodec.decode(BaseNCodec.java:395)
>   at org.apache.commons.codec.binary.Base64.decodeBase64(Base64.java:694)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEOMETRY-32) BSPTree Updates

2019-11-24 Thread Matt Juntunen (Jira)



[ 
https://issues.apache.org/jira/browse/GEOMETRY-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981171#comment-16981171
 ] 

Matt Juntunen commented on GEOMETRY-32:
---

bq. Isn't it possible to increase the number of allowed parameters?

It is, but I think we still want the project as a whole to follow the current 
max parameter number. Each exception to that rule should be called out 
explicitly.

bq. Note: Javadoc for the linearCombination methods needs fixing

Good call. Fixed.

bq. we should definitely use MathJax

Sounds good. I've never used that before. I recommend that as another follow-up 
issue.

bq. What about e.g. FieldVector3D?

Oh, yeah. I didn't go far enough back:-) None of the field-related classes made 
it over in the initial creation of commons-geometry because it would have 
required a dependency back to commons-math for the {{Field}} functionality.


> BSPTree Updates
> ---
>
> Key: GEOMETRY-32
> URL: https://issues.apache.org/jira/browse/GEOMETRY-32
> Project: Apache Commons Geometry
>  Issue Type: Improvement
>  Components: core
>Reporter: Matt Juntunen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following updates should be made to the BSPTree class:
> - add an {{isLeaf()}} method to replace all of the {{node.getCut() == null}} 
> expressions
> - add unit tests
> _Edit [2019-02-17]:_
> Additional goals:
> - Refactor the API to split the idea of a general BSPTree and a BSPTree used 
> for defining in/out regions. This could result in a BSPTree interface and a 
> RegionBSPTree interface. The goal here is to allow end-users to create their 
> own extensions of these classes and specialize them for their own 
> applications (for example, to implement spatial sorting or other algorithms). 
> This will be one of the only planned extension points in the library.
> - Make the API easier to use and extend and reduce the necessity of casting 
> (especially unchecked casting) as much as possible.
> - Add the idea of convex subhyperplanes to allow for more efficient tree 
> construction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEOMETRY-32) BSPTree Updates

2019-11-24 Thread Gilles Sadowski (Jira)



[ 
https://issues.apache.org/jira/browse/GEOMETRY-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981151#comment-16981151
 ] 

Gilles Sadowski commented on GEOMETRY-32:
-

{quote}VectorXD.linearCombination()
{quote}
Isn't it possible to increase the number of allowed parameters?  It would be 
more robust than disabling the rule.
 Note: Javadoc for the {{linearCombination}} methods needs fixing (it mentions 
"coordinate" instead of "vector").
 Also, for clarity, we should definitely use MathJax; the meaning of this:
{noformat}
* @return vector with coordinates calculated by {@code (a1 * v1) + (a2 * v2) + 
(a3 * v3)}
{noformat}
would be more clearly conveyed with:
{noformat}
* @return a new vector: \( a1 \vec{v1} +  a2 \vec{v2} + a3 \vec{v3} \)
{noformat}
{quote}[not] losing any [functionality]
{quote}
What about e.g. 
[FieldVector3D|http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/geometry/euclidean/threed/FieldVector3D.html]?

Not that it would be within the scope of Commons Geometry, but we should figure 
out what should not be delete from Commons Math (as not having any replacement 
in the new component).
{quote}it could be done after the main code is refactored
{quote}
Sure.

> BSPTree Updates
> ---
>
> Key: GEOMETRY-32
> URL: https://issues.apache.org/jira/browse/GEOMETRY-32
> Project: Apache Commons Geometry
>  Issue Type: Improvement
>  Components: core
>Reporter: Matt Juntunen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following updates should be made to the BSPTree class:
> - add an {{isLeaf()}} method to replace all of the {{node.getCut() == null}} 
> expressions
> - add unit tests
> _Edit [2019-02-17]:_
> Additional goals:
> - Refactor the API to split the idea of a general BSPTree and a BSPTree used 
> for defining in/out regions. This could result in a BSPTree interface and a 
> RegionBSPTree interface. The goal here is to allow end-users to create their 
> own extensions of these classes and specialize them for their own 
> applications (for example, to implement spatial sorting or other algorithms). 
> This will be one of the only planned extension points in the library.
> - Make the API easier to use and extend and reduce the necessity of casting 
> (especially unchecked casting) as much as possible.
> - Add the idea of convex subhyperplanes to allow for more efficient tree 
> construction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEOMETRY-32) BSPTree Updates

2019-11-24 Thread Matt Juntunen (Jira)



[ 
https://issues.apache.org/jira/browse/GEOMETRY-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981142#comment-16981142
 ] 

Matt Juntunen commented on GEOMETRY-32:
---

{quote}there are too many files
{quote}
Agreed :)

bq. ... suppression rules for CheckStyle. Are they necessary?

I believe they are. They all relate to the maximum number of method parameters 
allowed and are either 1) internal methods used to avoid the necessity of 
arrays (ex: {{Matrices.determinant()}} or 2) public convenience methods (ex: 
{{VectorXD.linearCombination()}}. The linear combination methods existed before 
this PR, and are similar to the {{LinearCombination}} methods from 
commons-numbers-arrays, which also have a suppression rule. I'll add 
documentation in checkstyle-suppressions.xml to say specifically what's being 
suppressed.

bq.  Does this PR entails that all the functionality of Commons Math's package 
o.a.c.math4.geometry is now covered by Commons Geometry?

In general, yes. Overall, we have gained functionality instead of losing any. 
The one exception that I can think of is the 
{{o.a.c.g.euclidean.threed.OutlineExtractor}} class. I have not ported that 
over to the new code yet and so it is not present in this PR. That is one of 
the follow-up issues that I intend to make. I left it out for now because I 
consider this as something of an edge-case functionality and I thought it could 
be done after the main code is refactored and the API settled.

> BSPTree Updates
> ---
>
> Key: GEOMETRY-32
> URL: https://issues.apache.org/jira/browse/GEOMETRY-32
> Project: Apache Commons Geometry
>  Issue Type: Improvement
>  Components: core
>Reporter: Matt Juntunen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following updates should be made to the BSPTree class:
> - add an {{isLeaf()}} method to replace all of the {{node.getCut() == null}} 
> expressions
> - add unit tests
> _Edit [2019-02-17]:_
> Additional goals:
> - Refactor the API to split the idea of a general BSPTree and a BSPTree used 
> for defining in/out regions. This could result in a BSPTree interface and a 
> RegionBSPTree interface. The goal here is to allow end-users to create their 
> own extensions of these classes and specialize them for their own 
> applications (for example, to implement spatial sorting or other algorithms). 
> This will be one of the only planned extension points in the library.
> - Make the API easier to use and extend and reduce the necessity of casting 
> (especially unchecked casting) as much as possible.
> - Add the idea of convex subhyperplanes to allow for more efficient tree 
> construction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CODEC-268) Deprecate methods in MurmurHash3 with no equivalent in the reference c++ source.

2019-11-24 Thread Claude Warren (Jira)



[ 
https://issues.apache.org/jira/browse/CODEC-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981103#comment-16981103
 ] 

Claude Warren commented on CODEC-268:
-

I think it was the last line of the original post that confuses me:

All deprecated methods should be referred to use these instead.

I read this to mean that `hash128( byte[], int, int, int )` would call 
`hash128_x64(  byte[], int, int, int )`,  but I see now  that you mean the 
deprecation tag should direct users to the "newer" corrected methods.



> Deprecate methods in MurmurHash3 with no equivalent in the reference c++ 
> source.
> 
>
> Key: CODEC-268
> URL: https://issues.apache.org/jira/browse/CODEC-268
> Project: Commons Codec
>  Issue Type: Wish
>Affects Versions: 1.13
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Minor
>
> The following methods have no equivalent in the [MurmurHash3 c++ source 
> code|https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp]
> {code:java}
> public static long hash64(final long data) {
> public static long hash64(final int data) {
> public static long hash64(final short data) {
> public static long hash64(final byte[] data) {
> public static long hash64(final byte[] data, final int offset, final int 
> length) {
> public static long hash64(final byte[] data, final int offset, final int 
> length, final int seed) {
> {code}
> They are documented to return the upper 64-bits of the 128-bit hash method. 
> This is false as the code neglects to mix in alternating blocks of 64-bits 
> with the corresponding other hash that would be built in the 128-bit method. 
> Thus this passes using a NotEquals check:
> {code:java}
> @Test
> public void testHash64InNotEqualToHash128() {
> for (int i = 0; i < 32; i++) {
> final byte[] bytes = Arrays.copyOf(RANDOM_BYTES, i);
> final long h1 = MurmurHash3.hash64(bytes);
> final long[] hash = MurmurHash3.hash128(bytes);
> Assert.assertNotEquals("Did not expect hash64 to match upper bits of 
> hash128", hash[0], h1);
> Assert.assertNotEquals("Did not expect hash64 to match lower bits of 
> hash128", hash[1], h1);
> }
> }
> {code}
> The following methods are convenience methods that use an arbitrary default 
> seed:
> {code:java}
> public static final int DEFAULT_SEED = 104729;
> public static int hash32(final long data1, final long data2) {
> public static int hash32(final long data1, final long data2, final int 
> seed) {
> public static int hash32(final long data) {
> public static int hash32(final long data, final int seed) {
> {code}
> Although they match calling hash32() with the equivalent bytes this assumes 
> big-endian format. The reference c++ code is designed for little endian. If 
> you hash the corresponding values using a Google Guava implementation with 
> the same seed then they do not match as Guava faithfully converts primitives 
> as little endian. Also note that the available methods for hashing primitives 
> are not the same as those in hash64.
> These methods thus make very little sense and should be removed as an 
> unwanted legacy from the port from Apache Hive.
> The following methods are convenience methods that use default encoding via 
> String.getBytes():
> {code:java}
> public static int hash32(final String data) {
> public static long[] hash128(final String data) {
> {code}
> These effectively do nothing and save 0 lines of code for the caller:
> {code:java}
> String s;
> int hash1 = MurmurHash3.hash32(s);
> int hash2 = MurmurHash3.hash32(s.getBytes());
> assert hash1 == hash2;
> {code}
> This is not even used:
> {code:java}
> public static final long NULL_HASHCODE = 2862933555777941757L;
> {code}
> The following methods for hash32, hash64 and hash128 are not consistent with 
> the available arguments:
> {code:java}
> public static int hash32(final byte[] data) {
> public static int hash32(final String data) {
> public static int hash32(final byte[] data, final int length) {
> public static int hash32(final byte[] data, final int length, final int 
> seed) {
> public static int hash32(final byte[] data, final int offset, final int 
> length, final int seed) {
> public static long hash64(final byte[] data) {
> // Two int arguments specify (offset, length) not (length, seed)
> public static long hash64(final byte[] data, final int offset, final int 
> length) {
> public static long hash64(final byte[] data, final int offset, final int 
> length, final int seed) {
> public static long[] hash128(final byte[] data) {
> public static long[] hash128(final String data) {
> // No other (offset, length, see

[jira] [Commented] (GEOMETRY-32) BSPTree Updates

2019-11-24 Thread Gilles Sadowski (Jira)



[ 
https://issues.apache.org/jira/browse/GEOMETRY-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981097#comment-16981097
 ] 

Gilles Sadowski commented on GEOMETRY-32:
-

Thanks!
I wish I could  review this, but there are too many files (and far too little 
time...). ;-)

I noticed that the PR added suppression rules for CheckStyle.  Are they 
necessary?  If so, please document them (i.e. which methods trigger them) so 
that we could update them if the corresponding codes are refactored.

On a general level: Does this PR entails that all the functionality of Commons 
Math's package {{o.a.c.math4.geometry}} is now covered by Commons Geometry?

> BSPTree Updates
> ---
>
> Key: GEOMETRY-32
> URL: https://issues.apache.org/jira/browse/GEOMETRY-32
> Project: Apache Commons Geometry
>  Issue Type: Improvement
>  Components: core
>Reporter: Matt Juntunen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following updates should be made to the BSPTree class:
> - add an {{isLeaf()}} method to replace all of the {{node.getCut() == null}} 
> expressions
> - add unit tests
> _Edit [2019-02-17]:_
> Additional goals:
> - Refactor the API to split the idea of a general BSPTree and a BSPTree used 
> for defining in/out regions. This could result in a BSPTree interface and a 
> RegionBSPTree interface. The goal here is to allow end-users to create their 
> own extensions of these classes and specialize them for their own 
> applications (for example, to implement spatial sorting or other algorithms). 
> This will be one of the only planned extension points in the library.
> - Make the API easier to use and extend and reduce the necessity of casting 
> (especially unchecked casting) as much as possible.
> - Add the idea of convex subhyperplanes to allow for more efficient tree 
> construction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (CODEC-267) MurmurHash3.hash32() does not process trailing bytes as unsigned

2019-11-24 Thread Alex Herbert (Jira)



 [ 
https://issues.apache.org/jira/browse/CODEC-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Herbert resolved CODEC-267.

Fix Version/s: 1.14
   Resolution: Fixed

New method hash32x86 with fix for trailing negative bytes added to master.

> MurmurHash3.hash32() does not process trailing bytes as unsigned
> 
>
> Key: CODEC-267
> URL: https://issues.apache.org/jira/browse/CODEC-267
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Claude Warren
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.14
>
>
> The hash32() algorithm processes blocks of 4 bytes. Trailing bytes of 1, 2 or 
> 3 that are negative are not masked to unsigned leading to an error.
> This test passes using data generated from the Python mmh3 library which 
> calls the MurmurHash3 c++ code (modified for Python):
> {code:java}
> /**
>  * Test to demonstrate the errors in
>  * {@link MurmurHash3#hash32(byte[], int, int, int)}
>  * if the final 1, 2, or 3 bytes are negative.
>  */
> @Test
> public void testHash32With1TrailingSignedByteIsInvalid() {
> // Generate test data:
> // import mmh3
> // import numpy as np
> // mmh3.hash(np.uint8([-1]))
> // mmh3.hash(np.uint8([0, -1]))
> // mmh3.hash(np.uint8([0, 0, -1]))
> // mmh3.hash(np.uint8([-1, 0]))
> // mmh3.hash(np.uint8([-1, 0, 0]))
> // mmh3.hash(np.uint8([0, -1, 0]))
> Assert.assertNotEquals(-43192051, MurmurHash3.hash32(new byte[] {-1}, 0, 
> 1, 0));
> Assert.assertNotEquals(-582037868, MurmurHash3.hash32(new byte[] {0, -1}, 
> 0, 1, 0));
> Assert.assertNotEquals(922088087, MurmurHash3.hash32(new byte[] {0, 0, 
> -1}, 0, 1, 0));
> Assert.assertNotEquals(-1309567588, MurmurHash3.hash32(new byte[] {-1, 
> 0}, 0, 1, 0));
> Assert.assertNotEquals(-363779670, MurmurHash3.hash32(new byte[] {-1, 0, 
> 0}, 0, 1, 0));
> Assert.assertNotEquals(-225068062, MurmurHash3.hash32(new byte[] {0, -1, 
> 0}, 0, 1, 0));
> }
> {code}
> This test passes with {{assertEquals}} when the code is fixed to apply 
> masking to the final 3 bytes:
> {code:java}
> case 3:
> k1 ^= (data[index + 2] & 0xff) << 16;
> case 2:
> k1 ^= (data[index + 1] & 0xff) << 8;
> case 1:
> k1 ^= (data[index] & 0xff);
> {code}
> Fixing this error will be a behavioural change.
> It is recommended to leave this method alone and implement a new hash32x86 
> method that should match the {{MurmurHash3_x86_32}} method from the c++ 
> source code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (CODEC-264) murmur3.hash128() does not account for unsigned seed argument

2019-11-24 Thread Alex Herbert (Jira)



 [ 
https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Herbert resolved CODEC-264.

Fix Version/s: 1.14
   Resolution: Fixed

New method hash128x64 to handle unsigned seed argument added to master.

> murmur3.hash128() does not account for unsigned seed argument
> -
>
> Key: CODEC-264
> URL: https://issues.apache.org/jira/browse/CODEC-264
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Claude Warren
>Assignee: Alex Herbert
>Priority: Major
> Fix For: 1.14
>
> Attachments: YonikMurmur3Tests.java
>
>
> The original murmur3_x64_128 code used unsigned int for seed arguments.  
> Using the equivalent bit patterns in the commons codec version does not yield 
> the same results.
> I believe this is because the commons version does not account for sign 
> extension etc.
> Yonic Seeley [~yonik] has explains the issue in his implementation 
> https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java
> He provides a test case to show that his code returns the same answers as the 
> original C/C++ code.  I modified that test to call the codec version to show 
> the error.
> I have attached that test case.
> Given that the original code is in the wild I am uncertain how to fix this 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CODEC-264) murmur3.hash128() does not account for unsigned seed argument

2019-11-24 Thread Claude Warren (Jira)



[ 
https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981049#comment-16981049
 ] 

Claude Warren commented on CODEC-264:
-

This issues should be closed with the changes that Alex made to master.  I am 
not certain what the codec procedure is.  As the person that opened this issue 
should I mark it as resolved or close it, or just leave it alone for Alex to 
record?

> murmur3.hash128() does not account for unsigned seed argument
> -
>
> Key: CODEC-264
> URL: https://issues.apache.org/jira/browse/CODEC-264
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Claude Warren
>Assignee: Alex Herbert
>Priority: Major
> Attachments: YonikMurmur3Tests.java
>
>
> The original murmur3_x64_128 code used unsigned int for seed arguments.  
> Using the equivalent bit patterns in the commons codec version does not yield 
> the same results.
> I believe this is because the commons version does not account for sign 
> extension etc.
> Yonic Seeley [~yonik] has explains the issue in his implementation 
> https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java
> He provides a test case to show that his code returns the same answers as the 
> original C/C++ code.  I modified that test to call the codec version to show 
> the error.
> I have attached that test case.
> Given that the original code is in the wild I am uncertain how to fix this 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [commons-codec] Claudenw closed pull request #27: Murmur3fix

2019-11-24 Thread GitBox

Claudenw closed pull request #27: Murmur3fix
URL: https://github.com/apache/commons-codec/pull/27
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-codec] Claudenw commented on issue #27: Murmur3fix

2019-11-24 Thread GitBox

Claudenw commented on issue #27: Murmur3fix
URL: https://github.com/apache/commons-codec/pull/27#issuecomment-557881487
 
 
   Pull request objective fulfilled by other changes to master branch


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-codec] aherbert commented on issue #27: Murmur3fix

2019-11-24 Thread GitBox

aherbert commented on issue #27: Murmur3fix
URL: https://github.com/apache/commons-codec/pull/27#issuecomment-557881044
 
 
   Yes. I have separated them into tests for positive and negative seed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (CODEC-268) Deprecate methods in MurmurHash3 with no equivalent in the reference c++ source.

2019-11-24 Thread Alex Herbert (Jira)



[ 
https://issues.apache.org/jira/browse/CODEC-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981047#comment-16981047
 ] 

Alex Herbert commented on CODEC-268:


{quote}The older implementations should not be removed
{quote}
I only suggested deprecation. They would be removed in 2.0.

Maybe it was not clear as I suggested what the remaining cleaner API would be. 
However all the other methods would be there until removed in a major release 
update.

 

> Deprecate methods in MurmurHash3 with no equivalent in the reference c++ 
> source.
> 
>
> Key: CODEC-268
> URL: https://issues.apache.org/jira/browse/CODEC-268
> Project: Commons Codec
>  Issue Type: Wish
>Affects Versions: 1.13
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Minor
>
> The following methods have no equivalent in the [MurmurHash3 c++ source 
> code|https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp]
> {code:java}
> public static long hash64(final long data) {
> public static long hash64(final int data) {
> public static long hash64(final short data) {
> public static long hash64(final byte[] data) {
> public static long hash64(final byte[] data, final int offset, final int 
> length) {
> public static long hash64(final byte[] data, final int offset, final int 
> length, final int seed) {
> {code}
> They are documented to return the upper 64-bits of the 128-bit hash method. 
> This is false as the code neglects to mix in alternating blocks of 64-bits 
> with the corresponding other hash that would be built in the 128-bit method. 
> Thus this passes using a NotEquals check:
> {code:java}
> @Test
> public void testHash64InNotEqualToHash128() {
> for (int i = 0; i < 32; i++) {
> final byte[] bytes = Arrays.copyOf(RANDOM_BYTES, i);
> final long h1 = MurmurHash3.hash64(bytes);
> final long[] hash = MurmurHash3.hash128(bytes);
> Assert.assertNotEquals("Did not expect hash64 to match upper bits of 
> hash128", hash[0], h1);
> Assert.assertNotEquals("Did not expect hash64 to match lower bits of 
> hash128", hash[1], h1);
> }
> }
> {code}
> The following methods are convenience methods that use an arbitrary default 
> seed:
> {code:java}
> public static final int DEFAULT_SEED = 104729;
> public static int hash32(final long data1, final long data2) {
> public static int hash32(final long data1, final long data2, final int 
> seed) {
> public static int hash32(final long data) {
> public static int hash32(final long data, final int seed) {
> {code}
> Although they match calling hash32() with the equivalent bytes this assumes 
> big-endian format. The reference c++ code is designed for little endian. If 
> you hash the corresponding values using a Google Guava implementation with 
> the same seed then they do not match as Guava faithfully converts primitives 
> as little endian. Also note that the available methods for hashing primitives 
> are not the same as those in hash64.
> These methods thus make very little sense and should be removed as an 
> unwanted legacy from the port from Apache Hive.
> The following methods are convenience methods that use default encoding via 
> String.getBytes():
> {code:java}
> public static int hash32(final String data) {
> public static long[] hash128(final String data) {
> {code}
> These effectively do nothing and save 0 lines of code for the caller:
> {code:java}
> String s;
> int hash1 = MurmurHash3.hash32(s);
> int hash2 = MurmurHash3.hash32(s.getBytes());
> assert hash1 == hash2;
> {code}
> This is not even used:
> {code:java}
> public static final long NULL_HASHCODE = 2862933555777941757L;
> {code}
> The following methods for hash32, hash64 and hash128 are not consistent with 
> the available arguments:
> {code:java}
> public static int hash32(final byte[] data) {
> public static int hash32(final String data) {
> public static int hash32(final byte[] data, final int length) {
> public static int hash32(final byte[] data, final int length, final int 
> seed) {
> public static int hash32(final byte[] data, final int offset, final int 
> length, final int seed) {
> public static long hash64(final byte[] data) {
> // Two int arguments specify (offset, length) not (length, seed)
> public static long hash64(final byte[] data, final int offset, final int 
> length) {
> public static long hash64(final byte[] data, final int offset, final int 
> length, final int seed) {
> public static long[] hash128(final byte[] data) {
> public static long[] hash128(final String data) {
> // No other (offset, length, seed) combinations
> public static long[] hash128(final byte[] da

[jira] [Work logged] (IMAGING-242) Move to JUnit 5

2019-11-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/IMAGING-242?focusedWorklogId=348688&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348688
 ]

ASF GitHub Bot logged work on IMAGING-242:
--

Author: ASF GitHub Bot
Created on: 24/Nov/19 10:19
Start Date: 24/Nov/19 10:19
Worklog Time Spent: 10m 
  Work Description: kinow commented on issue #59: [IMAGING-242] Upgrade to 
Junit5
URL: https://github.com/apache/commons-imaging/pull/59#issuecomment-557875098
 
 
   >The side advantage of this is to lock down the dependencies and not allow 
people to mix Jupiter and Vintage in the same class as was done in 
TestImageReadException and TestImageWriteException (presumably by mistake?).
   
   Mea culpa. I did 90% with IDE search-replace following the docs & looking at 
your CSV PR. But for Theories, Datapoints and other things that weren't in CSV 
I struggled a bit. Thanks for tidying it up :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348688)
Time Spent: 1h 50m  (was: 1h 40m)

> Move to JUnit 5
> ---
>
> Key: IMAGING-242
> URL: https://issues.apache.org/jira/browse/IMAGING-242
> Project: Commons Imaging
>  Issue Type: Improvement
>  Components: imaging.*
>Affects Versions: 1.0-alpha1
>Reporter: Bruno P. Kinoshita
>Assignee: Bruno P. Kinoshita
>Priority: Minor
> Fix For: 1.0-alpha2
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (IMAGING-242) Move to JUnit 5

2019-11-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/IMAGING-242?focusedWorklogId=348687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348687
 ]

ASF GitHub Bot logged work on IMAGING-242:
--

Author: ASF GitHub Bot
Created on: 24/Nov/19 10:19
Start Date: 24/Nov/19 10:19
Worklog Time Spent: 10m 
  Work Description: kinow commented on issue #59: [IMAGING-242] Upgrade to 
Junit5
URL: https://github.com/apache/commons-imaging/pull/59#issuecomment-557875098
 
 
   >The side advantage of this is to lock down the dependencies and not allow 
people to mix Jupiter and Vintage in the same class as was done in 
TestImageReadException and TestImageWriteException (presumably by mistake?).
   
   Probably mea culpa. I did 90% with IDE search-replace following the docs & 
looking at your CSV PR. But for Theories, Datapoints and other things that 
weren't in CSV I struggled a bit. Thanks for tidying it up :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348687)
Time Spent: 1h 40m  (was: 1.5h)

> Move to JUnit 5
> ---
>
> Key: IMAGING-242
> URL: https://issues.apache.org/jira/browse/IMAGING-242
> Project: Commons Imaging
>  Issue Type: Improvement
>  Components: imaging.*
>Affects Versions: 1.0-alpha1
>Reporter: Bruno P. Kinoshita
>Assignee: Bruno P. Kinoshita
>Priority: Minor
> Fix For: 1.0-alpha2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [commons-imaging] kinow edited a comment on issue #59: [IMAGING-242] Upgrade to Junit5

2019-11-24 Thread GitBox

kinow edited a comment on issue #59: [IMAGING-242] Upgrade to Junit5
URL: https://github.com/apache/commons-imaging/pull/59#issuecomment-557875098
 
 
   >The side advantage of this is to lock down the dependencies and not allow 
people to mix Jupiter and Vintage in the same class as was done in 
TestImageReadException and TestImageWriteException (presumably by mistake?).
   
   Mea culpa. I did 90% with IDE search-replace following the docs & looking at 
your CSV PR. But for Theories, Datapoints and other things that weren't in CSV 
I struggled a bit. Thanks for tidying it up :+1: 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-imaging] kinow commented on issue #59: [IMAGING-242] Upgrade to Junit5

2019-11-24 Thread GitBox

kinow commented on issue #59: [IMAGING-242] Upgrade to Junit5
URL: https://github.com/apache/commons-imaging/pull/59#issuecomment-557875098
 
 
   >The side advantage of this is to lock down the dependencies and not allow 
people to mix Jupiter and Vintage in the same class as was done in 
TestImageReadException and TestImageWriteException (presumably by mistake?).
   
   Probably mea culpa. I did 90% with IDE search-replace following the docs & 
looking at your CSV PR. But for Theories, Datapoints and other things that 
weren't in CSV I struggled a bit. Thanks for tidying it up :+1: 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Work logged] (IMAGING-242) Move to JUnit 5

2019-11-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/IMAGING-242?focusedWorklogId=348685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348685
 ]

ASF GitHub Bot logged work on IMAGING-242:
--

Author: ASF GitHub Bot
Created on: 24/Nov/19 10:18
Start Date: 24/Nov/19 10:18
Worklog Time Spent: 10m 
  Work Description: kinow commented on issue #61: IMAGING-242 Complete the 
migration to JUnit Jupiter
URL: https://github.com/apache/commons-imaging/pull/61#issuecomment-557874998
 
 
   Thanks a lot @mureinik ! I learned a bit more after this PR, and now Imaging 
is fully Junit 5 :confetti_ball: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348685)
Time Spent: 1h 20m  (was: 1h 10m)

> Move to JUnit 5
> ---
>
> Key: IMAGING-242
> URL: https://issues.apache.org/jira/browse/IMAGING-242
> Project: Commons Imaging
>  Issue Type: Improvement
>  Components: imaging.*
>Affects Versions: 1.0-alpha1
>Reporter: Bruno P. Kinoshita
>Assignee: Bruno P. Kinoshita
>Priority: Minor
> Fix For: 1.0-alpha2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (IMAGING-242) Move to JUnit 5

2019-11-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/IMAGING-242?focusedWorklogId=348686&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348686
 ]

ASF GitHub Bot logged work on IMAGING-242:
--

Author: ASF GitHub Bot
Created on: 24/Nov/19 10:18
Start Date: 24/Nov/19 10:18
Worklog Time Spent: 10m 
  Work Description: kinow commented on pull request #61: IMAGING-242 
Complete the migration to JUnit Jupiter
URL: https://github.com/apache/commons-imaging/pull/61
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348686)
Time Spent: 1.5h  (was: 1h 20m)

> Move to JUnit 5
> ---
>
> Key: IMAGING-242
> URL: https://issues.apache.org/jira/browse/IMAGING-242
> Project: Commons Imaging
>  Issue Type: Improvement
>  Components: imaging.*
>Affects Versions: 1.0-alpha1
>Reporter: Bruno P. Kinoshita
>Assignee: Bruno P. Kinoshita
>Priority: Minor
> Fix For: 1.0-alpha2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [commons-imaging] kinow merged pull request #61: IMAGING-242 Complete the migration to JUnit Jupiter

2019-11-24 Thread GitBox

kinow merged pull request #61: IMAGING-242 Complete the migration to JUnit 
Jupiter
URL: https://github.com/apache/commons-imaging/pull/61
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-imaging] kinow commented on a change in pull request #61: IMAGING-242 Complete the migration to JUnit Jupiter

2019-11-24 Thread GitBox

kinow commented on a change in pull request #61: IMAGING-242 Complete the 
migration to JUnit Jupiter
URL: https://github.com/apache/commons-imaging/pull/61#discussion_r349914359
 
 

 ##
 File path: 
src/test/java/org/apache/commons/imaging/roundtrip/BitmapRoundtripTest.java
 ##
 @@ -40,10 +38,12 @@
 TestImages.createBitmapBitmapImage(300, 300), // larger than 256
 };
 
-@DataPoints
-public static FormatInfo[] formatInfos = FormatInfo.READ_WRITE_FORMATS;
+public static Stream testBitmapRoundtrip() {
 
 Review comment:
   Ah! `Arguments`? Hadn't seen this in their docs. Brilliant!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [commons-imaging] kinow commented on issue #61: IMAGING-242 Complete the migration to JUnit Jupiter

2019-11-24 Thread GitBox

kinow commented on issue #61: IMAGING-242 Complete the migration to JUnit 
Jupiter
URL: https://github.com/apache/commons-imaging/pull/61#issuecomment-557874998
 
 
   Thanks a lot @mureinik ! I learned a bit more after this PR, and now Imaging 
is fully Junit 5 :confetti_ball: 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Work logged] (IMAGING-242) Move to JUnit 5

2019-11-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/IMAGING-242?focusedWorklogId=348684&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348684
 ]

ASF GitHub Bot logged work on IMAGING-242:
--

Author: ASF GitHub Bot
Created on: 24/Nov/19 10:17
Start Date: 24/Nov/19 10:17
Worklog Time Spent: 10m 
  Work Description: kinow commented on pull request #61: IMAGING-242 
Complete the migration to JUnit Jupiter
URL: https://github.com/apache/commons-imaging/pull/61#discussion_r349914359
 
 

 ##
 File path: 
src/test/java/org/apache/commons/imaging/roundtrip/BitmapRoundtripTest.java
 ##
 @@ -40,10 +38,12 @@
 TestImages.createBitmapBitmapImage(300, 300), // larger than 256
 };
 
-@DataPoints
-public static FormatInfo[] formatInfos = FormatInfo.READ_WRITE_FORMATS;
+public static Stream testBitmapRoundtrip() {
 
 Review comment:
   Ah! `Arguments`? Hadn't seen this in their docs. Brilliant!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348684)
Time Spent: 1h 10m  (was: 1h)

> Move to JUnit 5
> ---
>
> Key: IMAGING-242
> URL: https://issues.apache.org/jira/browse/IMAGING-242
> Project: Commons Imaging
>  Issue Type: Improvement
>  Components: imaging.*
>Affects Versions: 1.0-alpha1
>Reporter: Bruno P. Kinoshita
>Assignee: Bruno P. Kinoshita
>Priority: Minor
> Fix For: 1.0-alpha2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [commons-collections] Claudenw commented on issue #83: Initial bloom filter code contribution

2019-11-24 Thread GitBox

Claudenw commented on issue #83: Initial bloom filter code contribution
URL: 
https://github.com/apache/commons-collections/pull/83#issuecomment-557871331
 
 
   Are there any open issues against this pull request.
   
   The latest change places a dependency on commons-codec 1.14-SNAPSHOT so that 
would have to be released first. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (CODEC-268) Deprecate methods in MurmurHash3 with no equivalent in the reference c++ source.

2019-11-24 Thread Claude Warren (Jira)



[ 
https://issues.apache.org/jira/browse/CODEC-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981011#comment-16981011
 ] 

Claude Warren commented on CODEC-268:
-

I agree with most of this resolution steps in this issue with the following 
exceptions:

The older implementations should not be removed.  There are cases where the use 
of the "broken" implementation can not be easily changed.  For example if the 
MurmurHash is ues to build a hash of a file and that hash is then used to 
calculate a storage location.  An application using the hash in this way can 
not simply change the hash as all previously stored files will be unreachable.  
Therefore access to the old implementations should remain for some period of 
time.

I think that the older methods should be deprecated and developers pointed to 
he new implementations by means of javadoc.

But that the older implementations must remain to support earlier release that 
are in the wild. 


> Deprecate methods in MurmurHash3 with no equivalent in the reference c++ 
> source.
> 
>
> Key: CODEC-268
> URL: https://issues.apache.org/jira/browse/CODEC-268
> Project: Commons Codec
>  Issue Type: Wish
>Affects Versions: 1.13
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Minor
>
> The following methods have no equivalent in the [MurmurHash3 c++ source 
> code|https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp]
> {code:java}
> public static long hash64(final long data) {
> public static long hash64(final int data) {
> public static long hash64(final short data) {
> public static long hash64(final byte[] data) {
> public static long hash64(final byte[] data, final int offset, final int 
> length) {
> public static long hash64(final byte[] data, final int offset, final int 
> length, final int seed) {
> {code}
> They are documented to return the upper 64-bits of the 128-bit hash method. 
> This is false as the code neglects to mix in alternating blocks of 64-bits 
> with the corresponding other hash that would be built in the 128-bit method. 
> Thus this passes using a NotEquals check:
> {code:java}
> @Test
> public void testHash64InNotEqualToHash128() {
> for (int i = 0; i < 32; i++) {
> final byte[] bytes = Arrays.copyOf(RANDOM_BYTES, i);
> final long h1 = MurmurHash3.hash64(bytes);
> final long[] hash = MurmurHash3.hash128(bytes);
> Assert.assertNotEquals("Did not expect hash64 to match upper bits of 
> hash128", hash[0], h1);
> Assert.assertNotEquals("Did not expect hash64 to match lower bits of 
> hash128", hash[1], h1);
> }
> }
> {code}
> The following methods are convenience methods that use an arbitrary default 
> seed:
> {code:java}
> public static final int DEFAULT_SEED = 104729;
> public static int hash32(final long data1, final long data2) {
> public static int hash32(final long data1, final long data2, final int 
> seed) {
> public static int hash32(final long data) {
> public static int hash32(final long data, final int seed) {
> {code}
> Although they match calling hash32() with the equivalent bytes this assumes 
> big-endian format. The reference c++ code is designed for little endian. If 
> you hash the corresponding values using a Google Guava implementation with 
> the same seed then they do not match as Guava faithfully converts primitives 
> as little endian. Also note that the available methods for hashing primitives 
> are not the same as those in hash64.
> These methods thus make very little sense and should be removed as an 
> unwanted legacy from the port from Apache Hive.
> The following methods are convenience methods that use default encoding via 
> String.getBytes():
> {code:java}
> public static int hash32(final String data) {
> public static long[] hash128(final String data) {
> {code}
> These effectively do nothing and save 0 lines of code for the caller:
> {code:java}
> String s;
> int hash1 = MurmurHash3.hash32(s);
> int hash2 = MurmurHash3.hash32(s.getBytes());
> assert hash1 == hash2;
> {code}
> This is not even used:
> {code:java}
> public static final long NULL_HASHCODE = 2862933555777941757L;
> {code}
> The following methods for hash32, hash64 and hash128 are not consistent with 
> the available arguments:
> {code:java}
> public static int hash32(final byte[] data) {
> public static int hash32(final String data) {
> public static int hash32(final byte[] data, final int length) {
> public static int hash32(final byte[] data, final int length, final int 
> seed) {
> public static int hash32(final byte[] data, final int offset, final int 
> length, final int seed) {
> public static long hash64(final by

[GitHub] [commons-codec] Claudenw commented on issue #27: Murmur3fix

2019-11-24 Thread GitBox

Claudenw commented on issue #27: Murmur3fix
URL: https://github.com/apache/commons-codec/pull/27#issuecomment-557868665
 
 
   Nit: I think that in MurmurHash3Test line 740 it should use answers2 not
   answers:
   
   for (int i = 0; i < answers*2*.length; i++) {
   
   not
   
   for (int i = 0; i < answers.length; i++) {
   
   
   
   
   
   On Thu, Nov 21, 2019 at 4:13 PM Alex Herbert 
   wrote:
   
   > @Claudenw  I have created a separate ticket
   > for the sign extension bug in hash32 [CODEC-267].
   >
   > I fully documented what the class is currently doing. I rewrote the test
   > class entirely to ensure that all the documentation for what the various
   > methods are doing is tested. This tests hashing with random bytes from the
   > full range of [0-255]. The reference data is from Python mmh3 which is
   > based on the MurmurHash3 c++ code.
   >
   > I created tests to show the sign extension error.
   >
   > I then took your fixes for the hash32 bug and hash128 sign extension bug
   > and implemented them in master. The changes log has been attributed to you.
   >
   > Please can you verify that master passes any additional tests you have
   > written for the PR for the methods with bugs and the new methods,
   > hopefully, without bugs.
   >
   > The new API has a minimal footprint with 4 new methods and a new
   > IncrementHash32x86 class. I don't think all the other methods are
   > warranted. See [CODEC-268] for a wish to deprecate most of them.
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   
   
   -- 
   I like: Like Like - The likeliest place on the web
   
   LinkedIn: http://www.linkedin.com/in/claudewarren
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

40 matches

Mail list logo