[GitHub] [commons-compress] PeterAlfreadLee commented on issue #90: Compress-477 : add zip64 support for split zip

2020-02-02 Thread GitBox
PeterAlfreadLee commented on issue #90: Compress-477 : add zip64 support for 
split zip
URL: https://github.com/apache/commons-compress/pull/90#issuecomment-581281141
 
 
   > This is not really a new issue introduced with the split zips feature, 
MultiReadOnlySeekableByteChannel has been keeping channels open before as well.
   
   Yes, you're right. This is not a new issue with split zips. And as we all 
know that `MultiReadOnlySeekableByteChannel` is also used in 7zip split 
segments, I believe it's also affected by this.
   
   > OTOH I expect most split archives to only have a pretty small number of 
parts, an archive with more than 100 parts is something that I don't expect to 
be common. Therefore making sure we don't open too many streams is probably not 
that important at all.
   
   I agree. Actually I never meet any split zips with more than 20 segments. 
This is nothing common but only a special use case that's introduced in the zip 
specification.
   
   I agree that it's a extremely special use case and may not be used in most 
cases. Actually I have already finished this. So I will push the PR and it's up 
to you to decide whether to merge it or not. @bodewig 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (BEANUTILS-532) Require commons-beanutils library which supports commons-collections-4.1 version

2020-02-02 Thread AvanthikaNC (Jira)


[ 
https://issues.apache.org/jira/browse/BEANUTILS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028703#comment-17028703
 ] 

AvanthikaNC commented on BEANUTILS-532:
---

{color:#172b4d}Hi Gary D. Gregory,{color}

{color:#172b4d}We donot find the library available in the snapshot repository 
that you have provided us even though, we cannot release it to production as 
you mentioned it needs a bit more work to complete the required library.{color}

{color:#172b4d}Could you please provide us the Tentative date for Commons 
BeanUtils version supporting Commons Collections 4.x.library.{color}

 

{color:#172b4d}Thanks{color}

> Require commons-beanutils library which supports commons-collections-4.1 
> version 
> -
>
> Key: BEANUTILS-532
> URL: https://issues.apache.org/jira/browse/BEANUTILS-532
> Project: Commons BeanUtils
>  Issue Type: Bug
>  Components: Bean-Collections
>Reporter: AvanthikaNC
>Priority: Blocker
> Attachments: image-2020-01-31-14-52-43-114.png
>
>
> Hi Team,
>  
>  We are working on ATM SWITCH project and the project currently uses 
> commons-beanutils library 1.9.4 and we have upgraded to 
> commons-collections-4.1 as part of our project requirement as it contained 
> vulnerabilities.
> We are facing some errors due to the above mentioned upgrade as 
> commons-beanutils library 1.9.4 will support commons-collections 3.2.2 
> version.
> Now as per our requirement we cannot downgrade common-collections library but 
> we need commons-beanutils library which supports commons-collections4-4.1 
> version.
> Please provide your response asap.
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (COLLECTIONS-737) The test FluentIterableTest.size should be split

2020-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/COLLECTIONS-737?focusedWorklogId=380558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-380558
 ]

ASF GitHub Bot logged work on COLLECTIONS-737:
--

Author: ASF GitHub Bot
Created on: 02/Feb/20 16:56
Start Date: 02/Feb/20 16:56
Worklog Time Spent: 10m 
  Work Description: Prodigysov commented on issue #120: [COLLECTIONS-737] 
The test FluentIterableTest.size should be splitted
URL: 
https://github.com/apache/commons-collections/pull/120#issuecomment-581154786
 
 
   Sounds good, thanks for the review!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 380558)
Time Spent: 1h  (was: 50m)

> The test FluentIterableTest.size should be split
> 
>
> Key: COLLECTIONS-737
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-737
> Project: Commons Collections
>  Issue Type: Test
>  Components: Collection
>Reporter: Pengyu Nie
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The first part of FluentIterableTest.size is not testing function
> FluentIterable.size (see code copied below). Actually,
> FluentIterable.size will not be invoked at all, because
> FluentIterable.of(null) will throw an NPE before that. This part
> should be extracted as a separate unit test like
> FluentIterableTest.ofNull.
> {code:java}
> try {
> FluentIterable.of((Iterable) null).size();
> fail("expecting NullPointerException");
> } catch (final NullPointerException npe) {
> // expected
> }{code}
> I'll create a pull request for this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-collections] Prodigysov commented on issue #120: [COLLECTIONS-737] The test FluentIterableTest.size should be splitted

2020-02-02 Thread GitBox
Prodigysov commented on issue #120: [COLLECTIONS-737] The test 
FluentIterableTest.size should be splitted
URL: 
https://github.com/apache/commons-collections/pull/120#issuecomment-581154786
 
 
   Sounds good, thanks for the review!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (NUMBERS-70) Userguide and reports

2020-02-02 Thread Kaaira Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-70?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028479#comment-17028479
 ] 

Kaaira Gupta commented on NUMBERS-70:
-

Hey [~erans] !! Is this up for Outreachy summers 2020 as well?

> Userguide and reports
> -
>
> Key: NUMBERS-70
> URL: https://issues.apache.org/jira/browse/NUMBERS-70
> Project: Commons Numbers
>  Issue Type: Wish
>Reporter: Gilles Sadowski
>Priority: Minor
>  Labels: benchmark, documentation, gsoc2020
> Attachments: 0001-Angles-xdoc-is-added.patch, 
> 0001-Prime-xdoc-file-is-added.patch, 0001-Primes-xdoc-is-added.patch
>
>
> Review contents of the component towards providing an up-to-date userguide 
> and write benchmarking code for generating performance reports 
> ([JMH|http://openjdk.java.net/projects/code-tools/jmh/]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (COMPRESS-491) Deflate64CompressorInputStream.read(byte[]) works incorrectly

2020-02-02 Thread Stefan Bodewig (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig updated COMPRESS-491:

Component/s: Archivers

> Deflate64CompressorInputStream.read(byte[]) works incorrectly
> -
>
> Key: COMPRESS-491
> URL: https://issues.apache.org/jira/browse/COMPRESS-491
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.18
>Reporter: Juha Syrjälä
>Priority: Major
> Fix For: 1.20
>
>
> read(byte[])  method in 
> org.apache.commons.compress.compressors.deflate64.Deflate64CompressorInputStream
> return incorrectly value 0 sometimes:
> https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read(byte[])
> > Reads some number of bytes from the input stream and stores them into the 
> > buffer array b. The number of bytes actually read is returned as an 
> > integer. This method blocks until input data is available, end of file is 
> > detected, or an exception is thrown.
> If the length of b is zero, then no bytes are read and 0 is returned; 
> otherwise, there is an attempt to read at least one byte. If no byte is 
> available because the stream is at the end of the file, the value -1 is 
> returned; otherwise, at least one byte is read and stored into b.
> The first byte read is stored into element b[0], the next one into b[1], and 
> so on. The number of bytes read is, at most, equal to the length of b. Let k 
> be the number of bytes actually read; these bytes will be stored in elements 
> b[0] through b[k-1], leaving elements b[k] through b[b.length-1] unaffected.
> This means that `read` method can return `0` only when zero length byte array 
> is passed in.
> Otherwise read must block until there is at least 1 byte of data available, 
> or return -1 for end of stream.
> Currently in commons-compress 1.18, class 
> org.apache.commons.compress.compressors.deflate64.Deflate64CompressorInputStream
> returns `0` for some buffer sizes and some Zip files.
> See [https://github.com/jsyrjala/apache-commons-compress-bug] for test case



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUMBERS-143) Investigate Math.hypot for computing the absolute of a complex number

2020-02-02 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028450#comment-17028450
 ] 

Alex Herbert commented on NUMBERS-143:
--

Here are the definitions of what hypot is required to do from ISO C99:

7.12.7.3

The hypot functions compute the square root of the sum of the squares of x and 
y, without undue overflow or underflow. A range error may occur.

F.9.4.3

Special cases:
 * hypot(x, y), hypot(y, x), and hypot(x, −y) are equivalent.
 * hypot(x, ±0) is equivalent to |x|.
 * hypot(±∞, y) returns +∞, even if y is a NaN.

There is no requirement for a set level of accuracy. The fdlibm implementation 
of hypot specifies it achieves <1 ULP from the exact result.

Thus an implementation should handle scaling of the intermediate sum 
{{x^2+y^2}} to avoid over/underflow, be commutative and handle a special case 
of zero or infinite.

Here is the pseudocode for what fdlibm does:
{noformat}
// Get exponents of input x and y
// Get the absolutes: |x| and |y|
// Sort |x| and |y| by magnitude
// If the exponent difference is large: return |x| (this may be inf/nan)
// If the exponent is large then:
//   If |x| is inf/nan then: return inf/nan (depends on |y|)
//   else: scale down
// If the exponent is small then:
//   If |y| is zero then: return |x|
//   else: scale up
// Compute sqrt(x*x + y*y)
// Scale back the result
// return result
{noformat}
The code uses the exponent to perform magnitude checks and handle the special 
cases of inf/nan without ever requiring explicit identification of inf/nan.

There are variations on this to do the scaling and inf/nan checks. A key point 
is that the computation of {{x^2+y^2}} could be done using any method. The 
hypot function aims for high precision using a condition based on the relative 
magnitude of |x| and |y|:
{code:java}
final double w = x - y;
if (w > y) {
// x > 2y
// Extended precision computation of x^2, standard y^2
} else {
// 2y > x > y
// Extended precision computation of x^2 + y^2
}
{code}

The point here is that when y is a similar scale to x then any round-off of y^2 
is as important as round-off of x^2.

This branch is a key part of the performance of the method. For any uniformly 
distributed polar coordinate (magnitude, angle) input data, X and Y are the 
edges of a right angle triangle with the magnitude as the hypotenuse. If |x| 
and |y| can take any value then they will be within a 2-fold factor when the 
ratio of the edges of the triangle are between 2/1 and 1/2. The angle between 
them is arctan(2/1) - arctan(1/2) = arctan(3/4) (by trigonomic identity). This 
forms a segment in the quarter circle area of arctan(3/4) / (pi/2) = 0.41. Thus 
the branch to (2y > x > y) will be taken 41% of the time. If data is generated 
using a uniform distribution of x and y then the input data is a square (with 
one vertex the origin (0,0)) containing point (x,y). The branch (2y > x > y) 
will be taken 50% of the time. This follows from the lines (2,1) and (1,2) 
creating segments in a 2x2 square. The area not within the arc is (2*2 - 2*1/2 
- 2*1/2) / 4 = 2 / 4. (A diagram would have helped here.) The result being that 
for randomly simulated data that is uniform in polar or Cartesian coordinates 
this branch is taken around half the time. This is hard to predict for any 
input data and the processor cannot efficiently pipeline this computation. This 
will be demonstrated in results from JMH performance benchmarks.

The following candidate methods will be tested to compute {{x^2+y^2}}:

* fdlibm computation (involves a branch)
* simple x*x + y*y
* fused multiply add: Math.fma(x, x, y*y)
* extended precision x*x + y*y summation using Dekker's method then standard 
sqrt

Ideally the reference would be computed using 128-bit precision. This can be 
done using Java 9 BigDecimal which has a sqrt() function. An alternative is 
extended precision x*x + y*y summation and extended precision sqrt using 
Dekker's method [1] to set a reference value.

Results from an accuracy test and performance test will be added to this ticket.

1. [Dekker (1971) A floating-point technique for extending the available 
precision|https://doi.org/10.1007/BF01397083]


> Investigate Math.hypot for computing the absolute of a complex number
> -
>
> Key: NUMBERS-143
> URL: https://issues.apache.org/jira/browse/NUMBERS-143
> Project: Commons Numbers
>  Issue Type: Task
>  Components: complex
>Reporter: Alex Herbert
>Priority: Minor
>
> {{Math.hypot}} computes the value {{sqrt(x^2+y^2)}} to within 1 ULP. The 
> function uses the [e_hypot.c|https://www.netlib.org/fdlibm/e_hypot.c] 
> implementation from the Freely Distributable Math Library (fdlibm).
> Pre-java 9 this function used JNI to call an external implementation. The 

[jira] [Commented] (COMPRESS-502) Allow to disable closing files in the finalizer of ZipFile

2020-02-02 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028444#comment-17028444
 ] 

Stefan Bodewig commented on COMPRESS-502:
-

Sounds as if both [~ggregory] and I could live with either a constructor-arg or 
an instance setter.

> Allow to disable closing files in the finalizer of ZipFile
> --
>
> Key: COMPRESS-502
> URL: https://issues.apache.org/jira/browse/COMPRESS-502
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Compressors
>Affects Versions: 1.19
>Reporter: Dominik Stadler
>Priority: Major
>
> Apache POI uses commons-compress for handling ZipFiles. We found that it 
> sometimes does some auto-close magic in the finalizer of the ZipFile class 
> with printing out to stderr, see 
> https://gitbox.apache.org/repos/asf?p=commons-compress.git;a=blob;f=src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java;h=23194560dace91d8052626f3bdc8f765f9c46d7e;hb=HEAD#l652.
>  
> This has some shortcomings:
>  * It prints to stderr which many large-scale applications try to avoid by 
> using some logging framework, thus this output might "vanish" unseen in some 
> installations or might cause unexpected side-effects
>  * It prevents us from using tools for checking file leaks, e.g. we use 
> [https://github.com/kohsuke/file-leak-detector/] heavily for analyzing 
> test-runs for leaked file-handles, but commons-compress prevents this because 
> it "hides" the unclosed file from this functionality
>  * The behavior of automatic closing and reporting the problem is 
> non-reproducible because it depends on finalizer/garbage-collection and thus 
> any re-runs or unit-tests usually do not show the same behavior
>  
> There are some fairly simple options to improve this:
>  * Allow to disable this functionality via configuration/system-property/...
>  * Make this "pluggable" so a logging-framework can be plugged-in or closing 
> can be prevented for certain runs
>  
> I can provide a simple patch if you state which approach you think would make 
> most sense here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-compress] bodewig commented on issue #90: Compress-477 : add zip64 support for split zip

2020-02-02 Thread GitBox
bodewig commented on issue #90: Compress-477 : add zip64 support for split zip
URL: https://github.com/apache/commons-compress/pull/90#issuecomment-581135612
 
 
   This is not really a new issue introduced with the split zips feature, 
`MultiReadOnlySeekableByteChannel` has been keeping channels open before as 
well.
   
   Honestly I haven't got any idea of how much "jumping around" reading split 
archives (zip or 7z) actually involves. In both cases we read the channels 
containing file meta data once and will likely never go back to them. So a 
small number of open channels may be sufficient. OTOH I expect most split 
archives to only have a pretty small number of parts, an archive with more than 
100 parts is something that I don't expect to be common. Therefore making sure 
we don't open too many streams is probably not that important at all.
   
   I realize I'm not answering your question :-)
   
   I'd make the number of simultaneously opened streams configurable. 
"infinite" might even be a good default so that people only need to deal with 
it explicitly if they run into trouble. But in real life situations I'd expect 
any number bigger than say 20 to have the same effect as infinity (i.e. all 
channels are open as there are no more than 20 channels anyway).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (COMPRESS-494) ZipArchieveInputStream component is throwing "Invalid Entry Size"

2020-02-02 Thread Stefan Bodewig (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig resolved COMPRESS-494.
-
Resolution: Not A Bug

As I explained in my last comment here and later in COMPRESS-500 there are 
certain archives that are impossible to read with {{ZipArchiveInputStream}}. We 
have documented this and by now throw a helpful exception (thanks to 
COMPRESS-483) if such a file is encountered. Unfortunately you are facing such 
an archive and there is no workaround.

> ZipArchieveInputStream component is throwing "Invalid Entry Size"
> -
>
> Key: COMPRESS-494
> URL: https://issues.apache.org/jira/browse/COMPRESS-494
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.8, 1.18
>Reporter: Anvesh Mora
>Priority: Critical
> Attachments: commons-compress-1.20-SNAPSHOT.jar
>
>
> I've observed in my development in certain zip files which we are able to 
> extract with with unzip utility on linux is failing with our Compress library.
>  
> As of now I've stack-trace to share, I'm gonna add more in here as on when 
> discussion begins on this:
>  
> {code:java}
> Caused by: java.lang.IllegalArgumentException: invalid entry size
> at 
> org.apache.commons.compress.archivers.zip.ZipArchiveEntry.setSize(ZipArchiveEntry.java:550)
> at 
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readDataDescriptor(ZipArchiveI
> nputStream.java:702)
> at 
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.bufferContainsSignature(ZipArc
> hiveInputStream.java:805)
> at 
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readStoredEntry(ZipArchiveInpu
> tStream.java:758)
> at 
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readStored(ZipArchiveInputStre
> am.java:407)
> at 
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.jav
> a:382)
> {code}
> I missed to add version info, below are those:
> version of lib I'm using is: 1.9
> And I did try version 1.18, issue is observed in this version too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (COMPRESS-500) Discrepancy in file size extracted using ZipArchieveInputStream and Gzip decompress component

2020-02-02 Thread Stefan Bodewig (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig resolved COMPRESS-500.
-
Resolution: Not A Bug

> Discrepancy in file size extracted using ZipArchieveInputStream and Gzip 
> decompress component 
> --
>
> Key: COMPRESS-500
> URL: https://issues.apache.org/jira/browse/COMPRESS-500
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.8, 1.18
>Reporter: Anvesh Mora
>Priority: Major
> Attachments: Compress500.java, invalidzip.zip.partaa, 
> invalidzip.zip.partab, invalidzip.zip.partac, invalidzip.zip.partad, 
> invalidzip.zip.partae, invalidzip.zip.partaf, invalidzip.zip.partag, 
> invalidzip.zip.partah, invalidzip.zip.partai
>
>
> Recent time I raised a bug facing a issue of "invalid Entry Size"  
> COMPRESS-494 ( Not resolved yet).
>  
> And we are seeing a new issue, before explaining we have a file structure as 
> below and it is received as a stream of data over HTTPS.
>  
> *File Structure*:
> In Zip file
>      We have zero or more gz files which need to decompressed
>      And meta data at the end of the zip entries (end of stream), used for 
> downloading next file zip file. As plain text.
>  
> And Now in production we are seeing new issue where we the entire gz file is 
> not decompressing. We found out that the utility on Cent OS7 is able to 
> extract and decompress the entire where as our library is failing. Below are 
> the differences in Sizes:
> Using API: *765460480* bytes
> And using Cent OS7 Linux utilities: *2032925215* bytes.
>  
> We are getting EOF File exception at GzipCompressorInputStream.java:278, I'm 
> not sure of why.
>  
> Need you help on this as we are blocked in the production. This could be a 
> potential fix for our library to make it more robust.
>  
> Let me know HOW CAN WE INCREASE THE PRIORITY IF NEEDED!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (COMPRESS-500) Discrepancy in file size extracted using ZipArchieveInputStream and Gzip decompress component

2020-02-02 Thread Stefan Bodewig (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig updated COMPRESS-500:

Attachment: Compress500.java

> Discrepancy in file size extracted using ZipArchieveInputStream and Gzip 
> decompress component 
> --
>
> Key: COMPRESS-500
> URL: https://issues.apache.org/jira/browse/COMPRESS-500
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.8, 1.18
>Reporter: Anvesh Mora
>Priority: Major
> Attachments: Compress500.java, invalidzip.zip.partaa, 
> invalidzip.zip.partab, invalidzip.zip.partac, invalidzip.zip.partad, 
> invalidzip.zip.partae, invalidzip.zip.partaf, invalidzip.zip.partag, 
> invalidzip.zip.partah, invalidzip.zip.partai
>
>
> Recent time I raised a bug facing a issue of "invalid Entry Size"  
> COMPRESS-494 ( Not resolved yet).
>  
> And we are seeing a new issue, before explaining we have a file structure as 
> below and it is received as a stream of data over HTTPS.
>  
> *File Structure*:
> In Zip file
>      We have zero or more gz files which need to decompressed
>      And meta data at the end of the zip entries (end of stream), used for 
> downloading next file zip file. As plain text.
>  
> And Now in production we are seeing new issue where we the entire gz file is 
> not decompressing. We found out that the utility on Cent OS7 is able to 
> extract and decompress the entire where as our library is failing. Below are 
> the differences in Sizes:
> Using API: *765460480* bytes
> And using Cent OS7 Linux utilities: *2032925215* bytes.
>  
> We are getting EOF File exception at GzipCompressorInputStream.java:278, I'm 
> not sure of why.
>  
> Need you help on this as we are blocked in the production. This could be a 
> potential fix for our library to make it more robust.
>  
> Let me know HOW CAN WE INCREASE THE PRIORITY IF NEEDED!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-500) Discrepancy in file size extracted using ZipArchieveInputStream and Gzip decompress component

2020-02-02 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028427#comment-17028427
 ] 

Stefan Bodewig commented on COMPRESS-500:
-

I've written a small test program and will attach it. Running it you see:

 
 * things work as expected when using ZipFile
 * things don't work at all and you get an exception when using 
ZipArchiveInputStream without explicitly allowing the combination of data 
descriptors and stord entries. This has been true in 1.8 and remains true
 * when allowing the combination Compress 1.19 and later throws an Exception

{code:java}
java.util.zip.ZipException: compressed and uncompressed size don't match while 
reading a stored entry using data descriptor. Either the archive is broken or 
it can not be read using ZipArchiveInputStream and you must use ZipFile. A 
common cause for this is a ZIP archive containing a ZIP archive. See 
http://commons.apache.org/proper/commons-compress/zip.html#ZipArchiveInputStream_vs_ZipFile
{code}
So unfortunately what I suspected is true. You are looking at the kind of 
archive that is not possible to extract using {{ZipArchiveInputStream}} and 
there is no workaround for it.

If you create the archive yourself, either ensure you don't use a data 
descriptor and store the size information inside of the local file header or 
use the DEFLATED method, as wasteful as it may seem.

If you do not control the original archive, then you must store it to disk or 
keep it in memory (see {{SeekableInMemoryByteChannel}}) and use {{ZipFile}}.

> Discrepancy in file size extracted using ZipArchieveInputStream and Gzip 
> decompress component 
> --
>
> Key: COMPRESS-500
> URL: https://issues.apache.org/jira/browse/COMPRESS-500
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.8, 1.18
>Reporter: Anvesh Mora
>Priority: Major
> Attachments: invalidzip.zip.partaa, invalidzip.zip.partab, 
> invalidzip.zip.partac, invalidzip.zip.partad, invalidzip.zip.partae, 
> invalidzip.zip.partaf, invalidzip.zip.partag, invalidzip.zip.partah, 
> invalidzip.zip.partai
>
>
> Recent time I raised a bug facing a issue of "invalid Entry Size"  
> COMPRESS-494 ( Not resolved yet).
>  
> And we are seeing a new issue, before explaining we have a file structure as 
> below and it is received as a stream of data over HTTPS.
>  
> *File Structure*:
> In Zip file
>      We have zero or more gz files which need to decompressed
>      And meta data at the end of the zip entries (end of stream), used for 
> downloading next file zip file. As plain text.
>  
> And Now in production we are seeing new issue where we the entire gz file is 
> not decompressing. We found out that the utility on Cent OS7 is able to 
> extract and decompress the entire where as our library is failing. Below are 
> the differences in Sizes:
> Using API: *765460480* bytes
> And using Cent OS7 Linux utilities: *2032925215* bytes.
>  
> We are getting EOF File exception at GzipCompressorInputStream.java:278, I'm 
> not sure of why.
>  
> Need you help on this as we are blocked in the production. This could be a 
> potential fix for our library to make it more robust.
>  
> Let me know HOW CAN WE INCREASE THE PRIORITY IF NEEDED!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JEXL-323) Ant-style variables can throw exception when evaluated for their value

2020-02-02 Thread Dmitri Blinov (Jira)


[ 
https://issues.apache.org/jira/browse/JEXL-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028425#comment-17028425
 ] 

Dmitri Blinov commented on JEXL-323:


Not sure whether its worth creating a separate issue for this, but the 
following test case fails, because we are mixing null values with undefined 
variables when resolving antish variables.
{code:java}
@Test
public void testBadAnt() throws Exception {
JexlEvalContext ctxt = new JexlEvalContext();
JexlOptions options = ctxt.getEngineOptions();
ctxt.set("x.y", 42);
JexlScript script = JEXL.createScript("var x = null; x.y");
try {
Object result = script.execute(ctxt);
Assert.fail("antish var shall not be resolved");
} catch(JexlException xother) {
Assert.assertTrue(xother != null);
}
}
{code}

> Ant-style variables can throw exception when evaluated for their value
> --
>
> Key: JEXL-323
> URL: https://issues.apache.org/jira/browse/JEXL-323
> Project: Commons JEXL
>  Issue Type: Bug
>Affects Versions: 3.1
>Reporter: David Costanzo
>Assignee: Henri Biestro
>Priority: Minor
> Fix For: 3.2
>
>
> When try to evaluate an expression that is the name of a variable and the 
> value is null, I get the value null. This is good. However, when I do the 
> same thing with an ant-style variable name, a JexlException$Variable is 
> thrown claiming that the variable is null. I think this is a bug because I 
> would expect all variables to behave the same, regardless of their name.
> The reason for this behavior is evident in Interpreter.visit() and 
> InterpreterBase.unsolvableVariable().  There is already special-case logic to 
> detect when an ant variable is null versus when it's undefined, and this 
> information is given to unsolvableVariable(), but it still throws an 
> exception.
>  
> {code:java}
> if (object == null && !isTernaryProtected(node)) {
> if (antish && ant != null) {
> // V--- NOTE: context.has() returns true, so undefined is false
> boolean undefined = !(context.has(ant.toString()) || 
> isLocalVariable(node, 0));
> // variable unknown in context and not a local
> return unsolvableVariable(node, ant.toString(), undefined); // 
> <-- still throws exception
> } else if (!pty) {
> return unsolvableProperty(node, ".", null);
> }
> }
>  {code}
> In in unsolvableVariable():
>  
> {code:java}
> protected Object unsolvableVariable(JexlNode node, String var, boolean undef) 
> {
> // V-- NOTE: both my engine and arithmetic are strict, so this evaluates 
> to true
> if (isStrictEngine() && (undef || arithmetic.isStrict())) {
> throw new JexlException.Variable(node, var, undef);
> } else if (logger.isDebugEnabled()) {
> logger.debug(JexlException.variableError(node, var, undef));
> }
> return null;
> }
>  {code}
>  
>  
> h3. Steps to Reproduce:
>  
> {code:java}
> @Test
> public void testNullAntVariable() throws IOException {
> // Create or retrieve an engine
> JexlEngine jexl = new JexlBuilder().create();
> // on recent code: JexlEngine jexl = new 
> JexlBuilder().safe(false).create();
> // Populate to identical global variables
> JexlContext jc = new MapContext();
> jc.set("NormalVariable", null);
> jc.set("ant.variable", null);
> // Evaluate the value of the normal variable
> JexlExpression expression1 = jexl.createExpression("NormalVariable");
> Object o1 = expression1.evaluate(jc);
> Assert.assertEquals(null, o1);
> // Evaluate the value of the ant-style variable
> JexlExpression expression2 = jexl.createExpression("ant.variable");
> Object o2 = expression2.evaluate(jc); // <-- BUG: throws exception 
> instead of returning null
> Assert.assertEquals(null, o2);
> }
> {code}
>  
>  
> h3. What Happens:
> "expression2.evaluate(jc)" throws an JexlException$Variable exception with 
> text like "variable 'ant.variable' is null".
> h3. Expected Result:
> expression2.evaluate(jc) returns the value of 'ant.variable', which is null.
> h3.  
> Note:
> This was found on JEXL 3.1, the latest official release. I reproduced it on a 
> snapshot of JEXL 3.2 built from github source, but had to disable "safe".
> h3.  
> Impact:
> My organization uses JEXL to build datasets for clinical trials. In our 
> domain, it's very common to have an expression that is simply the name of a 
> variable whose value is desired. In our domain, we want any sloppy 
> expressions to be a hard error, so we we use strict engines and will use 
> "safe=false" when we update to JEXL 3.2. In our domain, "null" has a specific 
> meaning (it means "missing").  A 

[jira] [Created] (NUMBERS-143) Investigate Math.hypot for computing the absolute of a complex number

2020-02-02 Thread Alex Herbert (Jira)
Alex Herbert created NUMBERS-143:


 Summary: Investigate Math.hypot for computing the absolute of a 
complex number
 Key: NUMBERS-143
 URL: https://issues.apache.org/jira/browse/NUMBERS-143
 Project: Commons Numbers
  Issue Type: Task
  Components: complex
Reporter: Alex Herbert


{{Math.hypot}} computes the value {{sqrt(x^2+y^2)}} to within 1 ULP. The 
function uses the [e_hypot.c|https://www.netlib.org/fdlibm/e_hypot.c] 
implementation from the Freely Distributable Math Library (fdlibm).

Pre-java 9 this function used JNI to call an external implementation. The 
performance was slow. Java 9 ported the function to native java (see 
[JDK-7130085 : Port fdlibm hypot to 
Java|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7130085]).

This function is used to define the absolute value of a complex number. It is 
also used in sqrt() and log(). This ticket is to investigate the performance 
and accuracy of \{{Math.hypot}} against alternatives for use in Complex.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEOMETRY-50) Overflow in Vector norm and distance

2020-02-02 Thread Alex Herbert (Jira)


[ 
https://issues.apache.org/jira/browse/GEOMETRY-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028420#comment-17028420
 ] 

Alex Herbert commented on GEOMETRY-50:
--

Checking each value for over/underflow is costly. Even if the branch 
practically never occurs and branch prediction can learn to ignore the 'safe' 
branches there is still a cost to having to ensure those branches are not 
needed. You could run this with a size of 10 and see if the SafeNorm relatively 
improves. At that data size the branch prediction will be much better.

Also note that SafeNorm is written for arrays of any length. It could be 
unrolled for length 2 and 3. That may be an interesting addition to the 
benchmark.

{{Math.hypot}} ensures accuracy below 1 ULP. It does this with careful 
computation of the {{x^2+y^2}} to minimise round-off. This may not be needed 
here. However my tests show that this computation is not the only factor in 
performance, the number of unpredictable branches in the code play a key role.

I am going to put some analysis results on {{Math.hypot}} under a ticket for 
numbers. I'll link to this issue for reference.

 

> Overflow in Vector norm and distance
> 
>
> Key: GEOMETRY-50
> URL: https://issues.apache.org/jira/browse/GEOMETRY-50
> Project: Apache Commons Geometry
>  Issue Type: Bug
>Reporter: Baljit Singh
>Priority: Major
>
> In Euclidean Vector classes (Vector2D, Vector3D), norm() and distance() rely 
> on Math.sqrt(), which can overflow if the components of the vectors are 
> large. Instead, they should rely on SafeNorm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUMBERS-142) Improve LinearCombination accuracy during summation of the round-off errors

2020-02-02 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/NUMBERS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028370#comment-17028370
 ] 

Gilles Sadowski commented on NUMBERS-142:
-

bq. retain the use of primitives for performance and convenience.

Sure; it is necessary even if just to quantify the impact (of using objects) on 
performance.

bq. add alternative implementations that use Field.

That would be the proposal.  I was wondering whether this task could become a 
GSoC project.

> Improve LinearCombination accuracy during summation of the round-off errors
> ---
>
> Key: NUMBERS-142
> URL: https://issues.apache.org/jira/browse/NUMBERS-142
> Project: Commons Numbers
>  Issue Type: Improvement
>  Components: arrays
>Affects Versions: 1.0
>Reporter: Alex Herbert
>Assignee: Alex Herbert
>Priority: Minor
> Attachments: array_performance.jpg, cond_no.jpg, 
> error_vs_condition_no.jpg, inline_perfomance.jpg
>
>
> The algorithm in LinearCombination is an implementation of dot2 from [Ogita 
> el al|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547] 
> (algorithm 5.3). There is a subtle difference in that the original dot2 
> algorithm sums the round-off from the products and the round-off from the 
> product summation together. The method in LinearCombination sums them 
> separately (using an extra variable) and adds them at the end. This actually 
> improves the accuracy under conditions where the round-off is of greater sum 
> than the products, as described below.
> The dot2 algorithm suffers when the total sum is close to 0 but the 
> intermediate sum is far enough from zero that there is a large difference 
> between the exponents of summed terms and the final result. In this case the 
> sum of the round-off is more important than the sum of the products which due 
> to massive cancellation is zero. The case is important for Complex numbers 
> which require a computation of log1p(x^2+y^2-1) when x^2+y^2 is close to 1 
> such that log(1) would be ~zero yet the true logarithm is representable to 
> very high precision.
> This can be protected against by using the dotK algorithm of Ogita et al with 
> K>2. This saves all the round-off parts from each product and the running 
> sum. These are subject to an error free transform that repeatedly adds 
> adjacent pairs to generate a new split pair with a closer upper and lower 
> part. Over time this will order the parts from low to high and these can be 
> summed low first for an error free dot product.
> Using this algorithm with a single pass (K=3 for dot3) removes the 
> cancellation error observed in the mentioned use case. Adding a single pass 
> over the parts changes the algorithm from 25n floating point operations 
> (flops) to 31n flops for the sum of n products.
> A second change for the algorithm is to switch to using 
> [Dekker's|https://doi.org/10.1007/BF01397083] algorithm (Dekker, 1971) to 
> split the number. This extracts two 26-bit mantissas from a 53-bit mantis (in 
> IEEE 754 the leading bit in front of the of the 52-bit mantissa is assumed to 
> be 1). This is done by multiplication by 2^s+1 with s = ceil(53/2) = 27:
> big = (2^s+1) * a
> a_hi = (big - (big - a))
> The extra bit of precision is carried into the sign of the low part of the 
> split number.
> This is in contrast to the method in LinearCombination that uses a simple 
> mask on the long representation to obtain the a_hi part in 26-bits and the 
> lower part will be 27 bits.
> The advantage of Dekker's method is it produces 2 parts with 26 bits in the 
> mantissa that can be multiplied exactly. The disadvantage is the potential 
> for overflow requiring a branch condition to check for extreme values.
> It also appropriately handles very small sub-normal numbers that would be 
> masked to create a 0 high part with all the non-zero bits left in the low 
> part using the current method. This will have knock on effects on split 
> multiplication which requires the high part to be larger.
> A simple change to the current implementation to use Dekker's split improves 
> the precision on a wide range of test data (not shown).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEOMETRY-50) Overflow in Vector norm and distance

2020-02-02 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/GEOMETRY-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028366#comment-17028366
 ] 

Gilles Sadowski commented on GEOMETRY-50:
-

In the the context of "Commons Geometry" usage, are there examples where 
{{Math.hypot}} is required?  If not, a common implementation of the norm 
computation would avoid some future puzzling about the performance issue.

bq. {{SafeNorm}} [is] a modest performance hit in 3D.

(5026-4191)/(12742-4111) = 0.0967
So, if IIUC the above table, {{SafeNorm}} is ~10 times slower.


> Overflow in Vector norm and distance
> 
>
> Key: GEOMETRY-50
> URL: https://issues.apache.org/jira/browse/GEOMETRY-50
> Project: Apache Commons Geometry
>  Issue Type: Bug
>Reporter: Baljit Singh
>Priority: Major
>
> In Euclidean Vector classes (Vector2D, Vector3D), norm() and distance() rely 
> on Math.sqrt(), which can overflow if the components of the vectors are 
> large. Instead, they should rely on SafeNorm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)