[jira] [Updated] (STATISTICS-71) Implementation of Univariate Statistics

2023-07-01 Thread Anirudh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/STATISTICS-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Joshi updated STATISTICS-71:

Description: 
Jira ticket to track the implementation of the Univariate statistics required 
for the updated SummaryStatistics API. 
The implementation would be "storeless". It should be used for calculating 
statistics that can be computed in one pass through the data without storing 
the sample values.

Currently I have the definition of API as (this might evolve as I continue 
working)
{code:java}
public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier {

DoubleStorelessUnivariateStatistic add(double v);

long getCount();

void combine(StorelessUnivariateStatistic other);
} {code}
 

  was:
Jira ticket to track the implementation of the Univariate statistics required 
for the updated SummaryStatistics API. 
The implementation would be "storeless". It should be used for calculating 
statistics that can be computed in one pass through the data without storing 
the sample values.

Currently I have the definition of API as (this might evolve as I continue 
working)
{code:java}
public interface StorelessUnivariateStatistic extends DoubleConsumer, 
DoubleSupplier {

StorelessUnivariateStatistic add(double d);

StorelessUnivariateStatistic addAll(double[] values);

StorelessUnivariateStatistic addAll(double[] values, int start, int length);

long getN();

void combine(StorelessUnivariateStatistic other);
} {code}


> Implementation of Univariate Statistics
> ---
>
> Key: STATISTICS-71
> URL: https://issues.apache.org/jira/browse/STATISTICS-71
> Project: Commons Statistics
>  Issue Type: Task
>  Components: descriptive
>Reporter: Anirudh Joshi
>Priority: Minor
>  Labels: gsoc, gsoc2023
>
> Jira ticket to track the implementation of the Univariate statistics required 
> for the updated SummaryStatistics API. 
> The implementation would be "storeless". It should be used for calculating 
> statistics that can be computed in one pass through the data without storing 
> the sample values.
> Currently I have the definition of API as (this might evolve as I continue 
> working)
> {code:java}
> public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier {
> DoubleStorelessUnivariateStatistic add(double v);
> long getCount();
> void combine(StorelessUnivariateStatistic other);
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMAGING-356) TIFF reading extremely slow in version 1.0-SNAPSHOT

2023-07-01 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/IMAGING-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739337#comment-17739337
 ] 

Gary D. Gregory commented on IMAGING-356:
-

[~gwlucas] 

Please try the latest from git master. I just update the size() implementation. 

> TIFF reading extremely slow in version 1.0-SNAPSHOT
> ---
>
> Key: IMAGING-356
> URL: https://issues.apache.org/jira/browse/IMAGING-356
> Project: Commons Imaging
>  Issue Type: Bug
>  Components: Format: TIFF
>Affects Versions: 1.0
>Reporter: Gary Lucas
>Priority: Major
>
> I am using the latest code from github (1.0-SNAPSHOT downloaded from github 
> of June 2023) to read a 300 megabyte TIFF file.  Version 1.0-alpha3 required 
> 673 milliseconds to read that file.  The new code requires upward of 15 
> minutes.   Clearly something got broken since the last release.
> The TIFF file is a 1x1 pixel 4 byte image format organized in strips. 
>  The bottleneck appears to occur in the TiffReader getTiffRawImageData method 
> which reads raw data from the file in preparation of creating a BufferedImage 
> object.
> I suspect that there may be a general slowness of file access.  In debugging, 
> even reading the initial metadata (22 TIFF tags) took a couple of seconds.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [commons-lang] orionlibs opened a new pull request, #1078: updated the JavaDoc for the insert methods in ArrayUtils

2023-07-01 Thread via GitHub


orionlibs opened a new pull request, #1078:
URL: https://github.com/apache/commons-lang/pull/1078

   updated the JavaDoc for the insert methods in ArrayUtils to say the methods 
return also null if the input array is null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [commons-lang] orionlibs opened a new pull request, #1077: refactored ArrayUtils to reuse the getLength(Object array) method

2023-07-01 Thread via GitHub


orionlibs opened a new pull request, #1077:
URL: https://github.com/apache/commons-lang/pull/1077

   refactored ArrayUtils to reuse the getLength(Object array) method inside the 
class instead of calling Array.getLength()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [commons-lang] orionlibs opened a new pull request, #1076: refactored an addAll method in ArrayUtils

2023-07-01 Thread via GitHub


orionlibs opened a new pull request, #1076:
URL: https://github.com/apache/commons-lang/pull/1076

   refactored an addAll method in ArrayUtils to reuse code and defer the array1 
component type check to the exception catch block


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [commons-lang] orionlibs opened a new pull request, #1075: ArchUtils refactoring

2023-07-01 Thread via GitHub


orionlibs opened a new pull request, #1075:
URL: https://github.com/apache/commons-lang/pull/1075

   The logic to get the Processor for a given architecture string is spread 
across the init_* methods. A cleaner design would be to have a separate private 
method that maps the architecture string to a Processor, and calls that from 
the init_* methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (IMAGING-356) TIFF reading extremely slow in version 1.0-SNAPSHOT

2023-07-01 Thread Gary Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/IMAGING-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739288#comment-17739288
 ] 

Gary Lucas commented on IMAGING-356:


I haven't studied the changes that were made, so I can't offer any 
authoritative recommendations on the approach.  Instead, I have a few general 
observations about the way TIFF files work that may be useful in figuring how 
you tackle the problem.  Or perhaps not. So take them with a grain of salt.

TIFF files are kind of a special case in terms of image formats. First off, one 
can never assume that a TIFF file is going to be accessed in-order.  It is 
common for the the "directory" section of the file (which tells how its 
organized) to come last rather than first. And, of course, a TIFF file may have 
multiple directories (because it may contain multiple images). Second, TIFF 
files are typically quite large, often in the hundreds of megabytes range, and 
sometimes in the gigabyte range.  So it is often preferred to not keep the 
entire thing in memory. In many cases, an application will not  access the 
entire file, but only a subsection.  For example, a mapping program displaying 
an aerial photograph might only access the subsection of the photograph that is 
actually visible on the map. And finally, I note that TIFF files are often not 
images at all, but are used to store numerical raster data (such as Earth 
elevation and ocean depth data). 

All of this means that the file-access pattern for a TIFF file is a closer fit 
to the idea of a random access file rather than the idea of a sequential IO 
channel such as a network socket or a serial device.  I know that the PNG 
format (the only other one I've studied in depth)  was designed with network 
access specifically in mind.  The TIFF format evolved before network access was 
in the ascendency as it is today.

That being said, even the original Commons Imaging approach to TIFF file IO 
wasn't quite a perfect fit. For one thing, the original authors open and close 
a file multiple times (as they access each part of the file) . That is 
suboptimal since opening and closing a file carries its own performance 
overhead.  Also, when I was looking at refactoring Commons Imaging IO to 
implement Closeable to support of try-with-resources blocks, I didn't see a way 
to accomplish that without a significant rewrite and compatibility breaking 
changes to the public API.  



> TIFF reading extremely slow in version 1.0-SNAPSHOT
> ---
>
> Key: IMAGING-356
> URL: https://issues.apache.org/jira/browse/IMAGING-356
> Project: Commons Imaging
>  Issue Type: Bug
>  Components: Format: TIFF
>Affects Versions: 1.0
>Reporter: Gary Lucas
>Priority: Major
>
> I am using the latest code from github (1.0-SNAPSHOT downloaded from github 
> of June 2023) to read a 300 megabyte TIFF file.  Version 1.0-alpha3 required 
> 673 milliseconds to read that file.  The new code requires upward of 15 
> minutes.   Clearly something got broken since the last release.
> The TIFF file is a 1x1 pixel 4 byte image format organized in strips. 
>  The bottleneck appears to occur in the TiffReader getTiffRawImageData method 
> which reads raw data from the file in preparation of creating a BufferedImage 
> object.
> I suspect that there may be a general slowness of file access.  In debugging, 
> even reading the initial metadata (22 TIFF tags) took a couple of seconds.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)