[jira] [Updated] (STATISTICS-71) Implementation of Univariate Statistics
[ https://issues.apache.org/jira/browse/STATISTICS-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anirudh Joshi updated STATISTICS-71: Description: Jira ticket to track the implementation of the Univariate statistics required for the updated SummaryStatistics API. The implementation would be "storeless". It should be used for calculating statistics that can be computed in one pass through the data without storing the sample values. Currently I have the definition of API as (this might evolve as I continue working) {code:java} public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier { DoubleStorelessUnivariateStatistic add(double v); long getCount(); void combine(StorelessUnivariateStatistic other); } {code} was: Jira ticket to track the implementation of the Univariate statistics required for the updated SummaryStatistics API. The implementation would be "storeless". It should be used for calculating statistics that can be computed in one pass through the data without storing the sample values. Currently I have the definition of API as (this might evolve as I continue working) {code:java} public interface StorelessUnivariateStatistic extends DoubleConsumer, DoubleSupplier { StorelessUnivariateStatistic add(double d); StorelessUnivariateStatistic addAll(double[] values); StorelessUnivariateStatistic addAll(double[] values, int start, int length); long getN(); void combine(StorelessUnivariateStatistic other); } {code} > Implementation of Univariate Statistics > --- > > Key: STATISTICS-71 > URL: https://issues.apache.org/jira/browse/STATISTICS-71 > Project: Commons Statistics > Issue Type: Task > Components: descriptive >Reporter: Anirudh Joshi >Priority: Minor > Labels: gsoc, gsoc2023 > > Jira ticket to track the implementation of the Univariate statistics required > for the updated SummaryStatistics API. > The implementation would be "storeless". It should be used for calculating > statistics that can be computed in one pass through the data without storing > the sample values. > Currently I have the definition of API as (this might evolve as I continue > working) > {code:java} > public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier { > DoubleStorelessUnivariateStatistic add(double v); > long getCount(); > void combine(StorelessUnivariateStatistic other); > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMAGING-356) TIFF reading extremely slow in version 1.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IMAGING-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739337#comment-17739337 ] Gary D. Gregory commented on IMAGING-356: - [~gwlucas] Please try the latest from git master. I just update the size() implementation. > TIFF reading extremely slow in version 1.0-SNAPSHOT > --- > > Key: IMAGING-356 > URL: https://issues.apache.org/jira/browse/IMAGING-356 > Project: Commons Imaging > Issue Type: Bug > Components: Format: TIFF >Affects Versions: 1.0 >Reporter: Gary Lucas >Priority: Major > > I am using the latest code from github (1.0-SNAPSHOT downloaded from github > of June 2023) to read a 300 megabyte TIFF file. Version 1.0-alpha3 required > 673 milliseconds to read that file. The new code requires upward of 15 > minutes. Clearly something got broken since the last release. > The TIFF file is a 1x1 pixel 4 byte image format organized in strips. > The bottleneck appears to occur in the TiffReader getTiffRawImageData method > which reads raw data from the file in preparation of creating a BufferedImage > object. > I suspect that there may be a general slowness of file access. In debugging, > even reading the initial metadata (22 TIFF tags) took a couple of seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [commons-lang] orionlibs opened a new pull request, #1078: updated the JavaDoc for the insert methods in ArrayUtils
orionlibs opened a new pull request, #1078: URL: https://github.com/apache/commons-lang/pull/1078 updated the JavaDoc for the insert methods in ArrayUtils to say the methods return also null if the input array is null -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-lang] orionlibs opened a new pull request, #1077: refactored ArrayUtils to reuse the getLength(Object array) method
orionlibs opened a new pull request, #1077: URL: https://github.com/apache/commons-lang/pull/1077 refactored ArrayUtils to reuse the getLength(Object array) method inside the class instead of calling Array.getLength() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-lang] orionlibs opened a new pull request, #1076: refactored an addAll method in ArrayUtils
orionlibs opened a new pull request, #1076: URL: https://github.com/apache/commons-lang/pull/1076 refactored an addAll method in ArrayUtils to reuse code and defer the array1 component type check to the exception catch block -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-lang] orionlibs opened a new pull request, #1075: ArchUtils refactoring
orionlibs opened a new pull request, #1075: URL: https://github.com/apache/commons-lang/pull/1075 The logic to get the Processor for a given architecture string is spread across the init_* methods. A cleaner design would be to have a separate private method that maps the architecture string to a Processor, and calls that from the init_* methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (IMAGING-356) TIFF reading extremely slow in version 1.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IMAGING-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739288#comment-17739288 ] Gary Lucas commented on IMAGING-356: I haven't studied the changes that were made, so I can't offer any authoritative recommendations on the approach. Instead, I have a few general observations about the way TIFF files work that may be useful in figuring how you tackle the problem. Or perhaps not. So take them with a grain of salt. TIFF files are kind of a special case in terms of image formats. First off, one can never assume that a TIFF file is going to be accessed in-order. It is common for the the "directory" section of the file (which tells how its organized) to come last rather than first. And, of course, a TIFF file may have multiple directories (because it may contain multiple images). Second, TIFF files are typically quite large, often in the hundreds of megabytes range, and sometimes in the gigabyte range. So it is often preferred to not keep the entire thing in memory. In many cases, an application will not access the entire file, but only a subsection. For example, a mapping program displaying an aerial photograph might only access the subsection of the photograph that is actually visible on the map. And finally, I note that TIFF files are often not images at all, but are used to store numerical raster data (such as Earth elevation and ocean depth data). All of this means that the file-access pattern for a TIFF file is a closer fit to the idea of a random access file rather than the idea of a sequential IO channel such as a network socket or a serial device. I know that the PNG format (the only other one I've studied in depth) was designed with network access specifically in mind. The TIFF format evolved before network access was in the ascendency as it is today. That being said, even the original Commons Imaging approach to TIFF file IO wasn't quite a perfect fit. For one thing, the original authors open and close a file multiple times (as they access each part of the file) . That is suboptimal since opening and closing a file carries its own performance overhead. Also, when I was looking at refactoring Commons Imaging IO to implement Closeable to support of try-with-resources blocks, I didn't see a way to accomplish that without a significant rewrite and compatibility breaking changes to the public API. > TIFF reading extremely slow in version 1.0-SNAPSHOT > --- > > Key: IMAGING-356 > URL: https://issues.apache.org/jira/browse/IMAGING-356 > Project: Commons Imaging > Issue Type: Bug > Components: Format: TIFF >Affects Versions: 1.0 >Reporter: Gary Lucas >Priority: Major > > I am using the latest code from github (1.0-SNAPSHOT downloaded from github > of June 2023) to read a 300 megabyte TIFF file. Version 1.0-alpha3 required > 673 milliseconds to read that file. The new code requires upward of 15 > minutes. Clearly something got broken since the last release. > The TIFF file is a 1x1 pixel 4 byte image format organized in strips. > The bottleneck appears to occur in the TiffReader getTiffRawImageData method > which reads raw data from the file in preparation of creating a BufferedImage > object. > I suspect that there may be a general slowness of file access. In debugging, > even reading the initial metadata (22 TIFF tags) took a couple of seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)