[ https://issues.apache.org/jira/browse/STATISTICS-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961563#comment-16961563 ]
Eric Barnhill commented on STATISTICS-8: ---------------------------------------- !commons-stats-regression.png! Updated, more abstracted proposal for commons-statistics-regression > Implementation of regression libraries within common-statistics framework > ------------------------------------------------------------------------- > > Key: STATISTICS-8 > URL: https://issues.apache.org/jira/browse/STATISTICS-8 > Project: Apache Commons Statistics > Issue Type: Task > Reporter: Eric Barnhill > Priority: Major > Attachments: > PublicFinalDraft_Nguyen_Proposal_ApacheStatRegression.pdf, > commons-stats-regression.png > > > Apache commons is one of the most widely used resources by Java programmers > around the world. Data related applications are soaring and Java is one of > the most commonly used languages for data engineering. Consequently the > commons-statistics library, currently under development, is likely to find a > widespread audience. > For this project we aim to implement regression methods, arguably the most > widely used techniques in statistics and machine learning, within the Apache > commons framework, in particular within the new commons-statistics library. > The assignee will: > * Use core functionality from the regression sub-libraries of the deprecated > commons-math 4 framework as a starting point > * Create a new, standalone commons component for regression statistics, > focusing first on linear and logistic regression > * Make architectural and design decisions in the commons philosophy, that > is, lightweight standalone components easy to understand and use by a wide > range of Java developers (i.e. not a large, omnibus mathematical library with > many degrees of abstraction) > * Draw inspiration from widely used libraries in scikit-learn and R to > design an up-to-date statistics package > * Design unit testing and documentation for these libraries > Particularly challenging design decisions include how to incorporate core > matrix libraries with a minimum of dependencies and redundancies. > We see this project as potentially having a large impact on big data > applications. Java and the JVM are fundamental to popular data engineering > tools like Hadoop and Spark. Regression analyses are however often handled > downstream, on the other side of the "data fence", by tools like Python and > R. A robust and scalable pure Java regression library, easily visible and > accessible through Apache commons, can enable better integration of both > sides of this data divide by enabling many machine learning steps to be > programmed at scale on the Java side. -- This message was sent by Atlassian Jira (v8.3.4#803005)