[ 
https://issues.apache.org/jira/browse/STATISTICS-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961563#comment-16961563
 ] 

Eric Barnhill commented on STATISTICS-8:
----------------------------------------

!commons-stats-regression.png!

 

Updated, more abstracted proposal for commons-statistics-regression

> Implementation of regression libraries within common-statistics framework
> -------------------------------------------------------------------------
>
>                 Key: STATISTICS-8
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-8
>             Project: Apache Commons Statistics
>          Issue Type: Task
>            Reporter: Eric Barnhill
>            Priority: Major
>         Attachments: 
> PublicFinalDraft_Nguyen_Proposal_ApacheStatRegression.pdf, 
> commons-stats-regression.png
>
>
> Apache commons is one of the most widely used resources by Java programmers 
> around the world. Data related applications are soaring and Java is one of 
> the most commonly used languages for data engineering. Consequently the 
> commons-statistics library, currently under development, is likely to find a 
> widespread audience.
> For this project we aim to implement regression methods, arguably the most 
> widely used techniques in statistics and machine learning, within the Apache 
> commons framework, in particular within the new commons-statistics library.
> The assignee will:
>  * Use core functionality from the regression sub-libraries of the deprecated 
> commons-math 4 framework as a starting point
>  * Create a new, standalone commons component for regression statistics, 
> focusing first on linear and logistic regression
>  * Make architectural and design decisions in the commons philosophy, that 
> is, lightweight standalone components easy to understand and use by a wide 
> range of Java developers (i.e. not a large, omnibus mathematical library with 
> many degrees of abstraction)
>  * Draw inspiration from widely used libraries in scikit-learn and R to 
> design an up-to-date statistics package
>  * Design unit testing and documentation for these libraries
> Particularly challenging design decisions include how to incorporate core 
> matrix libraries with a minimum of dependencies and redundancies.
> We see this project as potentially having a large impact on big data 
> applications. Java and the JVM are fundamental to popular data engineering 
> tools like Hadoop and Spark. Regression analyses are however often handled 
> downstream, on the other side of the "data fence", by tools like Python and 
> R. A robust and scalable pure Java regression library, easily visible and 
> accessible through Apache commons, can enable better integration of both 
> sides of this data divide by enabling many machine learning steps to be 
> programmed at scale on the Java side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to