[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-29 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829243#comment-16829243
 ] 

Gilles commented on STATISTICS-7:
-

Hi [~Udit Arora],
{quote}this is kinda fun...
{quote}
Glad to hear. :)

Such small changes can indeed be a good opportunity to learn the process:
 # File an appropriate JIRA report: Almost every modification must be tracked 
(a notable exception is made for Javadoc improvement, e.g. correcting typos, or 
adding more unit tests, e.g. to improve code coverage).
 # The commit message should be prepended by the name of the JIRA ticket (e.g. 
"STATISTICS-123: ..."). See the output of the "git log" for examples of how 
detailed the message should be.
 Long time committers sometimes omit to open a JIRA ticket, but that should not 
be emulated. ;)
 # Describe the changes in the commit message: It's obvious that the commit 
contains a "change", but the reviewer should know, by reading the commit 
message, what was the purpose of the change.
 # It's always good to specify that you ran the unit test suite, and that the 
change is covered, and still produce the expected results. Side-note: We should 
ask INFRA to activate [Travis|https://travis-ci.org/apache/commons-statistics] 
for "Commons Statistics".

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-28 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827987#comment-16827987
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir Gilles and Sir Eric
I made a pull request. I don't know if I did it right of if I missed anything. 
Also its a minor change which I thought might be helpful. Please let me know if 
its fine. I will continue contributing since this is kinda fun... :)
Thanks

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-28 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827911#comment-16827911
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir, I saw some way to make a code better in commons-statistics repository. But 
I can't figure out how to make show the changes i wanna make. I have cloned the 
repository after that I am not sure what to do.? 

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-12 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816411#comment-16816411
 ] 

Udit Arora commented on STATISTICS-7:
-

Sure Sir. I intend to stay around. Currently I am closing my end semester exams 
and lab exams. So I might be a little less active. But I will try to be active 
here for any updates. Also I will look at the commons statistics repository, 
see what I can do.. 
Thanks

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-11 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815553#comment-16815553
 ] 

Eric Barnhill commented on STATISTICS-7:


Welcome aboard [~Udit Arora] . Fork the commons-statistics repository and start 
making contributions. You submit the contributions with a pull request. I will 
review the pull request and interact with you until it is satisfactory. Then, 
probably, Gilles will review my review :) . Even if we can't get you a GSoc 
slot this year, if you start contributing you will be in a great position for 
one next year. Because the main thing we want to know is, are these applicants 
going to stick around, contribute, become part of the project.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-11 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815446#comment-16815446
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir
Now since I have to wait.. what could I do towards this project? Anything that 
I should explore or learn..?


> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-08 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812516#comment-16812516
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir, I have written my final comment. Just see to it, so that I can submit my 
final proposal tomorrow, well in time. 
Thanks a lot

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-08 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812440#comment-16812440
 ] 

Gilles commented on STATISTICS-7:
-

{quote}Please look at my replies.
{quote}
I answered on "Google Docs".
 As noted over there, the right place to discuss changes, provide suggestions 
and ask for clarifications is the "dev" ML.

We look forward to reading from all of you, independently of what happens in 
the next step of GSoC 2019. :)

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-08 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812358#comment-16812358
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir Gilles. Please look at my replies. As always I am open to feedback. Thanks.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-08 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812326#comment-16812326
 ] 

Gilles commented on STATISTICS-7:
-

{quote}making a formal proposal on GSOC19 portal
{quote}
Do you know that the deadline for submitting applications is tomorrow?

That said, nothing prevents you from making concrete suggestions about the 
refactoring, but we won't have much time to exchange about it.

In any case, you are welcome to contribute, GSoC or not.
 If you are interested, please [subscribe to the "dev" mailing 
list|http://commons.apache.org/mail-lists.html].

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-07 Thread Mukul chand yadav (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812011#comment-16812011
 ] 

Mukul chand yadav commented on STATISTICS-7:


[~ericbarnhill] [~virendrasinghrp]

Myself Mukul pursuing master's in computer science majoring in ML, would love 
to contribute to overhaul of {{org.apache.commons.math4}} package using Java 
8's functional APIs where I can leverage my experience of developing stream 
based APIs.

Based on discussion here, please let me know if I need to consider any other 
details before making a formal proposal on GSOC19 portal.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-07 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811962#comment-16811962
 ] 

Virendra Singh commented on STATISTICS-7:
-

Yes, [~ericbarnhill] I received your feedback & I am studying the topics you 
mentioned in the mail. I've already forked the repositories from GitHub and 
once I finish studying and designing the flow, I'll start contributing.

As of now, I've submitted the draft as my final proposal. Is that okay? If you 
recommend any change, I'll do that.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> As functional programming grows increasingly central to big data applications 
> we believe these libraries will play an important function in the data 
> engineering ecosystem. In particular, data engineering is widely done with 
> Java, then passed to other languages for data-scientific analyses; however, 
> the common availability of functionally implemented statistical mapping and 
> reductions in Java could prove very useful at the interface of data science 
> and engineering, by enabling teams to more easily perform reductions on the 
> engineering side before handing off to the analysis side.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-07 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811694#comment-16811694
 ] 

Eric Barnhill commented on STATISTICS-7:


Hi [~virendrasinghrp] I want to make sure you received my feedback by email 
because you didn't respond, and the deadline is almost here.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-05 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811219#comment-16811219
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir Gilles
I have made some changes, and am open to all feedback you give. Please let me 
know what to improve. 
Thanks.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-04 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810293#comment-16810293
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir
I have submitted my draft. I have shared the link. Please let me know the 
feedback. Thanks a lot.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-02 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807991#comment-16807991
 ] 

Eric Barnhill commented on STATISTICS-7:


Yes you are in the right place. And I realize the deadline is coming up and 
will look at the draft proposals soon.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-02 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807976#comment-16807976
 ] 

Virendra Singh commented on STATISTICS-7:
-

Hey [~ericbarnhill] & [~erans], Even I submitted the draft proposal for 
_stat.descriptive.*  ._ Is there any other ticket or Am I good here? Because, I 
can see the discussion is moved mainly towards _regression_. 

But, also , summary statistics is the basic & core of Statistics so I think I'm 
at right place.

Also, I'm waiting for feedback on draft proposal, [~ericbarnhill] . Tell me if 
any changes to make.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-02 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807899#comment-16807899
 ] 

Eric Barnhill commented on STATISTICS-7:


[~Udit Arora]

 

https://issues.apache.org/jira/browse/NUMBERS-98

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-02 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807832#comment-16807832
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir
I am interested in both. But since I have been following this discussion, I 
plan to stick to this. But now that I know I might send a proposal there as 
well. But currently this project is my priority.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807184#comment-16807184
 ] 

Eric Barnhill commented on STATISTICS-7:


Just send an email to that address

[dev-subscr...@commons.apache.org


|mailto:dev-subscr...@commons.apache.org]

Jira seems to make it a link no matter what you do. 

[~Udit Arora], if your interest was to port and refactor commons-math-linear 
that would be awesome and I am sure your proposal would be welcomed.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807172#comment-16807172
 ] 

Virendra Singh commented on STATISTICS-7:
-

Error shows while subscribing to dev-mail list:

_The requested URL /proper/commons-statistics/dev-subscr...@commons.apache.org 
was not found on this server._

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807126#comment-16807126
 ] 

Eric Barnhill commented on STATISTICS-7:


Udit,

Discussion has moved to the apache commons developers mailing list.

Eric

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807103#comment-16807103
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir Gilles
I may not be the best but I have very deep knowledge of linear algebra or 
matrix algebra as you said. I have studied this topic in detail, and am 
familiar with all of the mathematics part. It actually is one of my favorite 
fields of mathematics. It has been one of our courses in 1st semester. From 
determinant to eigenvectors, SVD I have knowledge of mathematics part 
completely. There surely must be more than that, but I am familiar with this 
topic to a good level. If I am of any in this regard let me know.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806966#comment-16806966
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir
Since this is my first time applying, I am a bit scared and worried, any advice 
sir?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806936#comment-16806936
 ] 

Eric Barnhill commented on STATISTICS-7:


I have replied to this thread on the dev mailing list; interested mentees 
should subscribe and continue this excellent discussion there.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806878#comment-16806878
 ] 

Gilles commented on STATISTICS-7:
-

bq. current "math-linear" will be ported to "Commons Linear" in the future?

Perhaps; we'd need expert advice on how to design a modern implementation of 
matrix algebra (?).

In the meantime, it may be worth exploring the implications of having a very 
focused {{commons-numbers-matrix}} module in "Commons Numbers".

bq. port necessary functionality into private packages

Yes. But IMO it should be very limited (i.e. code that is not called should be 
stripped).

bq. just use the current library temporarily for now

I'd rather not, as it will perpetuate the impression that "Commons Math" is 
still supported.  A new major version of CM should be released (with "legacy" 
codes) that will depend on "Commons Statistics".

bq. "math-exceptions"

No.  I now consider that specific exceptions generated by low-level components 
should not be public.
See how it's done in "Commons Numbers".

bq. "math-util"

Anything in there that is still useful is a candidate for "Commons Numbers".  
Did you have a look at what's there already?


By the way, this discussion should be moved to the "dev" ML.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Ben Nguyen (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806845#comment-16806845
 ] 

Ben Nguyen commented on STATISTICS-7:
-

Ah, that clarified it thanks!

So I am guessing the current "math-linear" will be ported to "Commons Linear" 
in the future? But for now we may port necessary functionality into 
private-packages as mentioned (or perhaps just use the current library 
temporarily for now), then later convert to the new library?

Does the same go for "math-exceptions" and "math-util"?

Thanks

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-04-01 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806695#comment-16806695
 ] 

Gilles commented on STATISTICS-7:
-

Both "Commons Numbers" and "Commons Statistics" are new components (not 
released yet).
There must not be any overlapping code; "Commons Statistics" can (and, very 
likely, will) depend on "Commons Numbers". Modules of "Commons Numbers" collect 
lowest-level tools (i.e. no dependency allowed).

bq. assumption correct?

Not sure what you mean.


> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-31 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806296#comment-16806296
 ] 

Gilles commented on STATISTICS-7:
-

[~Salman],

In your proposal, you mention
* T-distribution
* MathArrays
* Precision

Some have been moved to ["Commons 
Numbers"|http://commons.apache.org/proper/commons-numbers/] or ["Commons 
Statistics"|http://commons.apache.org/proper/commons-statistics/modules.html].

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-31 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806252#comment-16806252
 ] 

Eric Barnhill commented on STATISTICS-7:


I can confirm I see two draft proposals in. Will give them a look this week.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-31 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806156#comment-16806156
 ] 

Gilles commented on STATISTICS-7:
-

I don't see any link.
{quote}Where else can I share it?
{quote}
Through "Google Docs", but the "portal" should be fine too, provided we know 
where to look. ;)

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-31 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806148#comment-16806148
 ] 

Virendra Singh commented on STATISTICS-7:
-

On GSoC portal.

Following is written there:
h2. Draft Shared

"The Apache Software Foundation may respond with comments to help you improve 
your draft proposal, if there is enough time before the deadline. You may also 
contact the organization to request feedback."

 

Where else can I share it?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-30 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806039#comment-16806039
 ] 

Gilles commented on STATISTICS-7:
-

bq.  I am thinking about some scenarios of approaches

Best place to start discussing design issues is the "dev" ML.
As was already mentioned, when there are dependencies, the focus should first 
be on how to best port those. By this, I mean that issues already identified 
(cf. "Commons Math" bug-tracking system) should be fixed as part of the porting 
work.  If the way to go is not obvious, it might be wiser to start with 
utilities that are easier to port.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-30 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806037#comment-16806037
 ] 

Gilles commented on STATISTICS-7:
-

bq. I've shared my draft for the project.

Where is it?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-30 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806032#comment-16806032
 ] 

Virendra Singh commented on STATISTICS-7:
-

Hey [~ericbarnhill],[~erans], I've shared my draft for the project. I've 
prepared the draft based on my limited understanding. Please give feedback on 
it, and mention what needs to be added or changed.

Thank You :)

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-30 Thread Ben Nguyen (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805888#comment-16805888
 ] 

Ben Nguyen commented on STATISTICS-7:
-

Hello,

I am thinking about some scenarios of approaches and the main one is an 
approach that would have near zero dependencies (extremely lightweight) with 
all necessary functionality from dependencies 're-implemented' (such as linear) 
to specific use as Mr. [~erans] mentioned earlier. This would simplify usage 
but increase overall workload with repeating code (a future problem to come if 
everyone does this) to my understanding, this is the debate, but is there 
some kind of consensus regarding an approach style (extent of 'lightweight') 
for the entire new commons porting which we should be consistent with? Or are 
we truly free to have something up and running as effective as possible (using 
Java 8 features, etc) asap like Mr. [~ericbarnhill] mentioned? I'm new to 
apache (and open-source) and would like to learn more about how things should 
operate :D

Thank you

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-30 Thread Ben Nguyen (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805886#comment-16805886
 ] 

Ben Nguyen commented on STATISTICS-7:
-

Hello,

I am thinking about some scenarios of approaches and the main one is an 
approach that would have near zero dependencies (extremely lightweight) with 
all necessary functionality from dependencies 're-implemented' to specific use 
as 

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-30 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805859#comment-16805859
 ] 

Gilles commented on STATISTICS-7:
-

Yes.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-30 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805751#comment-16805751
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir 
http://community.apache.org/gsoc.html#students-read-this 
Sir should I follow the application template in this link for drafting my 
proposal?
Thanks


> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-29 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805337#comment-16805337
 ] 

Eric Barnhill commented on STATISTICS-7:


Hi Udit, if you are new to machine learning this is not the place to start. But 
we could use help porting the stat.descriptive and stat.inference libraries, 
these are all quite relevant for machine learning IMO.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-29 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805264#comment-16805264
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir Eric Barnhilll and Sir Gilles
Do we intend to include functions that help aid calculations in machine 
learning. We can add somethings such that given a data set we could give the 
number of parameters that would give a classifier decent enough so that loss is 
not minimized but still is a significantly low number? Since I am new to 
machine learning I am not aware if there already exists a function as that.
Thanks

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-29 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805157#comment-16805157
 ] 

Eric Barnhill commented on STATISTICS-7:


[~BenN] [~Salman]

Interest has flowed organically in the direction of regression, and that's 
great. [~erans] is right that this unavoidably brings up the linear library, 
what to do with it, and how. We are a do-ocracy here at commons-numbers and the 
proposed solution does not need to be ideal, the priority is to get the 
component up and running.

I have only one suggestion for what *not* to do and this is what was done 
before. This is to implement basic linear operations under many layers of 
object-oriented abstraction with the goal of assembling some sort of omnibus OO 
math library. The focus in commons is lightweight reusable components that are 
widely used and easy to use in real life Java programming. The user should 
*not* have to digest a large mathematically focused API to get Pearson's r from 
two vectors or solve Ax=b. The production developers in my shop should feel as 
comfortable grabbing commons-statistics-regression for a task as they do 
commons-csv . 

If you want to just accept array or List input and adapt the current linear 
functionality to process those inputs, I think that is a good solution. If you 
would rather create some sort of re-usable Matrix component and stick that into 
commons stats, to make your code more readable, that's fine too. We could also 
start up a small commons-numbers-linear project if someone was excited to do 
that, but that is definitely not necessary.

Hopefully we can find a way all interested mentees can take roles that 
complement each other in ways that interest them. Once we have a sense for 
that, and the proposals are approved I will write up and assign the necessary 
tickets. And of course we hope that after the summer you will continue working 
with us.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-29 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805011#comment-16805011
 ] 

Gilles commented on STATISTICS-7:
-

The question of matrix utilities is indeed fundamental and makes porting part 
of the stat tools not an abvious task.
The "linear" package itself is in need of refactoring (see list of open issues 
on the [Commons Math JIRA 
page|https://issues.apache.org/jira/projects/MATH/issues/MATH].

The new "STATISTICS" component is logically intended to be a dependency for the 
next major release of (legacy) "Commons Math" (i.e. v.4.0) who will contain 
whatever codes have not been moved to more focused components (i.e. "Commons 
Numbers", "Commons RNG", "Commons Geometry" and "Commons Statistics").  Hence 
making the last/old official release of CM a dependency seems *not* the right 
way to go, indeed. ;-)
This can be solved "temporarily", even though you are right that this is bad in 
general, by copying (into _private_ or _package-private_ classes) the necessary 
functionality.  Better still would be to consider a refactoring (bringing in 
only required functionality) of the linear algebra utilities specifically 
geared to its usage in the STATISTICS component.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-29 Thread Ben Nguyen (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804625#comment-16804625
 ] 

Ben Nguyen commented on STATISTICS-7:
-

Hello,

I am a second year computer science and economics student who has taken some of 
higher level econometrics courses, I'm very interested in working with the 
stat.regression library (I can see myself using it a lot in the near future). 
Looking through it briefly, I do see the problem; the implementation could be 
more intuitive to use. I am in the process of drafting a proposal. which lead 
me to wonder how to approach the dependency issue; to what extent should the 
new library be 'standalone' ; for example: should it still depend on 
math.linear (especially needed in OLS and GLS) since linear won't be getting an 
upgrade anytime soon? (I have not looked into math.linear yet) or should 
matrices be implemented internally with the new stats library? (repeating code 
in general is bad, though there is the benefit of 'zero' dependencies -> better 
maintainability?). I guess the same question lies with other dependencies too 
(math.exceptions, util) 'to what extent should this new library be 
standalone'?

 

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-28 Thread Salman Hussain (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803935#comment-16803935
 ] 

Salman Hussain commented on STATISTICS-7:
-

Sorry about that [~erans], I have changed it now. Having looked more closely at 
the linear regression used in scikit-learn, there seem to be a lot of 
dependencies, including from scipy. I think focusing on porting the existing 
regression library would be more realistic, but whilst taking inspiration and 
design elements from the scikit-learn. I aim to submit a draft proposal by EOD 
through the GSoC portal. 

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-27 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803493#comment-16803493
 ] 

Gilles commented on STATISTICS-7:
-

Hi [~Salman].

Could you please remove the restriction as to who can see your comments?
When one is not logged in, part of the conversation is missing and it looks 
quite odd...

Thanks.


> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-27 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803477#comment-16803477
 ] 

Eric Barnhill commented on STATISTICS-7:


Sounds very ambitious – just make sure you start with linear and logistic 
regression before you head to anything fancier. :) Obviously work can continue 
after summer's end if there is interest.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-27 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803070#comment-16803070
 ] 

Eric Barnhill commented on STATISTICS-7:


[~Salman] First of all I'll be delighted to read over the proposal, this is 
happening right now in some other projects.

Seems to me time frame depends entirely on the skill set of the applicant. But 
I would set aside some time for each of the following in the proposal:
 * Project design – choose which stat functions you are interested in developing
 * Dependency analysis – what other classes do the stat functions currently 
depend on in commons-math and what is the purpose of those functions? What 
changes need to be made in order to create a standalone statistical component? 
For example, is the method depending on abstract classes, exception classes, 
formatting classes? Are these dependencies necessary or can they be 
restructured to be more stand alone? Whatever decision creates the best 
architecture is best of course
 * New component architecture - what will be the class hiearchy of the 
redesigned component?
 * Class design - do I want the user to create an object instance, or call a 
static method, to use this functionality? Do I want some enums to handle 
parameters of various kinds? What kinds of inputs should it take? How do I 
handle different data types? What are the outputs?
 * Unit test design - creating a property-driven unit testing scheme
 * Algorithm Validation – is there code in other languages that I can use to 
validate test inputs?
 * Documentation - what doc will go with each class and method? What addition 
doc should be in the user guides? Ideally the project could conclude with a 
couple of tutorials implementing working examples

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802218#comment-16802218
 ] 

Eric Barnhill commented on STATISTICS-7:


Hi [~Salman], sounds like you would be in a great position to contribute the 
streaming code that is needed.

Just to be clear everyone, there is plenty of work here and [~erans] and I can 
(I assume) split this ticket so that we can assign a project of interest to 
whomever is interested. Right off the top of my head, I can think of a few 
different work packages here:

There are the summary statistics in stat.descriptive.*, which are already quite 
a lot, however the work is not restricted to this in any way. Some of these 
stats are pretty obscure for a "common" library, for example who uses 
FourthMoment?; meanwhile many metrics found in for example 
[https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics] 
are very commonly used in 2019 and their implementation with a clear user 
interface in Java is long overdue. If you write a proposal for such a project, 
you will be very free to take the initiative and contribute what you want here.

There is also a whole regression library, and porting that might be of 
particular interest to anyone interested in machine learning. It is also a 
suboptimally designed library IMO, I think the "SimpleRegression" class is 
evidence of bad design. Regression is such a huge percentage of actual ML in 
the wild, knowing it very technically, and being able to code it up, is I think 
a real asset that would put you ahead of 99% of aspiring data scientists.

Finally there is a whole library of statistical tests, as well as correlation 
and covariance lbraries, this would be another work package, for someone 
perhaps more interested in statistics getting into applied math, and coding 
those algorithms.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801900#comment-16801900
 ] 

Udit Arora commented on STATISTICS-7:
-

Sure sir. I will continue to dig deeper, since you can never know enough of any 
particular thing.. :)

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801890#comment-16801890
 ] 

Eric Barnhill commented on STATISTICS-7:


[~Udit Arora] Sounds like you are in great shape to me. For anything you don't 
know, we are here to mentor you of course! And we will welcome your efforts.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801879#comment-16801879
 ] 

Udit Arora commented on STATISTICS-7:
-

Sir Eric Barnhill and Sir Gilles
I am not a data scientist sir, but I am pursuing Computer Science Engineering 
and I am currently in my second semester. Even though I have decent good 
knowledge in Statistics as it has been taught to us and I am also looking up 
somethings on the Internet and have decent knowledge of Java and Python as 
well, what things other than this should I know so as to be capable to work on 
this project?


> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801871#comment-16801871
 ] 

Eric Barnhill commented on STATISTICS-7:


Hi [~virendrasinghrp]

I work as a data scientist for a Silicon Valley company so I know data science 
is a big thing and I know what languages are used.

Java makes up a large part of commercial infrastructure and Apache commons in 
particular is used everywhere. Even in a small project I am working on right 
now the devs use commons-cli, commons-csv, commons-lang . The applications of 
commons-statistics will just be innumerable.

If you want something data science specific you probably know that nearly all 
data engineering infrastructure is in Java. The ability to run some statistical 
mappings on the side of that infrastructure would I think be very valuable. If 
the mappings were implemented functionally, it would scale effortlessly. A lot 
of job offers for data scientist are really mostly engineering, so knowing that 
side of the business is really good.

But I am not going to spend a lot of time defending the project I think it 
obviously has a very wide audience.

I also disagree that it is easy. There is a lot that goes into development, 
testing, documentation and release of widely used software. Also like Gilles 
said there are a lot of architectural decisions to be made about how the stats 
libraries are going to hang together. It is decisions at this level, that 
separate good engineers from bad. We do not want a bunch of independent 
scripts, I could write those very quickly.

If we make good progress there are lots of ways to extend the project into ML 
tools like logistic regression (which is probably what people use 90%+ of the 
time, from what I hear).

 

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801653#comment-16801653
 ] 

Gilles commented on STATISTICS-7:
-

Some of the "Commons Math" open issues stem from design bugs (e.g. MATH-1281).

The new component allows to start from a clean slate, without the 
backward-compatibility constraint, hopefully with design lessons learnt from 
other libraries, and from mistakes made in "Commons Math" whose fix have been 
delayed indefinitely.

Indeed, the design of the {{stat}} package dates from the inception of the 
component, back in 2003: In the [initial 
proposal|http://commons.apache.org/proper/commons-math/proposal.html], half of 
the source description pertains to statistics and from those, the random 
utilities have gone into their [own 
component|http://commons.apache.org/proper/commons-rng/] which I consider as a 
step in the right direction, i.e. away from a huge monolithic library that 
proved to be an unsustainable project, mostly because requested stability of 
some packages prevented a sane evolution of others (solely because they were 
part of the same component!).

That said, an advantage to having "Commons Math" is that a lot of the "core" 
codes and unit tests can be leveraged for making the port work relatively fast 
and robust, once a new design has been put forward.  And if there are competing 
proposals, they can be developed in parallel, for some time, until one seems to 
gather more interest.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801610#comment-16801610
 ] 

Virendra Singh commented on STATISTICS-7:
-

Hi [~erans], based on what you said, I understood that the main purpose of this 
package is to develop statistical functions using Java 8 features like 
Streams,Lambda and Functions Interfaces in Functional Programming style.
You mentioned, *"...order to fix design issues..."*,Are we talking about using 
Functional programming to fix design issues as it has features like pure 
functions,referential transparency etc. If not, please correct me. 

We are discussing in comments section only.Is this the only way or there is any 
other platform to discuss project?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801603#comment-16801603
 ] 

Gilles commented on STATISTICS-7:
-

Hi.

Contributors are most wanted.

The idea behind the new "Commons Statistics" component is to offer the same 
statistical tools as exist under the {{org.apache.commons.math4.stat}} package 
of ["Commons 
Math"|http://commons.apache.org/proper/commons-math/apidocs/index.html] (see 
also STATISTICS-5) but with the opportunity for a complete overhaul in order to 
fix design issues that have existed for years (see e.g. [this list of JIRA 
reports|https://issues.apache.org/jira/issues/?jql=project%20%3D%20MATH%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22stat%22]),
 as well as for a modern API (Java 8+).

There isn't yet any specific plan or priorities; anything can be proposed to 
start a discussion, and people with an interest in using these functionalities 
(i.e. having real-life problems to solve) are especially welcome to drive the 
discussions.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801500#comment-16801500
 ] 

Virendra Singh commented on STATISTICS-7:
-

I am asking this because I am inclined towards Data Science, and the statistics 
package would be great to work on in Java. Also, I'am starting with GSoC(as it 
would definitely boost up my confidence) but thats not the limit, I want to 
work on it even after that,that's why I am asking.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-26 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801496#comment-16801496
 ] 

Virendra Singh commented on STATISTICS-7:
-

[~ericbarnhill], what are the future plans for this statistics package? because 
summary statistics is a very basic thing.
I assume that you are planning to make a package for applied advanced 
statistics which can be further used in Data Science and Machine Learning . As, 
currently Data Science is the new big thing and most of its work is done in 
Python & R.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801361#comment-16801361
 ] 

Udit Arora commented on STATISTICS-7:
-

Khaled Emara Sir, I am very much interested in this project. As you said 
languages like R and Python may be needed, I was trying to implement some 
statistical data using python which involved multivariate regression, so as to 
be more comfortable with the language.
Sorry for not being active. But sir I want to be fully prepared for this 
project so I started doing somethings on python. Also some assignments from 
college also took my time. But sir, I am familiar with all the terms Virendra 
Singh mentioned in his last comment. Sir now that you drew my attention, I will 
try to be more active.
Thanks

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801298#comment-16801298
 ] 

Eric Barnhill commented on STATISTICS-7:


Yes. Perfect.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801253#comment-16801253
 ] 

Virendra Singh commented on STATISTICS-7:
-

[~ericbarnhill] as you mentioned,"This ticket addresses more the commons-stats 
functions, in particular summary statistics that are widely used".

I know that summary statistics include basic and widely used 
mean,median,mode,standard deviation,variance,skewness,kurtosis.

For this project we need to work on them.

Am I right?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801241#comment-16801241
 ] 

Virendra Singh commented on STATISTICS-7:
-

We need to port classes from "commons.math4.stat" which are *not* coded using 
Functional programming concept(I just saw the code).
We need to port those classes and code using Functional programming concept 
which is introduced in Java 8.(As, functional programming is used for 
mathematical computations and when concurrency and parallelism are required)
Am I correct?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801239#comment-16801239
 ] 

Eric Barnhill commented on STATISTICS-7:


Oh you are right, those have been ported, but not much developed. I see Gilles 
has one Gsoc ticket regarding the distributions and he can speak to what work 
needs to be done there. 

This ticket addresses more the commons-stats functions, in particular summary 
statistics that are widely used.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Virendra Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801233#comment-16801233
 ] 

Virendra Singh commented on STATISTICS-7:
-

No [~ericbarnhill] not from commons.math4.distribution.

The distribution classes are on 
[org.apache.commons.statistics.distribution|https://commons.apache.org/proper/commons-statistics/xref/org/apache/commons/statistics/distribution/package-frame.html]

Here is the link: 
https://commons.apache.org/proper/commons-statistics/xref/index.html

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801221#comment-16801221
 ] 

Eric Barnhill commented on STATISTICS-7:


Hello [~virendrasinghrp], my idea was to port the classes from 
commons.math4.stat but we would welcome someone to also port the 
commons.math4.distribution classes as well. Is that what you meant, when you 
said you saw them already programmed?

Some of your English was not clear (by the way I am happy to help you with 
English too.) and I am not sure I understand your question. We are looking for 
up to date implementations of these Java statistical libraries but also the 
deceloper will need to make some architectural decisions, so really I think it 
is a rich project. I had added some more information to the ticket so that 
everyone can see it. Does this information address your question?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread VIRENDRA SINGH (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801176#comment-16801176
 ] 

VIRENDRA SINGH commented on STATISTICS-7:
-

Hi [~erans], I am Virendra Singh and I am interested in this project.To be 
clear I want to work on this project under GSoC 2019.I am currently studying 
Business analytics and Data science.
 From the description I understood that we are using Functional Interfaces from 
Java 8 for developing this statistics library.I also saw some statistical 
functions already programmed like chi-square distribution,Gamma 
Distribution,Normal Distribution etc.
 My question is ,for this project do we have to code more statistical function 
or anything else?Also, if I've understood anything wrong please correct me.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Eric Barnhill (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801010#comment-16801010
 ] 

Eric Barnhill commented on STATISTICS-7:


You need to submit a proposal, that proposal should make it clear that you have 
the requisite abilities.

It sounds like you have plenty of Java background, what we are looking for is 
someone who has familiarity with the functional programming side of Java to 
produce an up-to-date Java statistics library. Or who wants to learn this side 
of Java through this project. 

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-25 Thread Khaled Emara (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800948#comment-16800948
 ] 

Khaled Emara commented on STATISTICS-7:
---

Hi [~Udit Arora], are you still interested in this project? I haven't seen an 
activity for a while.
Are there any competency tests, or some issues easy for newcomers to be solved 
as a proof of ability, [~erans]?
I kind of am new to Open Source development theme, but I have worked on a 
reasonably large Android projects using Kotlin and Java previously. I have also 
worked on a window manager using Xlib before and did some Game Development. Do 
you think I would be a good fir for this project?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-16 Thread Gilles (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794279#comment-16794279
 ] 

Gilles commented on STATISTICS-7:
-

For the library, no; but contributors sometimes used other languages (e.g. "R", 
"Python") to generate "reference" values to compare against in unit tests.

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing

2019-03-16 Thread Udit Arora (JIRA)


[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794275#comment-16794275
 ] 

Udit Arora commented on STATISTICS-7:
-

Thanks for the reply. Would any language other than Java be needed so that I 
can prepare myself for it?

> Stream-based Java statistical processing
> 
>
> Key: STATISTICS-7
> URL: https://issues.apache.org/jira/browse/STATISTICS-7
> Project: Apache Commons Statistics
>  Issue Type: New Feature
>Reporter: Eric Barnhill
>Priority: Major
>  Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)