[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829243#comment-16829243 ] Gilles commented on STATISTICS-7: - Hi [~Udit Arora], {quote}this is kinda fun... {quote} Glad to hear. :) Such small changes can indeed be a good opportunity to learn the process: # File an appropriate JIRA report: Almost every modification must be tracked (a notable exception is made for Javadoc improvement, e.g. correcting typos, or adding more unit tests, e.g. to improve code coverage). # The commit message should be prepended by the name of the JIRA ticket (e.g. "STATISTICS-123: ..."). See the output of the "git log" for examples of how detailed the message should be. Long time committers sometimes omit to open a JIRA ticket, but that should not be emulated. ;) # Describe the changes in the commit message: It's obvious that the commit contains a "change", but the reviewer should know, by reading the commit message, what was the purpose of the change. # It's always good to specify that you ran the unit test suite, and that the change is covered, and still produce the expected results. Side-note: We should ask INFRA to activate [Travis|https://travis-ci.org/apache/commons-statistics] for "Commons Statistics". > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827987#comment-16827987 ] Udit Arora commented on STATISTICS-7: - Sir Gilles and Sir Eric I made a pull request. I don't know if I did it right of if I missed anything. Also its a minor change which I thought might be helpful. Please let me know if its fine. I will continue contributing since this is kinda fun... :) Thanks > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827911#comment-16827911 ] Udit Arora commented on STATISTICS-7: - Sir, I saw some way to make a code better in commons-statistics repository. But I can't figure out how to make show the changes i wanna make. I have cloned the repository after that I am not sure what to do.? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816411#comment-16816411 ] Udit Arora commented on STATISTICS-7: - Sure Sir. I intend to stay around. Currently I am closing my end semester exams and lab exams. So I might be a little less active. But I will try to be active here for any updates. Also I will look at the commons statistics repository, see what I can do.. Thanks > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815553#comment-16815553 ] Eric Barnhill commented on STATISTICS-7: Welcome aboard [~Udit Arora] . Fork the commons-statistics repository and start making contributions. You submit the contributions with a pull request. I will review the pull request and interact with you until it is satisfactory. Then, probably, Gilles will review my review :) . Even if we can't get you a GSoc slot this year, if you start contributing you will be in a great position for one next year. Because the main thing we want to know is, are these applicants going to stick around, contribute, become part of the project. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815446#comment-16815446 ] Udit Arora commented on STATISTICS-7: - Sir Now since I have to wait.. what could I do towards this project? Anything that I should explore or learn..? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812516#comment-16812516 ] Udit Arora commented on STATISTICS-7: - Sir, I have written my final comment. Just see to it, so that I can submit my final proposal tomorrow, well in time. Thanks a lot > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812440#comment-16812440 ] Gilles commented on STATISTICS-7: - {quote}Please look at my replies. {quote} I answered on "Google Docs". As noted over there, the right place to discuss changes, provide suggestions and ask for clarifications is the "dev" ML. We look forward to reading from all of you, independently of what happens in the next step of GSoC 2019. :) > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812358#comment-16812358 ] Udit Arora commented on STATISTICS-7: - Sir Gilles. Please look at my replies. As always I am open to feedback. Thanks. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812326#comment-16812326 ] Gilles commented on STATISTICS-7: - {quote}making a formal proposal on GSOC19 portal {quote} Do you know that the deadline for submitting applications is tomorrow? That said, nothing prevents you from making concrete suggestions about the refactoring, but we won't have much time to exchange about it. In any case, you are welcome to contribute, GSoC or not. If you are interested, please [subscribe to the "dev" mailing list|http://commons.apache.org/mail-lists.html]. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812011#comment-16812011 ] Mukul chand yadav commented on STATISTICS-7: [~ericbarnhill] [~virendrasinghrp] Myself Mukul pursuing master's in computer science majoring in ML, would love to contribute to overhaul of {{org.apache.commons.math4}} package using Java 8's functional APIs where I can leverage my experience of developing stream based APIs. Based on discussion here, please let me know if I need to consider any other details before making a formal proposal on GSOC19 portal. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811962#comment-16811962 ] Virendra Singh commented on STATISTICS-7: - Yes, [~ericbarnhill] I received your feedback & I am studying the topics you mentioned in the mail. I've already forked the repositories from GitHub and once I finish studying and designing the flow, I'll start contributing. As of now, I've submitted the draft as my final proposal. Is that okay? If you recommend any change, I'll do that. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > As functional programming grows increasingly central to big data applications > we believe these libraries will play an important function in the data > engineering ecosystem. In particular, data engineering is widely done with > Java, then passed to other languages for data-scientific analyses; however, > the common availability of functionally implemented statistical mapping and > reductions in Java could prove very useful at the interface of data science > and engineering, by enabling teams to more easily perform reductions on the > engineering side before handing off to the analysis side. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811694#comment-16811694 ] Eric Barnhill commented on STATISTICS-7: Hi [~virendrasinghrp] I want to make sure you received my feedback by email because you didn't respond, and the deadline is almost here. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811219#comment-16811219 ] Udit Arora commented on STATISTICS-7: - Sir Gilles I have made some changes, and am open to all feedback you give. Please let me know what to improve. Thanks. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810293#comment-16810293 ] Udit Arora commented on STATISTICS-7: - Sir I have submitted my draft. I have shared the link. Please let me know the feedback. Thanks a lot. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807991#comment-16807991 ] Eric Barnhill commented on STATISTICS-7: Yes you are in the right place. And I realize the deadline is coming up and will look at the draft proposals soon. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807976#comment-16807976 ] Virendra Singh commented on STATISTICS-7: - Hey [~ericbarnhill] & [~erans], Even I submitted the draft proposal for _stat.descriptive.* ._ Is there any other ticket or Am I good here? Because, I can see the discussion is moved mainly towards _regression_. But, also , summary statistics is the basic & core of Statistics so I think I'm at right place. Also, I'm waiting for feedback on draft proposal, [~ericbarnhill] . Tell me if any changes to make. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807899#comment-16807899 ] Eric Barnhill commented on STATISTICS-7: [~Udit Arora] https://issues.apache.org/jira/browse/NUMBERS-98 > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807832#comment-16807832 ] Udit Arora commented on STATISTICS-7: - Sir I am interested in both. But since I have been following this discussion, I plan to stick to this. But now that I know I might send a proposal there as well. But currently this project is my priority. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807184#comment-16807184 ] Eric Barnhill commented on STATISTICS-7: Just send an email to that address [dev-subscr...@commons.apache.org |mailto:dev-subscr...@commons.apache.org] Jira seems to make it a link no matter what you do. [~Udit Arora], if your interest was to port and refactor commons-math-linear that would be awesome and I am sure your proposal would be welcomed. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807172#comment-16807172 ] Virendra Singh commented on STATISTICS-7: - Error shows while subscribing to dev-mail list: _The requested URL /proper/commons-statistics/dev-subscr...@commons.apache.org was not found on this server._ > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807126#comment-16807126 ] Eric Barnhill commented on STATISTICS-7: Udit, Discussion has moved to the apache commons developers mailing list. Eric > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807103#comment-16807103 ] Udit Arora commented on STATISTICS-7: - Sir Gilles I may not be the best but I have very deep knowledge of linear algebra or matrix algebra as you said. I have studied this topic in detail, and am familiar with all of the mathematics part. It actually is one of my favorite fields of mathematics. It has been one of our courses in 1st semester. From determinant to eigenvectors, SVD I have knowledge of mathematics part completely. There surely must be more than that, but I am familiar with this topic to a good level. If I am of any in this regard let me know. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806966#comment-16806966 ] Udit Arora commented on STATISTICS-7: - Sir Since this is my first time applying, I am a bit scared and worried, any advice sir? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806936#comment-16806936 ] Eric Barnhill commented on STATISTICS-7: I have replied to this thread on the dev mailing list; interested mentees should subscribe and continue this excellent discussion there. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806878#comment-16806878 ] Gilles commented on STATISTICS-7: - bq. current "math-linear" will be ported to "Commons Linear" in the future? Perhaps; we'd need expert advice on how to design a modern implementation of matrix algebra (?). In the meantime, it may be worth exploring the implications of having a very focused {{commons-numbers-matrix}} module in "Commons Numbers". bq. port necessary functionality into private packages Yes. But IMO it should be very limited (i.e. code that is not called should be stripped). bq. just use the current library temporarily for now I'd rather not, as it will perpetuate the impression that "Commons Math" is still supported. A new major version of CM should be released (with "legacy" codes) that will depend on "Commons Statistics". bq. "math-exceptions" No. I now consider that specific exceptions generated by low-level components should not be public. See how it's done in "Commons Numbers". bq. "math-util" Anything in there that is still useful is a candidate for "Commons Numbers". Did you have a look at what's there already? By the way, this discussion should be moved to the "dev" ML. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806845#comment-16806845 ] Ben Nguyen commented on STATISTICS-7: - Ah, that clarified it thanks! So I am guessing the current "math-linear" will be ported to "Commons Linear" in the future? But for now we may port necessary functionality into private-packages as mentioned (or perhaps just use the current library temporarily for now), then later convert to the new library? Does the same go for "math-exceptions" and "math-util"? Thanks > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806695#comment-16806695 ] Gilles commented on STATISTICS-7: - Both "Commons Numbers" and "Commons Statistics" are new components (not released yet). There must not be any overlapping code; "Commons Statistics" can (and, very likely, will) depend on "Commons Numbers". Modules of "Commons Numbers" collect lowest-level tools (i.e. no dependency allowed). bq. assumption correct? Not sure what you mean. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806296#comment-16806296 ] Gilles commented on STATISTICS-7: - [~Salman], In your proposal, you mention * T-distribution * MathArrays * Precision Some have been moved to ["Commons Numbers"|http://commons.apache.org/proper/commons-numbers/] or ["Commons Statistics"|http://commons.apache.org/proper/commons-statistics/modules.html]. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806252#comment-16806252 ] Eric Barnhill commented on STATISTICS-7: I can confirm I see two draft proposals in. Will give them a look this week. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806156#comment-16806156 ] Gilles commented on STATISTICS-7: - I don't see any link. {quote}Where else can I share it? {quote} Through "Google Docs", but the "portal" should be fine too, provided we know where to look. ;) > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806148#comment-16806148 ] Virendra Singh commented on STATISTICS-7: - On GSoC portal. Following is written there: h2. Draft Shared "The Apache Software Foundation may respond with comments to help you improve your draft proposal, if there is enough time before the deadline. You may also contact the organization to request feedback." Where else can I share it? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806039#comment-16806039 ] Gilles commented on STATISTICS-7: - bq. I am thinking about some scenarios of approaches Best place to start discussing design issues is the "dev" ML. As was already mentioned, when there are dependencies, the focus should first be on how to best port those. By this, I mean that issues already identified (cf. "Commons Math" bug-tracking system) should be fixed as part of the porting work. If the way to go is not obvious, it might be wiser to start with utilities that are easier to port. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806037#comment-16806037 ] Gilles commented on STATISTICS-7: - bq. I've shared my draft for the project. Where is it? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806032#comment-16806032 ] Virendra Singh commented on STATISTICS-7: - Hey [~ericbarnhill],[~erans], I've shared my draft for the project. I've prepared the draft based on my limited understanding. Please give feedback on it, and mention what needs to be added or changed. Thank You :) > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805888#comment-16805888 ] Ben Nguyen commented on STATISTICS-7: - Hello, I am thinking about some scenarios of approaches and the main one is an approach that would have near zero dependencies (extremely lightweight) with all necessary functionality from dependencies 're-implemented' (such as linear) to specific use as Mr. [~erans] mentioned earlier. This would simplify usage but increase overall workload with repeating code (a future problem to come if everyone does this) to my understanding, this is the debate, but is there some kind of consensus regarding an approach style (extent of 'lightweight') for the entire new commons porting which we should be consistent with? Or are we truly free to have something up and running as effective as possible (using Java 8 features, etc) asap like Mr. [~ericbarnhill] mentioned? I'm new to apache (and open-source) and would like to learn more about how things should operate :D Thank you > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805886#comment-16805886 ] Ben Nguyen commented on STATISTICS-7: - Hello, I am thinking about some scenarios of approaches and the main one is an approach that would have near zero dependencies (extremely lightweight) with all necessary functionality from dependencies 're-implemented' to specific use as > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805859#comment-16805859 ] Gilles commented on STATISTICS-7: - Yes. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805751#comment-16805751 ] Udit Arora commented on STATISTICS-7: - Sir http://community.apache.org/gsoc.html#students-read-this Sir should I follow the application template in this link for drafting my proposal? Thanks > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805337#comment-16805337 ] Eric Barnhill commented on STATISTICS-7: Hi Udit, if you are new to machine learning this is not the place to start. But we could use help porting the stat.descriptive and stat.inference libraries, these are all quite relevant for machine learning IMO. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805264#comment-16805264 ] Udit Arora commented on STATISTICS-7: - Sir Eric Barnhilll and Sir Gilles Do we intend to include functions that help aid calculations in machine learning. We can add somethings such that given a data set we could give the number of parameters that would give a classifier decent enough so that loss is not minimized but still is a significantly low number? Since I am new to machine learning I am not aware if there already exists a function as that. Thanks > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805157#comment-16805157 ] Eric Barnhill commented on STATISTICS-7: [~BenN] [~Salman] Interest has flowed organically in the direction of regression, and that's great. [~erans] is right that this unavoidably brings up the linear library, what to do with it, and how. We are a do-ocracy here at commons-numbers and the proposed solution does not need to be ideal, the priority is to get the component up and running. I have only one suggestion for what *not* to do and this is what was done before. This is to implement basic linear operations under many layers of object-oriented abstraction with the goal of assembling some sort of omnibus OO math library. The focus in commons is lightweight reusable components that are widely used and easy to use in real life Java programming. The user should *not* have to digest a large mathematically focused API to get Pearson's r from two vectors or solve Ax=b. The production developers in my shop should feel as comfortable grabbing commons-statistics-regression for a task as they do commons-csv . If you want to just accept array or List input and adapt the current linear functionality to process those inputs, I think that is a good solution. If you would rather create some sort of re-usable Matrix component and stick that into commons stats, to make your code more readable, that's fine too. We could also start up a small commons-numbers-linear project if someone was excited to do that, but that is definitely not necessary. Hopefully we can find a way all interested mentees can take roles that complement each other in ways that interest them. Once we have a sense for that, and the proposals are approved I will write up and assign the necessary tickets. And of course we hope that after the summer you will continue working with us. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805011#comment-16805011 ] Gilles commented on STATISTICS-7: - The question of matrix utilities is indeed fundamental and makes porting part of the stat tools not an abvious task. The "linear" package itself is in need of refactoring (see list of open issues on the [Commons Math JIRA page|https://issues.apache.org/jira/projects/MATH/issues/MATH]. The new "STATISTICS" component is logically intended to be a dependency for the next major release of (legacy) "Commons Math" (i.e. v.4.0) who will contain whatever codes have not been moved to more focused components (i.e. "Commons Numbers", "Commons RNG", "Commons Geometry" and "Commons Statistics"). Hence making the last/old official release of CM a dependency seems *not* the right way to go, indeed. ;-) This can be solved "temporarily", even though you are right that this is bad in general, by copying (into _private_ or _package-private_ classes) the necessary functionality. Better still would be to consider a refactoring (bringing in only required functionality) of the linear algebra utilities specifically geared to its usage in the STATISTICS component. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804625#comment-16804625 ] Ben Nguyen commented on STATISTICS-7: - Hello, I am a second year computer science and economics student who has taken some of higher level econometrics courses, I'm very interested in working with the stat.regression library (I can see myself using it a lot in the near future). Looking through it briefly, I do see the problem; the implementation could be more intuitive to use. I am in the process of drafting a proposal. which lead me to wonder how to approach the dependency issue; to what extent should the new library be 'standalone' ; for example: should it still depend on math.linear (especially needed in OLS and GLS) since linear won't be getting an upgrade anytime soon? (I have not looked into math.linear yet) or should matrices be implemented internally with the new stats library? (repeating code in general is bad, though there is the benefit of 'zero' dependencies -> better maintainability?). I guess the same question lies with other dependencies too (math.exceptions, util) 'to what extent should this new library be standalone'? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803935#comment-16803935 ] Salman Hussain commented on STATISTICS-7: - Sorry about that [~erans], I have changed it now. Having looked more closely at the linear regression used in scikit-learn, there seem to be a lot of dependencies, including from scipy. I think focusing on porting the existing regression library would be more realistic, but whilst taking inspiration and design elements from the scikit-learn. I aim to submit a draft proposal by EOD through the GSoC portal. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803493#comment-16803493 ] Gilles commented on STATISTICS-7: - Hi [~Salman]. Could you please remove the restriction as to who can see your comments? When one is not logged in, part of the conversation is missing and it looks quite odd... Thanks. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803477#comment-16803477 ] Eric Barnhill commented on STATISTICS-7: Sounds very ambitious – just make sure you start with linear and logistic regression before you head to anything fancier. :) Obviously work can continue after summer's end if there is interest. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803070#comment-16803070 ] Eric Barnhill commented on STATISTICS-7: [~Salman] First of all I'll be delighted to read over the proposal, this is happening right now in some other projects. Seems to me time frame depends entirely on the skill set of the applicant. But I would set aside some time for each of the following in the proposal: * Project design – choose which stat functions you are interested in developing * Dependency analysis – what other classes do the stat functions currently depend on in commons-math and what is the purpose of those functions? What changes need to be made in order to create a standalone statistical component? For example, is the method depending on abstract classes, exception classes, formatting classes? Are these dependencies necessary or can they be restructured to be more stand alone? Whatever decision creates the best architecture is best of course * New component architecture - what will be the class hiearchy of the redesigned component? * Class design - do I want the user to create an object instance, or call a static method, to use this functionality? Do I want some enums to handle parameters of various kinds? What kinds of inputs should it take? How do I handle different data types? What are the outputs? * Unit test design - creating a property-driven unit testing scheme * Algorithm Validation – is there code in other languages that I can use to validate test inputs? * Documentation - what doc will go with each class and method? What addition doc should be in the user guides? Ideally the project could conclude with a couple of tutorials implementing working examples > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802218#comment-16802218 ] Eric Barnhill commented on STATISTICS-7: Hi [~Salman], sounds like you would be in a great position to contribute the streaming code that is needed. Just to be clear everyone, there is plenty of work here and [~erans] and I can (I assume) split this ticket so that we can assign a project of interest to whomever is interested. Right off the top of my head, I can think of a few different work packages here: There are the summary statistics in stat.descriptive.*, which are already quite a lot, however the work is not restricted to this in any way. Some of these stats are pretty obscure for a "common" library, for example who uses FourthMoment?; meanwhile many metrics found in for example [https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics] are very commonly used in 2019 and their implementation with a clear user interface in Java is long overdue. If you write a proposal for such a project, you will be very free to take the initiative and contribute what you want here. There is also a whole regression library, and porting that might be of particular interest to anyone interested in machine learning. It is also a suboptimally designed library IMO, I think the "SimpleRegression" class is evidence of bad design. Regression is such a huge percentage of actual ML in the wild, knowing it very technically, and being able to code it up, is I think a real asset that would put you ahead of 99% of aspiring data scientists. Finally there is a whole library of statistical tests, as well as correlation and covariance lbraries, this would be another work package, for someone perhaps more interested in statistics getting into applied math, and coding those algorithms. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801900#comment-16801900 ] Udit Arora commented on STATISTICS-7: - Sure sir. I will continue to dig deeper, since you can never know enough of any particular thing.. :) > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801890#comment-16801890 ] Eric Barnhill commented on STATISTICS-7: [~Udit Arora] Sounds like you are in great shape to me. For anything you don't know, we are here to mentor you of course! And we will welcome your efforts. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801879#comment-16801879 ] Udit Arora commented on STATISTICS-7: - Sir Eric Barnhill and Sir Gilles I am not a data scientist sir, but I am pursuing Computer Science Engineering and I am currently in my second semester. Even though I have decent good knowledge in Statistics as it has been taught to us and I am also looking up somethings on the Internet and have decent knowledge of Java and Python as well, what things other than this should I know so as to be capable to work on this project? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801871#comment-16801871 ] Eric Barnhill commented on STATISTICS-7: Hi [~virendrasinghrp] I work as a data scientist for a Silicon Valley company so I know data science is a big thing and I know what languages are used. Java makes up a large part of commercial infrastructure and Apache commons in particular is used everywhere. Even in a small project I am working on right now the devs use commons-cli, commons-csv, commons-lang . The applications of commons-statistics will just be innumerable. If you want something data science specific you probably know that nearly all data engineering infrastructure is in Java. The ability to run some statistical mappings on the side of that infrastructure would I think be very valuable. If the mappings were implemented functionally, it would scale effortlessly. A lot of job offers for data scientist are really mostly engineering, so knowing that side of the business is really good. But I am not going to spend a lot of time defending the project I think it obviously has a very wide audience. I also disagree that it is easy. There is a lot that goes into development, testing, documentation and release of widely used software. Also like Gilles said there are a lot of architectural decisions to be made about how the stats libraries are going to hang together. It is decisions at this level, that separate good engineers from bad. We do not want a bunch of independent scripts, I could write those very quickly. If we make good progress there are lots of ways to extend the project into ML tools like logistic regression (which is probably what people use 90%+ of the time, from what I hear). > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801653#comment-16801653 ] Gilles commented on STATISTICS-7: - Some of the "Commons Math" open issues stem from design bugs (e.g. MATH-1281). The new component allows to start from a clean slate, without the backward-compatibility constraint, hopefully with design lessons learnt from other libraries, and from mistakes made in "Commons Math" whose fix have been delayed indefinitely. Indeed, the design of the {{stat}} package dates from the inception of the component, back in 2003: In the [initial proposal|http://commons.apache.org/proper/commons-math/proposal.html], half of the source description pertains to statistics and from those, the random utilities have gone into their [own component|http://commons.apache.org/proper/commons-rng/] which I consider as a step in the right direction, i.e. away from a huge monolithic library that proved to be an unsustainable project, mostly because requested stability of some packages prevented a sane evolution of others (solely because they were part of the same component!). That said, an advantage to having "Commons Math" is that a lot of the "core" codes and unit tests can be leveraged for making the port work relatively fast and robust, once a new design has been put forward. And if there are competing proposals, they can be developed in parallel, for some time, until one seems to gather more interest. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801610#comment-16801610 ] Virendra Singh commented on STATISTICS-7: - Hi [~erans], based on what you said, I understood that the main purpose of this package is to develop statistical functions using Java 8 features like Streams,Lambda and Functions Interfaces in Functional Programming style. You mentioned, *"...order to fix design issues..."*,Are we talking about using Functional programming to fix design issues as it has features like pure functions,referential transparency etc. If not, please correct me. We are discussing in comments section only.Is this the only way or there is any other platform to discuss project? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801603#comment-16801603 ] Gilles commented on STATISTICS-7: - Hi. Contributors are most wanted. The idea behind the new "Commons Statistics" component is to offer the same statistical tools as exist under the {{org.apache.commons.math4.stat}} package of ["Commons Math"|http://commons.apache.org/proper/commons-math/apidocs/index.html] (see also STATISTICS-5) but with the opportunity for a complete overhaul in order to fix design issues that have existed for years (see e.g. [this list of JIRA reports|https://issues.apache.org/jira/issues/?jql=project%20%3D%20MATH%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22stat%22]), as well as for a modern API (Java 8+). There isn't yet any specific plan or priorities; anything can be proposed to start a discussion, and people with an interest in using these functionalities (i.e. having real-life problems to solve) are especially welcome to drive the discussions. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801500#comment-16801500 ] Virendra Singh commented on STATISTICS-7: - I am asking this because I am inclined towards Data Science, and the statistics package would be great to work on in Java. Also, I'am starting with GSoC(as it would definitely boost up my confidence) but thats not the limit, I want to work on it even after that,that's why I am asking. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801496#comment-16801496 ] Virendra Singh commented on STATISTICS-7: - [~ericbarnhill], what are the future plans for this statistics package? because summary statistics is a very basic thing. I assume that you are planning to make a package for applied advanced statistics which can be further used in Data Science and Machine Learning . As, currently Data Science is the new big thing and most of its work is done in Python & R. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801361#comment-16801361 ] Udit Arora commented on STATISTICS-7: - Khaled Emara Sir, I am very much interested in this project. As you said languages like R and Python may be needed, I was trying to implement some statistical data using python which involved multivariate regression, so as to be more comfortable with the language. Sorry for not being active. But sir I want to be fully prepared for this project so I started doing somethings on python. Also some assignments from college also took my time. But sir, I am familiar with all the terms Virendra Singh mentioned in his last comment. Sir now that you drew my attention, I will try to be more active. Thanks > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801298#comment-16801298 ] Eric Barnhill commented on STATISTICS-7: Yes. Perfect. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801253#comment-16801253 ] Virendra Singh commented on STATISTICS-7: - [~ericbarnhill] as you mentioned,"This ticket addresses more the commons-stats functions, in particular summary statistics that are widely used". I know that summary statistics include basic and widely used mean,median,mode,standard deviation,variance,skewness,kurtosis. For this project we need to work on them. Am I right? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801241#comment-16801241 ] Virendra Singh commented on STATISTICS-7: - We need to port classes from "commons.math4.stat" which are *not* coded using Functional programming concept(I just saw the code). We need to port those classes and code using Functional programming concept which is introduced in Java 8.(As, functional programming is used for mathematical computations and when concurrency and parallelism are required) Am I correct? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801239#comment-16801239 ] Eric Barnhill commented on STATISTICS-7: Oh you are right, those have been ported, but not much developed. I see Gilles has one Gsoc ticket regarding the distributions and he can speak to what work needs to be done there. This ticket addresses more the commons-stats functions, in particular summary statistics that are widely used. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801233#comment-16801233 ] Virendra Singh commented on STATISTICS-7: - No [~ericbarnhill] not from commons.math4.distribution. The distribution classes are on [org.apache.commons.statistics.distribution|https://commons.apache.org/proper/commons-statistics/xref/org/apache/commons/statistics/distribution/package-frame.html] Here is the link: https://commons.apache.org/proper/commons-statistics/xref/index.html > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801221#comment-16801221 ] Eric Barnhill commented on STATISTICS-7: Hello [~virendrasinghrp], my idea was to port the classes from commons.math4.stat but we would welcome someone to also port the commons.math4.distribution classes as well. Is that what you meant, when you said you saw them already programmed? Some of your English was not clear (by the way I am happy to help you with English too.) and I am not sure I understand your question. We are looking for up to date implementations of these Java statistical libraries but also the deceloper will need to make some architectural decisions, so really I think it is a rich project. I had added some more information to the ticket so that everyone can see it. Does this information address your question? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > The ideal contributor will also be able to help with important architectural > decision making. The old source of these libraries, commons-math, grew too > large, hierarchically complex and interdependent for the commons mission. The > developers on this project need to make architectural choices that will > enable the statiscal code to be lightweight and reusable, with a minimum of > outside dependencies while avoiding redundancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801176#comment-16801176 ] VIRENDRA SINGH commented on STATISTICS-7: - Hi [~erans], I am Virendra Singh and I am interested in this project.To be clear I want to work on this project under GSoC 2019.I am currently studying Business analytics and Data science. From the description I understood that we are using Functional Interfaces from Java 8 for developing this statistics library.I also saw some statistical functions already programmed like chi-square distribution,Gamma Distribution,Normal Distribution etc. My question is ,for this project do we have to code more statistical function or anything else?Also, if I've understood anything wrong please correct me. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801010#comment-16801010 ] Eric Barnhill commented on STATISTICS-7: You need to submit a proposal, that proposal should make it clear that you have the requisite abilities. It sounds like you have plenty of Java background, what we are looking for is someone who has familiarity with the functional programming side of Java to produce an up-to-date Java statistics library. Or who wants to learn this side of Java through this project. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800948#comment-16800948 ] Khaled Emara commented on STATISTICS-7: --- Hi [~Udit Arora], are you still interested in this project? I haven't seen an activity for a while. Are there any competency tests, or some issues easy for newcomers to be solved as a proof of ability, [~erans]? I kind of am new to Open Source development theme, but I have worked on a reasonably large Android projects using Kotlin and Java previously. I have also worked on a window manager using Xlib before and did some Game Development. Do you think I would be a good fir for this project? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794279#comment-16794279 ] Gilles commented on STATISTICS-7: - For the library, no; but contributors sometimes used other languages (e.g. "R", "Python") to generate "reference" values to compare against in unit tests. > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-7) Stream-based Java statistical processing
[ https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794275#comment-16794275 ] Udit Arora commented on STATISTICS-7: - Thanks for the reply. Would any language other than Java be needed so that I can prepare myself for it? > Stream-based Java statistical processing > > > Key: STATISTICS-7 > URL: https://issues.apache.org/jira/browse/STATISTICS-7 > Project: Apache Commons Statistics > Issue Type: New Feature >Reporter: Eric Barnhill >Priority: Major > Labels: GSoC2019, gsoc2019, statistics, streams > > The new component aims to be a library of commons statistics functions > synchronized with the latest developments in the Java language, in particular > Java's functional programming syntax. > The library will make commonly used statistical functions available to an end > user through a simple grammar comparable to commons-math-statistics or > scikit-learn, while under the hood will implement Java's mapping, streaming, > and other producer and consumer functions to ensure the statistical methods > run optimally in new Java implementations. > Developers working on the project will have the opportunity to demonstrate > Java programming, functional programming, algorithm design, and data science > skills and receive authorship on a commons project that is likely to be > widely used. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)