Re: [Rdkit-discuss] chemfp preprint

2019-03-26 Thread Greg Landrum
On Mon, Mar 25, 2019 at 11:36 PM Geoffrey Hutchison <
geoff.hutchi...@gmail.com> wrote:

> > Sometimes, I wish there was a rdkit consortium/NPO (so that donations
> are tax deductible), so that rdkit could be massively funded by all its
> commercial users, and even accepting individual donations.
>
> I don't want to hijack the thread, so please feel free to take this
> off-list with anyone interested.
>

It's been pretty thoroughly hijacked. :-)

It's an important topic, so I'm going to start a new thread for this.


> I think it's an interesting idea in general in open chemistry. We have set
> up an Open Chemistry collective - this receives $$ from Google Summer of
> Code. The "host" is the Open Source Collective, a 501c6 non-profit in the
> United States (
> https://docs.opencollective.com/help/hosts/open-source-collective)
>
> The collective isn't perfect, it skims 5% for transaction fees and
> overhead, but it's:
> - completely transparent for donations
> - completely transparent for expenses
> - allows both one-time and recurring donations
>
> Greg can correct me - I think we handled the $$ to RDKit from Google
> Summer of Code 2018 before we set this up, but it's certainly there to use.
> You can create your own RDKit collective pretty easily too:
> https://opencollective.com/open-chemistry


Yeah, the travel grants for the two RDKit GSoC students were handled via a
different mechanism.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] chemfp preprint

2019-03-26 Thread Andrew Dalke
On Mar 25, 2019, at 04:05, Francois Berenger  wrote:
> Sometimes, I wish there was a rdkit consortium/NPO (so that donations are tax 
> deductible), so that rdkit could be massively funded by all its commercial 
> users, and even accepting individual donations.

Setting up such an organization is not difficult. It does take time, money, and 
effort, which add overhead to the funding process. It also requires people who 
are willing to do that sort of work. I was on the board of the Open 
Bioinformatics Foundation, and involved with the Python Software Foundation. I 
know that I am *not* that sort of person.

In any case, as Geoff Hutchinson points out, there are umbrella organizations 
like the Open Source Collective which can handle most of that overhead.

My question is, why would users - commercial or otherwise - be willing to fund 
such work in the first place?

As far as I can tell, nearly everyone uses free and open source software 
because they are available for no cost. Users are rarely willing to pay for 
software freedom, or for economic benefits like avoiding vendor lock-in.

And it seems like commercial users are often willing to use an internal fork of 
a project than to work with upstream to develop new features. This might be 
because it's easier to work with existing staff or existing contracting 
arrangements than figuring out how to get upstream to do the work and setting 
up a new contractual relationship, and take the risk that it isn't done to 
schedule.

My conjecture is that there are several issues at play.

1) Most end users don't realize there is a funding problem for many FOSS 
projects. Package managers like pip/PyPI, Conda, Homebrew and apt make it 
*really easy* to install a large number of packages without knowing anything 
about the funding or staffing status of each underlying project.

Consider that one of the early business models for PyMol was the idea that 
people would be willing to pay for pre-compiled packages from the main 
developer, even though the source code was available for free as open source. 
That business model somewhat worked then. It would not work now.

2) The proponents of "open source" in the late 1990s emphasized the volunteer 
nature of open source, going so much as to argue that there was a "gift 
culture" (using E. Raymond's term). The implication is that there was a sort of 
social contract, where donations of source code would be met with other sorts 
of payment, including job/consulting offers and non-trivial amounts of 
reciprocal code contributions. 

This has not turned out to be true, with rare exceptions. Instead, I think the 
association with volunteerism and gifts has caused people to avoid talking 
about fund raising. This should be particularly odd as many volunteer 
organizations outside of computing have funding drives.

3) FOSS developers who distribute at no cost are ignoring any capital value in 
the software. They can only make income on gifts (which are rare) or through 
labor (e.g., consulting). This places them at a funding disadvantage compared 
to proprietary software vendors who can amortize labor costs across multiple 
sales.

To be clear, I am only talking about self-funded FOSS projects. My paper 
mentions a few other funding models, like research grants at universities, or 
in-house projects funded by the ability to reduce costs. In the latter case, 
the minor additional costs for releasing the project as FOSS can be justified 
by even small benefits.

4) The pricing of per-unit sales of FOSS software, either institutional sales 
like what I tried with chemfp,  or end-user sales like PyMol, should factor in 
the likelihood that customers will redistribute the software further, and by 
doing so reduce the market size. This factor is hard to estimate, and higher in 
general for universities than pharmaceutical companies, which makes it harder 
to give a significant discount to universities like what proprietary vendors 
can do.

5) In my paper I bring up "free rider problem" as a way to think of the issues. 
To be clear, this is only a *problem* if people expect anything back from 
releasing and/or maintaining an open source software project. (Or don't expect 
people to insist on support, like I have received for the no-cost/open source 
version of chemfp.)

Suppose I want to add a new feature to mmpdb, the matched molecular pair 
program which I helped develop and has been contributed to the RDKit project. I 
might go around to various users and ask for development funding as a 
consortium. 20 organizations might be interested, and each one willing to pay 
50% of the development cost, which means in principle I could get 10x the cost 
of labor, which provides the extra profit that could go towards documentation, 
testing, and general support.

However, it's also easy for each of those 20 organizations to think that 
someone else ("Let George do it", as the Stanford Encyclopedia of Philosophy 
article explains) is going t

Re: [Rdkit-discuss] chemfp preprint

2019-03-25 Thread Geoffrey Hutchison
> Sometimes, I wish there was a rdkit consortium/NPO (so that donations are tax 
> deductible), so that rdkit could be massively funded by all its commercial 
> users, and even accepting individual donations.


I don't want to hijack the thread, so please feel free to take this off-list 
with anyone interested.

I think it's an interesting idea in general in open chemistry. We have set up 
an Open Chemistry collective - this receives $$ from Google Summer of Code. The 
"host" is the Open Source Collective, a 501c6 non-profit in the United States 
(https://docs.opencollective.com/help/hosts/open-source-collective)

The collective isn't perfect, it skims 5% for transaction fees and overhead, 
but it's:
- completely transparent for donations
- completely transparent for expenses
- allows both one-time and recurring donations

Greg can correct me - I think we handled the $$ to RDKit from Google Summer of 
Code 2018 before we set this up, but it's certainly there to use. You can 
create your own RDKit collective pretty easily too:
https://opencollective.com/open-chemistry

One big benefit is that OpenCollective handles all the legal paperwork and 
accounting.

-Geoff

PS One regret is that I haven't had need of chemfp in house, or I would have 
pushed some $$ towards Andrew.

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] chemfp preprint

2019-03-24 Thread Francois Berenger

On 23/03/2019 04:39, Andrew Dalke wrote:

Hi RDKit users,

  This week I submitted a paper about chemfp for publication. I also
submitted a preprint on ChemRxiv, which was just accepted.

For those interested, it's at
https://chemrxiv.org/articles/The_Chemfp_Project/7877846 .

It's a rather long paper as it covers many aspects about the chemfp
project, including the FPS and FPB formats, search algorithms, details
about the different ways to compute a popcount, and memory bandwidth
and latency bottlenecks. On a non-technical level I also describe some
of the difficulties I ran into trying to run chemfp as "commercial
free software."


The part about funding free software is quite interesting (I just 
skimmed through this part of the paper, sorry).


Sometimes, I wish there was a rdkit consortium/NPO (so that donations 
are tax deductible), so that rdkit could be massively funded by all its 
commercial users, and even accepting individual donations.


When you think about Linux, several developers are paid
full-time either by the Linux foundation (I think) or by large companies 
using Linux,

to work on the Linux kernel full-time.
I guess it gives them a lot of manpower to push their open-source 
project forward

and maintain it in the long run.


Let me know of any corrections or improvements, or any other feedback
you might have.

Cheers,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] chemfp preprint

2019-03-22 Thread Markus Sitzmann
Yes, we all love ref 57.

-
|  Markus Sitzmann
|  markus.sitzm...@gmail.com

> On 22. Mar 2019, at 20:39, Andrew Dalke  wrote:
> 
> Hi RDKit users,
> 
>  This week I submitted a paper about chemfp for publication. I also submitted 
> a preprint on ChemRxiv, which was just accepted.
> 
> For those interested, it's at 
> https://chemrxiv.org/articles/The_Chemfp_Project/7877846 .
> 
> It's a rather long paper as it covers many aspects about the chemfp project, 
> including the FPS and FPB formats, search algorithms, details about the 
> different ways to compute a popcount, and memory bandwidth and latency 
> bottlenecks. On a non-technical level I also describe some of the 
> difficulties I ran into trying to run chemfp as "commercial free software."
> 
> Let me know of any corrections or improvements, or any other feedback you 
> might have.
> 
> Cheers,
> 
>Andrew
>da...@dalkescientific.com
> 
> 
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] chemfp preprint

2019-03-22 Thread Andrew Dalke
Hi RDKit users,

  This week I submitted a paper about chemfp for publication. I also submitted 
a preprint on ChemRxiv, which was just accepted.

For those interested, it's at 
https://chemrxiv.org/articles/The_Chemfp_Project/7877846 .

It's a rather long paper as it covers many aspects about the chemfp project, 
including the FPS and FPB formats, search algorithms, details about the 
different ways to compute a popcount, and memory bandwidth and latency 
bottlenecks. On a non-technical level I also describe some of the difficulties 
I ran into trying to run chemfp as "commercial free software."

Let me know of any corrections or improvements, or any other feedback you might 
have.

Cheers,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss