RE: [julia-users] Is there a way to use values in a DataFrame directly in computation?
Query.jl does not aim to make working with Nullables easier. The package provides querying capabilities, and is specifically designed to simply pick up whatever support for Nullables there is in julia base. Right now, as a temporary measure, Query.jl defines lots of methods for functions like arithmetic operators (``+`` etc.) for Nullables. Without those definitions the package would be close to unusable (in the same way that DataFrames right now is close to unusable). But I really hope to move these methods out of Query.jl, they don’t belong in that package, instead those methods should be in base (the approach here is what David Gold called the “method extension lifting approach”). I feel strongly that “pushing” the problem of how to deal with Nullables into querying packages is not the right strategy. Instead I would much prefer to see better support for Nullables generally, and then packages like Query.jl can pick that support up. There are too many situations where using a query package is overkill, but where you will still encounter Nullables (especially now that DataFrames is based on NullableArrays). If we ask folks to use query packages in all of these cases we will have created a conceptually clean but completely impractical system, IMHO. For example, I think the examples from the original email simply need to work before the new DataFrames is tagged. Those kinds of operations are sooo common, I would find it completely impractical to ask folks to use something like Query.jl in such a situation. I think the path to this is pretty simple: all we have to do is add methods for the common arithmetic operators that work on Nullable types. That is the approach that C# took, and it works really well. Maybe add some methods for Strings. I think once that is covered, most of the common use cases are dealt with and the system would work well in practice. Those new methods could be added to the julia master branch now, and then be backported to julia 0.5. Once that is done DataFrames could be merged. Best, David From: julia-users@googlegroups.com [mailto:julia-users@googlegroups.com] On Behalf Of John Myles White Sent: Monday, October 3, 2016 5:05 PM To: julia-users Subject: Re: [julia-users] Is there a way to use values in a DataFrame directly in computation? I think the core problem is that the current API + Nullable's is very cumbersome, but the switch to Nullable's will hopefully occur nearly simultaneously with the introduction of new API's that can make Nullable's much easier to deal with. David Gold spent the summer working on one approach that is, I think, much better than the current API; David Anthoff also has another approach that is substantially more powerful than the current API. The time between 0.5 and 0.6 may be a little chaotic in this regard, but I think the eventual results will be unequivocally worth the wait. -- John On Monday, October 3, 2016 at 3:45:42 PM UTC-7, Min-Woong Sohn wrote: Thank you. I fear that Nullables will make the DataFrame very difficult to use and turn many people away from Julia. On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote: Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit : > > I am using DataFrames from master branch (with NullableArrays as the > default) and was wondering how the following should be done: > > df = DataFrame() > df[:A] = NullableArray([1,2,3]) > > The following are not allowed or return wrong values: > > df[1,:A] == 1 # false > df[1,:A] > 1 # MethodError: no method matching isless(::Int64, > ::Nullable{Int64}) > df[3,:A] + 1 # MethodError: no method matching > +(::Nullable{Int64}, ::Int64) > > How should I get around these issues? Does anybody know if there is a > plan to support these kinds of computations directly? These operations currently work (after loading NullableArrays) if you rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two first return a Nullable{Bool}, so you need to call get() on the result if you want to use them e.g. with an if. As an alternative, you can use isequal(). There are discussions as regards whether mixing Nullable and scalars should be allowed, as well as whether these operations should be moved into Julia Base. See in particular https://github.com/JuliaStats/NullableArrays.jl/pull/85 https://github.com/JuliaLang/julia/pull/16988 Anyway, the best approach to work with data frames is probably to use frameworks like AbstractQuery.jl and Query.jl, which are not yet completely ready to handle Nullable, but should make this easier. Regards
Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?
This is good news, and I am holding my breath for this to be succesful! As someone from a data-rich science (Ecology), a really good way of interacting directly with data is the make-or-break for whether I will be able to persuade my colleagues to make the shift to julia.
Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?
I think the core problem is that the current API + Nullable's is very cumbersome, but the switch to Nullable's will hopefully occur nearly simultaneously with the introduction of new API's that can make Nullable's much easier to deal with. David Gold spent the summer working on one approach that is, I think, much better than the current API; David Anthoff also has another approach that is substantially more powerful than the current API. The time between 0.5 and 0.6 may be a little chaotic in this regard, but I think the eventual results will be unequivocally worth the wait. -- John On Monday, October 3, 2016 at 3:45:42 PM UTC-7, Min-Woong Sohn wrote: > > Thank you. I fear that Nullables will make the DataFrame very difficult to > use and turn many people away from Julia. > > > > On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote: >> >> Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit : >> > >> > I am using DataFrames from master branch (with NullableArrays as the >> > default) and was wondering how the following should be done: >> > >> > df = DataFrame() >> > df[:A] = NullableArray([1,2,3]) >> > >> > The following are not allowed or return wrong values: >> > >> > df[1,:A] == 1 # false >> > df[1,:A] > 1 # MethodError: no method matching isless(::Int64, >> > ::Nullable{Int64}) >> > df[3,:A] + 1 # MethodError: no method matching >> > +(::Nullable{Int64}, ::Int64) >> > >> > How should I get around these issues? Does anybody know if there is a >> > plan to support these kinds of computations directly? >> These operations currently work (after loading NullableArrays) if you >> rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two >> first return a Nullable{Bool}, so you need to call get() on the result >> if you want to use them e.g. with an if. As an alternative, you can use >> isequal(). >> >> There are discussions as regards whether mixing Nullable and scalars >> should be allowed, as well as whether these operations should be moved >> into Julia Base. See in particular >> https://github.com/JuliaStats/NullableArrays.jl/pull/85 >> https://github.com/JuliaLang/julia/pull/16988 >> >> Anyway, the best approach to work with data frames is probably to use >> frameworks like AbstractQuery.jl and Query.jl, which are not yet >> completely ready to handle Nullable, but should make this easier. >> >> >> Regards >> >
Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?
Thank you. I fear that Nullables will make the DataFrame very difficult to use and turn many people away from Julia. On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote: > > Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit : > > > > I am using DataFrames from master branch (with NullableArrays as the > > default) and was wondering how the following should be done: > > > > df = DataFrame() > > df[:A] = NullableArray([1,2,3]) > > > > The following are not allowed or return wrong values: > > > > df[1,:A] == 1 # false > > df[1,:A] > 1 # MethodError: no method matching isless(::Int64, > > ::Nullable{Int64}) > > df[3,:A] + 1 # MethodError: no method matching > > +(::Nullable{Int64}, ::Int64) > > > > How should I get around these issues? Does anybody know if there is a > > plan to support these kinds of computations directly? > These operations currently work (after loading NullableArrays) if you > rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two > first return a Nullable{Bool}, so you need to call get() on the result > if you want to use them e.g. with an if. As an alternative, you can use > isequal(). > > There are discussions as regards whether mixing Nullable and scalars > should be allowed, as well as whether these operations should be moved > into Julia Base. See in particular > https://github.com/JuliaStats/NullableArrays.jl/pull/85 > https://github.com/JuliaLang/julia/pull/16988 > > Anyway, the best approach to work with data frames is probably to use > frameworks like AbstractQuery.jl and Query.jl, which are not yet > completely ready to handle Nullable, but should make this easier. > > > Regards >
Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?
Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit : > I am using DataFrames from master branch (with NullableArrays as the default) > and was wondering how the following should be done: > > df = DataFrame() > df[:A] = NullableArray([1,2,3]) > > The following are not allowed or return wrong values: > > df[1,:A] == 1 # false > df[1,:A] > 1 # MethodError: no method matching isless(::Int64, > ::Nullable{Int64}) > df[3,:A] + 1 # MethodError: no method matching +(::Nullable{Int64}, > ::Int64) > > How should I get around these issues? Does anybody know if there is a > plan to support these kinds of computations directly? These operations currently work (after loading NullableArrays) if you rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two first return a Nullable{Bool}, so you need to call get() on the result if you want to use them e.g. with an if. As an alternative, you can use isequal(). There are discussions as regards whether mixing Nullable and scalars should be allowed, as well as whether these operations should be moved into Julia Base. See in particular https://github.com/JuliaStats/NullableArrays.jl/pull/85 https://github.com/JuliaLang/julia/pull/16988 Anyway, the best approach to work with data frames is probably to use frameworks like AbstractQuery.jl and Query.jl, which are not yet completely ready to handle Nullable, but should make this easier. Regards
Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?
Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit : > > I am using DataFrames from master branch (with NullableArrays as the > default) and was wondering how the following should be done: > > df = DataFrame() > df[:A] = NullableArray([1,2,3]) > > The following are not allowed or return wrong values: > > df[1,:A] == 1 # false > df[1,:A] > 1 # MethodError: no method matching isless(::Int64, > ::Nullable{Int64}) > df[3,:A] + 1 # MethodError: no method matching > +(::Nullable{Int64}, ::Int64) > > How should I get around these issues? Does anybody know if there is a > plan to support these kinds of computations directly? These operations currently work (after loading NullableArrays) if you rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two first return a Nullable{Bool}, so you need to call get() on the result if you want to use them e.g. with an if. As an alternative, you can use isequal(). There are discussions as regards whether mixing Nullable and scalars should be allowed, as well as whether these operations should be moved into Julia Base. See in particular https://github.com/JuliaStats/NullableArrays.jl/pull/85 https://github.com/JuliaLang/julia/pull/16988 Anyway, the best approach to work with data frames is probably to use frameworks like AbstractQuery.jl and Query.jl, which are not yet completely ready to handle Nullable, but should make this easier. Regards
[julia-users] Is there a way to use values in a DataFrame directly in computation?
I am using DataFrames from master branch (with NullableArrays as the default) and was wondering how the following should be done: df = DataFrame() df[:A] = NullableArray([1,2,3]) The following are not allowed or return wrong values: df[1,:A] == 1 # false df[1,:A] > 1 # MethodError: no method matching isless(::Int64, ::Nullable{Int64}) df[3,:A] + 1 # MethodError: no method matching +(::Nullable{Int64}, ::Int64) How should I get around these issues? Does anybody know if there is a plan to support these kinds of computations directly?