Re: R/arrow update

2018-03-21 Thread Wes McKinney
Cool. For JIRA, the only issue fields we really use are:

Project: This is always Apache Arrow
Issue Type: Select one (in this case it will mostly be New Feature or
Improvement)
Summary: Describe what the issue is. Preferably add "[R]" in the title
to help with inbox filters
Priority: Fine to leave as Major for all issues
Components: add "R" as a component
Fix version: The project version that this issue is planned to be
completed for. For example, the next major release is 0.10.0
Assignee: New issues can be left as Unassigned, or you can assign to yourself

The other fields can be left blank.

On Wed, Mar 21, 2018 at 2:47 AM, Romain François  wrote:
> That sounds good. I’ll make a pull request of what I have once I have 
> something useful in the readme.
>
> Things like build are not dealt with at the moment so it might be that this 
> only works on macOS or even (don’t think so) only on my 💻.
>
> As long as it’s clearly established that this is wip and that it might 
> entirely change, for sure let’s merge patches to master.
>
> JIRA is new to me, I usually work with github issues, so I’ll probably need 
> some guidance.
>
> Romain
>
>> Le 20 mars 2018 à 23:30, Wes McKinney  a écrit :
>>
>> hi Romain,
>>
>> Cool! I would suggest that we proceed in one of two ways:
>>
>> * Start merging R patches to master (what I would prefer)
>> * Merge patches into an r-devel branch while the R bindings initiative
>> is in early stages
>>
>> I don't really see any benefits to hiding early-stage code in a
>> branch; the README for R should clearly indicate that the API is
>> experimental. I think it would be better for the code to start going
>> into the Arrow project (rather than staying in your personal branch)
>> for a few reasons:
>>
>> * More opportunities for the community to participate
>> * More visible progress / transparency into what is going on
>> * You will earn karma in the Apache project and be on your way to
>> becoming a committer
>> * Opportunities for code review from other C++ developers on use of
>> the Arrow APIs, and opportunities for improvement
>> * Incremental IP / licensing oversight (this gets harder when the
>> patches get bigger)
>> * Help with roadmapping / enumerating work to be done
>>
>> On that last note, I would recommend beginning to liberally create
>> JIRAs as you think of things that need to be done to build first class
>> R support for Arrow. JIRA is the simplest way to develop the roadmap
>> organically, it doesn't need to be anything formal.
>>
>> Thanks!
>> Wes
>>
>>> On Tue, Mar 20, 2018 at 12:04 PM, Romain Francois  
>>> wrote:
>>> Hello,
>>>
>>> Today is Tuesday, so that's the day I work on porting arrow to R. This 
>>> week, I've continued some of the work from last week, still following the 
>>> steps of the python front end as documented here: 
>>> https://arrow.apache.org/docs/python/data.html#type-metadata 
>>> 
>>>
>>> Things are starting to materialize, and I try to give it an R feel.
>>>
 int32()
>>> DataType(int32)

 float64()
>>> DataType(double)

 struct( x = int32(), y = float64(), d1 = date32() )
>>> StructType(struct)

 schema( x = int32(), y = float64(), d1 = date32() )
>>> x: int32
>>> y: double
>>> d1: date32[day]
>>>
>>>
>>> This is not that interesting, but it sets a nice premise for the future.
>>>
>>> Quick ones:
>>> - are there examples of uses of pyarrow.union ?
>>> - how does pyarrow.array dispatches to the right array type ? And perhaps 
>>> more generally, how do I know what's inside the function ?
>>>
>> pa.array([1, 2, None, 3])
>>> 
>>> [
>>>  1,
>>>  2,
>>>  NA,
>>>  3
>>> ]
>>
>> pa.array
>>> 
>>>
>>>
>>> Romain
>>>
>>>
>


Re: R/arrow update

2018-03-20 Thread Romain François
That sounds good. I’ll make a pull request of what I have once I have something 
useful in the readme. 

Things like build are not dealt with at the moment so it might be that this 
only works on macOS or even (don’t think so) only on my 💻. 

As long as it’s clearly established that this is wip and that it might entirely 
change, for sure let’s merge patches to master.

JIRA is new to me, I usually work with github issues, so I’ll probably need 
some guidance. 

Romain

> Le 20 mars 2018 à 23:30, Wes McKinney  a écrit :
> 
> hi Romain,
> 
> Cool! I would suggest that we proceed in one of two ways:
> 
> * Start merging R patches to master (what I would prefer)
> * Merge patches into an r-devel branch while the R bindings initiative
> is in early stages
> 
> I don't really see any benefits to hiding early-stage code in a
> branch; the README for R should clearly indicate that the API is
> experimental. I think it would be better for the code to start going
> into the Arrow project (rather than staying in your personal branch)
> for a few reasons:
> 
> * More opportunities for the community to participate
> * More visible progress / transparency into what is going on
> * You will earn karma in the Apache project and be on your way to
> becoming a committer
> * Opportunities for code review from other C++ developers on use of
> the Arrow APIs, and opportunities for improvement
> * Incremental IP / licensing oversight (this gets harder when the
> patches get bigger)
> * Help with roadmapping / enumerating work to be done
> 
> On that last note, I would recommend beginning to liberally create
> JIRAs as you think of things that need to be done to build first class
> R support for Arrow. JIRA is the simplest way to develop the roadmap
> organically, it doesn't need to be anything formal.
> 
> Thanks!
> Wes
> 
>> On Tue, Mar 20, 2018 at 12:04 PM, Romain Francois  wrote:
>> Hello,
>> 
>> Today is Tuesday, so that's the day I work on porting arrow to R. This week, 
>> I've continued some of the work from last week, still following the steps of 
>> the python front end as documented here: 
>> https://arrow.apache.org/docs/python/data.html#type-metadata 
>> 
>> 
>> Things are starting to materialize, and I try to give it an R feel.
>> 
>>> int32()
>> DataType(int32)
>>> 
>>> float64()
>> DataType(double)
>>> 
>>> struct( x = int32(), y = float64(), d1 = date32() )
>> StructType(struct)
>>> 
>>> schema( x = int32(), y = float64(), d1 = date32() )
>> x: int32
>> y: double
>> d1: date32[day]
>> 
>> 
>> This is not that interesting, but it sets a nice premise for the future.
>> 
>> Quick ones:
>> - are there examples of uses of pyarrow.union ?
>> - how does pyarrow.array dispatches to the right array type ? And perhaps 
>> more generally, how do I know what's inside the function ?
>> 
> pa.array([1, 2, None, 3])
>> 
>> [
>>  1,
>>  2,
>>  NA,
>>  3
>> ]
> 
> pa.array
>> 
>> 
>> 
>> Romain
>> 
>> 



Re: R/arrow update

2018-03-20 Thread Wes McKinney
hi Romain,

Cool! I would suggest that we proceed in one of two ways:

* Start merging R patches to master (what I would prefer)
* Merge patches into an r-devel branch while the R bindings initiative
is in early stages

I don't really see any benefits to hiding early-stage code in a
branch; the README for R should clearly indicate that the API is
experimental. I think it would be better for the code to start going
into the Arrow project (rather than staying in your personal branch)
for a few reasons:

* More opportunities for the community to participate
* More visible progress / transparency into what is going on
* You will earn karma in the Apache project and be on your way to
becoming a committer
* Opportunities for code review from other C++ developers on use of
the Arrow APIs, and opportunities for improvement
* Incremental IP / licensing oversight (this gets harder when the
patches get bigger)
* Help with roadmapping / enumerating work to be done

On that last note, I would recommend beginning to liberally create
JIRAs as you think of things that need to be done to build first class
R support for Arrow. JIRA is the simplest way to develop the roadmap
organically, it doesn't need to be anything formal.

Thanks!
Wes

On Tue, Mar 20, 2018 at 12:04 PM, Romain Francois  wrote:
> Hello,
>
> Today is Tuesday, so that's the day I work on porting arrow to R. This week, 
> I've continued some of the work from last week, still following the steps of 
> the python front end as documented here: 
> https://arrow.apache.org/docs/python/data.html#type-metadata 
> 
>
> Things are starting to materialize, and I try to give it an R feel.
>
>> int32()
> DataType(int32)
>>
>> float64()
> DataType(double)
>>
>> struct( x = int32(), y = float64(), d1 = date32() )
> StructType(struct)
>>
>> schema( x = int32(), y = float64(), d1 = date32() )
> x: int32
> y: double
> d1: date32[day]
>
>
> This is not that interesting, but it sets a nice premise for the future.
>
> Quick ones:
> - are there examples of uses of pyarrow.union ?
> - how does pyarrow.array dispatches to the right array type ? And perhaps 
> more generally, how do I know what's inside the function ?
>
 pa.array([1, 2, None, 3])
> 
> [
>   1,
>   2,
>   NA,
>   3
> ]

 pa.array
> 
>
>
> Romain
>
>