Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Ian Joiner
Sure. Let’s actually keep it simple. Just go to
https://arrow.apache.org/docs/index.html and click on “7.0.0”. The problem
will immediately manifest itself.

Ian

On Wednesday, February 9, 2022, Sutou Kouhei  wrote:

> Thanks.
> But we can't attach an image to this mailing list. Could you
> upload it to somewhere such as https://gist.github.com/ ?
>
> In 
>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022
> 20:51:16 -0500,
>   Ian Joiner  wrote:
>
> > The URLs are good. It is the values in the drop-down list that still
> > need to be fixed. Please see the attached photo.
> >
> > Ian
> >
> >
> >
> > On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
> >>
> >> Could you show URL that is occurred?
> >>
> >> It seems that the following URLs show correct versions:
> >>
> >> * https://arrow.apache.org/docs/6.0/index.html
> >> * https://arrow.apache.org/docs/index.html
> >> * https://arrow.apache.org/docs/dev/index.html
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022
> 17:35:47 -0500,
> >>   Ian Joiner  wrote:
> >>
> >> > Really thanks!
> >> >
> >> > I do need to mention that versioning in the docs is still not
> displayed
> >> > properly (7.0 is labelled “6.0 (stable)” while 8.0 is labelled “7.0
> (dev)”).
> >> >
> >> > Ian
> >> >
> >> > On Tuesday, February 8, 2022, Sutou Kouhei 
> wrote:
> >> >
> >> >> Homebrew, MSYS2 and RubyGems are done:
> >> >>
> >> >> 1. [done] make the released version as "RELEASED" on JIRA
> >> >> 2. [done] start the new version on JIRA
> >> >> 4. [done] upload source
> >> >> 5. [done] upload binaries
> >> >> 6. [done] update website
> >> >> 7. [done] update Homebrew packages
> >> >> 8. [done] update MSYS2 package
> >> >> 9. [done] upload RubyGems
> >> >> 10. [done] upload JS packages
> >> >> 11. [done] upload C# packages
> >> >> 12. [todo:unassigned] update conda recipes
> >> >> 13. [done] upload wheels/sdist to pypi
> >> >> 14. [todo:kszucs] publish Maven artifacts
> >> >> 15. [todo:nealrichardson] update R packages
> >> >> 16. [todo:ianmcook] update vcpkg port
> >> >> 17. [done] bump versions
> >> >> 18. [done] update tags for Go modules
> >> >> 19. [done] update docs
> >> >>
> >> >> In  mail.gmail.com>
> >> >>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> >> >> 23:31:45 +0100,
> >> >>   Krisztián Szűcs  wrote:
> >> >>
> >> >> > I managed to upload the wheels to pypi. Here is the current status
> >> >> > with the updated assignments:
> >> >> >
> >> >> > 1. [done] make the released version as "RELEASED" on JIRA
> >> >> > 2. [done] start the new version on JIRA
> >> >> > 4. [done] upload source
> >> >> > 5. [done] upload binaries
> >> >> > 6. [in-pr] update website
> >> >> > 7. [todo:kou] update Homebrew packages
> >> >> > 8. [todo:kou] update MSYS2 package
> >> >> > 9. [todo:kou] upload RubyGems
> >> >> > 10. [done] upload JS packages
> >> >> > 11. [done] upload C# packages
> >> >> > 12. [todo:unassigned] update conda recipes
> >> >> > 13. [done] upload wheels/sdist to pypi
> >> >> > 14. [todo:kszucs] publish Maven artifacts
> >> >> > 15. [todo:nealrichardson] update R packages
> >> >> > 16. [todo:ianmcook] update vcpkg port
> >> >> > 17. [done] bump versions
> >> >> > 18. [done] update tags for Go modules
> >> >> > 19. [in-pr] update docs
> >> >> >
> >> >> > On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
> >> >> >  wrote:
> >> >> >>
> >> >> >> I will handle the R package submission to CRAN.
> >> >> > Thanks Neal!
> >> >> >>
> >> >> >> Neal
> >> >> >>
> >> >> >> On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei 
> wrote:
> >> >> >>
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > I'll update/upload Homebrew, MSYS2 and RubyGems.
> >> >> > Thanks Kou!
> >> >> >> >
> >> >> >> > 1. [done] make the released version as "RELEASED" on JIRA
> >> >> >> > 2. [done] start the new version on JIRA
> >> >> >> > 4. [done] upload source
> >> >> >> > 5. [done] upload binaries
> >> >> >> > 6. [in-pr] update website
> >> >> >> > 7. [todo:kou] update Homebrew packages
> >> >> >> > 8. [todo:kou] update MSYS2 package
> >> >> >> > 9. [todo:kou] upload RubyGems
> >> >> >> > 10. [done] upload JS packages
> >> >> >> > 11. [done] upload C# packages
> >> >> >> > 12. [todo:unassigned] update conda recipes
> >> >> >> > 13. [blocked:kszucs] upload wheels/sdist to pypi
> >> >> >> > "Project size too large. Limit for project 'pyarrow' total size
> is 10
> >> >> >> > GB. See https://pypi.org/help/#project-size-limit";
> >> >> >> > Filed an issue to increase the project limit, waiting for the
> >> >> >> > response: https://github.com/pypa/pypi-support/issues/1653
> >> >> >> > 14. [todo] publish Maven artifacts
> >> >> >> > Micah did you have a chance to verify the staged maven
> artifacts? I'd
> >> >> >> > wait for your response before pushing the release button.
> >> >> >> > 15. [todo:unassigned] update R packages
> >> >> >> > 16. [todo:ianmcook] update vcpkg port
> >> >> >> > 17. [done] bump versions
> >> >> 

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Sutou Kouhei
Thanks for confirming them!
I've pressed the "Release" button on
https://repository.apache.org/ .

In 
  "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022 21:06:18 
-0800,
  Chao Sun  wrote:

> The Java Maven artifacts look good to me. I was able to upgrade Apache
> Spark to use the staging artifacts and all the tests successfully passed.
> It's also great to see the `arrow-c-data` module finally get published in
> this release!
> 
> Chao
> 
> On Tue, Feb 8, 2022 at 5:51 PM Ian Joiner  wrote:
> 
>> The URLs are good. It is the values in the drop-down list that still
>> need to be fixed. Please see the attached photo.
>>
>> Ian
>>
>>
>>
>> On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
>> >
>> > Could you show URL that is occurred?
>> >
>> > It seems that the following URLs show correct versions:
>> >
>> > * https://arrow.apache.org/docs/6.0/index.html
>> > * https://arrow.apache.org/docs/index.html
>> > * https://arrow.apache.org/docs/dev/index.html
>> >
>> >
>> > Thanks,
>> > --
>> > kou
>> >
>> > In 
>> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022
>> 17:35:47 -0500,
>> >   Ian Joiner  wrote:
>> >
>> > > Really thanks!
>> > >
>> > > I do need to mention that versioning in the docs is still not displayed
>> > > properly (7.0 is labelled “6.0 (stable)” while 8.0 is labelled “7.0
>> (dev)”).
>> > >
>> > > Ian
>> > >
>> > > On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
>> > >
>> > >> Homebrew, MSYS2 and RubyGems are done:
>> > >>
>> > >> 1. [done] make the released version as "RELEASED" on JIRA
>> > >> 2. [done] start the new version on JIRA
>> > >> 4. [done] upload source
>> > >> 5. [done] upload binaries
>> > >> 6. [done] update website
>> > >> 7. [done] update Homebrew packages
>> > >> 8. [done] update MSYS2 package
>> > >> 9. [done] upload RubyGems
>> > >> 10. [done] upload JS packages
>> > >> 11. [done] upload C# packages
>> > >> 12. [todo:unassigned] update conda recipes
>> > >> 13. [done] upload wheels/sdist to pypi
>> > >> 14. [todo:kszucs] publish Maven artifacts
>> > >> 15. [todo:nealrichardson] update R packages
>> > >> 16. [todo:ianmcook] update vcpkg port
>> > >> 17. [done] bump versions
>> > >> 18. [done] update tags for Go modules
>> > >> 19. [done] update docs
>> > >>
>> > >> In <
>> cahm19a4se5wf8_fj3mwk-pksjdthe+d_sry3xdnbtjm+jjn...@mail.gmail.com>
>> > >>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
>> > >> 23:31:45 +0100,
>> > >>   Krisztián Szűcs  wrote:
>> > >>
>> > >> > I managed to upload the wheels to pypi. Here is the current status
>> > >> > with the updated assignments:
>> > >> >
>> > >> > 1. [done] make the released version as "RELEASED" on JIRA
>> > >> > 2. [done] start the new version on JIRA
>> > >> > 4. [done] upload source
>> > >> > 5. [done] upload binaries
>> > >> > 6. [in-pr] update website
>> > >> > 7. [todo:kou] update Homebrew packages
>> > >> > 8. [todo:kou] update MSYS2 package
>> > >> > 9. [todo:kou] upload RubyGems
>> > >> > 10. [done] upload JS packages
>> > >> > 11. [done] upload C# packages
>> > >> > 12. [todo:unassigned] update conda recipes
>> > >> > 13. [done] upload wheels/sdist to pypi
>> > >> > 14. [todo:kszucs] publish Maven artifacts
>> > >> > 15. [todo:nealrichardson] update R packages
>> > >> > 16. [todo:ianmcook] update vcpkg port
>> > >> > 17. [done] bump versions
>> > >> > 18. [done] update tags for Go modules
>> > >> > 19. [in-pr] update docs
>> > >> >
>> > >> > On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
>> > >> >  wrote:
>> > >> >>
>> > >> >> I will handle the R package submission to CRAN.
>> > >> > Thanks Neal!
>> > >> >>
>> > >> >> Neal
>> > >> >>
>> > >> >> On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei 
>> wrote:
>> > >> >>
>> > >> >> > Hi,
>> > >> >> >
>> > >> >> > I'll update/upload Homebrew, MSYS2 and RubyGems.
>> > >> > Thanks Kou!
>> > >> >> >
>> > >> >> > 1. [done] make the released version as "RELEASED" on JIRA
>> > >> >> > 2. [done] start the new version on JIRA
>> > >> >> > 4. [done] upload source
>> > >> >> > 5. [done] upload binaries
>> > >> >> > 6. [in-pr] update website
>> > >> >> > 7. [todo:kou] update Homebrew packages
>> > >> >> > 8. [todo:kou] update MSYS2 package
>> > >> >> > 9. [todo:kou] upload RubyGems
>> > >> >> > 10. [done] upload JS packages
>> > >> >> > 11. [done] upload C# packages
>> > >> >> > 12. [todo:unassigned] update conda recipes
>> > >> >> > 13. [blocked:kszucs] upload wheels/sdist to pypi
>> > >> >> > "Project size too large. Limit for project 'pyarrow' total size
>> is 10
>> > >> >> > GB. See https://pypi.org/help/#project-size-limit";
>> > >> >> > Filed an issue to increase the project limit, waiting for the
>> > >> >> > response: https://github.com/pypa/pypi-support/issues/1653
>> > >> >> > 14. [todo] publish Maven artifacts
>> > >> >> > Micah did you have a chance to verify the staged maven
>> artifacts? I'd
>> > >> >> > wait for your response before pushing the release button.
>> > >> >> > 15. [todo:unassigned] update R packages
>>

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Sutou Kouhei
Thanks.
But we can't attach an image to this mailing list. Could you
upload it to somewhere such as https://gist.github.com/ ?

In 
  "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022 20:51:16 
-0500,
  Ian Joiner  wrote:

> The URLs are good. It is the values in the drop-down list that still
> need to be fixed. Please see the attached photo.
> 
> Ian
> 
> 
> 
> On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
>>
>> Could you show URL that is occurred?
>>
>> It seems that the following URLs show correct versions:
>>
>> * https://arrow.apache.org/docs/6.0/index.html
>> * https://arrow.apache.org/docs/index.html
>> * https://arrow.apache.org/docs/dev/index.html
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022 17:35:47 
>> -0500,
>>   Ian Joiner  wrote:
>>
>> > Really thanks!
>> >
>> > I do need to mention that versioning in the docs is still not displayed
>> > properly (7.0 is labelled “6.0 (stable)” while 8.0 is labelled “7.0 
>> > (dev)”).
>> >
>> > Ian
>> >
>> > On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
>> >
>> >> Homebrew, MSYS2 and RubyGems are done:
>> >>
>> >> 1. [done] make the released version as "RELEASED" on JIRA
>> >> 2. [done] start the new version on JIRA
>> >> 4. [done] upload source
>> >> 5. [done] upload binaries
>> >> 6. [done] update website
>> >> 7. [done] update Homebrew packages
>> >> 8. [done] update MSYS2 package
>> >> 9. [done] upload RubyGems
>> >> 10. [done] upload JS packages
>> >> 11. [done] upload C# packages
>> >> 12. [todo:unassigned] update conda recipes
>> >> 13. [done] upload wheels/sdist to pypi
>> >> 14. [todo:kszucs] publish Maven artifacts
>> >> 15. [todo:nealrichardson] update R packages
>> >> 16. [todo:ianmcook] update vcpkg port
>> >> 17. [done] bump versions
>> >> 18. [done] update tags for Go modules
>> >> 19. [done] update docs
>> >>
>> >> In 
>> >>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
>> >> 23:31:45 +0100,
>> >>   Krisztián Szűcs  wrote:
>> >>
>> >> > I managed to upload the wheels to pypi. Here is the current status
>> >> > with the updated assignments:
>> >> >
>> >> > 1. [done] make the released version as "RELEASED" on JIRA
>> >> > 2. [done] start the new version on JIRA
>> >> > 4. [done] upload source
>> >> > 5. [done] upload binaries
>> >> > 6. [in-pr] update website
>> >> > 7. [todo:kou] update Homebrew packages
>> >> > 8. [todo:kou] update MSYS2 package
>> >> > 9. [todo:kou] upload RubyGems
>> >> > 10. [done] upload JS packages
>> >> > 11. [done] upload C# packages
>> >> > 12. [todo:unassigned] update conda recipes
>> >> > 13. [done] upload wheels/sdist to pypi
>> >> > 14. [todo:kszucs] publish Maven artifacts
>> >> > 15. [todo:nealrichardson] update R packages
>> >> > 16. [todo:ianmcook] update vcpkg port
>> >> > 17. [done] bump versions
>> >> > 18. [done] update tags for Go modules
>> >> > 19. [in-pr] update docs
>> >> >
>> >> > On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
>> >> >  wrote:
>> >> >>
>> >> >> I will handle the R package submission to CRAN.
>> >> > Thanks Neal!
>> >> >>
>> >> >> Neal
>> >> >>
>> >> >> On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei  
>> >> >> wrote:
>> >> >>
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'll update/upload Homebrew, MSYS2 and RubyGems.
>> >> > Thanks Kou!
>> >> >> >
>> >> >> > 1. [done] make the released version as "RELEASED" on JIRA
>> >> >> > 2. [done] start the new version on JIRA
>> >> >> > 4. [done] upload source
>> >> >> > 5. [done] upload binaries
>> >> >> > 6. [in-pr] update website
>> >> >> > 7. [todo:kou] update Homebrew packages
>> >> >> > 8. [todo:kou] update MSYS2 package
>> >> >> > 9. [todo:kou] upload RubyGems
>> >> >> > 10. [done] upload JS packages
>> >> >> > 11. [done] upload C# packages
>> >> >> > 12. [todo:unassigned] update conda recipes
>> >> >> > 13. [blocked:kszucs] upload wheels/sdist to pypi
>> >> >> > "Project size too large. Limit for project 'pyarrow' total size is 10
>> >> >> > GB. See https://pypi.org/help/#project-size-limit";
>> >> >> > Filed an issue to increase the project limit, waiting for the
>> >> >> > response: https://github.com/pypa/pypi-support/issues/1653
>> >> >> > 14. [todo] publish Maven artifacts
>> >> >> > Micah did you have a chance to verify the staged maven artifacts? I'd
>> >> >> > wait for your response before pushing the release button.
>> >> >> > 15. [todo:unassigned] update R packages
>> >> >> > 16. [todo:ianmcook] update vcpkg port
>> >> >> > 17. [done] bump versions
>> >> >> > 18. [done] update tags for Go modules
>> >> >> > 19. [in-pr] update docs
>> >> >> >
>> >> >> >
>> >> >> > Thanks,
>> >> >> > --
>> >> >> > kou
>> >> >> >
>> >> >> > In > >> gmail.com>
>> >> >> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
>> >> >> > 21:38:17 +0100,
>> >> >> >   Krisztián Szűcs  wrote:
>> >> >> >
>> >> >> > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook 
>> >> wrote:
>> >> >> > >>
>> >> >> > >> Thanks Krisztián!
>> >> >> > >>
>> >> >> >

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Chao Sun
The Java Maven artifacts look good to me. I was able to upgrade Apache
Spark to use the staging artifacts and all the tests successfully passed.
It's also great to see the `arrow-c-data` module finally get published in
this release!

Chao

On Tue, Feb 8, 2022 at 5:51 PM Ian Joiner  wrote:

> The URLs are good. It is the values in the drop-down list that still
> need to be fixed. Please see the attached photo.
>
> Ian
>
>
>
> On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
> >
> > Could you show URL that is occurred?
> >
> > It seems that the following URLs show correct versions:
> >
> > * https://arrow.apache.org/docs/6.0/index.html
> > * https://arrow.apache.org/docs/index.html
> > * https://arrow.apache.org/docs/dev/index.html
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In 
> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022
> 17:35:47 -0500,
> >   Ian Joiner  wrote:
> >
> > > Really thanks!
> > >
> > > I do need to mention that versioning in the docs is still not displayed
> > > properly (7.0 is labelled “6.0 (stable)” while 8.0 is labelled “7.0
> (dev)”).
> > >
> > > Ian
> > >
> > > On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
> > >
> > >> Homebrew, MSYS2 and RubyGems are done:
> > >>
> > >> 1. [done] make the released version as "RELEASED" on JIRA
> > >> 2. [done] start the new version on JIRA
> > >> 4. [done] upload source
> > >> 5. [done] upload binaries
> > >> 6. [done] update website
> > >> 7. [done] update Homebrew packages
> > >> 8. [done] update MSYS2 package
> > >> 9. [done] upload RubyGems
> > >> 10. [done] upload JS packages
> > >> 11. [done] upload C# packages
> > >> 12. [todo:unassigned] update conda recipes
> > >> 13. [done] upload wheels/sdist to pypi
> > >> 14. [todo:kszucs] publish Maven artifacts
> > >> 15. [todo:nealrichardson] update R packages
> > >> 16. [todo:ianmcook] update vcpkg port
> > >> 17. [done] bump versions
> > >> 18. [done] update tags for Go modules
> > >> 19. [done] update docs
> > >>
> > >> In <
> cahm19a4se5wf8_fj3mwk-pksjdthe+d_sry3xdnbtjm+jjn...@mail.gmail.com>
> > >>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> > >> 23:31:45 +0100,
> > >>   Krisztián Szűcs  wrote:
> > >>
> > >> > I managed to upload the wheels to pypi. Here is the current status
> > >> > with the updated assignments:
> > >> >
> > >> > 1. [done] make the released version as "RELEASED" on JIRA
> > >> > 2. [done] start the new version on JIRA
> > >> > 4. [done] upload source
> > >> > 5. [done] upload binaries
> > >> > 6. [in-pr] update website
> > >> > 7. [todo:kou] update Homebrew packages
> > >> > 8. [todo:kou] update MSYS2 package
> > >> > 9. [todo:kou] upload RubyGems
> > >> > 10. [done] upload JS packages
> > >> > 11. [done] upload C# packages
> > >> > 12. [todo:unassigned] update conda recipes
> > >> > 13. [done] upload wheels/sdist to pypi
> > >> > 14. [todo:kszucs] publish Maven artifacts
> > >> > 15. [todo:nealrichardson] update R packages
> > >> > 16. [todo:ianmcook] update vcpkg port
> > >> > 17. [done] bump versions
> > >> > 18. [done] update tags for Go modules
> > >> > 19. [in-pr] update docs
> > >> >
> > >> > On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
> > >> >  wrote:
> > >> >>
> > >> >> I will handle the R package submission to CRAN.
> > >> > Thanks Neal!
> > >> >>
> > >> >> Neal
> > >> >>
> > >> >> On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei 
> wrote:
> > >> >>
> > >> >> > Hi,
> > >> >> >
> > >> >> > I'll update/upload Homebrew, MSYS2 and RubyGems.
> > >> > Thanks Kou!
> > >> >> >
> > >> >> > 1. [done] make the released version as "RELEASED" on JIRA
> > >> >> > 2. [done] start the new version on JIRA
> > >> >> > 4. [done] upload source
> > >> >> > 5. [done] upload binaries
> > >> >> > 6. [in-pr] update website
> > >> >> > 7. [todo:kou] update Homebrew packages
> > >> >> > 8. [todo:kou] update MSYS2 package
> > >> >> > 9. [todo:kou] upload RubyGems
> > >> >> > 10. [done] upload JS packages
> > >> >> > 11. [done] upload C# packages
> > >> >> > 12. [todo:unassigned] update conda recipes
> > >> >> > 13. [blocked:kszucs] upload wheels/sdist to pypi
> > >> >> > "Project size too large. Limit for project 'pyarrow' total size
> is 10
> > >> >> > GB. See https://pypi.org/help/#project-size-limit";
> > >> >> > Filed an issue to increase the project limit, waiting for the
> > >> >> > response: https://github.com/pypa/pypi-support/issues/1653
> > >> >> > 14. [todo] publish Maven artifacts
> > >> >> > Micah did you have a chance to verify the staged maven
> artifacts? I'd
> > >> >> > wait for your response before pushing the release button.
> > >> >> > 15. [todo:unassigned] update R packages
> > >> >> > 16. [todo:ianmcook] update vcpkg port
> > >> >> > 17. [done] bump versions
> > >> >> > 18. [done] update tags for Go modules
> > >> >> > 19. [in-pr] update docs
> > >> >> >
> > >> >> >
> > >> >> > Thanks,
> > >> >> > --
> > >> >> > kou
> > >> >> >
> > >> >> > In  > >> gmail.com>
> > >> >> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10"

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Dewey Dunnington
I'll share a bit more about geospatial extension types that Joris
mentioned. I'm new to the Arrow community and didn't know that there were
any restrictions on metadata values (the C Data interface docs don't seem
to indicate that there are restrictions, or if it's there I missed it!), so
I used the same encoding for the ARROW:extension:metadata that's used to
encode the parent metadata (int32 num_items, int32 name_len,
char[name_len], int32 value_len, char[value_len],  etc..). I did this
because I needed two key/value pairs (geodesic = true/false; crs =
some_coordinate_reference_system) and already had the code to iterate over
the parent metadata. I'm not saying that it's any pinnacle of elegant code
(still very much a prototype), but it only takes about 30 lines of C to do
this [1].

I prototyped the extension types for geospatial using the C data interface,
the idea being that a header-only helper file (geoarrow.hpp) could be
distributed that would make it an attractive and easy alternative to
well-known binary (WKB) to pass geometries around between libraries (e.g.,
GEOS, GDAL, PROJ). Requiring anybody who uses an extension type to also
vendor a JSON parser [2] seems a bit anti-social and restricts where that
extension type is useful, although I understand that it's not the use case
that many might have.

There are definitely reasonable ways to do what I'm trying to do without
resorting to a binary encoding, and JSON could probably even work...I'm
just trying to share the use-case since it seems like this kind of
environment isn't how folks envisioned extension types being used.

[1]
https://github.com/paleolimbot/geoarrow/blob/master/src/internal/geoarrow.hpp#L511-L542
[2] The commonly vendored JSON parser in geospatial libraries is this one:
https://github.com/nlohmann/json

On Tue, Feb 8, 2022 at 7:58 PM Weston Pace  wrote:

> I think I'm +0 but lean slightly towards JSON.
>
> In favor of binary I would guess that most extension types are going
> to have relatively simple parameterization (to the point that
> protobuf/flatbuffers isn't really needed).  For example, the substrate
> consumer PR has five extension types at the moment (e.g. uuid,
> varchar) and only two of them are parameterized and each of these by a
> single int32_t.  It might be interesting to see what kinds of
> extension types the geospatial community uses.
>
> That being said, this sort of parsing isn't really on any kind of
> critical path.  It's very likely that users (not Arrow developers)
> will be creating and working with extension types.  These users are
> likely going to default to JSON (or pickle or XML).  If our "well
> known types" use JSON then it will be more easily recognizable to
> users what is going on.
>
> -Weston
>
> On Tue, Feb 8, 2022 at 8:14 AM Joris Van den Bossche
>  wrote:
> >
> > On Tue, 8 Feb 2022 at 17:37, Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com>
> > wrote:
> >
> > > ...
> > >
> > > Wrt to binary, imo the challenge is:
> > > * we state that backward incompatible changes to the c data interface
> > > require a new spec [1]
> > >
> >
> > Note that this discussion wouldn't change anything about the C Data
> > Interface spec itself. The discussion is only about the *value* that is
> put
> > in one of the key-value metadata fields. The C Data Interface spec
> defines
> > how the metadata needs to be stored, but doesn't specify anything about
> the
> > actual value of one of the key-value metadata fields.
> >
> >
> > > * we state that the metadata is a binary string [2]
> > > * a valid string is a subset of all valid byte arrays and thus
> removing "
> > > *string*" from the spec is backward incompatible
> > >
> > > If we write invalid utf8 to it and a reader assumes utf8 when reading
> it,
> > > we trigger undefined behavior.
> > >
> > > I was a bit surprised by ARROW-15613 - my understanding is that the c++
> > > implementation is not following the spec, and if we at arrow2 were not
> be
> > > checking for utf8, we would be exposing a vulnerability (at least
> according
> > > to Rust's standards). We just checked it out of luck (it is O(1), so
> why
> > > not).
> > >
> >
> > Yes, the C++ implementation is indeed not following the spec. See the
> > "[DISCUSS] Binary Values in Key value pairs" thread (
> > https://lists.apache.org/thread/blmj0cgv34dgdxqd3ow60ln68khnz0qr). Let's
> > maybe keep this part of the discussion there?
>


Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Ian Joiner
The URLs are good. It is the values in the drop-down list that still
need to be fixed. Please see the attached photo.

Ian



On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
>
> Could you show URL that is occurred?
>
> It seems that the following URLs show correct versions:
>
> * https://arrow.apache.org/docs/6.0/index.html
> * https://arrow.apache.org/docs/index.html
> * https://arrow.apache.org/docs/dev/index.html
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022 17:35:47 
> -0500,
>   Ian Joiner  wrote:
>
> > Really thanks!
> >
> > I do need to mention that versioning in the docs is still not displayed
> > properly (7.0 is labelled “6.0 (stable)” while 8.0 is labelled “7.0 (dev)”).
> >
> > Ian
> >
> > On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
> >
> >> Homebrew, MSYS2 and RubyGems are done:
> >>
> >> 1. [done] make the released version as "RELEASED" on JIRA
> >> 2. [done] start the new version on JIRA
> >> 4. [done] upload source
> >> 5. [done] upload binaries
> >> 6. [done] update website
> >> 7. [done] update Homebrew packages
> >> 8. [done] update MSYS2 package
> >> 9. [done] upload RubyGems
> >> 10. [done] upload JS packages
> >> 11. [done] upload C# packages
> >> 12. [todo:unassigned] update conda recipes
> >> 13. [done] upload wheels/sdist to pypi
> >> 14. [todo:kszucs] publish Maven artifacts
> >> 15. [todo:nealrichardson] update R packages
> >> 16. [todo:ianmcook] update vcpkg port
> >> 17. [done] bump versions
> >> 18. [done] update tags for Go modules
> >> 19. [done] update docs
> >>
> >> In 
> >>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> >> 23:31:45 +0100,
> >>   Krisztián Szűcs  wrote:
> >>
> >> > I managed to upload the wheels to pypi. Here is the current status
> >> > with the updated assignments:
> >> >
> >> > 1. [done] make the released version as "RELEASED" on JIRA
> >> > 2. [done] start the new version on JIRA
> >> > 4. [done] upload source
> >> > 5. [done] upload binaries
> >> > 6. [in-pr] update website
> >> > 7. [todo:kou] update Homebrew packages
> >> > 8. [todo:kou] update MSYS2 package
> >> > 9. [todo:kou] upload RubyGems
> >> > 10. [done] upload JS packages
> >> > 11. [done] upload C# packages
> >> > 12. [todo:unassigned] update conda recipes
> >> > 13. [done] upload wheels/sdist to pypi
> >> > 14. [todo:kszucs] publish Maven artifacts
> >> > 15. [todo:nealrichardson] update R packages
> >> > 16. [todo:ianmcook] update vcpkg port
> >> > 17. [done] bump versions
> >> > 18. [done] update tags for Go modules
> >> > 19. [in-pr] update docs
> >> >
> >> > On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
> >> >  wrote:
> >> >>
> >> >> I will handle the R package submission to CRAN.
> >> > Thanks Neal!
> >> >>
> >> >> Neal
> >> >>
> >> >> On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei  wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I'll update/upload Homebrew, MSYS2 and RubyGems.
> >> > Thanks Kou!
> >> >> >
> >> >> > 1. [done] make the released version as "RELEASED" on JIRA
> >> >> > 2. [done] start the new version on JIRA
> >> >> > 4. [done] upload source
> >> >> > 5. [done] upload binaries
> >> >> > 6. [in-pr] update website
> >> >> > 7. [todo:kou] update Homebrew packages
> >> >> > 8. [todo:kou] update MSYS2 package
> >> >> > 9. [todo:kou] upload RubyGems
> >> >> > 10. [done] upload JS packages
> >> >> > 11. [done] upload C# packages
> >> >> > 12. [todo:unassigned] update conda recipes
> >> >> > 13. [blocked:kszucs] upload wheels/sdist to pypi
> >> >> > "Project size too large. Limit for project 'pyarrow' total size is 10
> >> >> > GB. See https://pypi.org/help/#project-size-limit";
> >> >> > Filed an issue to increase the project limit, waiting for the
> >> >> > response: https://github.com/pypa/pypi-support/issues/1653
> >> >> > 14. [todo] publish Maven artifacts
> >> >> > Micah did you have a chance to verify the staged maven artifacts? I'd
> >> >> > wait for your response before pushing the release button.
> >> >> > 15. [todo:unassigned] update R packages
> >> >> > 16. [todo:ianmcook] update vcpkg port
> >> >> > 17. [done] bump versions
> >> >> > 18. [done] update tags for Go modules
> >> >> > 19. [in-pr] update docs
> >> >> >
> >> >> >
> >> >> > Thanks,
> >> >> > --
> >> >> > kou
> >> >> >
> >> >> > In  >> gmail.com>
> >> >> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> >> >> > 21:38:17 +0100,
> >> >> >   Krisztián Szűcs  wrote:
> >> >> >
> >> >> > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook 
> >> wrote:
> >> >> > >>
> >> >> > >> Thanks Krisztián!
> >> >> > >>
> >> >> > >> I will update the vcpkg port.
> >> >> > > Thanks Ian!
> >> >> > >
> >> >> > > Here is the updated todo list:
> >> >> > >
> >> >> > > 1. [done] make the released version as "RELEASED" on JIRA
> >> >> > > 2. [done] start the new version on JIRA
> >> >> > > 4. [done] upload source
> >> >> > > 5. [done] upload binaries
> >> >> > > 6. [in-pr] update website
> >> >> > > 7. [todo:unassigned] update Homeb

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Weston Pace
I think I'm +0 but lean slightly towards JSON.

In favor of binary I would guess that most extension types are going
to have relatively simple parameterization (to the point that
protobuf/flatbuffers isn't really needed).  For example, the substrate
consumer PR has five extension types at the moment (e.g. uuid,
varchar) and only two of them are parameterized and each of these by a
single int32_t.  It might be interesting to see what kinds of
extension types the geospatial community uses.

That being said, this sort of parsing isn't really on any kind of
critical path.  It's very likely that users (not Arrow developers)
will be creating and working with extension types.  These users are
likely going to default to JSON (or pickle or XML).  If our "well
known types" use JSON then it will be more easily recognizable to
users what is going on.

-Weston

On Tue, Feb 8, 2022 at 8:14 AM Joris Van den Bossche
 wrote:
>
> On Tue, 8 Feb 2022 at 17:37, Jorge Cardoso Leitão 
> wrote:
>
> > ...
> >
> > Wrt to binary, imo the challenge is:
> > * we state that backward incompatible changes to the c data interface
> > require a new spec [1]
> >
>
> Note that this discussion wouldn't change anything about the C Data
> Interface spec itself. The discussion is only about the *value* that is put
> in one of the key-value metadata fields. The C Data Interface spec defines
> how the metadata needs to be stored, but doesn't specify anything about the
> actual value of one of the key-value metadata fields.
>
>
> > * we state that the metadata is a binary string [2]
> > * a valid string is a subset of all valid byte arrays and thus removing "
> > *string*" from the spec is backward incompatible
> >
> > If we write invalid utf8 to it and a reader assumes utf8 when reading it,
> > we trigger undefined behavior.
> >
> > I was a bit surprised by ARROW-15613 - my understanding is that the c++
> > implementation is not following the spec, and if we at arrow2 were not be
> > checking for utf8, we would be exposing a vulnerability (at least according
> > to Rust's standards). We just checked it out of luck (it is O(1), so why
> > not).
> >
>
> Yes, the C++ implementation is indeed not following the spec. See the
> "[DISCUSS] Binary Values in Key value pairs" thread (
> https://lists.apache.org/thread/blmj0cgv34dgdxqd3ow60ln68khnz0qr). Let's
> maybe keep this part of the discussion there?


Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Sutou Kouhei
Could you show URL that is occurred?

It seems that the following URLs show correct versions:

* https://arrow.apache.org/docs/6.0/index.html
* https://arrow.apache.org/docs/index.html
* https://arrow.apache.org/docs/dev/index.html


Thanks,
-- 
kou

In 
  "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022 17:35:47 
-0500,
  Ian Joiner  wrote:

> Really thanks!
> 
> I do need to mention that versioning in the docs is still not displayed
> properly (7.0 is labelled “6.0 (stable)” while 8.0 is labelled “7.0 (dev)”).
> 
> Ian
> 
> On Tuesday, February 8, 2022, Sutou Kouhei  wrote:
> 
>> Homebrew, MSYS2 and RubyGems are done:
>>
>> 1. [done] make the released version as "RELEASED" on JIRA
>> 2. [done] start the new version on JIRA
>> 4. [done] upload source
>> 5. [done] upload binaries
>> 6. [done] update website
>> 7. [done] update Homebrew packages
>> 8. [done] update MSYS2 package
>> 9. [done] upload RubyGems
>> 10. [done] upload JS packages
>> 11. [done] upload C# packages
>> 12. [todo:unassigned] update conda recipes
>> 13. [done] upload wheels/sdist to pypi
>> 14. [todo:kszucs] publish Maven artifacts
>> 15. [todo:nealrichardson] update R packages
>> 16. [todo:ianmcook] update vcpkg port
>> 17. [done] bump versions
>> 18. [done] update tags for Go modules
>> 19. [done] update docs
>>
>> In 
>>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
>> 23:31:45 +0100,
>>   Krisztián Szűcs  wrote:
>>
>> > I managed to upload the wheels to pypi. Here is the current status
>> > with the updated assignments:
>> >
>> > 1. [done] make the released version as "RELEASED" on JIRA
>> > 2. [done] start the new version on JIRA
>> > 4. [done] upload source
>> > 5. [done] upload binaries
>> > 6. [in-pr] update website
>> > 7. [todo:kou] update Homebrew packages
>> > 8. [todo:kou] update MSYS2 package
>> > 9. [todo:kou] upload RubyGems
>> > 10. [done] upload JS packages
>> > 11. [done] upload C# packages
>> > 12. [todo:unassigned] update conda recipes
>> > 13. [done] upload wheels/sdist to pypi
>> > 14. [todo:kszucs] publish Maven artifacts
>> > 15. [todo:nealrichardson] update R packages
>> > 16. [todo:ianmcook] update vcpkg port
>> > 17. [done] bump versions
>> > 18. [done] update tags for Go modules
>> > 19. [in-pr] update docs
>> >
>> > On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
>> >  wrote:
>> >>
>> >> I will handle the R package submission to CRAN.
>> > Thanks Neal!
>> >>
>> >> Neal
>> >>
>> >> On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei  wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I'll update/upload Homebrew, MSYS2 and RubyGems.
>> > Thanks Kou!
>> >> >
>> >> > 1. [done] make the released version as "RELEASED" on JIRA
>> >> > 2. [done] start the new version on JIRA
>> >> > 4. [done] upload source
>> >> > 5. [done] upload binaries
>> >> > 6. [in-pr] update website
>> >> > 7. [todo:kou] update Homebrew packages
>> >> > 8. [todo:kou] update MSYS2 package
>> >> > 9. [todo:kou] upload RubyGems
>> >> > 10. [done] upload JS packages
>> >> > 11. [done] upload C# packages
>> >> > 12. [todo:unassigned] update conda recipes
>> >> > 13. [blocked:kszucs] upload wheels/sdist to pypi
>> >> > "Project size too large. Limit for project 'pyarrow' total size is 10
>> >> > GB. See https://pypi.org/help/#project-size-limit";
>> >> > Filed an issue to increase the project limit, waiting for the
>> >> > response: https://github.com/pypa/pypi-support/issues/1653
>> >> > 14. [todo] publish Maven artifacts
>> >> > Micah did you have a chance to verify the staged maven artifacts? I'd
>> >> > wait for your response before pushing the release button.
>> >> > 15. [todo:unassigned] update R packages
>> >> > 16. [todo:ianmcook] update vcpkg port
>> >> > 17. [done] bump versions
>> >> > 18. [done] update tags for Go modules
>> >> > 19. [in-pr] update docs
>> >> >
>> >> >
>> >> > Thanks,
>> >> > --
>> >> > kou
>> >> >
>> >> > In > gmail.com>
>> >> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
>> >> > 21:38:17 +0100,
>> >> >   Krisztián Szűcs  wrote:
>> >> >
>> >> > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook 
>> wrote:
>> >> > >>
>> >> > >> Thanks Krisztián!
>> >> > >>
>> >> > >> I will update the vcpkg port.
>> >> > > Thanks Ian!
>> >> > >
>> >> > > Here is the updated todo list:
>> >> > >
>> >> > > 1. [done] make the released version as "RELEASED" on JIRA
>> >> > > 2. [done] start the new version on JIRA
>> >> > > 4. [done] upload source
>> >> > > 5. [done] upload binaries
>> >> > > 6. [in-pr] update website
>> >> > > 7. [todo:unassigned] update Homebrew packages
>> >> > > 8. [todo:unassigned] update MSYS2 package
>> >> > > 9. [todo:unassigned] upload RubyGems
>> >> > > 10. [done] upload JS packages
>> >> > > 11. [done] upload C# packages
>> >> > > 12. [todo:unassigned] update conda recipes
>> >> > > 13. [blocked:kszucs] upload wheels/sdist to pypi
>> >> > > "Project size too large. Limit for project 'pyarrow' total size is
>> 10
>> >> > > GB. See https://pypi.org/help/#project-size-limit";
>> 

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Ian Joiner
Really thanks!

I do need to mention that versioning in the docs is still not displayed
properly (7.0 is labelled “6.0 (stable)” while 8.0 is labelled “7.0 (dev)”).

Ian

On Tuesday, February 8, 2022, Sutou Kouhei  wrote:

> Homebrew, MSYS2 and RubyGems are done:
>
> 1. [done] make the released version as "RELEASED" on JIRA
> 2. [done] start the new version on JIRA
> 4. [done] upload source
> 5. [done] upload binaries
> 6. [done] update website
> 7. [done] update Homebrew packages
> 8. [done] update MSYS2 package
> 9. [done] upload RubyGems
> 10. [done] upload JS packages
> 11. [done] upload C# packages
> 12. [todo:unassigned] update conda recipes
> 13. [done] upload wheels/sdist to pypi
> 14. [todo:kszucs] publish Maven artifacts
> 15. [todo:nealrichardson] update R packages
> 16. [todo:ianmcook] update vcpkg port
> 17. [done] bump versions
> 18. [done] update tags for Go modules
> 19. [done] update docs
>
> In 
>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> 23:31:45 +0100,
>   Krisztián Szűcs  wrote:
>
> > I managed to upload the wheels to pypi. Here is the current status
> > with the updated assignments:
> >
> > 1. [done] make the released version as "RELEASED" on JIRA
> > 2. [done] start the new version on JIRA
> > 4. [done] upload source
> > 5. [done] upload binaries
> > 6. [in-pr] update website
> > 7. [todo:kou] update Homebrew packages
> > 8. [todo:kou] update MSYS2 package
> > 9. [todo:kou] upload RubyGems
> > 10. [done] upload JS packages
> > 11. [done] upload C# packages
> > 12. [todo:unassigned] update conda recipes
> > 13. [done] upload wheels/sdist to pypi
> > 14. [todo:kszucs] publish Maven artifacts
> > 15. [todo:nealrichardson] update R packages
> > 16. [todo:ianmcook] update vcpkg port
> > 17. [done] bump versions
> > 18. [done] update tags for Go modules
> > 19. [in-pr] update docs
> >
> > On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
> >  wrote:
> >>
> >> I will handle the R package submission to CRAN.
> > Thanks Neal!
> >>
> >> Neal
> >>
> >> On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei  wrote:
> >>
> >> > Hi,
> >> >
> >> > I'll update/upload Homebrew, MSYS2 and RubyGems.
> > Thanks Kou!
> >> >
> >> > 1. [done] make the released version as "RELEASED" on JIRA
> >> > 2. [done] start the new version on JIRA
> >> > 4. [done] upload source
> >> > 5. [done] upload binaries
> >> > 6. [in-pr] update website
> >> > 7. [todo:kou] update Homebrew packages
> >> > 8. [todo:kou] update MSYS2 package
> >> > 9. [todo:kou] upload RubyGems
> >> > 10. [done] upload JS packages
> >> > 11. [done] upload C# packages
> >> > 12. [todo:unassigned] update conda recipes
> >> > 13. [blocked:kszucs] upload wheels/sdist to pypi
> >> > "Project size too large. Limit for project 'pyarrow' total size is 10
> >> > GB. See https://pypi.org/help/#project-size-limit";
> >> > Filed an issue to increase the project limit, waiting for the
> >> > response: https://github.com/pypa/pypi-support/issues/1653
> >> > 14. [todo] publish Maven artifacts
> >> > Micah did you have a chance to verify the staged maven artifacts? I'd
> >> > wait for your response before pushing the release button.
> >> > 15. [todo:unassigned] update R packages
> >> > 16. [todo:ianmcook] update vcpkg port
> >> > 17. [done] bump versions
> >> > 18. [done] update tags for Go modules
> >> > 19. [in-pr] update docs
> >> >
> >> >
> >> > Thanks,
> >> > --
> >> > kou
> >> >
> >> > In  gmail.com>
> >> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> >> > 21:38:17 +0100,
> >> >   Krisztián Szűcs  wrote:
> >> >
> >> > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook 
> wrote:
> >> > >>
> >> > >> Thanks Krisztián!
> >> > >>
> >> > >> I will update the vcpkg port.
> >> > > Thanks Ian!
> >> > >
> >> > > Here is the updated todo list:
> >> > >
> >> > > 1. [done] make the released version as "RELEASED" on JIRA
> >> > > 2. [done] start the new version on JIRA
> >> > > 4. [done] upload source
> >> > > 5. [done] upload binaries
> >> > > 6. [in-pr] update website
> >> > > 7. [todo:unassigned] update Homebrew packages
> >> > > 8. [todo:unassigned] update MSYS2 package
> >> > > 9. [todo:unassigned] upload RubyGems
> >> > > 10. [done] upload JS packages
> >> > > 11. [done] upload C# packages
> >> > > 12. [todo:unassigned] update conda recipes
> >> > > 13. [blocked:kszucs] upload wheels/sdist to pypi
> >> > > "Project size too large. Limit for project 'pyarrow' total size is
> 10
> >> > > GB. See https://pypi.org/help/#project-size-limit";
> >> > > Filed an issue to increase the project limit, waiting for the
> >> > > response: https://github.com/pypa/pypi-support/issues/1653
> >> > > 14. [todo] publish Maven artifacts
> >> > > Micah did you have a chance to verify the staged maven artifacts?
> I'd
> >> > > wait for your response before pushing the release button.
> >> > > 15. [todo:unassigned] update R packages
> >> > > 16. [todo:ianmcook] update vcpkg port
> >> > > 17. [done] bump versions
> >> > > 18. [done] update tags for Go modules

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Chao Sun
I just opened a PR in Spark to test the Arrow upgrade:
https://github.com/apache/spark/pull/35449. Let's see how it goes.

> Technical question: can we rollback releases on maven?

Hmm you mean remove already published artifacts? I don't think there is a
way: once artifacts are published, they are immutable.

Chao


On Tue, Feb 8, 2022 at 1:17 PM Krisztián Szűcs 
wrote:

> On Tue, Feb 8, 2022 at 10:01 PM Chao Sun  wrote:
> >
> > Thanks, let me verify the artifacts and come back to this thread.
> That would be great, thanks Chao!
> >
> > On Tue, Feb 8, 2022 at 1:00 PM Sutou Kouhei  wrote:
> >
> > > Hi,
> > >
> > > > Just curious what's the progress of publishing Maven artifacts.
> Thanks!
> > >
> > > It seems that Krisztián is waiting for someone's
> > > verification of the staged artifacts:
> > >
> > > https://lists.apache.org/thread/r7cm1zz2qz9r823p5tbxdv0gtr3o6s4r
> > >
> > > > 14. [todo] publish Maven artifacts
> > > > Micah did you have a chance to verify the staged maven artifacts? I'd
> > > > wait for your response before pushing the release button.
> > >
> > > Chao, could you verify the staged artifacts? Or is it OK to
> > > publish the staged artifacts now?
>
> Technical question: can we rollback releases on maven?
>
> > >
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > > In  >
> > >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022
> > > 11:12:29 -0800,
> > >   Chao Sun  wrote:
> > >
> > > > Hi Krisztián,
> > > >
> > > > Just curious what's the progress of publishing Maven artifacts.
> Thanks!
> > > >
> > > > Best,
> > > > Chao
> > > >
> > > > On Thu, Feb 3, 2022 at 2:32 PM Krisztián Szűcs <
> > > szucs.kriszt...@gmail.com>
> > > > wrote:
> > > >
> > > >> I managed to upload the wheels to pypi. Here is the current status
> > > >> with the updated assignments:
> > > >>
> > > >> 1. [done] make the released version as "RELEASED" on JIRA
> > > >> 2. [done] start the new version on JIRA
> > > >> 4. [done] upload source
> > > >> 5. [done] upload binaries
> > > >> 6. [in-pr] update website
> > > >> 7. [todo:kou] update Homebrew packages
> > > >> 8. [todo:kou] update MSYS2 package
> > > >> 9. [todo:kou] upload RubyGems
> > > >> 10. [done] upload JS packages
> > > >> 11. [done] upload C# packages
> > > >> 12. [todo:unassigned] update conda recipes
> > > >> 13. [done] upload wheels/sdist to pypi
> > > >> 14. [todo:kszucs] publish Maven artifacts
> > > >> 15. [todo:nealrichardson] update R packages
> > > >> 16. [todo:ianmcook] update vcpkg port
> > > >> 17. [done] bump versions
> > > >> 18. [done] update tags for Go modules
> > > >> 19. [in-pr] update docs
> > > >>
> > > >> On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
> > > >>  wrote:
> > > >> >
> > > >> > I will handle the R package submission to CRAN.
> > > >> Thanks Neal!
> > > >> >
> > > >> > Neal
> > > >> >
> > > >> > On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei 
> > > wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > I'll update/upload Homebrew, MSYS2 and RubyGems.
> > > >> Thanks Kou!
> > > >> > >
> > > >> > > 1. [done] make the released version as "RELEASED" on JIRA
> > > >> > > 2. [done] start the new version on JIRA
> > > >> > > 4. [done] upload source
> > > >> > > 5. [done] upload binaries
> > > >> > > 6. [in-pr] update website
> > > >> > > 7. [todo:kou] update Homebrew packages
> > > >> > > 8. [todo:kou] update MSYS2 package
> > > >> > > 9. [todo:kou] upload RubyGems
> > > >> > > 10. [done] upload JS packages
> > > >> > > 11. [done] upload C# packages
> > > >> > > 12. [todo:unassigned] update conda recipes
> > > >> > > 13. [blocked:kszucs] upload wheels/sdist to pypi
> > > >> > > "Project size too large. Limit for project 'pyarrow' total size
> is
> > > 10
> > > >> > > GB. See https://pypi.org/help/#project-size-limit";
> > > >> > > Filed an issue to increase the project limit, waiting for the
> > > >> > > response: https://github.com/pypa/pypi-support/issues/1653
> > > >> > > 14. [todo] publish Maven artifacts
> > > >> > > Micah did you have a chance to verify the staged maven
> artifacts?
> > > I'd
> > > >> > > wait for your response before pushing the release button.
> > > >> > > 15. [todo:unassigned] update R packages
> > > >> > > 16. [todo:ianmcook] update vcpkg port
> > > >> > > 17. [done] bump versions
> > > >> > > 18. [done] update tags for Go modules
> > > >> > > 19. [in-pr] update docs
> > > >> > >
> > > >> > >
> > > >> > > Thanks,
> > > >> > > --
> > > >> > > kou
> > > >> > >
> > > >> > > In  > > pcb70w-nxkpl_y_w5bpl6...@mail.gmail.com
> > > >> >
> > > >> > >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb
> 2022
> > > >> > > 21:38:17 +0100,
> > > >> > >   Krisztián Szűcs  wrote:
> > > >> > >
> > > >> > > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook <
> i...@ursacomputing.com>
> > > >> wrote:
> > > >> > > >>
> > > >> > > >> Thanks Krisztián!
> > > >> > > >>
> > > >> > > >> I will update the vcpkg port.
> > > >> > > > Thanks Ian!
> > > >> > > >
> > > >> > > > Here is the updated todo list:
> >

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Krisztián Szűcs
On Tue, Feb 8, 2022 at 10:01 PM Chao Sun  wrote:
>
> Thanks, let me verify the artifacts and come back to this thread.
That would be great, thanks Chao!
>
> On Tue, Feb 8, 2022 at 1:00 PM Sutou Kouhei  wrote:
>
> > Hi,
> >
> > > Just curious what's the progress of publishing Maven artifacts. Thanks!
> >
> > It seems that Krisztián is waiting for someone's
> > verification of the staged artifacts:
> >
> > https://lists.apache.org/thread/r7cm1zz2qz9r823p5tbxdv0gtr3o6s4r
> >
> > > 14. [todo] publish Maven artifacts
> > > Micah did you have a chance to verify the staged maven artifacts? I'd
> > > wait for your response before pushing the release button.
> >
> > Chao, could you verify the staged artifacts? Or is it OK to
> > publish the staged artifacts now?

Technical question: can we rollback releases on maven?

> >
> >
> > Thanks,
> > --
> > kou
> >
> > In 
> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022
> > 11:12:29 -0800,
> >   Chao Sun  wrote:
> >
> > > Hi Krisztián,
> > >
> > > Just curious what's the progress of publishing Maven artifacts. Thanks!
> > >
> > > Best,
> > > Chao
> > >
> > > On Thu, Feb 3, 2022 at 2:32 PM Krisztián Szűcs <
> > szucs.kriszt...@gmail.com>
> > > wrote:
> > >
> > >> I managed to upload the wheels to pypi. Here is the current status
> > >> with the updated assignments:
> > >>
> > >> 1. [done] make the released version as "RELEASED" on JIRA
> > >> 2. [done] start the new version on JIRA
> > >> 4. [done] upload source
> > >> 5. [done] upload binaries
> > >> 6. [in-pr] update website
> > >> 7. [todo:kou] update Homebrew packages
> > >> 8. [todo:kou] update MSYS2 package
> > >> 9. [todo:kou] upload RubyGems
> > >> 10. [done] upload JS packages
> > >> 11. [done] upload C# packages
> > >> 12. [todo:unassigned] update conda recipes
> > >> 13. [done] upload wheels/sdist to pypi
> > >> 14. [todo:kszucs] publish Maven artifacts
> > >> 15. [todo:nealrichardson] update R packages
> > >> 16. [todo:ianmcook] update vcpkg port
> > >> 17. [done] bump versions
> > >> 18. [done] update tags for Go modules
> > >> 19. [in-pr] update docs
> > >>
> > >> On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
> > >>  wrote:
> > >> >
> > >> > I will handle the R package submission to CRAN.
> > >> Thanks Neal!
> > >> >
> > >> > Neal
> > >> >
> > >> > On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei 
> > wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > I'll update/upload Homebrew, MSYS2 and RubyGems.
> > >> Thanks Kou!
> > >> > >
> > >> > > 1. [done] make the released version as "RELEASED" on JIRA
> > >> > > 2. [done] start the new version on JIRA
> > >> > > 4. [done] upload source
> > >> > > 5. [done] upload binaries
> > >> > > 6. [in-pr] update website
> > >> > > 7. [todo:kou] update Homebrew packages
> > >> > > 8. [todo:kou] update MSYS2 package
> > >> > > 9. [todo:kou] upload RubyGems
> > >> > > 10. [done] upload JS packages
> > >> > > 11. [done] upload C# packages
> > >> > > 12. [todo:unassigned] update conda recipes
> > >> > > 13. [blocked:kszucs] upload wheels/sdist to pypi
> > >> > > "Project size too large. Limit for project 'pyarrow' total size is
> > 10
> > >> > > GB. See https://pypi.org/help/#project-size-limit";
> > >> > > Filed an issue to increase the project limit, waiting for the
> > >> > > response: https://github.com/pypa/pypi-support/issues/1653
> > >> > > 14. [todo] publish Maven artifacts
> > >> > > Micah did you have a chance to verify the staged maven artifacts?
> > I'd
> > >> > > wait for your response before pushing the release button.
> > >> > > 15. [todo:unassigned] update R packages
> > >> > > 16. [todo:ianmcook] update vcpkg port
> > >> > > 17. [done] bump versions
> > >> > > 18. [done] update tags for Go modules
> > >> > > 19. [in-pr] update docs
> > >> > >
> > >> > >
> > >> > > Thanks,
> > >> > > --
> > >> > > kou
> > >> > >
> > >> > > In  > pcb70w-nxkpl_y_w5bpl6...@mail.gmail.com
> > >> >
> > >> > >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> > >> > > 21:38:17 +0100,
> > >> > >   Krisztián Szűcs  wrote:
> > >> > >
> > >> > > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook 
> > >> wrote:
> > >> > > >>
> > >> > > >> Thanks Krisztián!
> > >> > > >>
> > >> > > >> I will update the vcpkg port.
> > >> > > > Thanks Ian!
> > >> > > >
> > >> > > > Here is the updated todo list:
> > >> > > >
> > >> > > > 1. [done] make the released version as "RELEASED" on JIRA
> > >> > > > 2. [done] start the new version on JIRA
> > >> > > > 4. [done] upload source
> > >> > > > 5. [done] upload binaries
> > >> > > > 6. [in-pr] update website
> > >> > > > 7. [todo:unassigned] update Homebrew packages
> > >> > > > 8. [todo:unassigned] update MSYS2 package
> > >> > > > 9. [todo:unassigned] upload RubyGems
> > >> > > > 10. [done] upload JS packages
> > >> > > > 11. [done] upload C# packages
> > >> > > > 12. [todo:unassigned] update conda recipes
> > >> > > > 13. [blocked:kszucs] upload wheels/sdist to pypi
> > >> > > > "Project size too large. Limit for project 'pya

Re: [DISCUSS] Further proposals for Flight SQL

2022-02-08 Thread David Li
Thanks for the clarification, Wes!

Kyle - the grant process is outlined here [1] and I can help with this on the 
Arrow PMC side. From your side, you will need to file a grant (either the CCLA 
form or the grant here [2]) and make sure everyone has a CLA on file, then once 
the Apache side has acknowledged everything we can merge.

We could hold the Arrow-side vote now and start the process, aiming to merge 
into a development branch, if that is more convenient for the developers, or we 
can continue iterating on the PR for now.

There's an example of the process for Arrow/Julia [3].

[1]: https://incubator.apache.org/ip-clearance/ip-clearance-template.html
[2]: https://www.apache.org/licenses/contributor-agreements.html
[3]: https://incubator.apache.org/ip-clearance/arrow-julia-library2.html

-David

On Fri, Feb 4, 2022, at 18:10, Wes McKinney wrote:
> hi David,
>
> Yes, I think we need to do an IP clearance for this work. Please let
> me know if I can assist, but it would probably be good for other PMC
> members to familiarize themselves with the process since we are likely
> to receive more large pieces of work that need to go through the
> process in the future!
>
> Thanks,
> Wes
>
> On Wed, Jan 26, 2022 at 12:23 PM David Li  wrote:
>>
>> I'd also like to highlight this new PR which contributes a JDBC driver on 
>> top of Flight SQL and Avatica: https://github.com/apache/arrow/pull/12254
>>
>> One thing I'm not sure of is whether this needs to go through IP clearance? 
>> At ~15k LOC and with development going back to June 2021, it is quite 
>> substantial.
>>
>> -David
>>
>> On Mon, Jan 24, 2022, at 14:14, David Li wrote:
>> > Following up here, I think we've resolved all current comments, so if 
>> > anyone else has feedback, it would be much appreciated. Otherwise, I think 
>> > it would be good to put it to a vote soon, and we can use the 8.0 cycle to 
>> > improve the documentation and see if there's any other work needed for the 
>> > JDBC driver.
>> >
>> > -David
>> >
>> > On Fri, Jan 21, 2022, at 09:09, David Li wrote:
>> > > Following up here, James Duong and Jose Almeida have submitted a set of 
>> > > pull requests proposing a set of additions to Flight SQL to expose more 
>> > > information about supported data types and provide metadata about column 
>> > > types in results. For anyone interested in reviewing the proposals, the 
>> > > pull requests can be found here:
>> > > * https://github.com/apache/arrow/pull/11982
>> > > * https://github.com/apache/arrow/pull/11999
>> > > These PRs include implementations for C++ and Java as well as 
>> > > integration tests.
>> > >
>> > > Thanks,
>> > > David
>> > >
>> > > On Fri, Dec 17, 2021, at 17:07, James Duong wrote:
>> > > > Yes, additional metadata would just be using the Field metadata map. 
>> > > > The
>> > > > protocol is the same, we have just pre-defined keys for some fields 
>> > > > that
>> > > > would be used for JDBC column attributes.
>> > > >
>> > > > Our preference would be that we get the currently approved protocol 
>> > > > merged
>> > > > into master first (after completing the integration tests) and then 
>> > > > have a
>> > > > separate vote on the TypeInfo changes. There's significant value in 
>> > > > adding
>> > > > Flight-SQL already and it'd be great to make that available. It's 
>> > > > natural
>> > > > that there will be an ongoing need to add extensions to the protocol 
>> > > > as it
>> > > > gets used in more scenarios. Now that we have a solid foundation, we 
>> > > > can
>> > > > examine further changes on a case-by-case basis.
>> > > >
>> > > > On Thu, Dec 16, 2021 at 2:42 PM David Li  wrote:
>> > > >
>> > > > > Strictly speaking we should have a vote since it is updating the 
>> > > > > format
>> > > > > definition files we already voted on.
>> > > > >
>> > > > > I am curious about what exactly you mean by additional column 
>> > > > > metadata,
>> > > > > but if it's just going to be encoded into the key-value metadata 
>> > > > > then I
>> > > > > don't see a problem there. (As in: it sounds like it fits in the 
>> > > > > Field
>> > > > > class given it's encoded in the Field metadata!)
>> > > > >
>> > > > > -David
>> > > > >
>> > > > > On Thu, Dec 16, 2021, at 16:14, James Duong wrote:
>> > > > > > Hi David,
>> > > > > >
>> > > > > > While working on the JDBC driver on top of Flight SQL and on 
>> > > > > > integration
>> > > > > > tests, we identified a couple of enhancements that were needed.
>> > > > > > 1. The ability to report data type information, as done in this PR:
>> > > > > > https://github.com/apache/arrow/pull/11982. This PR adds another 
>> > > > > > RPC
>> > > > > > request for this information.
>> > > > > > 2. Additional column metadata that's outside of the Schema/Field 
>> > > > > > classes
>> > > > > in
>> > > > > > Arrow (PR pending) when returning Arrow schemas. The planned PR 
>> > > > > > uses the
>> > > > > > Arrow Field's MetadataMap to encode extra metadata rather than 
>> > >

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Chao Sun
Thanks, let me verify the artifacts and come back to this thread.

On Tue, Feb 8, 2022 at 1:00 PM Sutou Kouhei  wrote:

> Hi,
>
> > Just curious what's the progress of publishing Maven artifacts. Thanks!
>
> It seems that Krisztián is waiting for someone's
> verification of the staged artifacts:
>
> https://lists.apache.org/thread/r7cm1zz2qz9r823p5tbxdv0gtr3o6s4r
>
> > 14. [todo] publish Maven artifacts
> > Micah did you have a chance to verify the staged maven artifacts? I'd
> > wait for your response before pushing the release button.
>
> Chao, could you verify the staged artifacts? Or is it OK to
> publish the staged artifacts now?
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022
> 11:12:29 -0800,
>   Chao Sun  wrote:
>
> > Hi Krisztián,
> >
> > Just curious what's the progress of publishing Maven artifacts. Thanks!
> >
> > Best,
> > Chao
> >
> > On Thu, Feb 3, 2022 at 2:32 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > wrote:
> >
> >> I managed to upload the wheels to pypi. Here is the current status
> >> with the updated assignments:
> >>
> >> 1. [done] make the released version as "RELEASED" on JIRA
> >> 2. [done] start the new version on JIRA
> >> 4. [done] upload source
> >> 5. [done] upload binaries
> >> 6. [in-pr] update website
> >> 7. [todo:kou] update Homebrew packages
> >> 8. [todo:kou] update MSYS2 package
> >> 9. [todo:kou] upload RubyGems
> >> 10. [done] upload JS packages
> >> 11. [done] upload C# packages
> >> 12. [todo:unassigned] update conda recipes
> >> 13. [done] upload wheels/sdist to pypi
> >> 14. [todo:kszucs] publish Maven artifacts
> >> 15. [todo:nealrichardson] update R packages
> >> 16. [todo:ianmcook] update vcpkg port
> >> 17. [done] bump versions
> >> 18. [done] update tags for Go modules
> >> 19. [in-pr] update docs
> >>
> >> On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
> >>  wrote:
> >> >
> >> > I will handle the R package submission to CRAN.
> >> Thanks Neal!
> >> >
> >> > Neal
> >> >
> >> > On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei 
> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I'll update/upload Homebrew, MSYS2 and RubyGems.
> >> Thanks Kou!
> >> > >
> >> > > 1. [done] make the released version as "RELEASED" on JIRA
> >> > > 2. [done] start the new version on JIRA
> >> > > 4. [done] upload source
> >> > > 5. [done] upload binaries
> >> > > 6. [in-pr] update website
> >> > > 7. [todo:kou] update Homebrew packages
> >> > > 8. [todo:kou] update MSYS2 package
> >> > > 9. [todo:kou] upload RubyGems
> >> > > 10. [done] upload JS packages
> >> > > 11. [done] upload C# packages
> >> > > 12. [todo:unassigned] update conda recipes
> >> > > 13. [blocked:kszucs] upload wheels/sdist to pypi
> >> > > "Project size too large. Limit for project 'pyarrow' total size is
> 10
> >> > > GB. See https://pypi.org/help/#project-size-limit";
> >> > > Filed an issue to increase the project limit, waiting for the
> >> > > response: https://github.com/pypa/pypi-support/issues/1653
> >> > > 14. [todo] publish Maven artifacts
> >> > > Micah did you have a chance to verify the staged maven artifacts?
> I'd
> >> > > wait for your response before pushing the release button.
> >> > > 15. [todo:unassigned] update R packages
> >> > > 16. [todo:ianmcook] update vcpkg port
> >> > > 17. [done] bump versions
> >> > > 18. [done] update tags for Go modules
> >> > > 19. [in-pr] update docs
> >> > >
> >> > >
> >> > > Thanks,
> >> > > --
> >> > > kou
> >> > >
> >> > > In  pcb70w-nxkpl_y_w5bpl6...@mail.gmail.com
> >> >
> >> > >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> >> > > 21:38:17 +0100,
> >> > >   Krisztián Szűcs  wrote:
> >> > >
> >> > > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook 
> >> wrote:
> >> > > >>
> >> > > >> Thanks Krisztián!
> >> > > >>
> >> > > >> I will update the vcpkg port.
> >> > > > Thanks Ian!
> >> > > >
> >> > > > Here is the updated todo list:
> >> > > >
> >> > > > 1. [done] make the released version as "RELEASED" on JIRA
> >> > > > 2. [done] start the new version on JIRA
> >> > > > 4. [done] upload source
> >> > > > 5. [done] upload binaries
> >> > > > 6. [in-pr] update website
> >> > > > 7. [todo:unassigned] update Homebrew packages
> >> > > > 8. [todo:unassigned] update MSYS2 package
> >> > > > 9. [todo:unassigned] upload RubyGems
> >> > > > 10. [done] upload JS packages
> >> > > > 11. [done] upload C# packages
> >> > > > 12. [todo:unassigned] update conda recipes
> >> > > > 13. [blocked:kszucs] upload wheels/sdist to pypi
> >> > > > "Project size too large. Limit for project 'pyarrow' total size
> is 10
> >> > > > GB. See https://pypi.org/help/#project-size-limit";
> >> > > > Filed an issue to increase the project limit, waiting for the
> >> > > > response: https://github.com/pypa/pypi-support/issues/1653
> >> > > > 14. [todo] publish Maven artifacts
> >> > > > Micah did you have a chance to verify the staged maven artifacts?
> I'd
> >> > > > wait for your response before pushing the release b

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Sutou Kouhei
Hi,

> Just curious what's the progress of publishing Maven artifacts. Thanks!

It seems that Krisztián is waiting for someone's
verification of the staged artifacts:

https://lists.apache.org/thread/r7cm1zz2qz9r823p5tbxdv0gtr3o6s4r

> 14. [todo] publish Maven artifacts
> Micah did you have a chance to verify the staged maven artifacts? I'd
> wait for your response before pushing the release button.

Chao, could you verify the staged artifacts? Or is it OK to
publish the staged artifacts now?


Thanks,
-- 
kou

In 
  "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Tue, 8 Feb 2022 11:12:29 
-0800,
  Chao Sun  wrote:

> Hi Krisztián,
> 
> Just curious what's the progress of publishing Maven artifacts. Thanks!
> 
> Best,
> Chao
> 
> On Thu, Feb 3, 2022 at 2:32 PM Krisztián Szűcs 
> wrote:
> 
>> I managed to upload the wheels to pypi. Here is the current status
>> with the updated assignments:
>>
>> 1. [done] make the released version as "RELEASED" on JIRA
>> 2. [done] start the new version on JIRA
>> 4. [done] upload source
>> 5. [done] upload binaries
>> 6. [in-pr] update website
>> 7. [todo:kou] update Homebrew packages
>> 8. [todo:kou] update MSYS2 package
>> 9. [todo:kou] upload RubyGems
>> 10. [done] upload JS packages
>> 11. [done] upload C# packages
>> 12. [todo:unassigned] update conda recipes
>> 13. [done] upload wheels/sdist to pypi
>> 14. [todo:kszucs] publish Maven artifacts
>> 15. [todo:nealrichardson] update R packages
>> 16. [todo:ianmcook] update vcpkg port
>> 17. [done] bump versions
>> 18. [done] update tags for Go modules
>> 19. [in-pr] update docs
>>
>> On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
>>  wrote:
>> >
>> > I will handle the R package submission to CRAN.
>> Thanks Neal!
>> >
>> > Neal
>> >
>> > On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei  wrote:
>> >
>> > > Hi,
>> > >
>> > > I'll update/upload Homebrew, MSYS2 and RubyGems.
>> Thanks Kou!
>> > >
>> > > 1. [done] make the released version as "RELEASED" on JIRA
>> > > 2. [done] start the new version on JIRA
>> > > 4. [done] upload source
>> > > 5. [done] upload binaries
>> > > 6. [in-pr] update website
>> > > 7. [todo:kou] update Homebrew packages
>> > > 8. [todo:kou] update MSYS2 package
>> > > 9. [todo:kou] upload RubyGems
>> > > 10. [done] upload JS packages
>> > > 11. [done] upload C# packages
>> > > 12. [todo:unassigned] update conda recipes
>> > > 13. [blocked:kszucs] upload wheels/sdist to pypi
>> > > "Project size too large. Limit for project 'pyarrow' total size is 10
>> > > GB. See https://pypi.org/help/#project-size-limit";
>> > > Filed an issue to increase the project limit, waiting for the
>> > > response: https://github.com/pypa/pypi-support/issues/1653
>> > > 14. [todo] publish Maven artifacts
>> > > Micah did you have a chance to verify the staged maven artifacts? I'd
>> > > wait for your response before pushing the release button.
>> > > 15. [todo:unassigned] update R packages
>> > > 16. [todo:ianmcook] update vcpkg port
>> > > 17. [done] bump versions
>> > > 18. [done] update tags for Go modules
>> > > 19. [in-pr] update docs
>> > >
>> > >
>> > > Thanks,
>> > > --
>> > > kou
>> > >
>> > > In > >
>> > >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
>> > > 21:38:17 +0100,
>> > >   Krisztián Szűcs  wrote:
>> > >
>> > > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook 
>> wrote:
>> > > >>
>> > > >> Thanks Krisztián!
>> > > >>
>> > > >> I will update the vcpkg port.
>> > > > Thanks Ian!
>> > > >
>> > > > Here is the updated todo list:
>> > > >
>> > > > 1. [done] make the released version as "RELEASED" on JIRA
>> > > > 2. [done] start the new version on JIRA
>> > > > 4. [done] upload source
>> > > > 5. [done] upload binaries
>> > > > 6. [in-pr] update website
>> > > > 7. [todo:unassigned] update Homebrew packages
>> > > > 8. [todo:unassigned] update MSYS2 package
>> > > > 9. [todo:unassigned] upload RubyGems
>> > > > 10. [done] upload JS packages
>> > > > 11. [done] upload C# packages
>> > > > 12. [todo:unassigned] update conda recipes
>> > > > 13. [blocked:kszucs] upload wheels/sdist to pypi
>> > > > "Project size too large. Limit for project 'pyarrow' total size is 10
>> > > > GB. See https://pypi.org/help/#project-size-limit";
>> > > > Filed an issue to increase the project limit, waiting for the
>> > > > response: https://github.com/pypa/pypi-support/issues/1653
>> > > > 14. [todo] publish Maven artifacts
>> > > > Micah did you have a chance to verify the staged maven artifacts? I'd
>> > > > wait for your response before pushing the release button.
>> > > > 15. [todo:unassigned] update R packages
>> > > > 16. [todo:ianmcook] update vcpkg port
>> > > > 17. [done] bump versions
>> > > > 18. [done] update tags for Go modules
>> > > > 19. [in-pr] update docs
>> > > >
>> > > >>
>> > > >> Ian
>> > > >>
>> > > >> On Thu, Feb 3, 2022 at 2:35 PM Krisztián Szűcs
>> > > >>  wrote:
>> > > >> >
>> > > >> > Current status of the post release tasks:
>> > > >> >
>> > > >> > 1. [done] make the released version as

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Sutou Kouhei
Homebrew, MSYS2 and RubyGems are done:

1. [done] make the released version as "RELEASED" on JIRA
2. [done] start the new version on JIRA
4. [done] upload source
5. [done] upload binaries
6. [done] update website
7. [done] update Homebrew packages
8. [done] update MSYS2 package
9. [done] upload RubyGems
10. [done] upload JS packages
11. [done] upload C# packages
12. [todo:unassigned] update conda recipes
13. [done] upload wheels/sdist to pypi
14. [todo:kszucs] publish Maven artifacts
15. [todo:nealrichardson] update R packages
16. [todo:ianmcook] update vcpkg port
17. [done] bump versions
18. [done] update tags for Go modules
19. [done] update docs

In 
  "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022 23:31:45 
+0100,
  Krisztián Szűcs  wrote:

> I managed to upload the wheels to pypi. Here is the current status
> with the updated assignments:
> 
> 1. [done] make the released version as "RELEASED" on JIRA
> 2. [done] start the new version on JIRA
> 4. [done] upload source
> 5. [done] upload binaries
> 6. [in-pr] update website
> 7. [todo:kou] update Homebrew packages
> 8. [todo:kou] update MSYS2 package
> 9. [todo:kou] upload RubyGems
> 10. [done] upload JS packages
> 11. [done] upload C# packages
> 12. [todo:unassigned] update conda recipes
> 13. [done] upload wheels/sdist to pypi
> 14. [todo:kszucs] publish Maven artifacts
> 15. [todo:nealrichardson] update R packages
> 16. [todo:ianmcook] update vcpkg port
> 17. [done] bump versions
> 18. [done] update tags for Go modules
> 19. [in-pr] update docs
> 
> On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
>  wrote:
>>
>> I will handle the R package submission to CRAN.
> Thanks Neal!
>>
>> Neal
>>
>> On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei  wrote:
>>
>> > Hi,
>> >
>> > I'll update/upload Homebrew, MSYS2 and RubyGems.
> Thanks Kou!
>> >
>> > 1. [done] make the released version as "RELEASED" on JIRA
>> > 2. [done] start the new version on JIRA
>> > 4. [done] upload source
>> > 5. [done] upload binaries
>> > 6. [in-pr] update website
>> > 7. [todo:kou] update Homebrew packages
>> > 8. [todo:kou] update MSYS2 package
>> > 9. [todo:kou] upload RubyGems
>> > 10. [done] upload JS packages
>> > 11. [done] upload C# packages
>> > 12. [todo:unassigned] update conda recipes
>> > 13. [blocked:kszucs] upload wheels/sdist to pypi
>> > "Project size too large. Limit for project 'pyarrow' total size is 10
>> > GB. See https://pypi.org/help/#project-size-limit";
>> > Filed an issue to increase the project limit, waiting for the
>> > response: https://github.com/pypa/pypi-support/issues/1653
>> > 14. [todo] publish Maven artifacts
>> > Micah did you have a chance to verify the staged maven artifacts? I'd
>> > wait for your response before pushing the release button.
>> > 15. [todo:unassigned] update R packages
>> > 16. [todo:ianmcook] update vcpkg port
>> > 17. [done] bump versions
>> > 18. [done] update tags for Go modules
>> > 19. [in-pr] update docs
>> >
>> >
>> > Thanks,
>> > --
>> > kou
>> >
>> > In 
>> >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
>> > 21:38:17 +0100,
>> >   Krisztián Szűcs  wrote:
>> >
>> > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook  wrote:
>> > >>
>> > >> Thanks Krisztián!
>> > >>
>> > >> I will update the vcpkg port.
>> > > Thanks Ian!
>> > >
>> > > Here is the updated todo list:
>> > >
>> > > 1. [done] make the released version as "RELEASED" on JIRA
>> > > 2. [done] start the new version on JIRA
>> > > 4. [done] upload source
>> > > 5. [done] upload binaries
>> > > 6. [in-pr] update website
>> > > 7. [todo:unassigned] update Homebrew packages
>> > > 8. [todo:unassigned] update MSYS2 package
>> > > 9. [todo:unassigned] upload RubyGems
>> > > 10. [done] upload JS packages
>> > > 11. [done] upload C# packages
>> > > 12. [todo:unassigned] update conda recipes
>> > > 13. [blocked:kszucs] upload wheels/sdist to pypi
>> > > "Project size too large. Limit for project 'pyarrow' total size is 10
>> > > GB. See https://pypi.org/help/#project-size-limit";
>> > > Filed an issue to increase the project limit, waiting for the
>> > > response: https://github.com/pypa/pypi-support/issues/1653
>> > > 14. [todo] publish Maven artifacts
>> > > Micah did you have a chance to verify the staged maven artifacts? I'd
>> > > wait for your response before pushing the release button.
>> > > 15. [todo:unassigned] update R packages
>> > > 16. [todo:ianmcook] update vcpkg port
>> > > 17. [done] bump versions
>> > > 18. [done] update tags for Go modules
>> > > 19. [in-pr] update docs
>> > >
>> > >>
>> > >> Ian
>> > >>
>> > >> On Thu, Feb 3, 2022 at 2:35 PM Krisztián Szűcs
>> > >>  wrote:
>> > >> >
>> > >> > Current status of the post release tasks:
>> > >> >
>> > >> > 1. [done] make the released version as "RELEASED" on JIRA
>> > >> > 2. [done] start the new version on JIRA
>> > >> > 4. [done] upload source
>> > >> > 5. [done] upload binaries
>> > >> > 6. [in-pr] update website
>> > >> > 7. [TODO] update Homebrew packages
>> > >> > 8. [T

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-08 Thread Chao Sun
Hi Krisztián,

Just curious what's the progress of publishing Maven artifacts. Thanks!

Best,
Chao

On Thu, Feb 3, 2022 at 2:32 PM Krisztián Szűcs 
wrote:

> I managed to upload the wheels to pypi. Here is the current status
> with the updated assignments:
>
> 1. [done] make the released version as "RELEASED" on JIRA
> 2. [done] start the new version on JIRA
> 4. [done] upload source
> 5. [done] upload binaries
> 6. [in-pr] update website
> 7. [todo:kou] update Homebrew packages
> 8. [todo:kou] update MSYS2 package
> 9. [todo:kou] upload RubyGems
> 10. [done] upload JS packages
> 11. [done] upload C# packages
> 12. [todo:unassigned] update conda recipes
> 13. [done] upload wheels/sdist to pypi
> 14. [todo:kszucs] publish Maven artifacts
> 15. [todo:nealrichardson] update R packages
> 16. [todo:ianmcook] update vcpkg port
> 17. [done] bump versions
> 18. [done] update tags for Go modules
> 19. [in-pr] update docs
>
> On Thu, Feb 3, 2022 at 10:26 PM Neal Richardson
>  wrote:
> >
> > I will handle the R package submission to CRAN.
> Thanks Neal!
> >
> > Neal
> >
> > On Thu, Feb 3, 2022 at 4:11 PM Sutou Kouhei  wrote:
> >
> > > Hi,
> > >
> > > I'll update/upload Homebrew, MSYS2 and RubyGems.
> Thanks Kou!
> > >
> > > 1. [done] make the released version as "RELEASED" on JIRA
> > > 2. [done] start the new version on JIRA
> > > 4. [done] upload source
> > > 5. [done] upload binaries
> > > 6. [in-pr] update website
> > > 7. [todo:kou] update Homebrew packages
> > > 8. [todo:kou] update MSYS2 package
> > > 9. [todo:kou] upload RubyGems
> > > 10. [done] upload JS packages
> > > 11. [done] upload C# packages
> > > 12. [todo:unassigned] update conda recipes
> > > 13. [blocked:kszucs] upload wheels/sdist to pypi
> > > "Project size too large. Limit for project 'pyarrow' total size is 10
> > > GB. See https://pypi.org/help/#project-size-limit";
> > > Filed an issue to increase the project limit, waiting for the
> > > response: https://github.com/pypa/pypi-support/issues/1653
> > > 14. [todo] publish Maven artifacts
> > > Micah did you have a chance to verify the staged maven artifacts? I'd
> > > wait for your response before pushing the release button.
> > > 15. [todo:unassigned] update R packages
> > > 16. [todo:ianmcook] update vcpkg port
> > > 17. [done] bump versions
> > > 18. [done] update tags for Go modules
> > > 19. [in-pr] update docs
> > >
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > > In  >
> > >   "Re: [VOTE] Release Apache Arrow 7.0.0 - RC10" on Thu, 3 Feb 2022
> > > 21:38:17 +0100,
> > >   Krisztián Szűcs  wrote:
> > >
> > > > On Thu, Feb 3, 2022 at 8:58 PM Ian Cook 
> wrote:
> > > >>
> > > >> Thanks Krisztián!
> > > >>
> > > >> I will update the vcpkg port.
> > > > Thanks Ian!
> > > >
> > > > Here is the updated todo list:
> > > >
> > > > 1. [done] make the released version as "RELEASED" on JIRA
> > > > 2. [done] start the new version on JIRA
> > > > 4. [done] upload source
> > > > 5. [done] upload binaries
> > > > 6. [in-pr] update website
> > > > 7. [todo:unassigned] update Homebrew packages
> > > > 8. [todo:unassigned] update MSYS2 package
> > > > 9. [todo:unassigned] upload RubyGems
> > > > 10. [done] upload JS packages
> > > > 11. [done] upload C# packages
> > > > 12. [todo:unassigned] update conda recipes
> > > > 13. [blocked:kszucs] upload wheels/sdist to pypi
> > > > "Project size too large. Limit for project 'pyarrow' total size is 10
> > > > GB. See https://pypi.org/help/#project-size-limit";
> > > > Filed an issue to increase the project limit, waiting for the
> > > > response: https://github.com/pypa/pypi-support/issues/1653
> > > > 14. [todo] publish Maven artifacts
> > > > Micah did you have a chance to verify the staged maven artifacts? I'd
> > > > wait for your response before pushing the release button.
> > > > 15. [todo:unassigned] update R packages
> > > > 16. [todo:ianmcook] update vcpkg port
> > > > 17. [done] bump versions
> > > > 18. [done] update tags for Go modules
> > > > 19. [in-pr] update docs
> > > >
> > > >>
> > > >> Ian
> > > >>
> > > >> On Thu, Feb 3, 2022 at 2:35 PM Krisztián Szűcs
> > > >>  wrote:
> > > >> >
> > > >> > Current status of the post release tasks:
> > > >> >
> > > >> > 1. [done] make the released version as "RELEASED" on JIRA
> > > >> > 2. [done] start the new version on JIRA
> > > >> > 4. [done] upload source
> > > >> > 5. [done] upload binaries
> > > >> > 6. [in-pr] update website
> > > >> > 7. [TODO] update Homebrew packages
> > > >> > 8. [TODO] update MSYS2 package
> > > >> > 9. [TODO] upload RubyGems
> > > >> > 10. [done] upload JS packages
> > > >> > 11. [done] upload C# packages
> > > >> > 12. [TODO] update conda recipes
> > > >> > 13. [kszucs/problem] upload wheels/sdist to pypi
> > > >> > "Project size too large. Limit for project 'pyarrow' total size
> is 10
> > > >> > GB. See https://pypi.org/help/#project-size-limit";
> > > >> > Filed an issue to increase the project limit, waiting for the
> > > >> > response: https://github.com/pypa/pypi-supp

Re: Bearer Token Refresh Task

2022-02-08 Thread David Li
For gRPC, in theory, you can get UNAUTHENTICATED at any time, including after 
the client has gotten some results.

If you need to retry calls, and you want to do it transparently, you'd need a 
gRPC interceptor, yes. (The Flight middleware is not powerful enough to do 
that.) But that should be separable from authentication itself?

As a side note, we have far too many auth methods, especially as some are 
misleadingly named (e.g. the "basic" auth has little to no relation with HTTP 
Basic Auth). I suppose a lot of it is just historical stuff that should 
probably be cleaned up, or at least properly documented.

-David

On Tue, Feb 8, 2022, at 13:15, José Almeida wrote:
> Hi guys, We are assuming the Bearer Token Refresh task, which was started
> but now it's been paused for a while (link to original POC)
> <[link](https://github.com/apache/arrow/pull/8780)>, and we have some
> concerns about it, such as:
>
> 1 During a Flight Call we can get unauthenticated while consuming a stream
> or just when an operation is started? We were wondering if the creation of
> a wrapper around the StreamObserver is needed.
> 2 Would it be better for an Interceptor to make this Authentication? We
> were basing ourselves on this comment
> 


Bearer Token Refresh Task

2022-02-08 Thread José Almeida
Hi guys, We are assuming the Bearer Token Refresh task, which was started
but now it's been paused for a while (link to original POC)
<[link](https://github.com/apache/arrow/pull/8780)>, and we have some
concerns about it, such as:

1 During a Flight Call we can get unauthenticated while consuming a stream
or just when an operation is started? We were wondering if the creation of
a wrapper around the StreamObserver is needed.
2 Would it be better for an Interceptor to make this Authentication? We
were basing ourselves on this comment



Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Joris Van den Bossche
On Tue, 8 Feb 2022 at 17:37, Jorge Cardoso Leitão 
wrote:

> ...
>
> Wrt to binary, imo the challenge is:
> * we state that backward incompatible changes to the c data interface
> require a new spec [1]
>

Note that this discussion wouldn't change anything about the C Data
Interface spec itself. The discussion is only about the *value* that is put
in one of the key-value metadata fields. The C Data Interface spec defines
how the metadata needs to be stored, but doesn't specify anything about the
actual value of one of the key-value metadata fields.


> * we state that the metadata is a binary string [2]
> * a valid string is a subset of all valid byte arrays and thus removing "
> *string*" from the spec is backward incompatible
>
> If we write invalid utf8 to it and a reader assumes utf8 when reading it,
> we trigger undefined behavior.
>
> I was a bit surprised by ARROW-15613 - my understanding is that the c++
> implementation is not following the spec, and if we at arrow2 were not be
> checking for utf8, we would be exposing a vulnerability (at least according
> to Rust's standards). We just checked it out of luck (it is O(1), so why
> not).
>

Yes, the C++ implementation is indeed not following the spec. See the
"[DISCUSS] Binary Values in Key value pairs" thread (
https://lists.apache.org/thread/blmj0cgv34dgdxqd3ow60ln68khnz0qr). Let's
maybe keep this part of the discussion there?


Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Paul Balança
If I may, I would be really interested to be kept in the loop as well. I
have been working on a small library making it easy to declare Python types
and automatically getting them supported in Pyarrow as extension types (and
then benefit of vecotrized ops) : https://github.com/balancap/arrowbic

The main feature at the moment is the support of dataclass, numpy arrays
and enum, but I plan to extend it to as many standard Python patterns as
possible.

Short story, for now, I am storing metadata in json serialized, but I would
be happy to move to any standard defined in Pyarrow, and also use the
standard representation for tensor / Numpy array

Thanks you!
Paul




On Tue, 8 Feb 2022, 17:57 Micah Kornfield,  wrote:

> >
> > I do not know if we voted on a naming convention, but we may want to
> > reserve a namespace for us (e.g. "arrow").
>
> +1 to calling out in docs that the arrow namespace should be reserved.
> maybe "apache.arrow" to lower the possibility of collisions with people who
> already have extension types? (I don't feel too strongly about this).
>
> Note that we do not have tests on tensor arrays, so testing the extension
> > type on these may be hindered by divergences between implementations. I
> do
> > not think we even have json integration files for them.
>
> Agree, we'll likely need a little more thought on what it means to validate
> extension types (is being able to parse extension metadata sufficient?)
>
> Also, note that Rust's arrow2 supports extension types (tested part of the
> > IPC and c data interface*), and Polars relies on it to allow Python
> generic
> > "object" in its machinery.
>
> I think this is great for having external verification of  specifications,
> but I think for official arrow types, we should be focusing on
> implementations that are under ASF governance.
>
> On Tue, Feb 8, 2022 at 8:32 AM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > Note that we do not have tests on tensor arrays, so testing the extension
> > type on these may be hindered by divergences between implementations. I
> do
> > not think we even have json integration files for them.
> >
> > If the focus is extension types, maybe it would be best to cover types
> > whose physical representations are covered in e.g. IPC or c data
> interface
> > tests.
> >
> > I do not know if we voted on a naming convention, but we may want to
> > reserve a namespace for us (e.g. "arrow").
> >
> > Also, note that Rust's arrow2 supports extension types (tested part of
> the
> > IPC and c data interface*), and Polars relies on it to allow Python
> generic
> > "object" in its machinery.
> >
> > Best,
> > Jorge
> >
> > * pending https://issues.apache.org/jira/browse/ARROW-15613
> >
> >
> >
> > On Tue, Feb 8, 2022, 13:52 Joris Van den Bossche <
> > jorisvandenboss...@gmail.com> wrote:
> >
> > > On Mon, 7 Feb 2022 at 21:02, Rok Mihevc  wrote:
> > >
> > > > To follow up the discussion from the bi-weekly Arrow sync:
> > > >
> > > > - JSON seems the most suitable candidate for the extension metadata.
> > > > E.g.: TensorArray
> > > > {"key": "ARROW:extension:name", "value": "tensor shape=(3,
> > > > 3, 4), strides=(12, 4, 1)>"},
> > > > {"key": "ARROW:extension:metadata", "value": "{'type': 'int64',
> > > > 'shape': [3, 3, 4], 'strides': [12, 4, 1]}"}
> > > >
> > >
> > > I will start a separate thread for the exact encoding of the metadata
> > value
> > > (i.e. JSON or something else) if that's OK. I already started writing
> one
> > > last week anyway, and that keeps things a bit separated.
> > >
> > > For the name of the extension type:
> > > - We might want to use something like "arrow.tensor" to follow the
> > > recommendation at
> > > https://arrow.apache.org/docs/format/Columnar.html#extension-types to
> > use
> > > a
> > > namespace. And so for "well known" extension types that are defined in
> > the
> > > Arrow project itself, I think we can use the "arrow" namespace? (as
> > > example, for the extension types defined in pandas, I used the
> "pandas."
> > > namespace)
> > > - In general, I think it's best to keep the name itself simple, and
> leave
> > > any parametrization out of it (since this is included in the metadata).
> > So
> > > in this case that would be just "tensor" instead of "tensor > > shape=..., ..>".
> > > - Specifically for this extension type, we might want to use something
> > like
> > > "fixed_size_tensor" instead of "tensor", to be able to differentiate in
> > the
> > > future between the tensor type with constant shape vs variable shape (
> > > ARROW-1614  vs
> > > ARROW-8714
> > > ). But that's
> > something
> > > to discuss in the relevant JIRA issue / PR.
> > >
> > > - We want to start with at least one integration test pair. Potential
> > > > candidates are cpp, julia, go, rust.
> > > >
> > >
> > > Rust does not yet seem to support extension types? (
> > > https://github.com/a

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Micah Kornfield
>
> I do not know if we voted on a naming convention, but we may want to
> reserve a namespace for us (e.g. "arrow").

+1 to calling out in docs that the arrow namespace should be reserved.
maybe "apache.arrow" to lower the possibility of collisions with people who
already have extension types? (I don't feel too strongly about this).

Note that we do not have tests on tensor arrays, so testing the extension
> type on these may be hindered by divergences between implementations. I do
> not think we even have json integration files for them.

Agree, we'll likely need a little more thought on what it means to validate
extension types (is being able to parse extension metadata sufficient?)

Also, note that Rust's arrow2 supports extension types (tested part of the
> IPC and c data interface*), and Polars relies on it to allow Python generic
> "object" in its machinery.

I think this is great for having external verification of  specifications,
but I think for official arrow types, we should be focusing on
implementations that are under ASF governance.

On Tue, Feb 8, 2022 at 8:32 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Note that we do not have tests on tensor arrays, so testing the extension
> type on these may be hindered by divergences between implementations. I do
> not think we even have json integration files for them.
>
> If the focus is extension types, maybe it would be best to cover types
> whose physical representations are covered in e.g. IPC or c data interface
> tests.
>
> I do not know if we voted on a naming convention, but we may want to
> reserve a namespace for us (e.g. "arrow").
>
> Also, note that Rust's arrow2 supports extension types (tested part of the
> IPC and c data interface*), and Polars relies on it to allow Python generic
> "object" in its machinery.
>
> Best,
> Jorge
>
> * pending https://issues.apache.org/jira/browse/ARROW-15613
>
>
>
> On Tue, Feb 8, 2022, 13:52 Joris Van den Bossche <
> jorisvandenboss...@gmail.com> wrote:
>
> > On Mon, 7 Feb 2022 at 21:02, Rok Mihevc  wrote:
> >
> > > To follow up the discussion from the bi-weekly Arrow sync:
> > >
> > > - JSON seems the most suitable candidate for the extension metadata.
> > > E.g.: TensorArray
> > > {"key": "ARROW:extension:name", "value": "tensor > > 3, 4), strides=(12, 4, 1)>"},
> > > {"key": "ARROW:extension:metadata", "value": "{'type': 'int64',
> > > 'shape': [3, 3, 4], 'strides': [12, 4, 1]}"}
> > >
> >
> > I will start a separate thread for the exact encoding of the metadata
> value
> > (i.e. JSON or something else) if that's OK. I already started writing one
> > last week anyway, and that keeps things a bit separated.
> >
> > For the name of the extension type:
> > - We might want to use something like "arrow.tensor" to follow the
> > recommendation at
> > https://arrow.apache.org/docs/format/Columnar.html#extension-types to
> use
> > a
> > namespace. And so for "well known" extension types that are defined in
> the
> > Arrow project itself, I think we can use the "arrow" namespace? (as
> > example, for the extension types defined in pandas, I used the "pandas."
> > namespace)
> > - In general, I think it's best to keep the name itself simple, and leave
> > any parametrization out of it (since this is included in the metadata).
> So
> > in this case that would be just "tensor" instead of "tensor > shape=..., ..>".
> > - Specifically for this extension type, we might want to use something
> like
> > "fixed_size_tensor" instead of "tensor", to be able to differentiate in
> the
> > future between the tensor type with constant shape vs variable shape (
> > ARROW-1614  vs
> > ARROW-8714
> > ). But that's
> something
> > to discuss in the relevant JIRA issue / PR.
> >
> > - We want to start with at least one integration test pair. Potential
> > > candidates are cpp, julia, go, rust.
> > >
> >
> > Rust does not yet seem to support extension types? (
> > https://github.com/apache/arrow-rs/issues/218)
> >
> >
> > > - First well known extension type candidate is TensorArray but other
> > > suggestions are welcome.
> > >
> >
> > Others that I am aware of that have been brought up in the past are UUID
> (
> > ARROW-2152 ), complex
> > numbers (ARROW-638 ,
> this
> > has a PR) and 8-bit boolean values (ARROW-1674
> > ). But I think we
> should
> > mainly look at demand / someone wanting to implement this, and (for you)
> > this seems to be Tensors, so it's fine to focus on that.
> >
> > Joris
> >
> >
> > >
> > > On Tue, Jan 25, 2022 at 10:34 AM Antoine Pitrou 
> > > wrote:
> > > >
> > > >
> > > > Le 25/01/2022 à 10:12, Joris Van den Bossche a écrit :
> > > > > On Sat, 22 Jan 2022 at 20:27, Rok Mihevc 
> > wrote:
> > > > >>
> > > > >> Thanks for the input Weston!

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Micah Kornfield
>
> One possible alternative could be to use the format as specified in the C
> Data Interface for key-value metadata:
>
> https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata
> (there it is used for the actual key-value metadata of a field, while here
> it is for formatting a single value. But since for this discussion the
> value is also a key-value mapping, the same scheme could be used).
> (since this is a binary format, this assumes that the discussion about
> allowing binary values in the key-value metadata in the IPC format gets
> resolved)

I think it likely depends on the complexity of the metadata.  If your
values are themselves complex, then using something like JSON or another
existing serialization format makes sense (e.g. this could also be
flatbuffers, protobuf).

An alternative approach is to consider ARROW-15613 a bug and do not change
> the spec - require consumers to encode the binary data in a string
> representation like base64.


> My sense, that while onerous updating the specification is probably going
to be the safest way to avoid breaking existing users.  I would imagine the
process to get C++ compliant again would be:
1.  Add the ability to store arbitrary bytes to the specification.
2.  Start duplicating existing data between the two fields.
3.  At some point later, stop producing non-spec compliant data in C++


I just think it is important that we are consistent between the IPC and the
> c data interface.

+1

On Tue, Feb 8, 2022 at 8:38 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
> Great questions and write up. Thanks!
>
> imo dragging a JSON reader and writer to read official extension types'
> metadata seems overkill. The c data interface is expected to be quite low
> level. Imo we should aim for a (non-human readable) binary format. For
> non-official, imo you are spot on - use what best fits to the use-case or
> application. If the application is storing other metadata in json, json may
> make sense, in Python pickle is another option, flatbuffers or something
> like that is also ok imo.
>
> Wrt to binary, imo the challenge is:
> * we state that backward incompatible changes to the c data interface
> require a new spec [1]
> * we state that the metadata is a binary string [2]
> * a valid string is a subset of all valid byte arrays and thus removing "
> *string*" from the spec is backward incompatible
>
> If we write invalid utf8 to it and a reader assumes utf8 when reading it,
> we trigger undefined behavior.
>
> I was a bit surprised by ARROW-15613 - my understanding is that the c++
> implementation is not following the spec, and if we at arrow2 were not be
> checking for utf8, we would be exposing a vulnerability (at least according
> to Rust's standards). We just checked it out of luck (it is O(1), so why
> not).
>
> What is the concern with string-encoding binary like base64?
>
> Given that one of our reference implementations is not following the spec
> and there is value in allowing arbitrary bytes on the metadata values, we
> may as well just update the spec to align with the reference
> implementation? If we do that, I would suggest that we do it both in the c
> data interface and the IPC specification, since imo it is quite important
> that an extension can flow all the way through IPC and c data interface.
>
> An alternative approach is to consider ARROW-15613 a bug and do not change
> the spec - require consumers to encode the binary data in a string
> representation like base64.
>
> I just think it is important that we are consistent between the IPC and the
> c data interface.
>
> For reference, Polars uses base64 encoding of Python blobs (pickle,
> pointers, etc.) because we enforce the spec on arrow2.
>
> Best,
> Jorge
>
> [1]
>
> https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification
> [2]
>
> https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata
> [ARROW-15613
> ]
> https://issues.apache.org/jira/browse/ARROW-15613
>


Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Jorge Cardoso Leitão
Hi,

Great questions and write up. Thanks!

imo dragging a JSON reader and writer to read official extension types'
metadata seems overkill. The c data interface is expected to be quite low
level. Imo we should aim for a (non-human readable) binary format. For
non-official, imo you are spot on - use what best fits to the use-case or
application. If the application is storing other metadata in json, json may
make sense, in Python pickle is another option, flatbuffers or something
like that is also ok imo.

Wrt to binary, imo the challenge is:
* we state that backward incompatible changes to the c data interface
require a new spec [1]
* we state that the metadata is a binary string [2]
* a valid string is a subset of all valid byte arrays and thus removing "
*string*" from the spec is backward incompatible

If we write invalid utf8 to it and a reader assumes utf8 when reading it,
we trigger undefined behavior.

I was a bit surprised by ARROW-15613 - my understanding is that the c++
implementation is not following the spec, and if we at arrow2 were not be
checking for utf8, we would be exposing a vulnerability (at least according
to Rust's standards). We just checked it out of luck (it is O(1), so why
not).

What is the concern with string-encoding binary like base64?

Given that one of our reference implementations is not following the spec
and there is value in allowing arbitrary bytes on the metadata values, we
may as well just update the spec to align with the reference
implementation? If we do that, I would suggest that we do it both in the c
data interface and the IPC specification, since imo it is quite important
that an extension can flow all the way through IPC and c data interface.

An alternative approach is to consider ARROW-15613 a bug and do not change
the spec - require consumers to encode the binary data in a string
representation like base64.

I just think it is important that we are consistent between the IPC and the
c data interface.

For reference, Polars uses base64 encoding of Python blobs (pickle,
pointers, etc.) because we enforce the spec on arrow2.

Best,
Jorge

[1]
https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification
[2]
https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata
[ARROW-15613] https://issues.apache.org/jira/browse/ARROW-15613


Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Jorge Cardoso Leitão
Note that we do not have tests on tensor arrays, so testing the extension
type on these may be hindered by divergences between implementations. I do
not think we even have json integration files for them.

If the focus is extension types, maybe it would be best to cover types
whose physical representations are covered in e.g. IPC or c data interface
tests.

I do not know if we voted on a naming convention, but we may want to
reserve a namespace for us (e.g. "arrow").

Also, note that Rust's arrow2 supports extension types (tested part of the
IPC and c data interface*), and Polars relies on it to allow Python generic
"object" in its machinery.

Best,
Jorge

* pending https://issues.apache.org/jira/browse/ARROW-15613



On Tue, Feb 8, 2022, 13:52 Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> On Mon, 7 Feb 2022 at 21:02, Rok Mihevc  wrote:
>
> > To follow up the discussion from the bi-weekly Arrow sync:
> >
> > - JSON seems the most suitable candidate for the extension metadata.
> > E.g.: TensorArray
> > {"key": "ARROW:extension:name", "value": "tensor > 3, 4), strides=(12, 4, 1)>"},
> > {"key": "ARROW:extension:metadata", "value": "{'type': 'int64',
> > 'shape': [3, 3, 4], 'strides': [12, 4, 1]}"}
> >
>
> I will start a separate thread for the exact encoding of the metadata value
> (i.e. JSON or something else) if that's OK. I already started writing one
> last week anyway, and that keeps things a bit separated.
>
> For the name of the extension type:
> - We might want to use something like "arrow.tensor" to follow the
> recommendation at
> https://arrow.apache.org/docs/format/Columnar.html#extension-types to use
> a
> namespace. And so for "well known" extension types that are defined in the
> Arrow project itself, I think we can use the "arrow" namespace? (as
> example, for the extension types defined in pandas, I used the "pandas."
> namespace)
> - In general, I think it's best to keep the name itself simple, and leave
> any parametrization out of it (since this is included in the metadata). So
> in this case that would be just "tensor" instead of "tensor shape=..., ..>".
> - Specifically for this extension type, we might want to use something like
> "fixed_size_tensor" instead of "tensor", to be able to differentiate in the
> future between the tensor type with constant shape vs variable shape (
> ARROW-1614  vs
> ARROW-8714
> ). But that's something
> to discuss in the relevant JIRA issue / PR.
>
> - We want to start with at least one integration test pair. Potential
> > candidates are cpp, julia, go, rust.
> >
>
> Rust does not yet seem to support extension types? (
> https://github.com/apache/arrow-rs/issues/218)
>
>
> > - First well known extension type candidate is TensorArray but other
> > suggestions are welcome.
> >
>
> Others that I am aware of that have been brought up in the past are UUID (
> ARROW-2152 ), complex
> numbers (ARROW-638 , this
> has a PR) and 8-bit boolean values (ARROW-1674
> ). But I think we should
> mainly look at demand / someone wanting to implement this, and (for you)
> this seems to be Tensors, so it's fine to focus on that.
>
> Joris
>
>
> >
> > On Tue, Jan 25, 2022 at 10:34 AM Antoine Pitrou 
> > wrote:
> > >
> > >
> > > Le 25/01/2022 à 10:12, Joris Van den Bossche a écrit :
> > > > On Sat, 22 Jan 2022 at 20:27, Rok Mihevc 
> wrote:
> > > >>
> > > >> Thanks for the input Weston!
> > > >>
> > > >> How about arrow/experimental/format/ExtensionTypes.fbs or
> > > >> arrow/format/ExtensionTypes.fbs for language independent schema and
> > > >> loosely arrow//extensions for implementations?
> > > >>
> > > >> Having machine readable definitions could perhaps be useful for
> > > >> generating implementations in some cases.
> > > >
> > > > Is it useful to put this in a flatbuffer file? Based on the list from
> > > > Weston just below, I think this will mostly contain a *description*
> of
> > > > those different aspect (a specification of the extension type), and
> > > > there is no data that actually fits in a flatbuffer table? In that
> > > > case a plain text (eg markdown) file seems more fitting?
> > >
> > > I agree this is mostly a plain text (or, rather, reST :-))
> specification
> > > task.
> > >
> > > Regards
> > >
> > > Antoine.
> >
>


[Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Joris Van den Bossche
Hi all,

There is currently some discussion regarding how we can formalize/document
"well known" extension types (see the "[DISCUSS] New Types (Schema.fbs vs
Extension Types)" thread). There is ongoing work on an extension type to
store arrays / tensors by Rok (
https://issues.apache.org/jira/browse/ARROW-1614), and my colleague Dewey
and myself are looking at extension types for geospatial data.

Often, for an extension type, you will want to store some metadata in the
"ARROW:extension:metadata" field when serializing the type (see format docs
, an
example metadata given there is {'type': 'int8', 'shape': [4, 5]} for a
tensor array). But the question is how to exactly format the data in this
field, assuming the value itself is also some form of key-value metadata.

Last Wednesday, I raised this question in the Arrow sync call, and copying
from the meeting notes:

- Joris asked how we should store key-value metadata for extension
types as a string; practical options seem limited to JSON or YAML;
JSON seems most reasonable

Also when implementing Arrow extension types in pandas (for some pandas
data types that don't have a direct mapping to an Arrow type), I (naively)
used a json dump because this is simply an easy solution when working in
Python (example

).

Now, if you have a JSON library available, using JSON for this is indeed
straightforward. But if we want that the metadata is also relatively easily
parse-able "by hand", there might be better alternatives?
In https://github.com/paleolimbot/geoarrow/, Dewey has been working on an R
package dealing with some extension types where the core is implemented in
C, and mentioned that dealing with JSON-like metadata would not be trivial
(or at least more complex than what's currently being used there, see
below).

One possible alternative could be to use the format as specified in the C
Data Interface for key-value metadata:
https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata
(there it is used for the actual key-value metadata of a field, while here
it is for formatting a single value. But since for this discussion the
value is also a key-value mapping, the same scheme could be used).
(since this is a binary format, this assumes that the discussion about
allowing binary values in the key-value metadata in the IPC format gets
resolved)

Thoughts?

Joris


Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Joris Van den Bossche
On Mon, 7 Feb 2022 at 21:02, Rok Mihevc  wrote:

> To follow up the discussion from the bi-weekly Arrow sync:
>
> - JSON seems the most suitable candidate for the extension metadata.
> E.g.: TensorArray
> {"key": "ARROW:extension:name", "value": "tensor 3, 4), strides=(12, 4, 1)>"},
> {"key": "ARROW:extension:metadata", "value": "{'type': 'int64',
> 'shape': [3, 3, 4], 'strides': [12, 4, 1]}"}
>

I will start a separate thread for the exact encoding of the metadata value
(i.e. JSON or something else) if that's OK. I already started writing one
last week anyway, and that keeps things a bit separated.

For the name of the extension type:
- We might want to use something like "arrow.tensor" to follow the
recommendation at
https://arrow.apache.org/docs/format/Columnar.html#extension-types to use a
namespace. And so for "well known" extension types that are defined in the
Arrow project itself, I think we can use the "arrow" namespace? (as
example, for the extension types defined in pandas, I used the "pandas."
namespace)
- In general, I think it's best to keep the name itself simple, and leave
any parametrization out of it (since this is included in the metadata). So
in this case that would be just "tensor" instead of "tensor".
- Specifically for this extension type, we might want to use something like
"fixed_size_tensor" instead of "tensor", to be able to differentiate in the
future between the tensor type with constant shape vs variable shape (
ARROW-1614  vs ARROW-8714
). But that's something
to discuss in the relevant JIRA issue / PR.

- We want to start with at least one integration test pair. Potential
> candidates are cpp, julia, go, rust.
>

Rust does not yet seem to support extension types? (
https://github.com/apache/arrow-rs/issues/218)


> - First well known extension type candidate is TensorArray but other
> suggestions are welcome.
>

Others that I am aware of that have been brought up in the past are UUID (
ARROW-2152 ), complex
numbers (ARROW-638 , this
has a PR) and 8-bit boolean values (ARROW-1674
). But I think we should
mainly look at demand / someone wanting to implement this, and (for you)
this seems to be Tensors, so it's fine to focus on that.

Joris


>
> On Tue, Jan 25, 2022 at 10:34 AM Antoine Pitrou 
> wrote:
> >
> >
> > Le 25/01/2022 à 10:12, Joris Van den Bossche a écrit :
> > > On Sat, 22 Jan 2022 at 20:27, Rok Mihevc  wrote:
> > >>
> > >> Thanks for the input Weston!
> > >>
> > >> How about arrow/experimental/format/ExtensionTypes.fbs or
> > >> arrow/format/ExtensionTypes.fbs for language independent schema and
> > >> loosely arrow//extensions for implementations?
> > >>
> > >> Having machine readable definitions could perhaps be useful for
> > >> generating implementations in some cases.
> > >
> > > Is it useful to put this in a flatbuffer file? Based on the list from
> > > Weston just below, I think this will mostly contain a *description* of
> > > those different aspect (a specification of the extension type), and
> > > there is no data that actually fits in a flatbuffer table? In that
> > > case a plain text (eg markdown) file seems more fitting?
> >
> > I agree this is mostly a plain text (or, rather, reST :-)) specification
> > task.
> >
> > Regards
> >
> > Antoine.
>