Re: [DISCUSS] Release cadence and release vote conventions
Hi, Sorry for not replying this thread. I think that the biggest problem is related to our Java package. We'll be able to resolve the GPG key problem by creating a GPG key only for nightly release test. We can share the test GPG key publicly because it's a just for testing. It'll work for our binary artifacts and APT/Yum repositories but not work for our Java package. I don't know where GPG key is used in our Java package... We'll be able to resolve the Git commit problem by creating a cloned Git repository for test. It's done in our dev/release/00-prepare-test.rb[1]. [1] https://github.com/apache/arrow/blob/master/dev/release/00-prepare-test.rb#L30 The biggest problem for the Git commit is our Java package requires "apache-arrow-${VERSION}" tag on https://github.com/apache/arrow . (Right?) I think that "mvm release:perform" in dev/release/01-perform.sh does so but I don't know the details of "mvm release:perform"... More details: dev/release/00-prepare.sh: We'll be able to run this automatically when we can resolve the above GPG key problem in our Java package. We can resolve the Git commit problem by creating a cloned Git repository. dev/release/01-prepare.sh: We'll be able to run this automatically when we can resolve the above Git commit ("apche-arrow-${VERSION}" tag) problem in our Java package. dev/release/02-source.sh: We'll be able to run this automatically by creating a GPG key for nightly release test. We'll use Bintray to upload RC source archive instead of dist.apache.org. Ah, we need a Bintray API key for this. It must be secret. dev/release/03-binary.sh: We'll be able to run this automatically by creating a GPG key for nightly release test. We need a Bintray API key too. We need to improve this to support nightly release test. It use "XXX-rc" such as "debian-rc" for Bintray "package" name. It should use "XXX-nightly" such as "debian-nightly" for nightly release test instead. dev/release/post-00-release.sh: We'll be able to skip this. dev/release/post-01-upload.sh: We'll be able to skip this. dev/release/post-02-binary.sh: We'll be able to run this automatically by creating Bintray "packages" for nightly release and use them. We can create "XXX-nightly-release" ("debian-nightly-release") Bintray "packages" and use them instead of "XXX" ("debian") Bintray "packages". "debian" Bintray "package": https://bintray.com/apache/debian/ We need to improve this to support nightly release. dev/release/post-03-website.sh: We'll be able to run this automatically by creating a cloned Git repository for test. It's better that we have a Web site to show generated pages. We can create https://github.com/apache/arrow-site/tree/asf-site/nightly and use it but I don't like it. Because arrow-site increases a commit day by day. Can we prepare a Web site for this? (arrow-nightly.ursalabs.org?) dev/release/post-04-rubygems.sh: We may be able to use GitHub Package Registry[2] to upload RubyGems. We can use "pre-release" package feature of https://rubygems.org/ but it's not suitable for nightly. It's for RC or beta release. [2] https://github.blog/2019-05-10-introducing-github-package-registry/ dev/release/post-05-js.sh: We may be able to use GitHub Package Registry[2] to upload npm packages. dev/release/post-06-csharp.sh: We may be able to use GitHub Package Registry[2] to upload NuGet packages. dev/release/post-07-rust.sh: I don't have any idea. But it must be ran automatically. It's always failed. I needed to run each command manually. dev/release/post-08-remove-rc.sh: We'll be able to skip this. Thanks, -- kou In "Re: [DISCUSS] Release cadence and release vote conventions" on Wed, 31 Jul 2019 15:35:57 -0500, Wes McKinney wrote: > The PMC member and their GPG keys need to be in the loop at some > point. The release artifacts can be produced by some kind of CI/CD > system so long as the PMC member has confidence in the security of > those artifacts before signing them. For example, we build the > official binary packages on public CI services and then download and > sign them with Crossbow. I think the same could be done in theory with > the source release but we'd first need to figure out what to do about > the parts that create git commits. > > On Wed, Jul 31, 2019 at 11:23 AM Andy Grove wrote: >> >> To what extent would it be possible to automate the release process via >> CICD? >> >> On Wed, Jul 31, 2019 at 9:19 AM Wes McKinney wrote: >> >> > I think one thing that would help would be improving the >> > reproducibility of the source release process. The RM has to have >> > their machine configured in a particular way for it to work. >> > >> > Before anyone says "Docker" it isn't an easy solution because the >> > release scripts need to be able to create git commits (created by the >> > Maven release plugin) and sign artifacts using the RM's GPG keys. >> > >> > On Sat, Jul 27, 2019 at 10:04 PM Micah Kornfield >> > wrote: >> > > >> > > I just wanted to bump
Re: [DISCUSS] Release cadence and release vote conventions
The PMC member and their GPG keys need to be in the loop at some point. The release artifacts can be produced by some kind of CI/CD system so long as the PMC member has confidence in the security of those artifacts before signing them. For example, we build the official binary packages on public CI services and then download and sign them with Crossbow. I think the same could be done in theory with the source release but we'd first need to figure out what to do about the parts that create git commits. On Wed, Jul 31, 2019 at 11:23 AM Andy Grove wrote: > > To what extent would it be possible to automate the release process via > CICD? > > On Wed, Jul 31, 2019 at 9:19 AM Wes McKinney wrote: > > > I think one thing that would help would be improving the > > reproducibility of the source release process. The RM has to have > > their machine configured in a particular way for it to work. > > > > Before anyone says "Docker" it isn't an easy solution because the > > release scripts need to be able to create git commits (created by the > > Maven release plugin) and sign artifacts using the RM's GPG keys. > > > > On Sat, Jul 27, 2019 at 10:04 PM Micah Kornfield > > wrote: > > > > > > I just wanted to bump this thread. Kou and Krisztián as the last two > > > release managers is there any specific infrastructure that you think > > might > > > have helped? > > > > > > Thanks, > > > Micah > > > > > > On Wed, Jul 17, 2019 at 11:29 PM Micah Kornfield > > > wrote: > > > > > > > I'd can help as well, but not exactly sure where to start. It seems > > like > > > > there are already some JIRAs opened [1] > > > > for improving the release? Could someone more familiar with the > > process > > > > pick out the highest priority ones? Do more need to be opened? > > > > > > > > Thanks, > > > > Micah > > > > > > > > [1] > > > > > > https://issues.apache.org/jira/browse/ARROW-2880?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(%22Developer%20Tools%22%2C%20Packaging)%20and%20summary%20~%20Release > > > > > > > > On Sat, Jul 13, 2019 at 7:17 AM Wes McKinney > > wrote: > > > > > > > >> To be effective at improving the life of release managers, the nightly > > > >> release process really should use as close as possible to the same > > > >> scripts that the RM uses to produce the release. Otherwise we could > > > >> have a situation where the nightlies succeed but there is some problem > > > >> that either fails an RC or is unable to be produced at all. > > > >> > > > >> On Sat, Jul 13, 2019 at 9:12 AM Andy Grove > > wrote: > > > >> > > > > >> > I would like to volunteer to help with Java and Rust release process > > > >> work, > > > >> > especially nightly releases. > > > >> > > > > >> > Although I'm not that familiar with the Java implementation of > > Arrow, I > > > >> > have been using Java and Maven for a very long time. > > > >> > > > > >> > Do we envisage a single nightly release process that releases all > > > >> languages > > > >> > simultaneously? or do we want separate process per language, with > > > >> different > > > >> > maintainers? > > > >> > > > > >> > > > > >> > > > > >> > On Wed, Jul 10, 2019 at 8:18 AM Wes McKinney > > > >> wrote: > > > >> > > > > >> > > On Sun, Jul 7, 2019 at 7:40 PM Sutou Kouhei > > > >> wrote: > > > >> > > > > > > >> > > > Hi, > > > >> > > > > > > >> > > > > in future releases we should > > > >> > > > > institute a minimum 24-hour "quiet period" after any community > > > >> > > > > feedback on a release candidate to allow issues to be examined > > > >> > > > > further. > > > >> > > > > > > >> > > > I agree with this. I'll do so when I do a release manager in > > > >> > > > the future. > > > >> > > > > > > >> > > > > To be able to release more often, two things have to happen: > > > >> > > > > > > > >> > > > > * More PMC members must engage with the release management > > role, > > > >> > > > > process, and tools > > > >> > > > > * Continued improvements to release tooling to make the > > process > > > >> less > > > >> > > > > painful for the release manager. For example, it seems we may > > > >> want to > > > >> > > > > find a different place than Bintray to host binary artifacts > > > >> > > > > temporarily during release votes > > > >> > > > > > > >> > > > My opinion that we need to build nightly release system. > > > >> > > > > > > >> > > > It uses dev/release/NN-*.sh to build .tar.gz and binary > > > >> > > > artifacts from the .tar.gz. > > > >> > > > It also uses dev/release/verify-release-candidate.* to > > > >> > > > verify build .tar.gz and binary artifacts. > > > >> > > > It also uses dev/release/post-NN-*.sh to do post release > > > >> > > > tasks. (Some tasks such as uploading a package to packaging > > > >> > > > system will be dry-run.) > > > >> > > > > > > >> > > > > > >> > > I agree that having a turn-key release system that's capable of > > > >> > > producing nightly packages is the way to do. That
Re: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for Arrow 1.0.0 and beyond
+1 (non-binding) On Wed, Jul 31, 2019 at 8:59 AM Uwe L. Korn wrote: > +1 from me. > > I really like the separate versions > > Uwe > > On Tue, Jul 30, 2019, at 2:21 PM, Antoine Pitrou wrote: > > > > +1 from me. > > > > Regards > > > > Antoine. > > > > > > > > On Fri, 26 Jul 2019 14:33:30 -0500 > > Wes McKinney wrote: > > > hello, > > > > > > As discussed on the mailing list thread [1], Micah Kornfield has > > > proposed a version scheme for the project to take effect starting with > > > the 1.0.0 release. See document [2] containing a discussion of the > > > issues involved. > > > > > > To summarize my understanding of the plan: > > > > > > 1. TWO VERSIONS: As of 1.0.0, we establish separate FORMAT and LIBRARY > > > versions. Currently there is only a single version number. > > > > > > 2. SEMANTIC VERSIONING: We follow https://semver.org/ with regards to > > > communicating library API changes. Given the project's pace of > > > evolution, most releases are likely to be MAJOR releases according to > > > SemVer principles. > > > > > > 3. RELEASES: Releases of the project will be named according to the > > > LIBRARY version. A major release may or may not change the FORMAT > > > version. When a LIBRARY version has been released for a new FORMAT > > > version, the latter is considered to be released and official. > > > > > > 4. Each LIBRARY version will have a corresponding FORMAT version. For > > > example, LIBRARY versions 2.0.0 and 3.0.0 may track FORMAT version > > > 1.0.0. The idea is that FORMAT version will change less often than > > > LIBRARY version. > > > > > > 5. BACKWARD COMPATIBILITY GUARANTEE: A newer versioned client library > > > will be able to read any data and metadata produced by an older client > > > library. > > > > > > 6. FORWARD COMPATIBILITY GUARANTEE: An older client library must be > > > able to either read data generated from a new client library or detect > > > that it cannot properly read the data. > > > > > > 7. FORMAT MINOR VERSIONS: An increase in the minor version of the > > > FORMAT version, such as 1.0.0 to 1.1.0, indicates that 1.1.0 contains > > > new features not available in 1.0.0. So long as these features are not > > > used (such as a new logical data type), forward compatibility is > > > preserved. > > > > > > 8. FORMAT MAJOR VERSIONS: A change in the FORMAT major version > > > indicates a disruption to these compatibility guarantees in some way. > > > Hopefully we don't have to do this many times in our respective > > > lifetimes > > > > > > If I've misrepresented some aspect of the proposal it's fine to > > > discuss more and we can start a new votes. > > > > > > Please vote to approve this proposal. I'd like to keep this vote open > > > for 7 days (until Friday August 2) to allow for ample opportunities > > > for the community to have a look. > > > > > > [ ] +1 Adopt these version conventions and compatibility guarantees as > > > of Apache Arrow 1.0.0 > > > [ ] +0 > > > [ ] -1 I disagree because... > > > > > > Here is my vote: +1 > > > > > > Thanks > > > Wes > > > > > > [1]: > https://lists.apache.org/thread.html/5715a4d402c835d22d929a8069c5c0cf232077a660ee98639d544af8@%3Cdev.arrow.apache.org%3E > > > [2]: > https://docs.google.com/document/d/1uBitWu57rDu85tNHn0NwstAbrlYqor9dPFg_7QaE-nc/edit# > > > > > > > > > > > >
Re: Ursabot configuration within Arrow
We can now reproduce the builds locally (without the need of the web UI) with a single command: To demonstrate, building the master barnch and building a pull request requires the following commands: $ ursabot project build 'AMD64 Ubuntu 18.04 C++' $ ursabot project build -pr 'AMD64 Ubuntu 18.04 C++' See the output here: https://travis-ci.org/ursa-labs/ursabot/builds/566057077#L988 This effectively means, that the builders defined in ursabot can be directly runned on machones or CI services which have docker installed (with a single command). It also replaces the need of the docker-compose setup. I'm going to write some documentation and prepare the arrow builders for a donation to the arrow codebase (which of course requires a vote). If anyone has a question please don't hesitate to ask! Regards, Krisztian On Tue, Jul 30, 2019 at 4:45 PM Krisztián Szűcs wrote: > Ok, but the configuration movement to arrow is orthogonal to > the local reproducibility feature. Could we proceed with that? > > On Tue, Jul 30, 2019 at 4:38 PM Wes McKinney wrote: > >> I will defer to others to investigate this matter further but I would >> really like to see a concrete and practical path to local >> reproducibility before moving forward on any changes to our current >> CI. >> >> On Tue, Jul 30, 2019 at 7:38 AM Krisztián Szűcs >> wrote: >> > >> > Fixed it and restarted a bunch of builds. >> > >> > On Tue, Jul 30, 2019 at 5:13 AM Wes McKinney >> wrote: >> > >> > > By the way, can you please disable the Buildbot builders that are >> > > causing builds on master to fail? We haven't had a passing build in >> > > over a week. Until we reconcile the build configurations we shouldn't >> > > be failing contributors' builds >> > > >> > > On Mon, Jul 29, 2019 at 8:23 PM Wes McKinney >> wrote: >> > > > >> > > > On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs >> > > > wrote: >> > > > > >> > > > > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney > > >> > > wrote: >> > > > > >> > > > > > hi Krisztian, >> > > > > > >> > > > > > Before talking about any code donations or where to run builds, >> I >> > > > > > think we first need to discuss the worrisome situation where we >> have >> > > > > > in some cases 3 (or more) CI configurations for different >> components >> > > > > > in the project. >> > > > > > >> > > > > > Just taking into account out C++ build, we have: >> > > > > > >> > > > > > * A config for Travis CI >> > > > > > * Multiple configurations in Dockerfiles under cpp/ >> > > > > > * A brand new (?) configuration in this third party >> ursa-labs/ursabot >> > > > > > repository >> > > > > > >> > > > > > I note for example that the "AMD64 Conda C++" Buildbot build is >> > > > > > failing while Travis CI is succeeding >> > > > > > >> > > > > > https://ci.ursalabs.org/#builders/66/builds/3196 >> > > > > > >> > > > > > Starting from first principles, at least for Linux-based >> builds, what >> > > > > > I would like to see is: >> > > > > > >> > > > > > * A single build configuration (which can be driven by >> yaml-based >> > > > > > configuration files and environment variables), rather than 3 >> like we >> > > > > > have now. This build configuration should be decoupled from any >> CI >> > > > > > platform, including Travis CI and Buildbot >> > > > > > >> > > > > Yeah, this would be the ideal setup, but I'm afraid the situation >> is a >> > > bit >> > > > > more complicated. >> > > > > >> > > > > TravisCI >> > > > > >> > > > > >> > > > > constructed from a bunch of scripts optimized for travis, this >> setup is >> > > > > slow >> > > > > and hardly compatible with any of the remaining setups. >> > > > > I think we should ditch it. >> > > > > >> > > > > The "docker-compose setup" >> > > > > -- >> > > > > >> > > > > Most of the Dockerfiles are part of the docker-compose setup >> we've >> > > > > developed. >> > > > > This might be a good candidate as the tool to centralize around >> our >> > > future >> > > > > setup, mostly because docker-compose is widely used, and we could >> setup >> > > > > buildbot builders (or any other CI's) to execute the sequence of >> > > > > docker-compose >> > > > > build and docker-compose run commands. >> > > > > However docker-compose is not suitable for building and running >> > > > > hierarchical >> > > > > images. This is why we have added Makefile [1] to execute a >> "build" >> > > with a >> > > > > single make command instead of manually executing multiple >> commands >> > > > > involving >> > > > > multiple images (which is error prone). It can also leave a lot of >> > > garbage >> > > > > after both containers and images. >> > > > > Docker-compose shines when one needs to orchestrate multiple >> > > containers and >> > > > > their networks / volumes on the same machine. We made it work >> (with a >> > > > > couple of >> > > > > hacky workarounds) for arrow though. >> > > > > Despite that, I still consider the docker-compose setup a good >> > > solution,
[jira] [Created] (ARROW-6091) Implement parallel execution for limit
Andy Grove created ARROW-6091: - Summary: Implement parallel execution for limit Key: ARROW-6091 URL: https://issues.apache.org/jira/browse/ARROW-6091 Project: Apache Arrow Issue Type: Sub-task Components: Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6087) Implement parallel execution for CSV scan
Andy Grove created ARROW-6087: - Summary: Implement parallel execution for CSV scan Key: ARROW-6087 URL: https://issues.apache.org/jira/browse/ARROW-6087 Project: Apache Arrow Issue Type: Sub-task Components: Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6090) Implement parallel execution for hash aggregate
Andy Grove created ARROW-6090: - Summary: Implement parallel execution for hash aggregate Key: ARROW-6090 URL: https://issues.apache.org/jira/browse/ARROW-6090 Project: Apache Arrow Issue Type: Sub-task Components: Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6089) Implement parallel execution for selection
Andy Grove created ARROW-6089: - Summary: Implement parallel execution for selection Key: ARROW-6089 URL: https://issues.apache.org/jira/browse/ARROW-6089 Project: Apache Arrow Issue Type: Sub-task Components: Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6088) Implement parallel execution for projection
Andy Grove created ARROW-6088: - Summary: Implement parallel execution for projection Key: ARROW-6088 URL: https://issues.apache.org/jira/browse/ARROW-6088 Project: Apache Arrow Issue Type: Sub-task Components: Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6086) Implement parallel execution for parquet scan
Andy Grove created ARROW-6086: - Summary: Implement parallel execution for parquet scan Key: ARROW-6086 URL: https://issues.apache.org/jira/browse/ARROW-6086 Project: Apache Arrow Issue Type: Sub-task Components: Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove -- This message was sent by Atlassian JIRA (v7.6.14#76016)
Re: New version(s) on JIRA
Ok, I've created it as well. Regards Antoine. Le 31/07/2019 à 19:00, Wes McKinney a écrit : > Yes, I think we need 0.15.0 for this > > On Wed, Jul 31, 2019 at 10:42 AM Antoine Pitrou wrote: >> >> >> Thanks. I created "2.0.0". >> Will we also need a "0.15.0" for the flatbuffers alignment fix? >> >> Regards >> >> Antoine. >> >> >> Le 31/07/2019 à 03:00, Sutou Kouhei a écrit : >>> Hi, >>> >>> I think that "2.0.0" is better. Because we'll not release >>> "1.1.0". >>> >>> See also: >>> https://lists.apache.org/thread.html/d0ab931b15e75f745f8ae5a348f6c26a3e1f0bb98dc38a9a2c9888d3@%3Cdev.arrow.apache.org%3E >>> >>> >>> Thanks, >>> -- >>> kou >>> >>> In >>> "New version(s) on JIRA" on Tue, 30 Jul 2019 19:20:00 +0200, >>> Antoine Pitrou wrote: >>> Hi, Should we create a "1.1.0" (and/or "2.0.0") version on JIRA, to start assigning non-urgent issues? Regards Antoine.
Re: Building on Arrow CUDA
Hello Paul, you might want to look into https://github.com/conda-forge/conda-forge.github.io/issues/687 where CUDA support on conda-forge is dicussed. I'm not uptodate anymore on this but reading the whole issue should give you the current level of support. Once this is solved, adding cuda support to the Arrow packages on conda-forge should be really simple (but this issue is the major hurdle). Cheers Uwe On Thu, Jul 25, 2019, at 3:54 PM, Wes McKinney wrote: > hi Paul, > > On Wed, Jul 24, 2019 at 3:07 PM Paul Taylor wrote: > > > > I'm looking at options to replace the custom Arrow logic in cuDF with > > Arrow library calls. What's the recommended way to declare a dependency > > on pyarrow/arrowcpp with CUDA support? > > > > Well, for conda or wheel packages, we are not shipping with the CUDA > extensions enabled yet. So if you want to depend on one of those, you > will have to change that. My understanding is that it's possible to > build CUDA-enabled packages in conda-forge -- that would probably be > your best bet. Does anyone know examples of such packages that are > CUDA-enabled? > > > I see in the docs it says to build from source, but that's only an > > option for an (advanced) end-user. And building/vendoring > > libarrow_cuda.so isn't a great option for a non-Arrow library, because > > someone who does source build Arrow-with-cuda will conflict with the > > version we ship. > > > > Right now we're considering statically linking libarrow_cuda into > > libcudf.so and vendoring Arrow's cuda cython alongside ours, but this > > increases compile times/library size. > > > > Is there a package management solution (like pip/conda install > > pyarrow[cuda]) that I'm missing? If not, should there be? > > > > You can submit pull requests to > > * https://github.com/conda-forge/arrow-cpp-feedstock > * https://github.com/conda-forge/pyarrow-feedstock > > conda-forge itself can provide guidance at > https://gitter.im/conda-forge/conda-forge.github.io > > > Best, > > > > Paul > > >
Re: New version(s) on JIRA
Yes, I think we need 0.15.0 for this On Wed, Jul 31, 2019 at 10:42 AM Antoine Pitrou wrote: > > > Thanks. I created "2.0.0". > Will we also need a "0.15.0" for the flatbuffers alignment fix? > > Regards > > Antoine. > > > Le 31/07/2019 à 03:00, Sutou Kouhei a écrit : > > Hi, > > > > I think that "2.0.0" is better. Because we'll not release > > "1.1.0". > > > > See also: > > https://lists.apache.org/thread.html/d0ab931b15e75f745f8ae5a348f6c26a3e1f0bb98dc38a9a2c9888d3@%3Cdev.arrow.apache.org%3E > > > > > > Thanks, > > -- > > kou > > > > In > > "New version(s) on JIRA" on Tue, 30 Jul 2019 19:20:00 +0200, > > Antoine Pitrou wrote: > > > >> > >> Hi, > >> > >> Should we create a "1.1.0" (and/or "2.0.0") version on JIRA, to start > >> assigning non-urgent issues? > >> > >> Regards > >> > >> Antoine. > >>
Re: [DISCUSS][Format] FixedSizeList w/ row-length not specified as part of the type
I'm a little confused about the proposal now. If the unknown dimension doesn't have to be the same within a record batch, how would you be able to deduce it with the approach you described (dividing the logical length of the values array by the length of the record batch)? On Wed, Jul 31, 2019 at 8:24 AM Wes McKinney wrote: > I agree this sounds like a good application for ExtensionType. At > minimum, ExtensionType can be used to develop a working version of > what you need to help guide further discussions. > > On Mon, Jul 29, 2019 at 2:29 PM Francois Saint-Jacques > wrote: > > > > Hello, > > > > if each record has a different size, then I suggest to just use a > > Struct> where Dim is a struct (or expand in the outer > > struct). You can probably add your own logic with the recently > > introduced ExtensionType [1]. > > > > François > > [1] > https://github.com/apache/arrow/blob/f77c3427ca801597b572fb197b92b0133269049b/cpp/src/arrow/extension_type.h > > > > On Mon, Jul 29, 2019 at 3:15 PM Edward Loper > wrote: > > > > > > The intention is that each individual record could have a different > size. > > > This could be consistent within a given batch, but wouldn't need to be. > > > For example, if I wanted to send a 3-channel image, but the image size > may > > > vary for each record, then I could use > > > FixedSizeList[3]>[-1]>[-1]. > > > > > > On Mon, Jul 29, 2019 at 1:18 PM Brian Hulette > wrote: > > > > > > > This isn't really relevant but I feel compelled to point it out - the > > > > FixedSizeList type has actually been in the Arrow spec for a while, > but it > > > > was only implemented in JS and Java initially. It was implemented in > C++ > > > > just a few months ago. > > > > > > > > > > Thanks for the clarification -- I was going based on the blame history > for > > > Layout.rst, but I guess it just didn't get officially documented there > > > until the c++ implementation was added. > > > > > > -Edward > > > > > > > > > > On Mon, Jul 29, 2019 at 7:01 AM Edward Loper > > > > > wrote: > > > > > > > > > The FixedSizeList type, which was added to Arrow a few months ago, > is an > > > > > array where each slot contains a fixed-size sequence of values. > It is > > > > > specified as FixedSizeList[N], where T is a child type and N is > a > > > > signed > > > > > int32 that specifies the length of each list. > > > > > > > > > > This is useful for encoding fixed-size tensors. E.g., if I have a > > > > 100x8x10 > > > > > tensor, then I can encode it as > > > > > FixedSizeList[10]>[8]>[100]. > > > > > > > > > > But I'm also interested in encoding tensors where some dimension > sizes > > > > are > > > > > not known in advance. It seems to me that FixedSizeList could be > > > > extended > > > > > to support this fairly easily, by simply defining that N=-1 means > "each > > > > > array slot has the same length, but that length is not known in > advance." > > > > > So e.g. we could encode a 100x?x10 tensor as > > > > > FixedSizeList[10]>[-1]>[100]. > > > > > > > > > > Since these N=-1 row-lengths are not encoded in the type, we need > some > > > > way > > > > > to determine what they are. Luckily, every Field in the schema > has a > > > > > corresponding FieldNode in the message; and those FieldNodes can > be used > > > > to > > > > > deduce the row lengths. In particular, the row length must be > equal to > > > > the > > > > > length of the child node divided by the length of the > FixedSizeList. > > > > E.g., > > > > > if we have a FixedSizeList[-1] array with the values [[1, > 2], [3, > > > > 4], > > > > > [5, 6]] then the message representation is: > > > > > > > > > > * Length: 3, Null count: 0 > > > > > * Null bitmap buffer: Not required > > > > > * Values array (byte array): > > > > > * Length: 6, Null count: 0 > > > > > * Null bitmap buffer: Not required > > > > > * Value buffer: [1, 2, 3, 4, 5, 6, ] > > > > > > > > > > So we can deduce that the row length is 6/3=2. > > > > > > > > > > It looks to me like it would be fairly easy to add support for > this. > > > > E.g., > > > > > in the FixedSizeListArray constructor in c++, if > list_type()->list_size() > > > > > is -1, then set list_size_ to values.length()/length. There would > be no > > > > > changes to the schema.fbs/message.fbs files -- we would just be > > > > assigning a > > > > > meaning to something that's currently meaningless (having > > > > > FixedSizeList.listSize=-1). > > > > > > > > > > If there's support for adding this to Arrow, then I could put > together a > > > > > PR. > > > > > > > > > > Thanks, > > > > > -Edward > > > > > > > > > > P.S. Apologies if this gets posted twice -- I sent it out a couple > days > > > > ago > > > > > right before subscribing to the mailing list; but I don't see it > on the > > > > > archives, presumably because I wasn't subscribed yet when I sent > it out. > > > > > > > > > >
[jira] [Created] (ARROW-6085) Create traits for phsyical query plan
Andy Grove created ARROW-6085: - Summary: Create traits for phsyical query plan Key: ARROW-6085 URL: https://issues.apache.org/jira/browse/ARROW-6085 Project: Apache Arrow Issue Type: Sub-task Components: Rust, Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.15.0 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6084) [Python] Support LargeList
Antoine Pitrou created ARROW-6084: - Summary: [Python] Support LargeList Key: ARROW-6084 URL: https://issues.apache.org/jira/browse/ARROW-6084 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Antoine Pitrou -- This message was sent by Atlassian JIRA (v7.6.14#76016)
Re: [DISCUSS] Release cadence and release vote conventions
To what extent would it be possible to automate the release process via CICD? On Wed, Jul 31, 2019 at 9:19 AM Wes McKinney wrote: > I think one thing that would help would be improving the > reproducibility of the source release process. The RM has to have > their machine configured in a particular way for it to work. > > Before anyone says "Docker" it isn't an easy solution because the > release scripts need to be able to create git commits (created by the > Maven release plugin) and sign artifacts using the RM's GPG keys. > > On Sat, Jul 27, 2019 at 10:04 PM Micah Kornfield > wrote: > > > > I just wanted to bump this thread. Kou and Krisztián as the last two > > release managers is there any specific infrastructure that you think > might > > have helped? > > > > Thanks, > > Micah > > > > On Wed, Jul 17, 2019 at 11:29 PM Micah Kornfield > > wrote: > > > > > I'd can help as well, but not exactly sure where to start. It seems > like > > > there are already some JIRAs opened [1] > > > for improving the release? Could someone more familiar with the > process > > > pick out the highest priority ones? Do more need to be opened? > > > > > > Thanks, > > > Micah > > > > > > [1] > > > > https://issues.apache.org/jira/browse/ARROW-2880?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(%22Developer%20Tools%22%2C%20Packaging)%20and%20summary%20~%20Release > > > > > > On Sat, Jul 13, 2019 at 7:17 AM Wes McKinney > wrote: > > > > > >> To be effective at improving the life of release managers, the nightly > > >> release process really should use as close as possible to the same > > >> scripts that the RM uses to produce the release. Otherwise we could > > >> have a situation where the nightlies succeed but there is some problem > > >> that either fails an RC or is unable to be produced at all. > > >> > > >> On Sat, Jul 13, 2019 at 9:12 AM Andy Grove > wrote: > > >> > > > >> > I would like to volunteer to help with Java and Rust release process > > >> work, > > >> > especially nightly releases. > > >> > > > >> > Although I'm not that familiar with the Java implementation of > Arrow, I > > >> > have been using Java and Maven for a very long time. > > >> > > > >> > Do we envisage a single nightly release process that releases all > > >> languages > > >> > simultaneously? or do we want separate process per language, with > > >> different > > >> > maintainers? > > >> > > > >> > > > >> > > > >> > On Wed, Jul 10, 2019 at 8:18 AM Wes McKinney > > >> wrote: > > >> > > > >> > > On Sun, Jul 7, 2019 at 7:40 PM Sutou Kouhei > > >> wrote: > > >> > > > > > >> > > > Hi, > > >> > > > > > >> > > > > in future releases we should > > >> > > > > institute a minimum 24-hour "quiet period" after any community > > >> > > > > feedback on a release candidate to allow issues to be examined > > >> > > > > further. > > >> > > > > > >> > > > I agree with this. I'll do so when I do a release manager in > > >> > > > the future. > > >> > > > > > >> > > > > To be able to release more often, two things have to happen: > > >> > > > > > > >> > > > > * More PMC members must engage with the release management > role, > > >> > > > > process, and tools > > >> > > > > * Continued improvements to release tooling to make the > process > > >> less > > >> > > > > painful for the release manager. For example, it seems we may > > >> want to > > >> > > > > find a different place than Bintray to host binary artifacts > > >> > > > > temporarily during release votes > > >> > > > > > >> > > > My opinion that we need to build nightly release system. > > >> > > > > > >> > > > It uses dev/release/NN-*.sh to build .tar.gz and binary > > >> > > > artifacts from the .tar.gz. > > >> > > > It also uses dev/release/verify-release-candidate.* to > > >> > > > verify build .tar.gz and binary artifacts. > > >> > > > It also uses dev/release/post-NN-*.sh to do post release > > >> > > > tasks. (Some tasks such as uploading a package to packaging > > >> > > > system will be dry-run.) > > >> > > > > > >> > > > > >> > > I agree that having a turn-key release system that's capable of > > >> > > producing nightly packages is the way to do. That way any problems > > >> > > that would block a release will come up as they happen rather than > > >> > > piling up until the very end like they are now. > > >> > > > > >> > > > I needed 10 or more changes for dev/release/ to create > > >> > > > 0.14.0 RC0. (Some of them are still in my local stashes. I > > >> > > > don't have time to create pull requests for them > > >> > > > yet. Because I postponed some tasks of my main > > >> > > > business. I'll create pull requests after I finished the > > >> > > > postponed tasks of my main business.) > > >> > > > > > >> > > > > >> > > Thanks. I'll follow up on the 0.14.1/0.15.0 thread -- since we > need to > > >> > > release again soon because of problems with 0.14.0 please let us > know > > >> > > what patches will be needed to m
Re: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for Arrow 1.0.0 and beyond
+1 from me. I really like the separate versions Uwe On Tue, Jul 30, 2019, at 2:21 PM, Antoine Pitrou wrote: > > +1 from me. > > Regards > > Antoine. > > > > On Fri, 26 Jul 2019 14:33:30 -0500 > Wes McKinney wrote: > > hello, > > > > As discussed on the mailing list thread [1], Micah Kornfield has > > proposed a version scheme for the project to take effect starting with > > the 1.0.0 release. See document [2] containing a discussion of the > > issues involved. > > > > To summarize my understanding of the plan: > > > > 1. TWO VERSIONS: As of 1.0.0, we establish separate FORMAT and LIBRARY > > versions. Currently there is only a single version number. > > > > 2. SEMANTIC VERSIONING: We follow https://semver.org/ with regards to > > communicating library API changes. Given the project's pace of > > evolution, most releases are likely to be MAJOR releases according to > > SemVer principles. > > > > 3. RELEASES: Releases of the project will be named according to the > > LIBRARY version. A major release may or may not change the FORMAT > > version. When a LIBRARY version has been released for a new FORMAT > > version, the latter is considered to be released and official. > > > > 4. Each LIBRARY version will have a corresponding FORMAT version. For > > example, LIBRARY versions 2.0.0 and 3.0.0 may track FORMAT version > > 1.0.0. The idea is that FORMAT version will change less often than > > LIBRARY version. > > > > 5. BACKWARD COMPATIBILITY GUARANTEE: A newer versioned client library > > will be able to read any data and metadata produced by an older client > > library. > > > > 6. FORWARD COMPATIBILITY GUARANTEE: An older client library must be > > able to either read data generated from a new client library or detect > > that it cannot properly read the data. > > > > 7. FORMAT MINOR VERSIONS: An increase in the minor version of the > > FORMAT version, such as 1.0.0 to 1.1.0, indicates that 1.1.0 contains > > new features not available in 1.0.0. So long as these features are not > > used (such as a new logical data type), forward compatibility is > > preserved. > > > > 8. FORMAT MAJOR VERSIONS: A change in the FORMAT major version > > indicates a disruption to these compatibility guarantees in some way. > > Hopefully we don't have to do this many times in our respective > > lifetimes > > > > If I've misrepresented some aspect of the proposal it's fine to > > discuss more and we can start a new votes. > > > > Please vote to approve this proposal. I'd like to keep this vote open > > for 7 days (until Friday August 2) to allow for ample opportunities > > for the community to have a look. > > > > [ ] +1 Adopt these version conventions and compatibility guarantees as > > of Apache Arrow 1.0.0 > > [ ] +0 > > [ ] -1 I disagree because... > > > > Here is my vote: +1 > > > > Thanks > > Wes > > > > [1]: > > https://lists.apache.org/thread.html/5715a4d402c835d22d929a8069c5c0cf232077a660ee98639d544af8@%3Cdev.arrow.apache.org%3E > > [2]: > > https://docs.google.com/document/d/1uBitWu57rDu85tNHn0NwstAbrlYqor9dPFg_7QaE-nc/edit# > > > > > >
Re: New version(s) on JIRA
Thanks. I created "2.0.0". Will we also need a "0.15.0" for the flatbuffers alignment fix? Regards Antoine. Le 31/07/2019 à 03:00, Sutou Kouhei a écrit : > Hi, > > I think that "2.0.0" is better. Because we'll not release > "1.1.0". > > See also: > https://lists.apache.org/thread.html/d0ab931b15e75f745f8ae5a348f6c26a3e1f0bb98dc38a9a2c9888d3@%3Cdev.arrow.apache.org%3E > > > Thanks, > -- > kou > > In > "New version(s) on JIRA" on Tue, 30 Jul 2019 19:20:00 +0200, > Antoine Pitrou wrote: > >> >> Hi, >> >> Should we create a "1.1.0" (and/or "2.0.0") version on JIRA, to start >> assigning non-urgent issues? >> >> Regards >> >> Antoine. >>
Re: [DISCUSS][Format] FixedSizeList w/ row-length not specified as part of the type
I agree this sounds like a good application for ExtensionType. At minimum, ExtensionType can be used to develop a working version of what you need to help guide further discussions. On Mon, Jul 29, 2019 at 2:29 PM Francois Saint-Jacques wrote: > > Hello, > > if each record has a different size, then I suggest to just use a > Struct> where Dim is a struct (or expand in the outer > struct). You can probably add your own logic with the recently > introduced ExtensionType [1]. > > François > [1] > https://github.com/apache/arrow/blob/f77c3427ca801597b572fb197b92b0133269049b/cpp/src/arrow/extension_type.h > > On Mon, Jul 29, 2019 at 3:15 PM Edward Loper > wrote: > > > > The intention is that each individual record could have a different size. > > This could be consistent within a given batch, but wouldn't need to be. > > For example, if I wanted to send a 3-channel image, but the image size may > > vary for each record, then I could use > > FixedSizeList[3]>[-1]>[-1]. > > > > On Mon, Jul 29, 2019 at 1:18 PM Brian Hulette wrote: > > > > > This isn't really relevant but I feel compelled to point it out - the > > > FixedSizeList type has actually been in the Arrow spec for a while, but it > > > was only implemented in JS and Java initially. It was implemented in C++ > > > just a few months ago. > > > > > > > Thanks for the clarification -- I was going based on the blame history for > > Layout.rst, but I guess it just didn't get officially documented there > > until the c++ implementation was added. > > > > -Edward > > > > > > > On Mon, Jul 29, 2019 at 7:01 AM Edward Loper > > > wrote: > > > > > > > The FixedSizeList type, which was added to Arrow a few months ago, is an > > > > array where each slot contains a fixed-size sequence of values. It is > > > > specified as FixedSizeList[N], where T is a child type and N is a > > > signed > > > > int32 that specifies the length of each list. > > > > > > > > This is useful for encoding fixed-size tensors. E.g., if I have a > > > 100x8x10 > > > > tensor, then I can encode it as > > > > FixedSizeList[10]>[8]>[100]. > > > > > > > > But I'm also interested in encoding tensors where some dimension sizes > > > are > > > > not known in advance. It seems to me that FixedSizeList could be > > > extended > > > > to support this fairly easily, by simply defining that N=-1 means "each > > > > array slot has the same length, but that length is not known in > > > > advance." > > > > So e.g. we could encode a 100x?x10 tensor as > > > > FixedSizeList[10]>[-1]>[100]. > > > > > > > > Since these N=-1 row-lengths are not encoded in the type, we need some > > > way > > > > to determine what they are. Luckily, every Field in the schema has a > > > > corresponding FieldNode in the message; and those FieldNodes can be used > > > to > > > > deduce the row lengths. In particular, the row length must be equal to > > > the > > > > length of the child node divided by the length of the FixedSizeList. > > > E.g., > > > > if we have a FixedSizeList[-1] array with the values [[1, 2], [3, > > > 4], > > > > [5, 6]] then the message representation is: > > > > > > > > * Length: 3, Null count: 0 > > > > * Null bitmap buffer: Not required > > > > * Values array (byte array): > > > > * Length: 6, Null count: 0 > > > > * Null bitmap buffer: Not required > > > > * Value buffer: [1, 2, 3, 4, 5, 6, ] > > > > > > > > So we can deduce that the row length is 6/3=2. > > > > > > > > It looks to me like it would be fairly easy to add support for this. > > > E.g., > > > > in the FixedSizeListArray constructor in c++, if > > > > list_type()->list_size() > > > > is -1, then set list_size_ to values.length()/length. There would be no > > > > changes to the schema.fbs/message.fbs files -- we would just be > > > assigning a > > > > meaning to something that's currently meaningless (having > > > > FixedSizeList.listSize=-1). > > > > > > > > If there's support for adding this to Arrow, then I could put together a > > > > PR. > > > > > > > > Thanks, > > > > -Edward > > > > > > > > P.S. Apologies if this gets posted twice -- I sent it out a couple days > > > ago > > > > right before subscribing to the mailing list; but I don't see it on the > > > > archives, presumably because I wasn't subscribed yet when I sent it out. > > > > > > >
Re: [DISCUSS] Release cadence and release vote conventions
I think one thing that would help would be improving the reproducibility of the source release process. The RM has to have their machine configured in a particular way for it to work. Before anyone says "Docker" it isn't an easy solution because the release scripts need to be able to create git commits (created by the Maven release plugin) and sign artifacts using the RM's GPG keys. On Sat, Jul 27, 2019 at 10:04 PM Micah Kornfield wrote: > > I just wanted to bump this thread. Kou and Krisztián as the last two > release managers is there any specific infrastructure that you think might > have helped? > > Thanks, > Micah > > On Wed, Jul 17, 2019 at 11:29 PM Micah Kornfield > wrote: > > > I'd can help as well, but not exactly sure where to start. It seems like > > there are already some JIRAs opened [1] > > for improving the release? Could someone more familiar with the process > > pick out the highest priority ones? Do more need to be opened? > > > > Thanks, > > Micah > > > > [1] > > https://issues.apache.org/jira/browse/ARROW-2880?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(%22Developer%20Tools%22%2C%20Packaging)%20and%20summary%20~%20Release > > > > On Sat, Jul 13, 2019 at 7:17 AM Wes McKinney wrote: > > > >> To be effective at improving the life of release managers, the nightly > >> release process really should use as close as possible to the same > >> scripts that the RM uses to produce the release. Otherwise we could > >> have a situation where the nightlies succeed but there is some problem > >> that either fails an RC or is unable to be produced at all. > >> > >> On Sat, Jul 13, 2019 at 9:12 AM Andy Grove wrote: > >> > > >> > I would like to volunteer to help with Java and Rust release process > >> work, > >> > especially nightly releases. > >> > > >> > Although I'm not that familiar with the Java implementation of Arrow, I > >> > have been using Java and Maven for a very long time. > >> > > >> > Do we envisage a single nightly release process that releases all > >> languages > >> > simultaneously? or do we want separate process per language, with > >> different > >> > maintainers? > >> > > >> > > >> > > >> > On Wed, Jul 10, 2019 at 8:18 AM Wes McKinney > >> wrote: > >> > > >> > > On Sun, Jul 7, 2019 at 7:40 PM Sutou Kouhei > >> wrote: > >> > > > > >> > > > Hi, > >> > > > > >> > > > > in future releases we should > >> > > > > institute a minimum 24-hour "quiet period" after any community > >> > > > > feedback on a release candidate to allow issues to be examined > >> > > > > further. > >> > > > > >> > > > I agree with this. I'll do so when I do a release manager in > >> > > > the future. > >> > > > > >> > > > > To be able to release more often, two things have to happen: > >> > > > > > >> > > > > * More PMC members must engage with the release management role, > >> > > > > process, and tools > >> > > > > * Continued improvements to release tooling to make the process > >> less > >> > > > > painful for the release manager. For example, it seems we may > >> want to > >> > > > > find a different place than Bintray to host binary artifacts > >> > > > > temporarily during release votes > >> > > > > >> > > > My opinion that we need to build nightly release system. > >> > > > > >> > > > It uses dev/release/NN-*.sh to build .tar.gz and binary > >> > > > artifacts from the .tar.gz. > >> > > > It also uses dev/release/verify-release-candidate.* to > >> > > > verify build .tar.gz and binary artifacts. > >> > > > It also uses dev/release/post-NN-*.sh to do post release > >> > > > tasks. (Some tasks such as uploading a package to packaging > >> > > > system will be dry-run.) > >> > > > > >> > > > >> > > I agree that having a turn-key release system that's capable of > >> > > producing nightly packages is the way to do. That way any problems > >> > > that would block a release will come up as they happen rather than > >> > > piling up until the very end like they are now. > >> > > > >> > > > I needed 10 or more changes for dev/release/ to create > >> > > > 0.14.0 RC0. (Some of them are still in my local stashes. I > >> > > > don't have time to create pull requests for them > >> > > > yet. Because I postponed some tasks of my main > >> > > > business. I'll create pull requests after I finished the > >> > > > postponed tasks of my main business.) > >> > > > > >> > > > >> > > Thanks. I'll follow up on the 0.14.1/0.15.0 thread -- since we need to > >> > > release again soon because of problems with 0.14.0 please let us know > >> > > what patches will be needed to make another release. > >> > > > >> > > > If we fix problems related to dev/release/ in our normal > >> > > > development process, release process will be less painful. > >> > > > > >> > > > The biggest problem for 0.14.0 RC0 is java/pom.xml related: > >> > > > https://github.com/apache/arrow/pull/4717 > >> > > > > >> > > > It was difficult for me because I don't have Jav
[jira] [Created] (ARROW-6083) [Java] Refactor Jdbc adapter consume logic
Ji Liu created ARROW-6083: - Summary: [Java] Refactor Jdbc adapter consume logic Key: ARROW-6083 URL: https://issues.apache.org/jira/browse/ARROW-6083 Project: Apache Arrow Issue Type: Improvement Reporter: Ji Liu Assignee: Ji Liu Jdbc adapter read from {{ResultSet}} looks like: while (rs.next()) { for (int i = 1; i <= columnCount; i++) { jdbcToFieldVector( rs, i, rs.getMetaData().getColumnType(i), rowCount, root.getVector(rsmd.getColumnName(i)), config); } rowCount++; } And in {{jdbcToFieldVector}} has lots of switch-case, that is to see, for every single value from ResultSet we have to do lots of analyzing conditions. I think we could optimize this using consumer/delegate like avro adapter. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6082) [Python] create pa.dictionary() type with non-integer indices type crashes
Joris Van den Bossche created ARROW-6082: Summary: [Python] create pa.dictionary() type with non-integer indices type crashes Key: ARROW-6082 URL: https://issues.apache.org/jira/browse/ARROW-6082 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Joris Van den Bossche For example if you mixed the order of the indices and values type: {code} In [1]: pa.dictionary(pa.int8(), pa.string()) Out[1]: DictionaryType(dictionary) In [2]: pa.dictionary(pa.string(), pa.int8()) WARNING: Logging before InitGoogleLogging() is written to STDERR F0731 14:40:42.748589 26310 type.cc:440] Check failed: is_integer(index_type->id()) dictionary index type should be signed integer *** Check failure stack trace: *** Aborted (core dumped) {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6081) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptb2ao6te_job_6e0a8ca1.parquet'
David Draper created ARROW-6081: --- Summary: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptb2ao6te_job_6e0a8ca1.parquet' Key: ARROW-6081 URL: https://issues.apache.org/jira/browse/ARROW-6081 Project: Apache Arrow Issue Type: Bug Reporter: David Draper Any idea on how to fix this error? Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 1530, in load_table_from_dataframe dataframe.to_parquet(tmppath) File "/usr/local/lib64/python3.6/site-packages/pandas/core/frame.py", line 2203, in to_parquet partition_cols=partition_cols, **kwargs) File "/usr/local/lib64/python3.6/site-packages/pandas/io/parquet.py", line 252, in to_parquet partition_cols=partition_cols, **kwargs) File "/usr/local/lib64/python3.6/site-packages/pandas/io/parquet.py", line 122, in write coerce_timestamps=coerce_timestamps, **kwargs) File "/usr/local/lib64/python3.6/site-packages/pyarrow/parquet.py", line 1270, in write_table writer.write_table(table, row_group_size=row_group_size) File "/usr/local/lib64/python3.6/site-packages/pyarrow/parquet.py", line 426, in write_table self.writer.write_table(table, row_group_size=row_group_size) File "pyarrow/_parquet.pyx", line 1311, in pyarrow._parquet.ParquetWriter.write_table File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Nested column branch had multiple children During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/var/cache/tomcat/temp/interpreter-2169813765840716657.tmp", line 84, in client.load_table_from_dataframe(appended_data, table_ref,job_config=job_config).result() File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 1546, in load_table_from_dataframe os.remove(tmppath) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptb2ao6te_job_6e0a8ca1.parquet' -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6080) [Java] Support search operation for BaseRepeatedValueVector
Liya Fan created ARROW-6080: --- Summary: [Java] Support search operation for BaseRepeatedValueVector Key: ARROW-6080 URL: https://issues.apache.org/jira/browse/ARROW-6080 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Liya Fan Assignee: Liya Fan -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6079) [Java] Implement/test UnionFixedSizeListWriter for FixedSizeListVector
Ji Liu created ARROW-6079: - Summary: [Java] Implement/test UnionFixedSizeListWriter for FixedSizeListVector Key: ARROW-6079 URL: https://issues.apache.org/jira/browse/ARROW-6079 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Ji Liu Assignee: Ji Liu Now we have two list vectors: {{ListVector}} and {{FixedSizeListVector}}. {{ListVector}} has already implemented UnionListWriter for writing data, however, {{FixedSizeListVector}} doesn't have this yet and seems the only way for users to write data is getting inner vector and set value manually. Implement a writer for {{FixedSizeListVector}} is useful in some cases. -- This message was sent by Atlassian JIRA (v7.6.14#76016)