[jira] [Created] (ARROW-2786) [JS] Read Parquet files in JavaScript
Wes McKinney created ARROW-2786: --- Summary: [JS] Read Parquet files in JavaScript Key: ARROW-2786 URL: https://issues.apache.org/jira/browse/ARROW-2786 Project: Apache Arrow Issue Type: New Feature Components: JavaScript Reporter: Wes McKinney See question in https://github.com/apache/arrow/issues/2209 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2785) [C++] Crash in json-integration-test
Antoine Pitrou created ARROW-2785: - Summary: [C++] Crash in json-integration-test Key: ARROW-2785 URL: https://issues.apache.org/jira/browse/ARROW-2785 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou This is probably something I keep getting wrong when creating a new environment, but after creating a Python 3.7 conda environment and installing the tool chain, I get the following crash (apparently boost-related): {code} $ ./build-test/debug/json-integration-test [==] Running 2 tests from 1 test case. [--] Global test environment set-up. [--] 2 tests from TestJSONIntegration [ RUN ] TestJSONIntegration.ConvertAndValidate *** Error in `./build-test/debug/json-integration-test': munmap_chunk(): invalid pointer: 0x7ffc22542578 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f4762f257e5] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7f4762f32698] /home/antoine/miniconda3/envs/pyarrow37/lib/libstdc++.so.6(_ZNSsD1Ev+0x15)[0x7f476384cca5] ./build-test/debug/json-integration-test(_ZN5boost10filesystem4pathD1Ev+0x18)[0x694f4a] ./build-test/debug/json-integration-test[0x69205a] ./build-test/debug/json-integration-test(_ZN5arrow3ipc19TestJSONIntegration7mkstempEv+0x2c)[0x69599e] ./build-test/debug/json-integration-test(_ZN5arrow3ipc43TestJSONIntegration_ConvertAndValidate_Test8TestBodyEv+0x3b)[0x69210f] ./build-test/debug/json-integration-test(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x65)[0x8759da] ./build-test/debug/json-integration-test(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x5a)[0x86f65d] ./build-test/debug/json-integration-test(_ZN7testing4Test3RunEv+0xd5)[0x853697] ./build-test/debug/json-integration-test(_ZN7testing8TestInfo3RunEv+0x105)[0x853fef] ./build-test/debug/json-integration-test(_ZN7testing8TestCase3RunEv+0xf4)[0x8546f8] ./build-test/debug/json-integration-test(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x2ac)[0x85b666] ./build-test/debug/json-integration-test(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x65)[0x876eb7] ./build-test/debug/json-integration-test(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x5a)[0x870327] ./build-test/debug/json-integration-test(_ZN7testing8UnitTest3RunEv+0xc6)[0x85a128] ./build-test/debug/json-integration-test(_Z13RUN_ALL_TESTSv+0x11)[0x6945e6] ./build-test/debug/json-integration-test(main+0xfb)[0x693a2b] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f4762ece830] ./build-test/debug/json-integration-test(_start+0x29)[0x68b4a9] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2784) [C++] MemoryMappedFile::WriteAt allow writing past the end
Dimitri Vorona created ARROW-2784: - Summary: [C++] MemoryMappedFile::WriteAt allow writing past the end Key: ARROW-2784 URL: https://issues.apache.org/jira/browse/ARROW-2784 Project: Apache Arrow Issue Type: New Feature Components: C++ Affects Versions: 0.9.0 Reporter: Dimitri Vorona There is a missing check in WriteAt, this PR adds it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2783) Importing conda-forge pyarrow fails
Phillip Cloud created ARROW-2783: Summary: Importing conda-forge pyarrow fails Key: ARROW-2783 URL: https://issues.apache.org/jira/browse/ARROW-2783 Project: Apache Arrow Issue Type: Task Components: Python Affects Versions: 0.9.0 Reporter: Phillip Cloud Possibly related to: https://issues.apache.org/jira/projects/ARROW/issues/ARROW-2770 Steps to reproduce: {code} $ conda create -n test python=3 pyarrow -c conda-forge -y $ conda activate test $ python -c 'import pyarrow' {code} This gives: {code} Traceback (most recent call last): File "", line 1, in File "/home/phillip/miniconda3/envs/py36/lib/python3.6/site-packages/pyarrow/__init__.py", line 47 , in from pyarrow.lib import cpu_count, set_cpu_count ImportError: libboost_system.so.1.65.1: cannot open shared object file: No such file or directory {code} Downgrading boost to {{1.65.1}} gives a symbol lookup error: {code} $ conda install boost-cpp=1.65.1 -y -c conda-forge $ python -c 'import pyarrow' Traceback (most recent call last): File "", line 1, in File "/home/phillip/miniconda3/envs/py36/lib/python3.6/site-packages/pyarrow/__init__.py", line 47 , in from pyarrow.lib import cpu_count, set_cpu_count ImportError: /home/phillip/miniconda3/envs/py36/lib/python3.6/site-packages/pyarrow/../../../libarrow.so.0: undefined symbol: _ZN5boost13match_resultsIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcESaINS_9sub_matchISB_12maybe_assignERKSF_ {code} Installing {{pyarrow}} from {{defaults}} and importing it works fine. cc [~kszucs] [~xhochy] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Recruiting more maintainers for Apache Arrow
Hi, Le 02/07/2018 à 15:58, Wes McKinney a écrit : > * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html > * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html Very good articles, but I would stress that some of the mechanisms proposed lack metrics in their favour. Two particular examples that I know about: 1) """ I seem to recall Martin van Loewis offering to review one externally contributed patch for every ten other patches reviewed by the submitter. (I can’t find the link, sorry!) This imposes work requirements on would-be contributors that obligate them to contribute substantively to the project maintenance, before their pet feature gets implemented. """ Martin's offer was almost never taken up, although he expressed it many times during many years. I think there are two factors to it: a) Cost. As an occasional contributor, I could understand having to do a review before contributing a patch of mine, but not having to do 5 or more reviews for each patch I contribute. The effort asked is much too high, and you're probably discouraging people who are discovering the project, even before they could get hooked on it. b) Difficult. It's much more difficult and intimidating to review someone else's PR, than to propose your own changes knowing that it will be reviewed by (you are assuming) competent people. So this mechanism is excluding first-time contributors, which is probably *not* what you want. 2) """ Some projects have excellent incubators, like the Python Core Mentorship Program, where people who are interested in applying their effort to recruiting new contributors can do so. """ Actually, it doesn't seem to me that a significant proportion of frequent Python contributors have gone through the core mentorship process. It probably got us a handful of one-time contributions. Pointing to the Python core mentorship program as an "excellent incubator" sounds rather far-fetched to me. Generally speaking, there's a limit to the usefulness of hand-holding contributors, especially if your project is rather complex (as Python is), because the blocking point for contributors is *not* that the development mailing-list is a bit intimidating (as was claimed by the people who founded the Python core mentorship program). PS : as a matter of fact, the general rate of contributions to Python has been *decreasing* for years. Regards Antoine.
Re: Recruiting more maintainers for Apache Arrow
Hi folks, I would like to highlight that the challenges we are having are endemic to many parts of the open source world right now. A colleague of mine in the Python world wrote some pieces about this recently: * http://ivory.idyll.org/blog/2018-how-open-is-too-open.html * http://ivory.idyll.org/blog/2018-oss-framework-cpr.html Here are some quotes from those pieces: "This need for constant attention to projects, the sprawling ecosystem of amazing scientific software packages, and the relatively small community of actual maintainers, when combined, lead to the open source sustainability problem in science: we do not have the person power to keep it all running without heroic efforts. And when you couple this with the lack of clear career paths for software maintenance in science, it is clear that we cannot ethically and sustainably recruit more people into open source maintainership." I would say that "heroics" does describe some of the occasional behavior of Arrow maintainers. The trouble with "heroics" (which translates practically speaking to "overwork") is that if sustained for a long period of time, it surely leads to burnout and depression. I can speak from personal experience. On a later point in this quote about "lack of clear career paths for software maintenance", rather than griping about the problem, I decided to do something about it. I have recently created a new organization so that I can a) enable organizations to directly fund Arrow maintenance and b) provide secure full-time employment to Arrow maintainers "Second, the cost of the constant maintenance needs (code, documentation, installation, etc.) on the pool of available effort needs to be taken into account. Contributions of new features that do not come with effort applied to maintenance should be carefully considered - is this new contributor likely to stick around? Can they and will they devote some effort to maintenance? If not, maybe those contributions should be deferred in favor of contributions that add maintenance effort to the project, e.g. via partnerships." I see both sides of this argument. I think we need to be more proactive about requesting maintenance help from "extractive" contributors who are mostly "taking" from the project and giving relatively little to support the overall health of the project. "Fourth, there are some interesting governance implications around allowing all or most of the resource appropriators to participate in decision making. I need to dig more into this, but, briefly, I think projects should formally lay out what level of investment and contribution is rewarded with what kind of operational, policy making, and constitutional decision making authority." Apache governance already provides a framework for obtaining decision making authority in a project. Suffice to say, I would be hesistant to support a new PMC member who has not engaged on project maintenance. - Wes On Mon, Jul 2, 2018 at 7:03 AM, Antoine Pitrou wrote: > > Hi Dimitri, > > Le 02/07/2018 à 12:46, Dimitri Vorona a écrit : >> Hi Wes, >> >> to contribute an outsiders POW: while it is clear, what's expected if you'd >> like to make a PR, it's not at all clear to me, where would I start if I >> wanted to help with PR reviews without being heavily involved with the >> community/being a full maintainer. Should I just grab a PR, test it, >> comment on changes? I wouldn't be sure if I were stepping on someone's >> feet, tbh. > > You don't have to manually test a PR, unless you want to be sure about > semantics that are not part of the tests added in the PR (but then it > would be a good idea to mention that the tests don't exercise the > semantics enough :-)). > > From my point of view (generally as an open source developer and > maintainer, this isn't specific to Arrow), reviewing is: > > * checking for soundness of concepts (if the PR adds any of them) > * checking for maintainability and readability of code > * checking for smelly coding patterns, possible sources of bugs etc. > * depending on the context, checking for possible performance issues > * any potential problem that your personal expertise may help you detect > > If you're not sure about a comment and hesitate posting it, a good > solution is to phrase it as a question. > > Regards > > Antoine.
[jira] [Created] (ARROW-2782) [Python] Ongoing Travis CI failures in Plasma unit tests
Wes McKinney created ARROW-2782: --- Summary: [Python] Ongoing Travis CI failures in Plasma unit tests Key: ARROW-2782 URL: https://issues.apache.org/jira/browse/ARROW-2782 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Wes McKinney Fix For: 0.10.0 e.g. {code} [1m[31m_ test_use_huge_pages __[0m [1m@pytest.mark.skipif(not os.path.exists("/mnt/hugepages"),[0m [1mreason="requires hugepage support")[0m [1mdef test_use_huge_pages():[0m [1mimport pyarrow.plasma as plasma[0m [1mwith plasma.start_plasma_store([0m [1mplasma_store_memory=2*10**9,[0m [1mplasma_directory="/mnt/hugepages",[0m [1muse_hugepages=True) as (plasma_store_name, p):[0m [1mplasma_client = plasma.connect(plasma_store_name, "", 64)[0m [1m> create_object(plasma_client, 10**8)[0m [1m[31mpyarrow/tests/test_plasma.py[0m:773: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [1m[31mpyarrow/tests/test_plasma.py[0m:79: in create_object [1mseal=seal)[0m [1m[31mpyarrow/tests/test_plasma.py[0m:68: in create_object_with_id [1mmemory_buffer = client.create(object_id, data_size, metadata)[0m [1m[31mpyarrow/_plasma.pyx[0m:300: in pyarrow._plasma.PlasmaClient.create [1mcheck_status(self.client.get().Create(object_id.data, data_size,[0m _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [1m> raise PlasmaStoreFull(message)[0m [1m[31mE PlasmaStoreFull: /home/travis/build/apache/arrow/cpp/src/plasma/client.cc:375 code: ReadCreateReply(buffer.data(), buffer.size(), &id, &object, &store_fd, &mmap_size)[0m [1m[31mE object does not fit in the plasma store[0m [1m[31mpyarrow/error.pxi[0m:99: PlasmaStoreFull {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Recruiting more maintainers for Apache Arrow
Hi Dimitri, Le 02/07/2018 à 12:46, Dimitri Vorona a écrit : > Hi Wes, > > to contribute an outsiders POW: while it is clear, what's expected if you'd > like to make a PR, it's not at all clear to me, where would I start if I > wanted to help with PR reviews without being heavily involved with the > community/being a full maintainer. Should I just grab a PR, test it, > comment on changes? I wouldn't be sure if I were stepping on someone's > feet, tbh. You don't have to manually test a PR, unless you want to be sure about semantics that are not part of the tests added in the PR (but then it would be a good idea to mention that the tests don't exercise the semantics enough :-)). >From my point of view (generally as an open source developer and maintainer, this isn't specific to Arrow), reviewing is: * checking for soundness of concepts (if the PR adds any of them) * checking for maintainability and readability of code * checking for smelly coding patterns, possible sources of bugs etc. * depending on the context, checking for possible performance issues * any potential problem that your personal expertise may help you detect If you're not sure about a comment and hesitate posting it, a good solution is to phrase it as a question. Regards Antoine.
Re: Recruiting more maintainers for Apache Arrow
Hi Wes, to contribute an outsiders POW: while it is clear, what's expected if you'd like to make a PR, it's not at all clear to me, where would I start if I wanted to help with PR reviews without being heavily involved with the community/being a full maintainer. Should I just grab a PR, test it, comment on changes? I wouldn't be sure if I were stepping on someone's feet, tbh. So, in my view it would help if: * there were some kind of informal reviewer assignment system, i.e. I say "I'd like to review this PR", Wes/Uwe/Antoine reply: "sure, give it a shot". This would be mentioned prominently in the contributor guide * afterwards there were some kind of feedback-to-feedback arrangement, although it would increase the work load for the existing maintainers in the short term, of course Cheers, Dimitri. On Sun, Jul 1, 2018 at 1:09 AM Donald E. Foss wrote: > For what it's worth, this email thread and your summary writeup, Wes, are > a significant call to action on their own. > > I've been passive, not by choice, but by policy. Given the significance > and need of this project, I'll see what I can do on my side. It will be at > least a week given the US holiday. > > Donald E. Foss > > > On Jun 30, 2018, at 2:15 PM, Marco Neumann > wrote: > > > > Hey, > > > > first of all, thanks a lot for your, Uwes, the mergers and contributors > > work. Now, to the maintainer problem: > > > > # Arrow as "a library" > > One thing that makes Arrow special is that it is not a single, but many > > libraries (one for each language) and many of them are not only a > > binding to a C/C++ lib, but partly a complete re-implementation of the > > protocol, e.g.: > > > > - C++: one core, but also contains Python specialties > > - Java: another core > > - Rust: yet another core > > - Python: a binding to C++ but also a lot more stuff because of Pandas > > ... > > > > And you two are maintaining all of them and I doubt that you have the > > capacities and knowledge to do this at the desired level of quality > > (which is natural, not a personal issue or offense). So this I would > > call "pseudo-maintenance", since you're solely the gatekeeper that does > > some shallow reviewing and has the burden to do the housekeeping and > > the merging. So why accepting these language bindings in the first > > place without bringing a core maintainer in place? For example, let's > > say someone proposes a binding to Haskell now. That should not be > > accepted as part of the official Apache implementation without a > > dedicated maintainer (ideally the PR-author would be that person, but > > there may others who step up). > > > > Right now, it might be too late to remove some of the incomplete / WIP > > implementations that don't have a core maintainer though. > > > > # GitHub > > Another special thing to consider is that Arrow is (ab)using GitHub as > > a code hosting platform. Even as a contributor, this has obvious bad > > uncool consequences: > > > > - you have yet another issue hosting system to log in > > - there is yet another information channel to keep track of (this ML > > for example, which has a semi-informative web interface telling you > > can only login using Google but does not tell you how to subscribe to > > the list) > > - links to issues don't work in the known magic way > > - you're merging the PRs by closing them; which is by all means a not > > very nice way because it does not reflect the contributors work in > > the project overview and personal profiles, but exactly this is a > > large part of the GitHub community (btw: merging PRs without using > > GitHubs merge button IS possible as bors/bors-ng proof) > > > > So as a potential maintainer, this is already a bumper, since I know > > that there are things less confortable then the system I would get from > > any normal GitHub or Gitlab project. > > > > I'm not really sure how to solve this or if it should be solved (read > > about the laziness aspect in "Contribution VS Maintenance" below) > > > > # Time / Payment > > Yes, this is indeed a big issue. From what I can tell from the open > > source projects I was involved in is that for large contributor crowds, > > you normally have full/half-time positions in place for the core > > maintainer (look at the Mozilla projects, the Blender Foundation, Gnome > > / Red Hat). So at one point I think maintaining isn't a part time / > > hobby thing anymore (w/o downgrading the hard work of Hobby- > > contributors, in contrast). I don't have a link at hand, but I recall > > some discussion about GitHub and it's importance for hiring (since it > > it acts as a CV) after MS bought it, and some of the responses are > > "doing all this work in your free time is a privilege of wealthy, > > mostly-white men", which without signing this statement in this really > > bare form already shows a problem of open source world. > > > > # Contribution VS Maintenance > > The very "nice" thing about patch/PR contribution is that you do your
[jira] [Created] (ARROW-2781) [Python] Download boost using curl in manylinux1 image
Uwe L. Korn created ARROW-2781: -- Summary: [Python] Download boost using curl in manylinux1 image Key: ARROW-2781 URL: https://issues.apache.org/jira/browse/ARROW-2781 Project: Apache Arrow Issue Type: Bug Components: Packaging, Python Affects Versions: 0.9.0 Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.10.0 This is the only artifact where we use {{wget}} which has not the necessary level of TLS support to speak with the bintray servers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2780) [Go] Run code coverage analysis
Sebastien Binet created ARROW-2780: -- Summary: [Go] Run code coverage analysis Key: ARROW-2780 URL: https://issues.apache.org/jira/browse/ARROW-2780 Project: Apache Arrow Issue Type: Improvement Components: Go Reporter: Sebastien Binet -- This message was sent by Atlassian JIRA (v7.6.3#76005)