[jira] [Created] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.
Uwe L. Korn created ARROW-2535: -- Summary: [C++/Python] Provide pre-commit hooks that check flake8 et al. Key: ARROW-2535 URL: https://issues.apache.org/jira/browse/ARROW-2535 Project: Apache Arrow Issue Type: Task Components: C++, Python Reporter: Uwe L. Korn Fix For: 0.10.0 We should provide pre-commit hooks that users can install (optionally) that check e.g. flake8 and clang-format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2534) [C++] libarrow.so leaks zlib symbols
Antoine Pitrou created ARROW-2534: - Summary: [C++] libarrow.so leaks zlib symbols Key: ARROW-2534 URL: https://issues.apache.org/jira/browse/ARROW-2534 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.9.0 Reporter: Antoine Pitrou I get the following here: {code:bash} $ nm -D -C /home/antoine/miniconda3/envs/pyarrow/lib/libarrow.so.0.0.0 | \grep ' T ' | \grep -v arrow 0025bc8c T adler32_z 0025c4c9 T crc32_z 002ad638 T _fini 00078ab8 T _init {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: AppVeyor queue length measures
Hi, Le 01/05/2018 à 20:35, Uwe L. Korn a écrit : > Hello, > > as a heads-up, I have requested with INFRA [1] to enable the Rolling Builds > feature [2] of AppVeyor. We never look at old commits and very often we look > at our private AppVeyor instances, so these builds should not matter. +1, sounds great! > Another, more drastic measure would be to only build parts of the matrix on > PRs (all on master). You mean depending on the PR changes (that's ARROW-2516)? Or a static subpart of the matrix? I wonder why we have NMake builds on the matrix. Are they really important to some people? With Ninja + Visual Studio builds we should already have quite good coverage. Regards Antoine.
AppVeyor queue length measures
Hello, as a heads-up, I have requested with INFRA [1] to enable the Rolling Builds feature [2] of AppVeyor. We never look at old commits and very often we look at our private AppVeyor instances, so these builds should not matter. In addition, I'll shortly provide a PR that activates the fast finish flag for our builds [3] so that failing CI entries are immediately stopped. Another, more drastic measure would be to only build parts of the matrix on PRs (all on master). I hope that the Rolling Builds gives us an emptier queue but I'm not sure how this will evolve now that we also have Rust Windows builds. My feeling is that once we break AppVeyor, we break at least 50% of all CI entries. Thus a good selection of the entries that are tested should also catch 95% of all problems. Uwe [1] https://issues.apache.org/jira/browse/INFRA-16470 [2] https://www.appveyor.com/docs/build-configuration/#rolling-builds [3] https://issues.apache.org/jira/browse/ARROW-2533
[jira] [Created] (ARROW-2533) [CI] Fast finish failing AppVeyor builds
Uwe L. Korn created ARROW-2533: -- Summary: [CI] Fast finish failing AppVeyor builds Key: ARROW-2533 URL: https://issues.apache.org/jira/browse/ARROW-2533 Project: Apache Arrow Issue Type: Improvement Components: C++, Continuous Integration Reporter: Uwe L. Korn Assignee: Uwe L. Korn The main AppVeyor queue is taking very long to schedule jobs, one of the measures to get it better would be to immediately fail a job once a build is broken. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2532) [C++] Add chunked builder classes
Antoine Pitrou created ARROW-2532: - Summary: [C++] Add chunked builder classes Key: ARROW-2532 URL: https://issues.apache.org/jira/browse/ARROW-2532 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.9.0 Reporter: Antoine Pitrou I think it would be useful to have chunked builders for list, string and binary types. A chunked builder would produce a chunked array as output, circumventing the 32-bit offset limit of those types. There's some special-casing scatterred around our Numpy conversion routines right now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2531) [C++] Update clang-format to 6.0
Uwe L. Korn created ARROW-2531: -- Summary: [C++] Update clang-format to 6.0 Key: ARROW-2531 URL: https://issues.apache.org/jira/browse/ARROW-2531 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Uwe L. Korn Assignee: Uwe L. Korn -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [Format] Pointer types / span types
IIUC, the point is to have different logical views over the same data. So you could have e.g. a "sorted" view. You could also have a view spanning a tiny fraction of the original data (you can probably also encode that with a null bitmap, but if most values are nulls that is less efficient). Regards Antoine. Le 01/05/2018 à 15:24, Brian Hulette a écrit : > Yeah I see that difference. I guess my question was really - is there a > reason not to re-arrange the actual list data so that an offset array > will work? > > Perhaps they actually want to be able to specify lists with overlap? Or > maybe there is meaning to the original order of the list data? I suppose > that latter option seems more likely. > > Brian > > > On 04/30/2018 05:42 PM, Antoine Pitrou wrote: >> Le 30/04/2018 à 23:39, Brian Hulette a écrit : >>> Yes my first reaction to both of these requests is >>> - would dictionary-encoding work? >>> - would a List work? >>> >>> I think for the former the analogy is more clear, for the latter, >>> technically a List encodes start and stop indices with an offset array >>> rather than separate arrays for start and stop indices. Is there a >>> reason an offset array wouldn't work for the OAMap use-case though? >> With an offsets array, spans (lists) are contiguous: span N + 1 starts >> off where span N stops. With separate start/stops array, they needn't >> be: the logical array can "walk" the physical array in any order. >> >> Regards >> >> Antoine. >
[jira] [Created] (ARROW-2530) [GLib] Out-of-source build is failed
Kouhei Sutou created ARROW-2530: --- Summary: [GLib] Out-of-source build is failed Key: ARROW-2530 URL: https://issues.apache.org/jira/browse/ARROW-2530 Project: Apache Arrow Issue Type: Bug Components: GLib Affects Versions: 0.9.0 Reporter: Kouhei Sutou Assignee: Kouhei Sutou Fix For: 0.10.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [C++] Policy for "internal" and "detail" namespaces
Hello Antoine, I don't think we have a policy for that yet thus it's all up to personal preference. My personal preference is normally to call this namespace `arrow::internal::impl`. Alternatively one could also make it namespaced in the class `arrow::internal::ThreadPool::Impl` but that is probably only a good naming scheme for PIMPLs. Uwe On Tue, May 1, 2018, at 12:13 PM, Antoine Pitrou wrote: > > Hello, > > What's our policy for namespaces in C++? > > For example, in GitHub PR 1953, I define an API in the "arrow::internal" > namespace as it's meant for internal use by Arrow, and then I have a > helper class in the "arrow::internal::detail" namespace as it's really > an implementation detail of the aforementioned internal API (it's used > for template instantiation, so it can't go in the .cpp file, sadly). > Does that sound like the right way to go? > > Regards > > Antoine.
[C++] Policy for "internal" and "detail" namespaces
Hello, What's our policy for namespaces in C++? For example, in GitHub PR 1953, I define an API in the "arrow::internal" namespace as it's meant for internal use by Arrow, and then I have a helper class in the "arrow::internal::detail" namespace as it's really an implementation detail of the aforementioned internal API (it's used for template instantiation, so it can't go in the .cpp file, sadly). Does that sound like the right way to go? Regards Antoine.
[jira] [Created] (ARROW-2529) [C++] Update mention of clang-format to 5.0 in the docs
Alessandro Andrioni created ARROW-2529: -- Summary: [C++] Update mention of clang-format to 5.0 in the docs Key: ARROW-2529 URL: https://issues.apache.org/jira/browse/ARROW-2529 Project: Apache Arrow Issue Type: Improvement Components: C++, Documentation Reporter: Alessandro Andrioni The C++ README.md talks about requiring clang-format 4.0, while the current required version is 5.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)