[jira] [Created] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.

2018-05-01 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2535:
--

 Summary: [C++/Python] Provide pre-commit hooks that check flake8 
et al.
 Key: ARROW-2535
 URL: https://issues.apache.org/jira/browse/ARROW-2535
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


We should provide pre-commit hooks that users can install (optionally) that 
check e.g. flake8 and clang-format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2534) [C++] libarrow.so leaks zlib symbols

2018-05-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2534:
-

 Summary: [C++] libarrow.so leaks zlib symbols
 Key: ARROW-2534
 URL: https://issues.apache.org/jira/browse/ARROW-2534
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


I get the following here:

{code:bash}
$ nm -D -C /home/antoine/miniconda3/envs/pyarrow/lib/libarrow.so.0.0.0 | \grep 
' T ' | \grep -v arrow
0025bc8c T adler32_z
0025c4c9 T crc32_z
002ad638 T _fini
00078ab8 T _init
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: AppVeyor queue length measures

2018-05-01 Thread Antoine Pitrou

Hi,

Le 01/05/2018 à 20:35, Uwe L. Korn a écrit :
> Hello,
> 
> as a heads-up, I have requested with INFRA [1] to enable the Rolling Builds 
> feature [2] of AppVeyor. We never look at old commits and very often we look 
> at our private AppVeyor instances, so these builds should not matter.

+1, sounds great!

> Another, more drastic measure would be to only build parts of the matrix on 
> PRs (all on master).

You mean depending on the PR changes (that's ARROW-2516)?  Or a static
subpart of the matrix?

I wonder why we have NMake builds on the matrix.  Are they really
important to some people?  With Ninja + Visual Studio builds we should
already have quite good coverage.

Regards

Antoine.


AppVeyor queue length measures

2018-05-01 Thread Uwe L. Korn
Hello,

as a heads-up, I have requested with INFRA [1] to enable the Rolling Builds 
feature [2] of AppVeyor. We never look at old commits and very often we look at 
our private AppVeyor instances, so these builds should not matter. In addition, 
I'll shortly provide a PR that activates the fast finish flag for our builds 
[3] so that failing CI entries are immediately stopped.

Another, more drastic measure would be to only build parts of the matrix on PRs 
(all on master). I hope that the Rolling Builds gives us an emptier queue but 
I'm not sure how this will evolve now that we also have Rust Windows builds. My 
feeling is that once we break AppVeyor, we break at least 50% of all CI 
entries. Thus a good selection of the entries that are tested should also catch 
95% of all problems.

Uwe

[1] https://issues.apache.org/jira/browse/INFRA-16470
[2] https://www.appveyor.com/docs/build-configuration/#rolling-builds
[3] https://issues.apache.org/jira/browse/ARROW-2533


[jira] [Created] (ARROW-2533) [CI] Fast finish failing AppVeyor builds

2018-05-01 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2533:
--

 Summary: [CI] Fast finish failing AppVeyor builds
 Key: ARROW-2533
 URL: https://issues.apache.org/jira/browse/ARROW-2533
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


The main AppVeyor queue is taking very long to schedule jobs, one of the 
measures to get it better would be to immediately fail a job once a build is 
broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2532) [C++] Add chunked builder classes

2018-05-01 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2532:
-

 Summary: [C++] Add chunked builder classes
 Key: ARROW-2532
 URL: https://issues.apache.org/jira/browse/ARROW-2532
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


I think it would be useful to have chunked builders for list, string and binary 
types. A chunked builder would produce a chunked array as output, circumventing 
the 32-bit offset limit of those types. There's some special-casing scatterred 
around our Numpy conversion routines right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2531) [C++] Update clang-format to 6.0

2018-05-01 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2531:
--

 Summary: [C++] Update clang-format to 6.0
 Key: ARROW-2531
 URL: https://issues.apache.org/jira/browse/ARROW-2531
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Format] Pointer types / span types

2018-05-01 Thread Antoine Pitrou

IIUC, the point is to have different logical views over the same data.
So you could have e.g. a "sorted" view.  You could also have a view
spanning a tiny fraction of the original data (you can probably also
encode that with a null bitmap, but if most values are nulls that is
less efficient).

Regards

Antoine.


Le 01/05/2018 à 15:24, Brian Hulette a écrit :
> Yeah I see that difference. I guess my question was really - is there a 
> reason not to re-arrange the actual list data so that an offset array 
> will work?
> 
> Perhaps they actually want to be able to specify lists with overlap? Or 
> maybe there is meaning to the original order of the list data? I suppose 
> that latter option seems more likely.
> 
> Brian
> 
> 
> On 04/30/2018 05:42 PM, Antoine Pitrou wrote:
>> Le 30/04/2018 à 23:39, Brian Hulette a écrit :
>>> Yes my first reaction to both of these requests is
>>> - would dictionary-encoding work?
>>> - would a List work?
>>>
>>> I think for the former the analogy is more clear, for the latter,
>>> technically a List encodes start and stop indices with an offset array
>>> rather than separate arrays for start and stop indices. Is there a
>>> reason an offset array wouldn't work for the OAMap use-case though?
>> With an offsets array, spans (lists) are contiguous: span N + 1 starts
>> off where span N stops.  With separate start/stops array, they needn't
>> be: the logical array can "walk" the physical array in any order.
>>
>> Regards
>>
>> Antoine.
> 


[jira] [Created] (ARROW-2530) [GLib] Out-of-source build is failed

2018-05-01 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2530:
---

 Summary: [GLib] Out-of-source build is failed 
 Key: ARROW-2530
 URL: https://issues.apache.org/jira/browse/ARROW-2530
 Project: Apache Arrow
  Issue Type: Bug
  Components: GLib
Affects Versions: 0.9.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [C++] Policy for "internal" and "detail" namespaces

2018-05-01 Thread Uwe L. Korn
Hello Antoine,

I don't think we have a policy for that yet thus it's all up to personal 
preference. My personal preference is normally to call this namespace 
`arrow::internal::impl`. Alternatively one could also make it namespaced in the 
class `arrow::internal::ThreadPool::Impl` but that is probably only a good 
naming scheme for PIMPLs.

Uwe

On Tue, May 1, 2018, at 12:13 PM, Antoine Pitrou wrote:
> 
> Hello,
> 
> What's our policy for namespaces in C++?
> 
> For example, in GitHub PR 1953, I define an API in the "arrow::internal"
> namespace as it's meant for internal use by Arrow, and then I have a
> helper class in the "arrow::internal::detail" namespace as it's really
> an implementation detail of the aforementioned internal API (it's used
> for template instantiation, so it can't go in the .cpp file, sadly).
> Does that sound like the right way to go?
> 
> Regards
> 
> Antoine.


[C++] Policy for "internal" and "detail" namespaces

2018-05-01 Thread Antoine Pitrou

Hello,

What's our policy for namespaces in C++?

For example, in GitHub PR 1953, I define an API in the "arrow::internal"
namespace as it's meant for internal use by Arrow, and then I have a
helper class in the "arrow::internal::detail" namespace as it's really
an implementation detail of the aforementioned internal API (it's used
for template instantiation, so it can't go in the .cpp file, sadly).
Does that sound like the right way to go?

Regards

Antoine.


[jira] [Created] (ARROW-2529) [C++] Update mention of clang-format to 5.0 in the docs

2018-05-01 Thread Alessandro Andrioni (JIRA)
Alessandro Andrioni created ARROW-2529:
--

 Summary: [C++] Update mention of clang-format to 5.0 in the docs
 Key: ARROW-2529
 URL: https://issues.apache.org/jira/browse/ARROW-2529
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Reporter: Alessandro Andrioni


The C++ README.md talks about requiring clang-format 4.0, while the current 
required version is 5.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)