[jira] [Created] (ARROW-5459) [Go] implement Stringer for Float16 (array+dtype)

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5459:
--

 Summary: [Go] implement Stringer for Float16 (array+dtype)
 Key: ARROW-5459
 URL: https://issues.apache.org/jira/browse/ARROW-5459
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5460) [Java] Add micro-benchmarks for Float8Vector and allocators

2019-05-31 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5460:
---

 Summary: [Java] Add micro-benchmarks for Float8Vector and 
allocators
 Key: ARROW-5460
 URL: https://issues.apache.org/jira/browse/ARROW-5460
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


For the past days, we have been involved in some performance related issues. In 
this process, we have created some performance benchmarks, to help us verify 
performance results.

Now we want to add such micro-benchmarks to the code base, in the hope that 
they will be helpful for making performance-related decisions and avoid 
performance degradation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5461) [Java] Add micro-benchmarks for Float8Vector and allocators

2019-05-31 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5461:
---

 Summary: [Java] Add micro-benchmarks for Float8Vector and 
allocators
 Key: ARROW-5461
 URL: https://issues.apache.org/jira/browse/ARROW-5461
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


For the past days, we have been involved in some performance related issues. In 
this process, we have created some performance benchmarks, to help us verify 
performance results.

Now we want to add such micro-benchmarks to the code base, in the hope that 
they will be helpful for making performance-related decisions and avoid 
performance degradation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] PR Backlog reduction

2019-05-31 Thread Fan Liya
@Antoine Pitrou, you mean the titles of JIRA/PR should be chosen carefully?

Best,
Liya Fan

On Fri, May 31, 2019 at 12:03 AM Antoine Pitrou  wrote:

>
> One of the aspects of the problem is that our tools (Github, JIRA) don't
> allow us to work with categories easily.
>
> Regards
>
> Antoine.
>
>
> Le 30/05/2019 à 15:59, Wes McKinney a écrit :
> > They're complementary. At least in the short term the spreadsheet can
> > help us get our current backlog under control. I'd like to at least be
> > thinking about tools that can help us when patch volume inevitably
> > grows to 2-3 times the current level.
> >
> > On Thu, May 30, 2019 at 12:28 AM Micah Kornfield 
> wrote:
> >>
> >> That sounds great Wes, than you and your team for taking it on.
> >>
> >> Can you clarify, if you would prefer this approach to the one I
> proposed above (i.e. should I delete the spreadsheet) or are they
> complementary?
> >>
> >> Thanks,
> >> Micah
> >>
> >> On Wed, May 29, 2019 at 12:07 PM Wes McKinney 
> wrote:
> >>>
> >>> On the call today we discussed possibly repurposing the Spark PR
> >>> dashboard application for our use
> >>>
> >>> * https://github.com/databricks/spark-pr-dashboard
> >>> * https://spark-prs.appspot.com/
> >>>
> >>> This is a project that my team could take on this year sometime
> >>>
> >>> On Wed, May 29, 2019 at 4:12 AM Fan Liya  wrote:
> 
>  Sounds like a great idea. I am interested in Java PRs.
> 
>  Best,
>  Liya Fan
> 
>  On Wed, May 29, 2019 at 1:28 PM Micah Kornfield <
> emkornfi...@gmail.com>
>  wrote:
> 
> > Sorry for the delay.  I created
> >
> >
> https://docs.google.com/spreadsheets/d/146lDg11c5ohgVkrOglrb42a1JB0Gm1qBRbnoDlvB8QY/edit#gid=0
> > as
> > simple way to distribute old PRs if you are interested in helping,
> please
> > add a comment under the language and I'll add you.
> >
> > PMC/Committers, I can share edit access if you let me know which
> e-mail
> > account I should grant access to.
> >
> > Thanks,
> > Micah
> >
> > On Tue, May 21, 2019 at 9:22 PM Micah Kornfield <
> emkornfi...@gmail.com>
> > wrote:
> >
> >> I agree on hand curation for now.
> >>
> >>  I'll try to setup a sign up spreadsheet for shepherding old PRs
> and once
> >> that done assign reviewers/ping old PRs.  I expect to have
> something to
> >> share by the weekend.
> >>
> >> On Tuesday, May 21, 2019, Wes McKinney  wrote:
> >>
> >>> I think maintainers or contributors should be responsible for
> closing
> >>> PRs, it also helps with backlog curation (sometimes when a stale
> PR is
> >>> closed the JIRA may also be closed if it's a Won't Fix)
> >>>
> >>> On Tue, May 21, 2019 at 1:12 PM Antoine Pitrou  >
> >>> wrote:
> 
> 
> 
>  Le 21/05/2019 à 20:02, Neal Richardson a écrit :
> > Automatically close stale PRs? https://github.com/probot/stale
> 
>  That doesn't sound like a good idea to me.
> 
>  Regards
> 
>  Antoine.
> >>>
> >>
> >
>


Re: [DISCUSS] PR Backlog reduction

2019-05-31 Thread Antoine Pitrou


Le 31/05/2019 à 10:04, Fan Liya a écrit :
> @Antoine Pitrou, you mean the titles of JIRA/PR should be chosen carefully?

That helps a bit for visual filtering, but visual filtering quickly
becomes inefficient if there are too many issues.

No, I mean having views that only display certain kinds of issues (for
example I'm interested in C++ and Python issues, not so much the other
kinds).

We also currently don't record any interesting information wrt.
priority.  So if you take a look at the "Issues to do" list for 0.14.0,
for example, you get an (unordered?) list of 252 issues, which is
*truncated* by the stupid JIRA software:
https://issues.apache.org/jira/projects/ARROW/versions/12344925

... and when you click on "View in Issue Navigator" to get the
untruncated list, you get an unhelpful text box with a SQL-query and no
easy way to refine your search.  Not to mention the annoying pagination
with tiny links at the bottom middle-left, and the general UI slowness.

Regards

Antoine.


[jira] [Created] (ARROW-5462) [Go] support writing zero-length List

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5462:
--

 Summary: [Go] support writing zero-length List
 Key: ARROW-5462
 URL: https://issues.apache.org/jira/browse/ARROW-5462
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5463) [Rust] Implement AsRef for Buffer

2019-05-31 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5463:
-

 Summary: [Rust] Implement AsRef for Buffer
 Key: ARROW-5463
 URL: https://issues.apache.org/jira/browse/ARROW-5463
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Implement AsRef ArrowNativeType for Buffer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5464) [Archery] Bad --benchmark-filter default

2019-05-31 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5464:
-

 Summary: [Archery] Bad --benchmark-filter default
 Key: ARROW-5464
 URL: https://issues.apache.org/jira/browse/ARROW-5464
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] PR Backlog reduction

2019-05-31 Thread Wes McKinney
On Fri, May 31, 2019 at 3:21 AM Antoine Pitrou  wrote:
>
>
> Le 31/05/2019 à 10:04, Fan Liya a écrit :
> > @Antoine Pitrou, you mean the titles of JIRA/PR should be chosen carefully?
>
> That helps a bit for visual filtering, but visual filtering quickly
> becomes inefficient if there are too many issues.
>
> No, I mean having views that only display certain kinds of issues (for
> example I'm interested in C++ and Python issues, not so much the other
> kinds).
>
> We also currently don't record any interesting information wrt.
> priority.  So if you take a look at the "Issues to do" list for 0.14.0,
> for example, you get an (unordered?) list of 252 issues, which is
> *truncated* by the stupid JIRA software:
> https://issues.apache.org/jira/projects/ARROW/versions/12344925
>
> ... and when you click on "View in Issue Navigator" to get the
> untruncated list, you get an unhelpful text box with a SQL-query and no
> easy way to refine your search.  Not to mention the annoying pagination
> with tiny links at the bottom middle-left, and the general UI slowness.
>

That is annoying. I always start from the unfiltered Basic issue view with

Project = ARROW
Components = C++, Python

(maybe with Fix Version = 0.14.0)

You can then click "Save As" in the dropdown toward the top of the
page and save this search so you can navigate to it easily next time.

> Regards
>
> Antoine.


[jira] [Created] (ARROW-5465) [Crossbow] Support writing job definition to a file on submit

2019-05-31 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5465:
--

 Summary: [Crossbow] Support writing job definition to a file on 
submit 
 Key: ARROW-5465
 URL: https://issues.apache.org/jira/browse/ARROW-5465
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


In similar fashion like archery benchmark does. Required to consume the 
command's output from a buildbot build step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[JS] Proposal for numeric vector `from` functions

2019-05-31 Thread Brian Hulette
I think the current behavior of `from` functions on IntVector and
FloatVector can be quite confusing for new arrow users. The current
behavior can be summarized as:
- if the argument is any type of TypedArray (including one of a mismatched
type), create a new vector backed by that array's buffer.
- otherwise, treat it as an iterable of numbers, and convert them as needed
- ... unless we're making an Int64Vector, then treat each input as a 32-bit
number and pack pairs together

This can give users very unexpected results. For example, you might expect
arrow.Int32Vector.from(Float32Array.from([1.0,2.0,3.0])) to yield a vector
with the values [1,2,3] - but it doesn't, it gives you the integers that
result from re-interpreting that buffer of floating point numbers as
integers.

I put together a notebook with some more examples of this confusing
behavior, compared to TypedArray.from:
https://observablehq.com/d/6aa80e43b5a97361

I'd like to propose that we re-write these from functions with the
following behavior:
- iff the argument is an ArrayBuffer or a TypedArray of the same numeric
type, create a new vector backed by that array's buffer.
- otherwise, treat is as an iterable of numbers and convert to the
appropriate type.
- no exceptions for Int64

If users really want to preserve the current behavior and use a
TypedArray's memory directly without converting, even when the types are
mismatched, they can still just access the underlying ArrayBuffer and pass
that in. So arrow.Int32Vector.from(Float32Array.from([1.0,2.0,3.0])) would
yield a vector with [1,2,3], but you could still use
arrow.Int32Vector.from(Float32Array.from([1.0,2.0,3.0]).buffer) to
replicate the current behavior.

Removing the special case for Int64 does make it a little easier to shoot
yourself in the foot by exceeding JS numbers' 53-bit precision, so maybe we
should mitigate that somehow, but I don't think combining pairs of numbers
is the right way to do that. Maybe a warning?

What do you all think? If there's consensus on this I'd like to make the
change prior to 0.14 to minimize the number of releases with the current
behavior.

Brian


[jira] [Created] (ARROW-5466) [Java] Combine Java CI builds into a common build with multiple JDKs

2019-05-31 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5466:
---

 Summary: [Java] Combine Java CI builds into a common build with 
multiple JDKs
 Key: ARROW-5466
 URL: https://issues.apache.org/jira/browse/ARROW-5466
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Wes McKinney
 Fix For: 0.14.0


The JDK 9 and 11 builds are fast -- 4 minutes each. It would probably be more 
efficient to run all 3 JDK builds in a single build entry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5468) [Go] implement read/write IPC for Timestamp arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5468:
--

 Summary: [Go] implement read/write IPC for Timestamp arrays
 Key: ARROW-5468
 URL: https://issues.apache.org/jira/browse/ARROW-5468
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5467) [Go] implement read/write IPC for Time32/Time64 arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5467:
--

 Summary: [Go] implement read/write IPC for Time32/Time64 arrays
 Key: ARROW-5467
 URL: https://issues.apache.org/jira/browse/ARROW-5467
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5469) [Go] implement read/write IPC for Date32/Date64 arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5469:
--

 Summary: [Go] implement read/write IPC for Date32/Date64 arrays
 Key: ARROW-5469
 URL: https://issues.apache.org/jira/browse/ARROW-5469
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Reduced Arrow CI capacity

2019-05-31 Thread Uwe L. Korn



On Fri, May 31, 2019, at 12:11 AM, Antoine Pitrou wrote:
> 
> Le 30/05/2019 à 22:39, Uwe L. Korn a écrit :
> > Hello all,
> > 
> > Krisztián has been lately working on getting Buildbot running for Arrow. 
> > While I have not yet had the time to look at it in detail what would hinder 
> > us using it as the main Linux builder and ditching Travis except for OSX?
> > 
> > Otherwise I have lately made really good experiences with Gitlab CI 
> > connected to Github projects. While they only offer a comparatively small 
> > amount of CI time per month per project (2000 minutes is quite small in the 
> > Arrow case), I enjoyed that you can connect your own builders to their 
> > hosted gitlab.com instance. This would enable us to easily add funded 
> > workers to the project as well as utilise special hardware that we would 
> > not otherwise get in public CI instances. The CI runners ("workers") are 
> > really simple to setup (It took me on Windows and on Linux less than 5min 
> > each) and the logs show up in the hosted UI.
> 
> Are there any security issues with running self-hosted workers?
> Another question is whether Gitlab CI is allowed on Github repos owned
> by the Apache Foundation (Azure Pipelines still isn't).


The security implications are the same with any self-hosted, docker based CI: 
There are certain chances people can escape the docker sandbox and do nasty 
things on the host. Thus we shouldn't store any additional credentials on the 
host except what is needed to connect to the gitlab master.

I'm not sure about the requirements from Gitlab for the integration. They 
provide a hook for the CI status and a full-blown sync integration. The latter 
really wants all-access which the ASF INFRA won't grant for the former we may 
not even need INFRA but I have to look deeper into that.

Uwe


Re: [DISCUSS] PR Backlog reduction

2019-05-31 Thread Neal Richardson
There are also some "dashboards" (mostly just saved filters, in list view)
here: https://cwiki.apache.org/confluence/display/ARROW/Dashboards

They're not great either, but at least they can give us some common views
to pay attention to.

Neal


On Fri, May 31, 2019 at 5:56 AM Wes McKinney  wrote:

> On Fri, May 31, 2019 at 3:21 AM Antoine Pitrou  wrote:
> >
> >
> > Le 31/05/2019 à 10:04, Fan Liya a écrit :
> > > @Antoine Pitrou, you mean the titles of JIRA/PR should be chosen
> carefully?
> >
> > That helps a bit for visual filtering, but visual filtering quickly
> > becomes inefficient if there are too many issues.
> >
> > No, I mean having views that only display certain kinds of issues (for
> > example I'm interested in C++ and Python issues, not so much the other
> > kinds).
> >
> > We also currently don't record any interesting information wrt.
> > priority.  So if you take a look at the "Issues to do" list for 0.14.0,
> > for example, you get an (unordered?) list of 252 issues, which is
> > *truncated* by the stupid JIRA software:
> > https://issues.apache.org/jira/projects/ARROW/versions/12344925
> >
> > ... and when you click on "View in Issue Navigator" to get the
> > untruncated list, you get an unhelpful text box with a SQL-query and no
> > easy way to refine your search.  Not to mention the annoying pagination
> > with tiny links at the bottom middle-left, and the general UI slowness.
> >
>
> That is annoying. I always start from the unfiltered Basic issue view with
>
> Project = ARROW
> Components = C++, Python
>
> (maybe with Fix Version = 0.14.0)
>
> You can then click "Save As" in the dropdown toward the top of the
> page and save this search so you can navigate to it easily next time.
>
> > Regards
> >
> > Antoine.
>


[jira] [Created] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5470:
--

 Summary: [CI] C++ local filesystem patch breaks Travis R job
 Key: ARROW-5470
 URL: https://issues.apache.org/jira/browse/ARROW-5470
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Neal Richardson
 Fix For: 0.14.0


https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and required 
downstream bindings to be updated. Romain wasn't immediately available to 
update R, so we marked the R job on Travis as an "allowed failure". That 
failure looked like this: 
[https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ library 
built fine, but then the R package failed to build because it didn't line up 
with what's in C++.

Then, the C++ local file system patch 
(https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
though we were still ignoring the R build, which continued to fail. But, it 
started failing differently. Here's what the R build failure looks like on that 
PR, and on master since then: 
[https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ library 
is failing to build, so we're not even getting to the expected R failure.

For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
setup to the R build, and it's still passing. One difference between the two 
jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which sounds 
related to some open R issues, and `boost::filesystem` appears all over the 
error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5471) [C++][Gandiva]Array offset is ignored in Gandiva projector

2019-05-31 Thread Zeyuan Shang (JIRA)
Zeyuan Shang created ARROW-5471:
---

 Summary: [C++][Gandiva]Array offset is ignored in Gandiva projector
 Key: ARROW-5471
 URL: https://issues.apache.org/jira/browse/ARROW-5471
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Zeyuan Shang


I used the test case in 
[https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L25],
 and found an issue when I was using the slice operator {{input_batch[1:]}}. It 
seems that the offset is ignored in the Gandiva projector.
{code:java}
import pyarrow as pa
import pyarrow.gandiva as gandiva

builder = gandiva.TreeExprBuilder()

field_a = pa.field('a', pa.int32())
field_b = pa.field('b', pa.int32())

schema = pa.schema([field_a, field_b])

field_result = pa.field('res', pa.int32())

node_a = builder.make_field(field_a)
node_b = builder.make_field(field_b)

condition = builder.make_function("greater_than", [node_a, node_b],
pa.bool_())
if_node = builder.make_if(condition, node_a, node_b, pa.int32())

expr = builder.make_expression(if_node, field_result)

projector = gandiva.make_projector(
schema, [expr], pa.default_memory_pool())

a = pa.array([10, 12, -20, 5], type=pa.int32())
b = pa.array([5, 15, 15, 17], type=pa.int32())
e = pa.array([10, 15, 15, 17], type=pa.int32())
input_batch = pa.RecordBatch.from_arrays([a, b], names=['a', 'b'])

r, = projector.evaluate(input_batch[1:])
print(r)
{code}
If we use the full record batch {{input_batch}}, the expected output is {{[10, 
15, 15, 17]}}. So if we use {{input_batch[1:]}}, the expected output should be 
{{[15, 15, 17]}}, however this script returned {{[10, 15, 15]}}. It seems that 
the projector ignores the offset and always reads from 0.

 

A corresponding issue is created in GitHub as well 
[https://github.com/apache/arrow/issues/4420]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5472) [Development] Add warning to PR merge tool if no JIRA component is set

2019-05-31 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5472:
---

 Summary: [Development] Add warning to PR merge tool if no JIRA 
component is set
 Key: ARROW-5472
 URL: https://issues.apache.org/jira/browse/ARROW-5472
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Wes McKinney
 Fix For: 0.14.0


This will help with JIRA hygiene (there are over 300 resolved issues this 
moment with no component set)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2019-05-31 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5473:
---

 Summary: [C++] Build failure on googletest_ep on Windows when 
using Ninja
 Key: ARROW-5473
 URL: https://issues.apache.org/jira/browse/ARROW-5473
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.14.0


I consistently get this error when trying to use Ninja locally:

{code}
-- extracting...
 
src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
 
dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
  file RENAME failed to rename


C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1

  to

C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep

  because: Directory not empty



[179/623] Building CXX object 
src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
ninja: build stopped: subcommand failed.
{code}

I'm running within cmdr terminal emulator so it's conceivable there's some path 
modifications that are causing issues.

The CMake invocation is

{code}
cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
-DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
 -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
-DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-31 Thread Wes McKinney
It seems we have an in-flight Parquet C++ patch  (ARROW-3729) that
touches the data types code -- I think we could do the grand search
and replace after that

On Wed, May 29, 2019 at 4:31 PM Chao Sun  wrote:
>
> I'm +1 on the change for the Rust side as well. It probably won't be as
> disruptive as the C++ side.
>
> On Wed, May 29, 2019 at 7:09 AM Wes McKinney  wrote:
>
> > I'm in favor of making the change -- it's slightly disruptive for
> > library-users, but the fix is no more complicated than a
> > search-and-replace. When the C++ project was started, the LogicalType
> > union didn't exist and "LogicalType" seemed like a more appropriate
> > name for ConvertedType.
> >
> > On Wed, May 29, 2019 at 7:11 AM Wes McKinney  wrote:
> > >
> > > You all probably want to join d...@parquet.apache.org and have the
> > > discussion there. From a governance perspective that's where we need
> > > to talk about making breaking changes to the Parquet C++ library
> > >
> > > LogicalType was introduced into the Parquet format in October 2017 to
> > > be a more flexible and future-proof replacement for the original
> > > ConvertedType metadata, see
> > >
> > >
> > https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe
> > >
> > > Support and forward/backwards compatibility for the new LogicalType
> > > union was just developed in PARQUET-1411
> > >
> > >
> > https://github.com/apache/arrow/commit/38b1ddfb7f5def825ac57c8f27ffe5afa7fcb483
> > >
> > > On Wed, May 29, 2019 at 4:44 AM Joris Van den Bossche
> > >  wrote:
> > > >
> > > > Yes, the LogicalType is newer than ConvertedType in the parquet
> > format, and
> > > > was until recently not implemented in parquet-cpp.
> > > > The problem is that originally, the parquet thrift::ConvertedType was
> > > > implemented in parquet-cpp as LogicalType. Now, support is added in
> > > > parquet-cpp for this newer parquet thrift::LogicalType, but the obvious
> > > > name for that in parquet-cpp was already taken. Therefore, it was
> > added as
> > > > parquet::LogicalAnnotation. See this PR for context:
> > > > https://github.com/apache/arrow/pull/4185
> > > >
> > > > So Deepak's question is if we can rename parquet-cpp's
> > parquet::LogicalType
> > > > to parquet::ConvertedType (to match the thrift format naming), so we
> > can
> > > > actually use the logical name parquet::LogicalType instead of
> > > > parquet::LogicalAnnotation for the new implementation.
> > > > And to avoid the confusion we are having here ..
> > > >
> > > > But renaming like that would be hard break in parquet-cpp for libraries
> > > > depending on that, though. But I can't really assess the impact of
> > that.
> > > >
> > > > Best,
> > > > Joris
> > > >
> > > > Op wo 29 mei 2019 om 11:04 schreef Antoine Pitrou  > >:
> > > >
> > > > >
> > > > > Le 29/05/2019 à 10:47, Deepak Majeti a écrit :
> > > > > > "ConvertedType" term is used by the parquet specification below.
> > This
> > > > > type
> > > > > > is used to map client data types to Parquet types.
> > > > > >
> > > > >
> > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L48
> > > > >
> > > > > But apparently there's also "LogicalType":
> > > > >
> > > > >
> > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L315
> > > > >
> > > > > "LogicalType annotations to replace ConvertedType"
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> >


[jira] [Created] (ARROW-5474) [C++] What version of Boost do we require now?

2019-05-31 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5474:
--

 Summary: [C++] What version of Boost do we require now?
 Key: ARROW-5474
 URL: https://issues.apache.org/jira/browse/ARROW-5474
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
Assignee: Antoine Pitrou
 Fix For: 0.14.0


See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One possible 
cause for that error is that the local filesystem patch increased the version 
of boost that we actually require. The boost version (1.54 vs 1.58) was one 
difference between failure and success. 

Another point of confusion was that CMake reported two different versions of 
boost at different times. 

If we require a minimum version of boost, can we document that better, check 
for it more accurately in the build scripts, and fail with a useful message if 
that minimum isn't met? Or something else helpful.

If the actual cause of the failure was something else (e.g. compiler version), 
we should figure that out too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5475) [Python] Add Python binding for arrow::Concatenate

2019-05-31 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5475:
---

 Summary: [Python] Add Python binding for arrow::Concatenate
 Key: ARROW-5475
 URL: https://issues.apache.org/jira/browse/ARROW-5475
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.15.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)