[jira] [Created] (ARROW-1589) Fuzzing for certain input formats

2017-09-21 Thread Marco Neumann (JIRA)
Marco Neumann created ARROW-1589:


 Summary: Fuzzing for certain input formats
 Key: ARROW-1589
 URL: https://issues.apache.org/jira/browse/ARROW-1589
 Project: Apache Arrow
  Issue Type: Test
Reporter: Marco Neumann
Assignee: Marco Neumann


The arrow lib should have fuzzing tests for certain input formats, e.g. for 
reading record batches from streams. Ideally, malformed input must not crash 
the system but must report a proper error. This could easily be implemented 
e.g. w/ [libfuzzer|https://llvm.org/docs/LibFuzzer.html] in combination with 
address sanitizer (that's already implemented by Arrow's build system).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread Paul Taylor (JIRA)
Paul Taylor created ARROW-1590:
--

 Summary: Flow TS Table method generics
 Key: ARROW-1590
 URL: https://issues.apache.org/jira/browse/ARROW-1590
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Paul Taylor
Assignee: Paul Taylor


The Table method generics should thread the Vector and value types through from 
the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174580#comment-16174580
 ] 

ASF GitHub Bot commented on ARROW-1590:
---

GitHub user trxcllnt opened a pull request:

https://github.com/apache/arrow/pull/1120

ARROW-1590: [JS] Flow TS Table method generics

This PR fixes the Table generics to infer the types from the call site:

![kapture 2017-09-21 at 4 03 
34](https://user-images.githubusercontent.com/178183/30692953-5b8638d6-9e82-11e7-9d66-b87eb50f0e3f.gif)

@wesm this PR also includes the fixes to the prepublish script I mentioned 
yesterday.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/graphistry/arrow-1 fix-ts-typings

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/arrow/pull/1120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1120


commit b87f409c3997aa1e935bea805904831fbfef0eb4
Author: Kouhei Sutou 
Date:   2017-09-13T13:09:38Z

ARROW-1529: [GLib] Use Xcode 8.3 on Travis CI

Author: Kouhei Sutou 

Closes #1092 from kou/glib-travis-macos and squashes the following commits:

291808b2 [Kouhei Sutou] [GLib] Use Xcode 8.3 on Travis CI

commit f15a91b0f32d73c450e552ffb5230116dbd2cbf1
Author: Kouhei Sutou 
Date:   2017-09-15T14:47:04Z

ARROW-1537: [C++] Support building with full path install_name on macOS

If you use `@rpath` for install_name (default), you can use the
DYLD_LIBRARY_PATH environment variable to find libarrow.dylib. But the
DYLD_LIBRARY_PATH environment variable isn't inherited to sub process by
System Integration Protection (SIP). It's difficult to use
libarrow.dylib.

You can use full path install_name by -DARROW_INSTALL_NAME_RPATH=OFF
CMake option. If you use it, you can find libarrow.dylib without
DYLD_LIBRARY_PATH environment variable.

Author: Kouhei Sutou 

Closes #1100 from kou/cpp-macos-support-install-name and squashes the 
following commits:

8207ace [Kouhei Sutou] [C++] Support building with full path install_name 
on macOS

commit 37a4f2dc6b59a1a5b09d854827769b944622e67d
Author: Paul Taylor 
Date:   2017-09-21T10:44:43Z

enforce exact dependency package versions

commit 0151cbb76ae77bfb6c46fde9a3bc880e5a228cbe
Author: Paul Taylor 
Date:   2017-09-21T10:45:18Z

fix gulp and prepublish scripts

commit ac6db5e0b76ef0668b28f144d3d242c3a887bedb
Author: Paul Taylor 
Date:   2017-09-21T10:46:07Z

add comments explaining ARROW-1363 reader workaround

commit c7c67fbd3855cab58ca578481d41f746ad1b0da7
Author: Paul Taylor 
Date:   2017-09-21T10:47:21Z

more defensively typed reader internal values

commit 06fa8ae60738f62e1dc5777a48d6d0b091f2443e
Author: Paul Taylor 
Date:   2017-09-21T10:47:58Z

flow table method generics




> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1590:
--
Labels: pull-request-available  (was: )

> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-1591:
--

 Summary: C++: Xcode 9 is not correctly detected
 Key: ARROW-1591
 URL: https://issues.apache.org/jira/browse/ARROW-1591
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.7.0
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.8.0


See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1591:
--
Labels: pull-request-available  (was: )

> C++: Xcode 9 is not correctly detected
> --
>
> Key: ARROW-1591
> URL: https://issues.apache.org/jira/browse/ARROW-1591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174665#comment-16174665
 ] 

ASF GitHub Bot commented on ARROW-1591:
---

GitHub user xhochy opened a pull request:

https://github.com/apache/arrow/pull/1121

ARROW-1591: C++: Xcode 9 is not correctly detected



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xhochy/arrow ARROW-1591

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/arrow/pull/1121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1121


commit 0b3a11a44ed38c1c9830f4f36350ae10698808ea
Author: Uwe L. Korn 
Date:   2017-09-21T12:32:38Z

ARROW-1591: C++: Xcode 9 is not correctly detected




> C++: Xcode 9 is not correctly detected
> --
>
> Key: ARROW-1591
> URL: https://issues.apache.org/jira/browse/ARROW-1591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174666#comment-16174666
 ] 

Uwe L. Korn commented on ARROW-1591:


PR: https://github.com/apache/arrow/pull/1121

> C++: Xcode 9 is not correctly detected
> --
>
> Key: ARROW-1591
> URL: https://issues.apache.org/jira/browse/ARROW-1591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174722#comment-16174722
 ] 

ASF GitHub Bot commented on ARROW-1591:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1121
  
Seems like something went wrong when installing the Sphinx docs 
requirements, causing a NumPy version conflict:

https://travis-ci.org/apache/arrow/jobs/278165398#L9068

 Unrelated to this change


> C++: Xcode 9 is not correctly detected
> --
>
> Key: ARROW-1591
> URL: https://issues.apache.org/jira/browse/ARROW-1591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174723#comment-16174723
 ] 

ASF GitHub Bot commented on ARROW-1590:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1120
  
@trxcllnt can you rebase on master? 


> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible

2017-09-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1578.
-
Resolution: Fixed

Issue resolved by pull request 1118
[https://github.com/apache/arrow/pull/1118]

> [C++/Python] Run lint checks in Travis CI to fail for linting issues as early 
> as possible
> -
>
> Key: ARROW-1578
> URL: https://issues.apache.org/jira/browse/ARROW-1578
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The lint checks are run relatively late in the CI process, and a build may 
> fail after holding a worker for ~20 minutes or more. These could fail much 
> sooner and free up build slaves



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible

2017-09-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1578:
---

Assignee: Wes McKinney

> [C++/Python] Run lint checks in Travis CI to fail for linting issues as early 
> as possible
> -
>
> Key: ARROW-1578
> URL: https://issues.apache.org/jira/browse/ARROW-1578
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The lint checks are run relatively late in the CI process, and a build may 
> fail after holding a worker for ~20 minutes or more. These could fail much 
> sooner and free up build slaves



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174733#comment-16174733
 ] 

ASF GitHub Bot commented on ARROW-1591:
---

Github user xhochy commented on the issue:

https://github.com/apache/arrow/pull/1121
  
@wesm any idea on how to fix this?


> C++: Xcode 9 is not correctly detected
> --
>
> Key: ARROW-1591
> URL: https://issues.apache.org/jira/browse/ARROW-1591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174776#comment-16174776
 ] 

ASF GitHub Bot commented on ARROW-1591:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1121
  
It is probably related to a package update in conda-forge causing an issue 
with the package dependency graph. I will take a look


> C++: Xcode 9 is not correctly detected
> --
>
> Key: ARROW-1591
> URL: https://issues.apache.org/jira/browse/ARROW-1591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1591.
-
Resolution: Fixed

Issue resolved by pull request 1121
[https://github.com/apache/arrow/pull/1121]

> C++: Xcode 9 is not correctly detected
> --
>
> Key: ARROW-1591
> URL: https://issues.apache.org/jira/browse/ARROW-1591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1591) C++: Xcode 9 is not correctly detected

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174779#comment-16174779
 ] 

ASF GitHub Bot commented on ARROW-1591:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1121


> C++: Xcode 9 is not correctly detected
> --
>
> Key: ARROW-1591
> URL: https://issues.apache.org/jira/browse/ARROW-1591
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See https://github.com/ray-project/ray/issues/1000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1592) [GLib] Add GArrowUIntArrayBuilder

2017-09-21 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-1592:
---

 Summary: [GLib] Add GArrowUIntArrayBuilder
 Key: ARROW-1592
 URL: https://issues.apache.org/jira/browse/ARROW-1592
 Project: Apache Arrow
  Issue Type: New Feature
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1592) [GLib] Add GArrowUIntArrayBuilder

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174830#comment-16174830
 ] 

ASF GitHub Bot commented on ARROW-1592:
---

GitHub user kou opened a pull request:

https://github.com/apache/arrow/pull/1122

ARROW-1592: [GLib] Add GArrowUIntArrayBuilder



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kou/arrow glib-add-uint-array-builder

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/arrow/pull/1122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1122


commit 5b597750750388acd5aad1961ff2583e255b87b8
Author: Kouhei Sutou 
Date:   2017-09-21T14:08:04Z

[GLib] Add UIntArrayBuilder




> [GLib] Add GArrowUIntArrayBuilder
> -
>
> Key: ARROW-1592
> URL: https://issues.apache.org/jira/browse/ARROW-1592
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1592) [GLib] Add GArrowUIntArrayBuilder

2017-09-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1592:
--
Labels: pull-request-available  (was: )

> [GLib] Add GArrowUIntArrayBuilder
> -
>
> Key: ARROW-1592
> URL: https://issues.apache.org/jira/browse/ARROW-1592
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-21 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174984#comment-16174984
 ] 

Phillip Cloud commented on ARROW-1588:
--

[~nongli], [~jacq...@dremio.com] mentioned that you and he discussed why 
big-endian might be the optimal choice for byte ordering. Do you remember how 
you determined that to be the case?

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174993#comment-16174993
 ] 

Wes McKinney commented on ARROW-1588:
-

cc [~henryr] [~tarmstrong]

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-21 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175019#comment-16175019
 ] 

Tim Armstrong commented on ARROW-1588:
--

Thanks for the CC Wes. Is there some additional context here about the goals? 
Those decisions (16 bytes and big-endian) are not what I'd expect if we were 
optimising for in-memory processing speed on little-endian architectures. 

[~tarasbob] has recently been working on Impala's decimal implementation so may 
have some thoughts too.

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-21 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175026#comment-16175026
 ] 

Tim Armstrong commented on ARROW-1588:
--

Parquet does encodes some decimals as big-endian values when encoded as 
FIXED_LEN_BYTE_ARRAY but Impala byte-swaps them when reading. I always found 
this curious. This was actually a major source of runtime overhead so we ended 
up with a complicated SIMD byte-swap implementation 
https://github.com/apache/incubator-impala/blob/2e63752858d71cc745534367a686980e060a8180/be/src/util/bit-util.cc#L210

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175056#comment-16175056
 ] 

Wes McKinney commented on ARROW-1588:
-

This seems to support the hypothesis that we should be 16-bytes, little-endian 
in-memory. It would be helpful to get the historical context on why things are 
the way the are in Parquet, though

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175062#comment-16175062
 ] 

Wes McKinney commented on ARROW-1588:
-

[~tarmstrong] in the not-too-distant future we will be creating a columnar 
function kernel library in C++ to process arrays of contiguous decimals 
in-memory, so we would want operations like {{arr * 2}], where {{arr}} contains 
multiple decimal values in a buffer to evaluate as fast as possible. For the 
moment the only place where Arrow users are doing analytics on Decimals in in 
Dremio https://github.com/dremio/dremio-oss. 

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175062#comment-16175062
 ] 

Wes McKinney edited comment on ARROW-1588 at 9/21/17 4:34 PM:
--

[~tarmstrong] in the not-too-distant future we will be creating a columnar 
function kernel library in C++ to process arrays of contiguous decimals 
in-memory, so we would want operations like {{arr * 2}}, where {{arr}} contains 
multiple decimal values in a buffer to evaluate as fast as possible. For the 
moment the only place where Arrow users are doing analytics on Decimals in in 
Dremio https://github.com/dremio/dremio-oss. 


was (Author: wesmckinn):
[~tarmstrong] in the not-too-distant future we will be creating a columnar 
function kernel library in C++ to process arrays of contiguous decimals 
in-memory, so we would want operations like {{arr * 2}], where {{arr}} contains 
multiple decimal values in a buffer to evaluate as fast as possible. For the 
moment the only place where Arrow users are doing analytics on Decimals in in 
Dremio https://github.com/dremio/dremio-oss. 

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1593) [PYTHON] serialize_pandas should pass through the preserve_index keyword

2017-09-21 Thread Tom Augspurger (JIRA)
Tom Augspurger created ARROW-1593:
-

 Summary: [PYTHON] serialize_pandas should pass through the 
preserve_index keyword
 Key: ARROW-1593
 URL: https://issues.apache.org/jira/browse/ARROW-1593
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.7.0
Reporter: Tom Augspurger
Assignee: Tom Augspurger
Priority: Minor
 Fix For: 0.8.0


I'm doing some benchmarking of Arrow serialization for dask.distributed to 
serialize dataframes.

Overall things look good compared to the current implementation (using pickle). 
The biggest difference was pickle's ability to use pandas' RangeIndex to avoid 
serializing the entire Index of values when possible.

I suspect that a "range type" isn't in scope for arrow, but in the meantime 
applications using Arrow could detect the `RangeIndex`, and pass {{ 
pyarrow.serialize_pandas(df, preserve_index=False) }} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1593) [PYTHON] serialize_pandas should pass through the preserve_index keyword

2017-09-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175077#comment-16175077
 ] 

Wes McKinney commented on ARROW-1593:
-

That seems reasonable. We could also address the range index by augmenting the 
metadata in http://pandas-docs.github.io/pandas-docs-travis/developer.html

I have spent significantly less time optimizing {{RecordBatch.from_pandas}} 
than {{RecordBatch/Table.to_pandas}}, so there's likely some perf improvements 
that could be made on the ingest to Arrow path (e.g. parallel column 
conversions)

> [PYTHON] serialize_pandas should pass through the preserve_index keyword
> 
>
> Key: ARROW-1593
> URL: https://issues.apache.org/jira/browse/ARROW-1593
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Assignee: Tom Augspurger
>Priority: Minor
> Fix For: 0.8.0
>
>
> I'm doing some benchmarking of Arrow serialization for dask.distributed to 
> serialize dataframes.
> Overall things look good compared to the current implementation (using 
> pickle). The biggest difference was pickle's ability to use pandas' 
> RangeIndex to avoid serializing the entire Index of values when possible.
> I suspect that a "range type" isn't in scope for arrow, but in the meantime 
> applications using Arrow could detect the `RangeIndex`, and pass {{ 
> pyarrow.serialize_pandas(df, preserve_index=False) }} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1594) [Python] Enable multi-threaded conversions in Table.from_pandas

2017-09-21 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1594:
---

 Summary: [Python] Enable multi-threaded conversions in 
Table.from_pandas
 Key: ARROW-1594
 URL: https://issues.apache.org/jira/browse/ARROW-1594
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1595) [Python] Fix package dependency issues causing build failures

2017-09-21 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1595:
---

 Summary: [Python] Fix package dependency issues causing build 
failures
 Key: ARROW-1595
 URL: https://issues.apache.org/jira/browse/ARROW-1595
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.8.0


We are installing package requirements for the Python build in two steps, and 
the second step is causing conda to downgrade NumPy, resulting in an ABI 
conflict and broken build. I'm not sure why this suddenly started happening, 
but installing the packages all at once and pinning the NumPy version should 
fix it

https://travis-ci.org/apache/arrow/jobs/278202858#L9106



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1595) [Python] Fix package dependency issues causing build failures

2017-09-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1595:
--
Labels: pull-request-available  (was: )

> [Python] Fix package dependency issues causing build failures
> -
>
> Key: ARROW-1595
> URL: https://issues.apache.org/jira/browse/ARROW-1595
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We are installing package requirements for the Python build in two steps, and 
> the second step is causing conda to downgrade NumPy, resulting in an ABI 
> conflict and broken build. I'm not sure why this suddenly started happening, 
> but installing the packages all at once and pinning the NumPy version should 
> fix it
> https://travis-ci.org/apache/arrow/jobs/278202858#L9106



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1595) [Python] Fix package dependency issues causing build failures

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175138#comment-16175138
 ] 

ASF GitHub Bot commented on ARROW-1595:
---

GitHub user wesm opened a pull request:

https://github.com/apache/arrow/pull/1123

ARROW-1595: [Python] Fix package dependency resolution issue causing broken 
builds

One of the dependencies installed in the docs requirements is causing NumPy 
to get downgraded by the SAT solver, and this is then causing an ABI conflict 
with the pyarrow build (which was built with a different version of NumPy). 
This installs everything in one `conda install` call

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wesm/arrow ARROW-1595

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/arrow/pull/1123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1123


commit 60b05ad5bf24609d07eec0564360b862f84d7863
Author: Wes McKinney 
Date:   2017-09-21T17:17:46Z

Install conda dependencies all at once, pin NumPy version

Change-Id: Ie5866141da967c8bcc6ff4281710b1f66d28b62d




> [Python] Fix package dependency issues causing build failures
> -
>
> Key: ARROW-1595
> URL: https://issues.apache.org/jira/browse/ARROW-1595
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We are installing package requirements for the Python build in two steps, and 
> the second step is causing conda to downgrade NumPy, resulting in an ABI 
> conflict and broken build. I'm not sure why this suddenly started happening, 
> but installing the packages all at once and pinning the NumPy version should 
> fix it
> https://travis-ci.org/apache/arrow/jobs/278202858#L9106



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1596) [Python] Expand serialization test suite for NumPy arrays

2017-09-21 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1596:
---

 Summary: [Python] Expand serialization test suite for NumPy arrays
 Key: ARROW-1596
 URL: https://issues.apache.org/jira/browse/ARROW-1596
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.8.0


see 
https://github.com/dask/distributed/blob/master/distributed/protocol/tests/test_numpy.py#L30-L65
 for inspiration



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-21 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175166#comment-16175166
 ] 

Tim Armstrong commented on ARROW-1588:
--

Ah ok, makes sense. I don't know to what extent this applies but our experience 
is that decimal operations are a lot faster on narrower 4-byte and 8-byte 
representations. One reason is that the 4 byte and 8 byte decimal values fit in 
registers and can be manipulated with normal integer operations. A more subtle 
reason is that implementing some operations correctly (at least in Impala's 
implementation) requires temporarily promoting to a wider type, e.g. 4 byte -> 
8 byte or 8 byte -> 16 bytes. Emulated 128-bit and 256-bit operations are 
pretty slow.

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175259#comment-16175259
 ] 

ASF GitHub Bot commented on ARROW-1590:
---

Github user trxcllnt commented on the issue:

https://github.com/apache/arrow/pull/1120
  
@wesm done


> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1597) [Packaging] arrow-compute.pc is missing in .deb/.rpm file list

2017-09-21 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-1597:
---

 Summary: [Packaging] arrow-compute.pc is missing in .deb/.rpm file 
list
 Key: ARROW-1597
 URL: https://issues.apache.org/jira/browse/ARROW-1597
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
Priority: Minor
 Fix For: 0.7.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175581#comment-16175581
 ] 

ASF GitHub Bot commented on ARROW-1347:
---

Github user StevenMPhillips commented on the issue:

https://github.com/apache/arrow/pull/959
  
Thanks, @BryanCutler, if you want to go ahead and post your suggested 
change, I can merge it.


> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175584#comment-16175584
 ] 

ASF GitHub Bot commented on ARROW-1347:
---

Github user StevenMPhillips commented on the issue:

https://github.com/apache/arrow/pull/1119
  
+1 LGTM


> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-21 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved ARROW-1347.

Resolution: Fixed

Issue resolved by pull request 1119
[https://github.com/apache/arrow/pull/1119]

> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175589#comment-16175589
 ] 

ASF GitHub Bot commented on ARROW-1347:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1119


> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175618#comment-16175618
 ] 

ASF GitHub Bot commented on ARROW-1347:
---

Github user BryanCutler commented on the issue:

https://github.com/apache/arrow/pull/1119
  
Thanks @StevenMPhillips!


> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175633#comment-16175633
 ] 

ASF GitHub Bot commented on ARROW-1590:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1120
  
There are still some extra commits that should rebase out if you base 
against the current apache/master. I rebased some commits on the release branch 
and force-pushed master after 0.7.0 which is why there's an issue


> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175643#comment-16175643
 ] 

ASF GitHub Bot commented on ARROW-1590:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1120
  
Sweet, thanks!


> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175642#comment-16175642
 ] 

ASF GitHub Bot commented on ARROW-1590:
---

Github user trxcllnt commented on the issue:

https://github.com/apache/arrow/pull/1120
  
@wesm ok, I rebased and dropped the extra commits. 


> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1595) [Python] Fix package dependency issues causing build failures

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175649#comment-16175649
 ] 

ASF GitHub Bot commented on ARROW-1595:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1123


> [Python] Fix package dependency issues causing build failures
> -
>
> Key: ARROW-1595
> URL: https://issues.apache.org/jira/browse/ARROW-1595
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We are installing package requirements for the Python build in two steps, and 
> the second step is causing conda to downgrade NumPy, resulting in an ABI 
> conflict and broken build. I'm not sure why this suddenly started happening, 
> but installing the packages all at once and pinning the NumPy version should 
> fix it
> https://travis-ci.org/apache/arrow/jobs/278202858#L9106



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1595) [Python] Fix package dependency issues causing build failures

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175648#comment-16175648
 ] 

ASF GitHub Bot commented on ARROW-1595:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1123
  
+1. thanks @jreback -- if the build failures reappear we can apply this fix


> [Python] Fix package dependency issues causing build failures
> -
>
> Key: ARROW-1595
> URL: https://issues.apache.org/jira/browse/ARROW-1595
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We are installing package requirements for the Python build in two steps, and 
> the second step is causing conda to downgrade NumPy, resulting in an ABI 
> conflict and broken build. I'm not sure why this suddenly started happening, 
> but installing the packages all at once and pinning the NumPy version should 
> fix it
> https://travis-ci.org/apache/arrow/jobs/278202858#L9106



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1595) [Python] Fix package dependency issues causing build failures

2017-09-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1595.
-
Resolution: Fixed

Issue resolved by pull request 1123
[https://github.com/apache/arrow/pull/1123]

> [Python] Fix package dependency issues causing build failures
> -
>
> Key: ARROW-1595
> URL: https://issues.apache.org/jira/browse/ARROW-1595
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We are installing package requirements for the Python build in two steps, and 
> the second step is causing conda to downgrade NumPy, resulting in an ABI 
> conflict and broken build. I'm not sure why this suddenly started happening, 
> but installing the packages all at once and pinning the NumPy version should 
> fix it
> https://travis-ci.org/apache/arrow/jobs/278202858#L9106



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1589) Fuzzing for certain input formats

2017-09-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175753#comment-16175753
 ] 

Wes McKinney commented on ARROW-1589:
-

Could you clarify what kinds of malformed input you are talking about? I am not 
sure it is a requirement for the stream reader to be able to consistently 
return errors on random bytes input. 

In Arrow we need to distinguish between "can't fail" and "can fail" errors. The 
"can't fail" errors you detect in debug builds with DCHECK assertions. These 
are the kinds of errors that can only occur if the library developer (for 
example, an Arrow Java developer or an Arrow C++ developer) has implemented 
something incorrectly. Unit tests or integration tests must be written to 
exercise relevant code paths to catch these issues. I have found the debug 
assertions are especially helpful when refactoring, and they cost nothing in 
release builds.

In the case of reading record batches from a stream, i.e. according to the 
encapsulated message format described in http://arrow.apache.org/docs/ipc.html, 
if you are able to read the indicated number of metadata bytes from the stream, 
then it is assumed to be a valid Flatbuffer, and the sender has respected 
invariants that are detectable in an integration test -- we may check do some 
sanity checks of invariants such as the number of buffers in a record batch. 
Same goes for the message body.

If a Flatbuffer is truly malformed in some way in a way that cannot be detected 
with debug assertions, I am unsure whether we can protect ourselves from 
segfaults. The sender of a record batch stream must be assumed to be trusted 
(i.e. you have adequate integration tests against it to catch "can't fail" 
exceptions) to proceed with reading a stream at all.

> Fuzzing for certain input formats
> -
>
> Key: ARROW-1589
> URL: https://issues.apache.org/jira/browse/ARROW-1589
> Project: Apache Arrow
>  Issue Type: Test
>Reporter: Marco Neumann
>Assignee: Marco Neumann
>
> The arrow lib should have fuzzing tests for certain input formats, e.g. for 
> reading record batches from streams. Ideally, malformed input must not crash 
> the system but must report a proper error. This could easily be implemented 
> e.g. w/ [libfuzzer|https://llvm.org/docs/LibFuzzer.html] in combination with 
> address sanitizer (that's already implemented by Arrow's build system).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1589) Fuzzing for certain input formats

2017-09-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175756#comment-16175756
 ] 

Wes McKinney commented on ARROW-1589:
-

It would be reasonable to implement an {{UntrustedInputStreamReader}} that is 
more robust to random bytes input, like using Flatbuffers' generated 
verification functions (see "Access of untrusted buffers" 
https://google.github.io/flatbuffers/md__cpp_usage.html). This will slow down 
IPC reads but perhaps not by that much

> Fuzzing for certain input formats
> -
>
> Key: ARROW-1589
> URL: https://issues.apache.org/jira/browse/ARROW-1589
> Project: Apache Arrow
>  Issue Type: Test
>Reporter: Marco Neumann
>Assignee: Marco Neumann
>
> The arrow lib should have fuzzing tests for certain input formats, e.g. for 
> reading record batches from streams. Ideally, malformed input must not crash 
> the system but must report a proper error. This could easily be implemented 
> e.g. w/ [libfuzzer|https://llvm.org/docs/LibFuzzer.html] in combination with 
> address sanitizer (that's already implemented by Arrow's build system).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1589) [C++] Fuzzing for certain input formats

2017-09-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1589:

Summary: [C++] Fuzzing for certain input formats  (was: Fuzzing for certain 
input formats)

> [C++] Fuzzing for certain input formats
> ---
>
> Key: ARROW-1589
> URL: https://issues.apache.org/jira/browse/ARROW-1589
> Project: Apache Arrow
>  Issue Type: Test
>Reporter: Marco Neumann
>Assignee: Marco Neumann
>
> The arrow lib should have fuzzing tests for certain input formats, e.g. for 
> reading record batches from streams. Ideally, malformed input must not crash 
> the system but must report a proper error. This could easily be implemented 
> e.g. w/ [libfuzzer|https://llvm.org/docs/LibFuzzer.html] in combination with 
> address sanitizer (that's already implemented by Arrow's build system).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ARROW-1589) [C++] Fuzzing for certain input formats

2017-09-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175756#comment-16175756
 ] 

Wes McKinney edited comment on ARROW-1589 at 9/22/17 12:49 AM:
---

It would be reasonable to implement an {{UntrustedMessageReader}} that is more 
robust to random bytes input, like using Flatbuffers' generated verification 
functions (see "Access of untrusted buffers" 
https://google.github.io/flatbuffers/md__cpp_usage.html). This will slow down 
IPC reads but perhaps not by that much


was (Author: wesmckinn):
It would be reasonable to implement an {{UntrustedInputStreamReader}} that is 
more robust to random bytes input, like using Flatbuffers' generated 
verification functions (see "Access of untrusted buffers" 
https://google.github.io/flatbuffers/md__cpp_usage.html). This will slow down 
IPC reads but perhaps not by that much

> [C++] Fuzzing for certain input formats
> ---
>
> Key: ARROW-1589
> URL: https://issues.apache.org/jira/browse/ARROW-1589
> Project: Apache Arrow
>  Issue Type: Test
>Reporter: Marco Neumann
>Assignee: Marco Neumann
>
> The arrow lib should have fuzzing tests for certain input formats, e.g. for 
> reading record batches from streams. Ideally, malformed input must not crash 
> the system but must report a proper error. This could easily be implemented 
> e.g. w/ [libfuzzer|https://llvm.org/docs/LibFuzzer.html] in combination with 
> address sanitizer (that's already implemented by Arrow's build system).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1590.
-
Resolution: Fixed

Issue resolved by pull request 1120
[https://github.com/apache/arrow/pull/1120]

> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1590) Flow TS Table method generics

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175780#comment-16175780
 ] 

ASF GitHub Bot commented on ARROW-1590:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1120


> Flow TS Table method generics
> -
>
> Key: ARROW-1590
> URL: https://issues.apache.org/jira/browse/ARROW-1590
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>  Labels: pull-request-available
>
> The Table method generics should thread the Vector and value types through 
> from the call site.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)