[jira] [Commented] (ARROW-1538) [C++] Support Ubuntu 14.04 in .deb packaging automation

2017-09-20 Thread Rares Vernica (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174290#comment-16174290
 ] 

Rares Vernica commented on ARROW-1538:
--

Thanks for the pointer on glib. I applied the {{bfe65790}} commit on top of the 
{{0.7.0}} tag and I am able to move past that error, but I get some errors 
related to missing files, even while building for Ubuntu {{16.04}}:
{code}
# docker run --rm --tty --volume /arrow-dist/cpp-linux/apt:/host:rw --env 
DEBUG=yes apache-arrow-ubuntu-16.04 /host/build.sh
...
-- Installing: 
/build/apache-arrow-0.7.0/debian/tmp/usr/include/arrow/python/type_traits.h
-- Installing: 
/build/apache-arrow-0.7.0/debian/tmp/usr/lib/x86_64-linux-gnu/pkgconfig/arrow-python.pc
make[2]: Leaving directory '/build/apache-arrow-0.7.0/cpp_build'
dh_auto_install \
  --sourcedirectory=c_glib  \
  --builddirectory=c_glib_build
make[1]: Leaving directory '/build/apache-arrow-0.7.0'
   dh_install
dh_install: libarrow-glib0 missing files: usr/lib/*/libarrow-glib.so.*
dh_install: gir1.2-arrow-1.0 missing files: usr/lib/*/girepository-1.0/
dh_install: libarrow-glib-dev missing files: usr/include/arrow-glib/
dh_install: libarrow-glib-dev missing files: usr/lib/*/libarrow-glib.a
dh_install: libarrow-glib-dev missing files: usr/lib/*/libarrow-glib.so
dh_install: libarrow-glib-dev missing files: usr/lib/*/pkgconfig/arrow-glib.pc
dh_install: libarrow-glib-dev missing files: usr/share/gir-1.0/
dh_install: libarrow-glib-dev missing files: usr/share/arrow-glib/example/
dh_install: libarrow-glib-doc missing files: 
usr/share/doc/libarrow-glib-doc/arrow-glib/
dh_install: missing files, aborting
debian/rules:12: recipe for target 'binary' failed
make: *** [binary] Error 2
dpkg-buildpackage: error: fakeroot debian/rules binary gave error exit status 2
debuild: fatal error at line 1376:
dpkg-buildpackage -rfakeroot -D -us -uc failed
Failed debuild -us -uc
{code}
It seems like nothing is happening for {{c_glib}} during {{dh_auto_install}}.

> [C++] Support Ubuntu 14.04 in .deb packaging automation
> ---
>
> Key: ARROW-1538
> URL: https://issues.apache.org/jira/browse/ARROW-1538
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Wes McKinney
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174169#comment-16174169
 ] 

ASF GitHub Bot commented on ARROW-1578:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1118


> [C++/Python] Run lint checks in Travis CI to fail for linting issues as early 
> as possible
> -
>
> Key: ARROW-1578
> URL: https://issues.apache.org/jira/browse/ARROW-1578
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The lint checks are run relatively late in the CI process, and a build may 
> fail after holding a worker for ~20 minutes or more. These could fail much 
> sooner and free up build slaves



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ARROW-1585) serialize_pandas round trip fails on integer columns

2017-09-20 Thread Tom Augspurger (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174086#comment-16174086
 ] 

Tom Augspurger edited comment on ARROW-1585 at 9/21/17 1:11 AM:


Sorry, yes, I meant for the original data to be {{ pd.DataFrame({0: [1, 2]}) }} 
(an int, not a string).

Agreed that restricting field names to strings is best. Being able to 
reconstruct the original from the metadata is sufficient.


was (Author: tomaugspurger):
Sorry, yes, I meant for the original data to be {{ pd.DataFrame({0: [1, 
2]}))).columns }} (an int, not a string).

Agreed that restricting field names to strings is best. Being able to 
reconstruct the original from the metadata is sufficient.

> serialize_pandas round trip fails on integer columns
> 
>
> Key: ARROW-1585
> URL: https://issues.apache.org/jira/browse/ARROW-1585
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Priority: Minor
> Fix For: 0.8.0
>
>
> This roundtrip fails, since the Integer column isn't converted to a string 
> after deserializing
> {code:python}
> In [1]: import pandas as pd
> im
> In [2]: import pyarrow as pa
> In [3]: pa.deserialize_pandas(pa.serialize_pandas(pd.DataFrame({"0": [1, 
> 2]}))).columns
> Out[3]: Index(['0'], dtype='object')
> {code}
> That should be an {{ Int64Index([0]) }} for the columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1585) serialize_pandas round trip fails on integer columns

2017-09-20 Thread Tom Augspurger (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174086#comment-16174086
 ] 

Tom Augspurger commented on ARROW-1585:
---

Sorry, yes, I meant for the original data to be {{ pd.DataFrame({0: [1, 
2]}))).columns }} (an int, not a string).

Agreed that restricting field names to strings is best. Being able to 
reconstruct the original from the metadata is sufficient.

> serialize_pandas round trip fails on integer columns
> 
>
> Key: ARROW-1585
> URL: https://issues.apache.org/jira/browse/ARROW-1585
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Priority: Minor
> Fix For: 0.8.0
>
>
> This roundtrip fails, since the Integer column isn't converted to a string 
> after deserializing
> {code:python}
> In [1]: import pandas as pd
> im
> In [2]: import pyarrow as pa
> In [3]: pa.deserialize_pandas(pa.serialize_pandas(pd.DataFrame({"0": [1, 
> 2]}))).columns
> Out[3]: Index(['0'], dtype='object')
> {code}
> That should be an {{ Int64Index([0]) }} for the columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2017-09-20 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174080#comment-16174080
 ] 

Wes McKinney commented on ARROW-1581:
-

I don't see why not to put them on PyPI as long as the Apache project does not 
advertise them. It might be a little work to munge the package metadata to do 
this. I have already found having the conda nightlies to be incredibly useful

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create

2017-09-20 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1500.
-
Resolution: Fixed

Issue resolved by pull request 1116
[https://github.com/apache/arrow/pull/1116]

> [C++] Result of ftruncate ignored in MemoryMappedFile::Create
> -
>
> Key: ARROW-1500
> URL: https://issues.apache.org/jira/browse/ARROW-1500
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Amir Malekpour
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Observed in gcc 5.4.0 release build



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1585) serialize_pandas round trip fails on integer columns

2017-09-20 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174073#comment-16174073
 ] 

Wes McKinney commented on ARROW-1585:
-

You mean integer 0 instead of {{"0"}} for the column name, though, right?

{code}
In [7]: df = pd.DataFrame({"0": [1, 2]})

In [8]: df.columns
Out[8]: Index(['0'], dtype='object')
{code}

We made the decision to coerce non-string column names to strings, but we could 
add metadata to http://pandas-docs.github.io/pandas-docs-travis/developer.html 
that allows the original dtype to be recovered for the simple cases (e.g. 
{{Int64Index}}). cc [~cpcloud]

> serialize_pandas round trip fails on integer columns
> 
>
> Key: ARROW-1585
> URL: https://issues.apache.org/jira/browse/ARROW-1585
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Priority: Minor
> Fix For: 0.8.0
>
>
> This roundtrip fails, since the Integer column isn't converted to a string 
> after deserializing
> {code:python}
> In [1]: import pandas as pd
> im
> In [2]: import pyarrow as pa
> In [3]: pa.deserialize_pandas(pa.serialize_pandas(pd.DataFrame({"0": [1, 
> 2]}))).columns
> Out[3]: Index(['0'], dtype='object')
> {code}
> That should be an {{ Int64Index([0]) }} for the columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1586) [PYTHON] serialize_pandas roundtrip loses columns name

2017-09-20 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud reassigned ARROW-1586:


Assignee: Phillip Cloud

> [PYTHON] serialize_pandas roundtrip loses columns name
> --
>
> Key: ARROW-1586
> URL: https://issues.apache.org/jira/browse/ARROW-1586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Assignee: Phillip Cloud
>Priority: Minor
> Fix For: 0.8.0
>
>
> The serialize / deserialize roundtrip loses {{ df.columns.name }}
> {code:python}
> In [1]: import pandas as pd
> In [2]: import pyarrow as pa
> In [3]: df = pd.DataFrame([[1, 2]], columns=pd.Index(['a', 'b'], 
> name='col_name'))
> In [4]: df.columns.name
> Out[4]: 'col_name'
> In [5]: pa.deserialize_pandas(pa.serialize_pandas(df)).columns.name
> {code}
> Is this in scope for pyarrow? I suspect it would require an update to the 
> pandas section of the Schema metadata.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1586) [PYTHON] serialize_pandas roundtrip loses columns name

2017-09-20 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174068#comment-16174068
 ] 

Wes McKinney commented on ARROW-1586:
-

cc [~cpcloud]

> [PYTHON] serialize_pandas roundtrip loses columns name
> --
>
> Key: ARROW-1586
> URL: https://issues.apache.org/jira/browse/ARROW-1586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Priority: Minor
> Fix For: 0.8.0
>
>
> The serialize / deserialize roundtrip loses {{ df.columns.name }}
> {code:python}
> In [1]: import pandas as pd
> In [2]: import pyarrow as pa
> In [3]: df = pd.DataFrame([[1, 2]], columns=pd.Index(['a', 'b'], 
> name='col_name'))
> In [4]: df.columns.name
> Out[4]: 'col_name'
> In [5]: pa.deserialize_pandas(pa.serialize_pandas(df)).columns.name
> {code}
> Is this in scope for pyarrow? I suspect it would require an update to the 
> pandas section of the Schema metadata.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1586) [PYTHON] serialize_pandas roundtrip loses columns name

2017-09-20 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174067#comment-16174067
 ] 

Wes McKinney commented on ARROW-1586:
-

Yes, we should preserve this metadata and add this to 
http://pandas-docs.github.io/pandas-docs-travis/developer.html#storing-pandas-dataframe-objects-in-apache-parquet-format.
 Though perhaps we can constrain the name to be a string, or coercible to a 
string? 

> [PYTHON] serialize_pandas roundtrip loses columns name
> --
>
> Key: ARROW-1586
> URL: https://issues.apache.org/jira/browse/ARROW-1586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Priority: Minor
> Fix For: 0.8.0
>
>
> The serialize / deserialize roundtrip loses {{ df.columns.name }}
> {code:python}
> In [1]: import pandas as pd
> In [2]: import pyarrow as pa
> In [3]: df = pd.DataFrame([[1, 2]], columns=pd.Index(['a', 'b'], 
> name='col_name'))
> In [4]: df.columns.name
> Out[4]: 'col_name'
> In [5]: pa.deserialize_pandas(pa.serialize_pandas(df)).columns.name
> {code}
> Is this in scope for pyarrow? I suspect it would require an update to the 
> pandas section of the Schema metadata.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-20 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud updated ARROW-1588:
-
Component/s: Format

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-20 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud updated ARROW-1588:
-
Issue Type: Improvement  (was: Bug)

> [C++/Format] Harden Decimal Format
> --
>
> Key: ARROW-1588
> URL: https://issues.apache.org/jira/browse/ARROW-1588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.7.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
> Fix For: 0.8.0
>
>
> We should finalize and harden the decimal format. The remaining issues are 
> officially writing down the choice of making every decimal value 16 bytes and 
> byte order.
> For byte order we'll need to run some benchmarks to compare little endian vs 
> big endian. I plan to work on this over the next week or two.
> [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd 
> like to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1588) [C++/Format] Harden Decimal Format

2017-09-20 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-1588:


 Summary: [C++/Format] Harden Decimal Format
 Key: ARROW-1588
 URL: https://issues.apache.org/jira/browse/ARROW-1588
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.7.0
Reporter: Phillip Cloud
Assignee: Phillip Cloud
 Fix For: 0.8.0


We should finalize and harden the decimal format. The remaining issues are 
officially writing down the choice of making every decimal value 16 bytes and 
byte order.

For byte order we'll need to run some benchmarks to compare little endian vs 
big endian. I plan to work on this over the next week or two.

[~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd like 
to see addressed here please chime in. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1587) [Format] Add metadata for user-defined logical types

2017-09-20 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1587:
---

 Summary: [Format] Add metadata for user-defined logical types
 Key: ARROW-1587
 URL: https://issues.apache.org/jira/browse/ARROW-1587
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Reporter: Wes McKinney
 Fix For: 0.8.0


While we have the custom_metadata field at the Field level, it may be useful to 
have a proper user-defined type metadata in the `Type` union, which would allow 
us to provide a physical representation type (e.g. "Latitude longitude is 
represented by a struct, whose children consist of two doubles") from the other 
non-user defined types.

This is more flexible than {{custom_metadata}} because we can leverage existing 
structure in the Flatbuffers for describing the user type

https://github.com/apache/arrow/blob/master/format/Schema.fbs#L285



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174008#comment-16174008
 ] 

ASF GitHub Bot commented on ARROW-1347:
---

Github user BryanCutler commented on the issue:

https://github.com/apache/arrow/pull/1119
  
Continuation of #959 to use `instanceof` and add a test.  cc @jacques-n 
@wesm @StevenMPhillips 


> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174007#comment-16174007
 ] 

ASF GitHub Bot commented on ARROW-1347:
---

GitHub user BryanCutler opened a pull request:

https://github.com/apache/arrow/pull/1119

ARROW-1347: [JAVA] Return consistent child field name for List Vectors

This makes the child fields of ListVector have consistent names of 
`ListVector.DATA_VECTOR_NAME`. Previously, an empty ListVector would have a 
child name of `ZeroVector.name` which is "[DEFAULT]".

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/arrow 
java-ListVector-child-name-ARROW-1347

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/arrow/pull/1119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1119


commit 2923a453ef005278db834e85d6fa084dbc3453a3
Author: Steven Phillips 
Date:   2017-08-10T22:15:28Z

ARROW-1347: [JAVA] return consistent child field name for List vectors

commit c240378b3122d95351ad97db78bfb45d34097d61
Author: Bryan Cutler 
Date:   2017-09-20T23:25:28Z

changed to use instanceof and added test




> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1586) [PYTHON] serialize_pandas roundtrip loses columns name

2017-09-20 Thread Tom Augspurger (JIRA)
Tom Augspurger created ARROW-1586:
-

 Summary: [PYTHON] serialize_pandas roundtrip loses columns name
 Key: ARROW-1586
 URL: https://issues.apache.org/jira/browse/ARROW-1586
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.7.0
Reporter: Tom Augspurger
Priority: Minor
 Fix For: 0.8.0


The serialize / deserialize roundtrip loses {{ df.columns.name }}

{code:python}
In [1]: import pandas as pd

In [2]: import pyarrow as pa

In [3]: df = pd.DataFrame([[1, 2]], columns=pd.Index(['a', 'b'], 
name='col_name'))

In [4]: df.columns.name
Out[4]: 'col_name'

In [5]: pa.deserialize_pandas(pa.serialize_pandas(df)).columns.name
{code}

Is this in scope for pyarrow? I suspect it would require an update to the 
pandas section of the Schema metadata.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1585) serialize_pandas round trip fails on integer columns

2017-09-20 Thread Tom Augspurger (JIRA)
Tom Augspurger created ARROW-1585:
-

 Summary: serialize_pandas round trip fails on integer columns
 Key: ARROW-1585
 URL: https://issues.apache.org/jira/browse/ARROW-1585
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.7.0
Reporter: Tom Augspurger
Priority: Minor
 Fix For: 0.8.0


This roundtrip fails, since the Integer column isn't converted to a string 
after deserializing

{code:python}
In [1]: import pandas as pd
im
In [2]: import pyarrow as pa

In [3]: pa.deserialize_pandas(pa.serialize_pandas(pd.DataFrame({"0": [1, 
2]}))).columns
Out[3]: Index(['0'], dtype='object')
{code}

That should be an {{ Int64Index([0]) }} for the columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1584) [PYTHON] serialize_pandas on empty dataframe

2017-09-20 Thread Tom Augspurger (JIRA)
Tom Augspurger created ARROW-1584:
-

 Summary: [PYTHON] serialize_pandas on empty dataframe
 Key: ARROW-1584
 URL: https://issues.apache.org/jira/browse/ARROW-1584
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.7.0
Reporter: Tom Augspurger
Priority: Minor
 Fix For: 0.8.0


This code

{code:python}
import pandas as pd
import pyarrow as pa

pa.serialize_pandas(pd.DataFrame())
{code}

Raises

{code}
---
ArrowNotImplementedError  Traceback (most recent call last)
 in ()
> 1 pa.serialize_pandas(pd.DataFrame())

~/Envs/dask-dev/lib/python3.6/site-packages/pyarrow/ipc.py in 
serialize_pandas(df)
158 sink = pa.BufferOutputStream()
159 writer = pa.RecordBatchStreamWriter(sink, batch.schema)
--> 160 writer.write_batch(batch)
161 writer.close()
162 return sink.get_result()

pyarrow/ipc.pxi in pyarrow.lib._RecordBatchWriter.write_batch 
(/Users/travis/build/apache/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/lib.cxx:59238)()

pyarrow/error.pxi in pyarrow.lib.check_status 
(/Users/travis/build/apache/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/lib.cxx:8113)()

ArrowNotImplementedError: Unable to convert type: null

{code}

Presumably {{pa.deserialize_pandas}} will need a fix as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173882#comment-16173882
 ] 

ASF GitHub Bot commented on ARROW-1500:
---

Github user amirma commented on the issue:

https://github.com/apache/arrow/pull/1116
  
Fixed lint errors.


> [C++] Result of ftruncate ignored in MemoryMappedFile::Create
> -
>
> Key: ARROW-1500
> URL: https://issues.apache.org/jira/browse/ARROW-1500
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Amir Malekpour
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Observed in gcc 5.4.0 release build



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2017-09-20 Thread Robert Nishihara (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173840#comment-16173840
 ] 

Robert Nishihara commented on ARROW-1581:
-

Are you planning on putting these on PyPI? I'd like to do something similar 
with Ray, ideally people would be able to pip install the project from any 
commit. Sort of like https://pypi.python.org/pypi/tf-nightly except with every 
commit, not just the most recent.

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1579) Add dockerized test setup to validate Spark integration

2017-09-20 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173609#comment-16173609
 ] 

Wes McKinney commented on ARROW-1579:
-

I think just the Docker images, and we could run nightly builds at some point 
or simply run all our "ad hoc" integration tests prior to cutting release 
candidates. Basically I don't want to be surprised by an issue when an RC is 
out for a vote

cc [~heimir]

> Add dockerized test setup to validate Spark integration
> ---
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1583) Use "travis_retry" function in some key places to reduce CI flakiness

2017-09-20 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1583:
---

 Summary: Use "travis_retry" function in some key places to reduce 
CI flakiness
 Key: ARROW-1583
 URL: https://issues.apache.org/jira/browse/ARROW-1583
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Wes McKinney


There's enough things that can go wrong in our CI due to external package 
registries that we can often end up with spurious failures not caused by code 
changes.

For example, here is an NPM registry failure:

https://travis-ci.org/apache/arrow/jobs/277798491#L941

I have seen Maven Central fail or anaconda.org fail in the past, too. Some of 
these package commands that hit external resources could be wrapped in 
{{travis_retry}} to give them another shot at success



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1579) Add dockerized test setup to validate Spark integration

2017-09-20 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173592#comment-16173592
 ] 

Bryan Cutler commented on ARROW-1579:
-

This would be awesome to have!  I'm glad to help out.  Is this just to create 
the docker images or will it also be run as part of CI?

> Add dockerized test setup to validate Spark integration
> ---
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1582) [Python] Set up + document nightly conda builds for macOS

2017-09-20 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1582:
---

 Summary: [Python] Set up + document nightly conda builds for macOS
 Key: ARROW-1582
 URL: https://issues.apache.org/jira/browse/ARROW-1582
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney


It's already been great to be able to test the nightlies on Linux in conda; it 
would be great to be able to do the same on macOS



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1580) [Python] Instructions for setting up nightly builds on Linux

2017-09-20 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1580:
---

 Summary: [Python] Instructions for setting up nightly builds on 
Linux
 Key: ARROW-1580
 URL: https://issues.apache.org/jira/browse/ARROW-1580
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney


cc [~cpcloud]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2017-09-20 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1581:
---

 Summary: [Python] Set up nightly wheel builds for Linux, macOS
 Key: ARROW-1581
 URL: https://issues.apache.org/jira/browse/ARROW-1581
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173558#comment-16173558
 ] 

ASF GitHub Bot commented on ARROW-1557:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1117
  
In case it's useful we have nightly dev builds 



> [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
> 
>
> Key: ARROW-1557
> URL: https://issues.apache.org/jira/browse/ARROW-1557
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Assignee: Tom Augspurger
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> pa.Table.from_arrays doesn't validate that the length of {{arrays}} and 
> {{names}} matches. I think this should raise with a {{ValueError}}:
> {code}
> In [1]: import pyarrow as pa
> In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], 
> names=['a', 'b', 'c'])
> Out[2]:
> pyarrow.Table
> a: int64
> b: int64
> In [3]: pa.__version__
> Out[3]: '0.7.0'
> {code}
> (This is my first time using JIRA, hopefully I didn't mess up too badly)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors

2017-09-20 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1497:
---

Assignee: Li Jin

> [Java] JsonFileReader doesn't set value count for some vectors
> --
>
> Key: ARROW-1497
> URL: https://issues.apache.org/jira/browse/ARROW-1497
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Currently, in complex types, JsonFileReader only sets value count for 
> NullableMapType by an instance check, this is error prone and cause issues 
> with reading other complex types:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269
> We should have a better way to do this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173555#comment-16173555
 ] 

ASF GitHub Bot commented on ARROW-1497:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1067


> [Java] JsonFileReader doesn't set value count for some vectors
> --
>
> Key: ARROW-1497
> URL: https://issues.apache.org/jira/browse/ARROW-1497
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Currently, in complex types, JsonFileReader only sets value count for 
> NullableMapType by an instance check, this is error prone and cause issues 
> with reading other complex types:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269
> We should have a better way to do this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors

2017-09-20 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1497:

Fix Version/s: 0.8.0

> [Java] JsonFileReader doesn't set value count for some vectors
> --
>
> Key: ARROW-1497
> URL: https://issues.apache.org/jira/browse/ARROW-1497
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Currently, in complex types, JsonFileReader only sets value count for 
> NullableMapType by an instance check, this is error prone and cause issues 
> with reading other complex types:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269
> We should have a better way to do this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors

2017-09-20 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1497.
-
Resolution: Fixed

Issue resolved by pull request 1067
[https://github.com/apache/arrow/pull/1067]

> [Java] JsonFileReader doesn't set value count for some vectors
> --
>
> Key: ARROW-1497
> URL: https://issues.apache.org/jira/browse/ARROW-1497
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>  Labels: pull-request-available
>
> Currently, in complex types, JsonFileReader only sets value count for 
> NullableMapType by an instance check, this is error prone and cause issues 
> with reading other complex types:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269
> We should have a better way to do this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length

2017-09-20 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1557.
-
Resolution: Fixed

Issue resolved by pull request 1117
[https://github.com/apache/arrow/pull/1117]

> [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
> 
>
> Key: ARROW-1557
> URL: https://issues.apache.org/jira/browse/ARROW-1557
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Assignee: Tom Augspurger
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> pa.Table.from_arrays doesn't validate that the length of {{arrays}} and 
> {{names}} matches. I think this should raise with a {{ValueError}}:
> {code}
> In [1]: import pyarrow as pa
> In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], 
> names=['a', 'b', 'c'])
> Out[2]:
> pyarrow.Table
> a: int64
> b: int64
> In [3]: pa.__version__
> Out[3]: '0.7.0'
> {code}
> (This is my first time using JIRA, hopefully I didn't mess up too badly)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173550#comment-16173550
 ] 

ASF GitHub Bot commented on ARROW-1557:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1117


> [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
> 
>
> Key: ARROW-1557
> URL: https://issues.apache.org/jira/browse/ARROW-1557
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Assignee: Tom Augspurger
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> pa.Table.from_arrays doesn't validate that the length of {{arrays}} and 
> {{names}} matches. I think this should raise with a {{ValueError}}:
> {code}
> In [1]: import pyarrow as pa
> In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], 
> names=['a', 'b', 'c'])
> Out[2]:
> pyarrow.Table
> a: int64
> b: int64
> In [3]: pa.__version__
> Out[3]: '0.7.0'
> {code}
> (This is my first time using JIRA, hopefully I didn't mess up too badly)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1579) Add dockerized test setup to validate Spark integration

2017-09-20 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1579:
---

 Summary: Add dockerized test setup to validate Spark integration
 Key: ARROW-1579
 URL: https://issues.apache.org/jira/browse/ARROW-1579
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Wes McKinney


cc [~bryanc] -- the goal of this will be to validate master-to-master to catch 
any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173492#comment-16173492
 ] 

ASF GitHub Bot commented on ARROW-1497:
---

Github user siddharthteotia commented on the issue:

https://github.com/apache/arrow/pull/1067
  
+1


> [Java] JsonFileReader doesn't set value count for some vectors
> --
>
> Key: ARROW-1497
> URL: https://issues.apache.org/jira/browse/ARROW-1497
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>  Labels: pull-request-available
>
> Currently, in complex types, JsonFileReader only sets value count for 
> NullableMapType by an instance check, this is error prone and cause issues 
> with reading other complex types:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269
> We should have a better way to do this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1478) [JAVA] clear should release the buffer only if the buffer is not NULL

2017-09-20 Thread Siddharth Teotia (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Teotia resolved ARROW-1478.
-
Resolution: Won't Fix

Not needed.

> [JAVA] clear should release the buffer only if the buffer is not NULL
> -
>
> Key: ARROW-1478
> URL: https://issues.apache.org/jira/browse/ARROW-1478
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>
> In some cases we use a fake allocator in Dremio for the purpose of field 
> materialization only. The buffers of the underlying vectors are not 
> allocated. Fake allocator is a simple implementation of BufferAllocator 
> interface where almost every method throws UnsupportedOperation exception and 
> methods like getEmpty() return NULL.
> It is more like a pass-through mechanism that allows us to be able to 
> instantiate a vector using a non-functional allocator since the constructors 
> in vector code don't allow for the allocator itself to be NULL.
> Portions of code where we have this scenario are generic in nature and so 
> have typical methods like close() / clear() which underneath invoke the 
> corresponding methods on vectors.
> The clear() method in BaseDataValueVector releases the data buffer without 
> checking if the buffer is NULL and that's where callers hit NPE.
> We don't see such problems in Arrow unit tests. My guess is that when a 
> vector is instantiated, the buffer is still probably a valid reference 
> returned through allocator.getEmpty() call in the constructor of 
> BaseDataValueVector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173347#comment-16173347
 ] 

ASF GitHub Bot commented on ARROW-1347:
---

Github user BryanCutler commented on the issue:

https://github.com/apache/arrow/pull/959
  
Wouldn't it be better to use `instanceof`? I could change that and add a 
test for this if @StevenMPhillips is busy


> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173320#comment-16173320
 ] 

ASF GitHub Bot commented on ARROW-1347:
---

Github user jacques-n commented on the issue:

https://github.com/apache/arrow/pull/959
  
LGTM +1.


> [JAVA] List null type should use consistent name for inner field
> 
>
> Key: ARROW-1347
> URL: https://issues.apache.org/jira/browse/ARROW-1347
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>  Labels: pull-request-available
>
> The child field for List type has the field name "$data$" in most cases. In 
> the case that there is not a known type for the List, currently the 
> getField() method will return a subfield with name "DEFAULT". We should make 
> this consistent with the rest of the cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173222#comment-16173222
 ] 

ASF GitHub Bot commented on ARROW-1578:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1118
  
Pretty unsure why https://travis-ci.org/apache/arrow/jobs/277763849 failed


> [C++/Python] Run lint checks in Travis CI to fail for linting issues as early 
> as possible
> -
>
> Key: ARROW-1578
> URL: https://issues.apache.org/jira/browse/ARROW-1578
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The lint checks are run relatively late in the CI process, and a build may 
> fail after holding a worker for ~20 minutes or more. These could fail much 
> sooner and free up build slaves



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173135#comment-16173135
 ] 

ASF GitHub Bot commented on ARROW-1578:
---

GitHub user wesm opened a pull request:

https://github.com/apache/arrow/pull/1118

ARROW-1578: [C++] Run lint checks in Travis CI much earlier at 
before_script stage to fail faster



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wesm/arrow ARROW-1578

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/arrow/pull/1118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1118


commit 329f01790ad5a16965e744acf54440d7fb010c5e
Author: Wes McKinney 
Date:   2017-09-20T12:58:34Z

Run lint checks before compiling anything. Make cpplint warning

Change-Id: Ib812f49e248540c7283a1e058f26925dbc36af00

commit 28fc3fb07589551959664db31997a9a0d8599b0c
Author: Wes McKinney 
Date:   2017-09-20T13:02:00Z

Typo

Change-Id: Ifeae6a35fc35939bfdaf191b2639b3aee9f27274




> [C++/Python] Run lint checks in Travis CI to fail for linting issues as early 
> as possible
> -
>
> Key: ARROW-1578
> URL: https://issues.apache.org/jira/browse/ARROW-1578
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The lint checks are run relatively late in the CI process, and a build may 
> fail after holding a worker for ~20 minutes or more. These could fail much 
> sooner and free up build slaves



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible

2017-09-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1578:
--
Labels: pull-request-available  (was: )

> [C++/Python] Run lint checks in Travis CI to fail for linting issues as early 
> as possible
> -
>
> Key: ARROW-1578
> URL: https://issues.apache.org/jira/browse/ARROW-1578
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The lint checks are run relatively late in the CI process, and a build may 
> fail after holding a worker for ~20 minutes or more. These could fail much 
> sooner and free up build slaves



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible

2017-09-20 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1578:
---

 Summary: [C++/Python] Run lint checks in Travis CI to fail for 
linting issues as early as possible
 Key: ARROW-1578
 URL: https://issues.apache.org/jira/browse/ARROW-1578
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Reporter: Wes McKinney
 Fix For: 0.8.0


The lint checks are run relatively late in the CI process, and a build may fail 
after holding a worker for ~20 minutes or more. These could fail much sooner 
and free up build slaves



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173082#comment-16173082
 ] 

ASF GitHub Bot commented on ARROW-1500:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1116
  
This is failing with cpplint warnings

```
/home/travis/build/apache/arrow/cpp/src/arrow/io/file.cc:138:  Line ends in 
whitespace.  Consider deleting these extra spaces.  [whitespace/end_of_line] [4]
/home/travis/build/apache/arrow/cpp/src/arrow/io/file.cc:610:  Line ends in 
whitespace.  Consider deleting these extra spaces.  [whitespace/end_of_line] [4]
```

you can use `make lint` to run the lint checks locally


> [C++] Result of ftruncate ignored in MemoryMappedFile::Create
> -
>
> Key: ARROW-1500
> URL: https://issues.apache.org/jira/browse/ARROW-1500
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Amir Malekpour
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Observed in gcc 5.4.0 release build



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173065#comment-16173065
 ] 

ASF GitHub Bot commented on ARROW-1497:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1067
  
@siddharthteotia can you take a look at this?

@icexelloss can you change the PR title to start with "ARROW-1497:" (remove 
the brackets). thanks!


> [Java] JsonFileReader doesn't set value count for some vectors
> --
>
> Key: ARROW-1497
> URL: https://issues.apache.org/jira/browse/ARROW-1497
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>  Labels: pull-request-available
>
> Currently, in complex types, JsonFileReader only sets value count for 
> NullableMapType by an instance check, this is error prone and cause issues 
> with reading other complex types:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269
> We should have a better way to do this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors

2017-09-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1497:
--
Labels: pull-request-available  (was: )

> [Java] JsonFileReader doesn't set value count for some vectors
> --
>
> Key: ARROW-1497
> URL: https://issues.apache.org/jira/browse/ARROW-1497
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>  Labels: pull-request-available
>
> Currently, in complex types, JsonFileReader only sets value count for 
> NullableMapType by an instance check, this is error prone and cause issues 
> with reading other complex types:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269
> We should have a better way to do this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173060#comment-16173060
 ] 

ASF GitHub Bot commented on ARROW-1557:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1117
  
`if not K` is probably better, feel free to make that change too


> [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
> 
>
> Key: ARROW-1557
> URL: https://issues.apache.org/jira/browse/ARROW-1557
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
>Reporter: Tom Augspurger
>Assignee: Tom Augspurger
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> pa.Table.from_arrays doesn't validate that the length of {{arrays}} and 
> {{names}} matches. I think this should raise with a {{ValueError}}:
> {code}
> In [1]: import pyarrow as pa
> In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], 
> names=['a', 'b', 'c'])
> Out[2]:
> pyarrow.Table
> a: int64
> b: int64
> In [3]: pa.__version__
> Out[3]: '0.7.0'
> {code}
> (This is my first time using JIRA, hopefully I didn't mess up too badly)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-09-20 Thread Young-Jun Ko (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172932#comment-16172932
 ] 

Young-Jun Ko commented on ARROW-1555:
-

I think the simplest way to fix this would be to just expose the fs functions 
implemented by `s3fs`, `exists` being one of them. I suppose that's what 
Florian had in mind.

Thanks guys for looking into this!


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)