Re: Symbol not found: _PyCObject_Type (MacOS El Capitan, Python 3.6)

2018-05-17 Thread Quang Vu
Hi Wes,

Thank you for your suggestion. Clearing out CMake temporary files, and
rebuilding them again, helps fixing the issue. I am so glad I have a
working dev environment now. Thanks again!

On Tue, May 15, 2018 at 9:45 PM Wes McKinney  wrote:

> hi Quang -- I recommend clearing out your CMake temporary files after
> making any conda environment changes. If you activate a different
> conda environment, CMake will not know to recompute variables related
> to Python's header files and libraries. So it might have been that you
> invoked CMake with Python 2 activated and later activated Python 3
>
> - Wes
>
> On Tue, May 15, 2018 at 5:15 AM, Quang Vu  wrote:
> > Yes Antoine, that happens when compiling Arrow under an activated conda
> > environment.
> > Thank you for all the info you are helping me with!
> >
> > Quang.
> >
> > On Mon, May 14, 2018 at 3:34 PM Antoine Pitrou 
> wrote:
> >
> >>
> >> To give a bit more insight: you should compile Arrow with your conda
> >> environment activated, so that it picks the right Python version (3.6.5,
> >> in your case).  If it's still picking the wrong Python version, that
> >> might be a bug.
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 14/05/2018 à 20:50, Quang Vu a écrit :
> >> > Thanks Antoine,
> >> >
> >> > I will need to learn more about the compiling process that happens on
> my
> >> > Mac, to see how that link to Python 2.
> >> >  I am not familiar with that process. But this is a good pointer for
> my
> >> > issue. Thank you for your response to my issue!
> >> >
> >> > Quang.
> >> >
> >> > On Mon, May 14, 2018 at 12:50 PM Antoine Pitrou 
> >> wrote:
> >> >
> >> >>
> >> >> Hi Quang,
> >> >>
> >> >> It sounds like you have compiled Arrow against a Python 2 install but
> >> >> are now trying to use it with Python 3.  This won't work, the same
> >> >> Python version must be used when compiling and when using PyArrow.
> >> >>
> >> >> ("PyCObject" is a Python 2-specific API that doesn't exist anymore in
> >> >> Python 3)
> >> >>
> >> >> Regards
> >> >>
> >> >> Antoine.
> >> >>
> >> >>
> >> >> Le 14/05/2018 à 18:34, Quang Vu a écrit :
> >> >>> Hi Arrow dev,
> >> >>>
> >> >>> I am having trouble with installing and setting my development
> >> >> environment
> >> >>> for Arrow. I wonder if anyone is familiar with the issue. My system
> >> info:
> >> >>> - MacOS 10.11.6 (El Capitan)
> >> >>> - conda 4.5.1
> >> >>> - python 3.6.5
> >> >>> - arrow's current commit: 4b8511
> >> >>>
> >> >>> Installing Arrow C++ libraries and Pacquet are both successful. But
> >> >>> importing `pyarrow` fail:
> >> >>>
> >> >>> $ python -c 'import pyarrow'
> >> >>>
> >> >>> Traceback (most recent call last):
> >> >>>   File "", line 1, in 
> >> >>>   File "/Users/myuser/code/arrow/python/pyarrow/__init__.py", line
> 47,
> >> in
> >> >>> 
> >> >>> from pyarrow.lib import cpu_count, set_cpu_count
> >> >>> ImportError: dlopen(/Users/myuser/code/arrow/python/pyarrow/
> >> >>> lib.cpython-36m-darwin.so, 2): Symbol not found: _PyCObject_Type
> >> >>>   Referenced from:
> >> >>>
> /Users/myuser/miniconda3/envs/pyarrow-test/lib/libarrow_python.10.dylib
> >> >>>   Expected in: flat namespace
> >> >>>  in
> >> >>
> /Users/myuser/miniconda3/envs/pyarrow-test/lib/libarrow_python.10.dylib
> >> >>>
> >> >>> If anyone have suggestion on what the problem is about, please let
> me
> >> >> know.
> >> >>> Thanks!
> >> >>>
> >> >>
> >> >
> >>
>


[jira] [Created] (ARROW-2612) [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY

2018-05-17 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2612:
-

 Summary: [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
 Key: ARROW-2612
 URL: https://issues.apache.org/jira/browse/ARROW-2612
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


The deprecated PLASMA_DEFAULT_RELEASE_DELAY is currently broken, since it 
refers to kDeprecatedPlasmaDefaultReleaseDelay without the plasma:: namespace 
qualifier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2611) [Python] Python 2 integer serialization

2018-05-17 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2611:
-

 Summary: [Python] Python 2 integer serialization
 Key: ARROW-2611
 URL: https://issues.apache.org/jira/browse/ARROW-2611
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.9.0
Reporter: Philipp Moritz


In Python 2, serializing a Python int with pyarrow.serialize and then 
deserializing it returns a {{long }}instead of an integer. Note that this is 
not an issue in python 3 where the long type does not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Language-independent and cross-language docs

2018-05-17 Thread Wes McKinney
+1 on setting up a top-level documentation project. I think that
establishing an information hierarchy to help people understand all
the layers of the project is more important than the choice of the
documentation tool -- for example, if we started with Sphinx and
decided to move later to something else, there are tools to exist with
converting between markup languages (though it would require some
manual fixes).

I'm sort of neutral on combining the current language-specific
documentation projects into a monolithic documentation project. My
prior for this would be that the top-level documentation should
consist of:

* High level overview of the Arrow project: components, languages, and vision
* Columnar specification documents (migrating the current Markdown
documents in format/) and other specification documents
* High level project roadmap and contributor guide
* Guides for maintainers / committers
* Getting started guide for each language

The top-level documentation could direct users to the
language-specific API and usage docs (i.e. like the current Python
Sphinx project)

I'm interested what people think about how to integrate this
statically-generated content with our current Jekyll-based website.
One could argue that all this top-level documentation could be handled
by Jekyll (or equivalent static site generator)

- Wes

On Thu, May 17, 2018 at 3:44 PM, Uwe L. Korn  wrote:
> Hello,
>
> I can second that we should move the documentation to a central one. As a C++ 
> and Python contributor at the same time it always hard to think of where you 
> should document a specific piece. We have a very small C++ documentation and 
> a bit larger Python one. For some features it would though make sense to have 
> them in both. IPC and in-process sharing is also a main part of the Arrow 
> project. Documenting this separately for each language will be a lot of work 
> and probably leave blind spots in each language.
>
> Not everything in each language ecosystem can be directly included in Sphinx 
> but as Sphinx is becoming a very broadly used documentation system, there are 
> many nice converters like Breeze [1] (Doxygen to Sphinx) available.
>
> To directly answer the questions:
>
> - Should we do this at all (i.e. build up a central documentation system)?
>
> Yes
>
> - Should we use Sphinx for it?
>
> Very much in favour. There is probably also a tendency that some people 
> prefer Markdown (I do) but given the feature set of Sphinx, I would very much 
> argue in favour of it.
>
>  - To which extent our current docs should be migrated to Sphinx (apart
>  from the Python docs, which already use Sphinx)?  For example, should
>  the specs (currently standalone pages written in Markdown) be migrated
>  to Sphinx for better cross-referencing and navigation?  What about the
>  C++ tutorial pages?  etc.
>
> I would migrate C++ documentation definitely fully into that but the C++ / 
> Python relation is very tight. There are a lot of topics that either touch 
> two languages or are general to the project, these should also go in there.
>
> - Should we preferably have a single Sphinx doctree, or several
>  independent per-topic / per-language doctrees?
>
> I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will 
> have many shared topics between the different implemenations so I would 
> expect that we should have a single documentation with well organized 
> sections.
>
> Also we probably will face the issue we have documentation on a specific 
> topic and only a small part is different between two 
> implementations/setups/... I really like the Scala/Python tabs in the Spark 
> docs [2]. There is a Sphinx extension that seems to something similar to this 
> [3]. This could either be used to have documentation on how to construct 
> things where one switches between Ruby and Python or the main issue where I 
> would need it: Setting up the build with slightly different package managers 
> (e.g. conda vs pip in Python).
>
> Uwe
>
> [1]: https://breathe.readthedocs.io/en/latest/
> [2]: 
> http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
> [3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html
>
>
> On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote:
>>
>> Hi,
>>
>> In the following PR discussion it was mentioned that we currently lack a
>> central documentation system for cross-language topics:
>> https://github.com/apache/arrow/pull/1575#issuecomment-364062240
>>
>> Sphinx looks like a reasonable contender for that purpose.  For that who
>> don't know it, Sphinx is a documentation system initially developed for
>> the Python language, which quickly became widely-used amongst Python
>> projects, and is now being used by non-Python projects as well.  For
>> example, the LLVM docs (https://llvm.org/docs/) and even the Linux
>> kernel online docs are now written using Sphinx
>> 

[jira] [Created] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2610:
--

 Summary: [Java/Python] Add support for dictionary type to 
pyarrow.Field.from_jvm
 Key: ARROW-2610
 URL: https://issues.apache.org/jira/browse/ARROW-2610
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Uwe L. Korn


The DictionaryType is a bit more complex as it also references the dictionary 
values itself. This also needs to be integrated into {{pyarrow.Field.from_jvm}} 
but the work to make DictionaryType working maybe also depends on that 
{{pyarrow.Array.from_jvm}} first supports non-primitive arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2609:
--

 Summary: [Java/Python] Complex type conversion in 
pyarrow.Field.from_jvm
 Key: ARROW-2609
 URL: https://issues.apache.org/jira/browse/ARROW-2609
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


The converter {{pyarrow.Field.from_jvm}} currently only works for primitive 
types. Types like List, Struct or Union that have children in their definition 
are not supported. We should add the needed recursion for these types and 
enable the respective tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2607:
--

 Summary: [Java/Python] Support VarCharVector / StringArray in 
pyarrow.Array.from_jvm
 Key: ARROW-2607
 URL: https://issues.apache.org/jira/browse/ARROW-2607
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently 
only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses 
{{pyarrow.Array.from_buffers}} underneath. We should extend one of the two 
functions to be able to deal with string arrays. There is a currently failing 
unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to verify 
the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2606) [Java/Python]  Add unit test for pyarrow.decimal128 in Array.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2606:
--

 Summary: [Java/Python]  Add unit test for pyarrow.decimal128 in 
Array.from_jvm
 Key: ARROW-2606
 URL: https://issues.apache.org/jira/browse/ARROW-2606
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


Follow-up after https://issues.apache.org/jira/browse/ARROW-2249. We need to 
find the correct code to construct Java decimals and fill them into a 
{{DecimalVector}}. Afterwards, we should activate the decimal128 type on 
{{test_jvm_array}} and ensure that we load them correctly from Java into Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2605:
--

 Summary: [Java/Python] Add unit test for pyarrow.timeX types in 
Array.from_jvm
 Key: ARROW-2605
 URL: https://issues.apache.org/jira/browse/ARROW-2605
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are 
missing the necessary methods to construct these arrays conveniently on the 
Python side.

Once there is a path to construct {{pyarrow.Array}} instances from a Python 
list of {{datetime.time}} for the various time types, we should activate the 
time types on {{test_jvm_array}} and ensure that we load them correctly from 
Java into Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2604) [Java] Add method overload for VarCharVector.set(int,String)

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2604:
--

 Summary: [Java] Add method overload for 
VarCharVector.set(int,String)
 Key: ARROW-2604
 URL: https://issues.apache.org/jira/browse/ARROW-2604
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors
Reporter: Uwe L. Korn
 Fix For: 0.10.0


I would have expected that this is a very typical use case but at the moment I 
only see code that first fills a {{VarCharHolder}}. We could also provide this 
as a convenience overload.

Correct me please if I missed a convenience feature. I'm still new to the Java 
side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Language-independent and cross-language docs

2018-05-17 Thread Uwe L. Korn
Hello,

I can second that we should move the documentation to a central one. As a C++ 
and Python contributor at the same time it always hard to think of where you 
should document a specific piece. We have a very small C++ documentation and a 
bit larger Python one. For some features it would though make sense to have 
them in both. IPC and in-process sharing is also a main part of the Arrow 
project. Documenting this separately for each language will be a lot of work 
and probably leave blind spots in each language.

Not everything in each language ecosystem can be directly included in Sphinx 
but as Sphinx is becoming a very broadly used documentation system, there are 
many nice converters like Breeze [1] (Doxygen to Sphinx) available.

To directly answer the questions:

- Should we do this at all (i.e. build up a central documentation system)?

Yes
 
- Should we use Sphinx for it?

Very much in favour. There is probably also a tendency that some people prefer 
Markdown (I do) but given the feature set of Sphinx, I would very much argue in 
favour of it.

 - To which extent our current docs should be migrated to Sphinx (apart
 from the Python docs, which already use Sphinx)?  For example, should
 the specs (currently standalone pages written in Markdown) be migrated
 to Sphinx for better cross-referencing and navigation?  What about the
 C++ tutorial pages?  etc.

I would migrate C++ documentation definitely fully into that but the C++ / 
Python relation is very tight. There are a lot of topics that either touch two 
languages or are general to the project, these should also go in there.
 
- Should we preferably have a single Sphinx doctree, or several
 independent per-topic / per-language doctrees?

I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will 
have many shared topics between the different implemenations so I would expect 
that we should have a single documentation with well organized sections.

Also we probably will face the issue we have documentation on a specific topic 
and only a small part is different between two implementations/setups/... I 
really like the Scala/Python tabs in the Spark docs [2]. There is a Sphinx 
extension that seems to something similar to this [3]. This could either be 
used to have documentation on how to construct things where one switches 
between Ruby and Python or the main issue where I would need it: Setting up the 
build with slightly different package managers (e.g. conda vs pip in Python).

Uwe

[1]: https://breathe.readthedocs.io/en/latest/
[2]: 
http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
[3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html


On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote:
> 
> Hi,
> 
> In the following PR discussion it was mentioned that we currently lack a
> central documentation system for cross-language topics:
> https://github.com/apache/arrow/pull/1575#issuecomment-364062240
> 
> Sphinx looks like a reasonable contender for that purpose.  For that who
> don't know it, Sphinx is a documentation system initially developed for
> the Python language, which quickly became widely-used amongst Python
> projects, and is now being used by non-Python projects as well.  For
> example, the LLVM docs (https://llvm.org/docs/) and even the Linux
> kernel online docs are now written using Sphinx
> (https://www.kernel.org/doc/html/latest/index.html).
> 
> Sphinx uses reStructuredText (a.k.a "reST") as its basic markup
> language, but with many extensions.  It allows for structured
> documentation with extensive cross-referencing (even between independent
> Sphinx sites, using the "intersphinx" extension).
> 
> The questions here are:
> 
> - Should we do this at all (i.e. build up a central documentation system)?
> 
> - Should we use Sphinx for it?
> 
> - To which extent our current docs should be migrated to Sphinx (apart
> from the Python docs, which already use Sphinx)?  For example, should
> the specs (currently standalone pages written in Markdown) be migrated
> to Sphinx for better cross-referencing and navigation?  What about the
> C++ tutorial pages?  etc.
> 
> - Should we preferably have a single Sphinx doctree, or several
> independent per-topic / per-language doctrees?
> 
> Regards
> 
> Antoine.


New Arrow PMC Member: Siddharth Teotia

2018-05-17 Thread Wes McKinney
The Project Management Committee (PMC) for Apache Arrow has invited
Siddharth Teotia to become a PMC member and we are pleased to announce
that he has accepted.

Congratulations and welcome, Sidd!


Re: Arrow 1319 [Python] Add additional HDFS filesystem methods

2018-05-17 Thread Wes McKinney
hi Alex,

Yes, please feel free to break this into multiple PRs as each new
filesystem method may require a number of unit tests, and these may be
easier to review in smaller batches.

cheers
Wes

On Mon, May 14, 2018 at 8:44 AM, Alex Hagerman  wrote:
> Hello,
>
> I was reviewing tickets to work on during the sprint days at PyCon and came 
> across 1319.
>
> https://issues.apache.org/jira/browse/ARROW-1319
>
> I was going to pick this up and see what I could do with it. I read the 
> history and wanted to check if there has been any changes that might impact 
> the ticket since the last update in December 2017?
>
> Also would it be ok to break this into multiple PRs? I would like to be able 
> to get some feedback as I add the first few filesystem methods.
>
> Thanks,
> Alex
>


[CI] Rust/C++/Python/Cython coverage published

2018-05-17 Thread Antoine Pitrou

Hi,

As a heads-up, Travis-CI runs now generate code coverage data for the
aforementioned languages (after running the relevant test suite(s)),
and upload it to CodeCov.  You can find an example report here:

https://codecov.io/gh/apache/arrow/list/455318556339ca492d3c02e6c6a297865f647bf7/

Regards

Antoine.


[jira] [Created] (ARROW-2603) [Python] from pandas raises ArrowInvalid for date(time) subclasses

2018-05-17 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-2603:
-

 Summary: [Python] from pandas raises ArrowInvalid for date(time) 
subclasses
 Key: ARROW-2603
 URL: https://issues.apache.org/jira/browse/ARROW-2603
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Florian Jetter
Assignee: Florian Jetter


When converting a pandas dataframe holding subclasses of date/datetime objects, 
arrow raises an {{ArrowInvalid}} exception
{code:java}
import pandas as pd
import pyarrow as pa
import datetime

classMyDate(datetime.date):
pass

date_array = [MyDate(2000, 1, 1)]
df = pd.DataFrame({"date": pd.Series(date_array, dtype=object)})

table = pa.Table.from_pandas(df){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2602) [C++/Python] Automate build of development docker container

2018-05-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2602:
--

 Summary: [C++/Python] Automate build of development docker 
container
 Key: ARROW-2602
 URL: https://issues.apache.org/jira/browse/ARROW-2602
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Python
Reporter: Uwe L. Korn
 Fix For: 0.10.0


With 
[https://github.com/apache/arrow/pull/2016|https://github.com/apache/arrow/pull/2016#pullrequestreview-121047089]
 we provide a convenience docker container so that one can develop Arrow but 
does not directly run into the hassles of setting up the development on chain 
his machine.

The current base image is not build automatically as we are waiting for input 
from INFRA on https://issues.apache.org/jira/browse/INFRA-16533

Once we know how to upload continously to docker hub, we should move the 
Dockerfile appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2601) [Python] MemoryPool bytes_allocated causes seg

2018-05-17 Thread Alex Hagerman (JIRA)
Alex Hagerman created ARROW-2601:


 Summary: [Python] MemoryPool bytes_allocated causes seg
 Key: ARROW-2601
 URL: https://issues.apache.org/jira/browse/ARROW-2601
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Alex Hagerman
 Fix For: 0.10.0


Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 18:21:58) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.


>>> import pyarrow as pa

>>> mp = pa.MemoryPool()
>>> arr = pa.array([1,2,3], memory_pool=mp)
>>> mp.bytes_allocated()

Segmentation fault (core dumped)

I'll dig into this further, but should bytes_alloacted be returning anything 
when called like this? Or should it raise NotImplemented?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2600) [Python] Add additional LocalFileSystem filesystem methods

2018-05-17 Thread Alex Hagerman (JIRA)
Alex Hagerman created ARROW-2600:


 Summary: [Python] Add additional LocalFileSystem filesystem methods
 Key: ARROW-2600
 URL: https://issues.apache.org/jira/browse/ARROW-2600
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Alex Hagerman
Assignee: Alex Hagerman
 Fix For: 0.10.0


Related to https://issues.apache.org/jira/browse/ARROW-1319 I noticed the 
methods Martin listed are also not part of the LocalFileSystem class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2599) pip install on ARM fails

2018-05-17 Thread Dominykas Mostauskis (JIRA)
Dominykas Mostauskis created ARROW-2599:
---

 Summary: pip install on ARM fails
 Key: ARROW-2599
 URL: https://issues.apache.org/jira/browse/ARROW-2599
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
 Environment: Arch ARM Linux
pip 10.0.1
Python 3.6.5
Reporter: Dominykas Mostauskis


Trying to install pyarrow with pip on ARM fails with `{{Could not find the 
Arrow library. Looked for headers in , and for libs in}}`:

 

{{$ pip install pyarrow --no-build-isolation --user}}

{{[omitted]}}

{{Thread model: posix}}
{{ gcc version 8.1.0 (GCC)}}{{INFOCompiler id: GNU}}
{{ Selected compiler gcc 8.1.0}}
{{ -- Performing Test CXX_SUPPORTS_SSE3}}
{{ -- Performing Test CXX_SUPPORTS_SSE3 - Failed}}
{{ -- Performing Test CXX_SUPPORTS_ALTIVEC}}
{{ -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed}}
{{ Configured for DEBUG build (set with cmake 
-DCMAKE_BUILD_TYPE={release,debug}}
{{,...})}}
{{ -- Build Type: DEBUG}}
{{ -- Build output directory: 
/tmp/pip-install-auk894mc/pyarrow/build/temp.linu}}
{{x-armv7l-3.6/debug/}}
{{ -- Found PythonInterp: /usr/bin/python (found version "3.6.5")}}
{{ -- Searching for Python libs in 
/usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
{{linux-gnueabihf}}
{{ -- Looking for python3.6m}}
{{ -- Found Python lib /usr/lib/libpython3.6m.so}}
{{ -- Found PythonLibs: /usr/lib/libpython3.6m.so}}
{{ -- Found NumPy: version "1.14.3" 
/home/domas/.local/lib/python3.6/site-packa}}
{{ges/numpy/core/include}}
{{ -- Searching for Python libs in 
/usr/lib;/usr/lib/python3.6/config-3.6m-arm-}}
{{linux-gnueabihf}}
{{ -- Looking for python3.6m}}
{{ -- Found Python lib /usr/lib/libpython3.6m.so}}
{{ -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")}}
{{ -- Checking for module 'arrow'}}
{{ -- No package 'arrow' found}}
{{ CMake Error at cmake_modules/FindArrow.cmake:130 (message):}}
{{ Could not find the Arrow library. Looked for headers in , and for libs in}}
{{ Call Stack (most recent call first):}}
{{ CMakeLists.txt:197 (find_package)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2598) [Python] table.to_pandas segfault

2018-05-17 Thread jacques (JIRA)
jacques created ARROW-2598:
--

 Summary: [Python]  table.to_pandas segfault
 Key: ARROW-2598
 URL: https://issues.apache.org/jira/browse/ARROW-2598
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: jacques


Here is a small snippet which produce a segfault:

{noformat}

In [1]: import pyarrow as pa

In [2]: import pyarrow.parquet as pq

In [3]: pa_ar = pa.array([[], []])

In [4]: pq.write_table(
   ...: table=pa.Table.from_arrays([pa_ar],["test"]),
   ...: where="test5.parquet",
   ...: compression="snappy",
   ...: flavor="spark"
   ...: )

In [5]: pq.read_table("test5.parquet")
Out[5]: 
pyarrow.Table
test: list
  child 0, item: null

In [6]: pq.read_table("test5.parquet").to_pydict()
Out[6]: OrderedDict([(u'test', [None, None])])

In [7]: pq.read_table("test5.parquet").to_pandas()
Segmentation fault

 

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2597) [Plasma] remove UniqueIDHasher

2018-05-17 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-2597:


 Summary: [Plasma] remove UniqueIDHasher
 Key: ARROW-2597
 URL: https://issues.apache.org/jira/browse/ARROW-2597
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Reporter: Zhijun Fu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2596) [GLib] Use the default value of GTK-Doc

2018-05-17 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2596:
---

 Summary: [GLib] Use the default value of GTK-Doc
 Key: ARROW-2596
 URL: https://issues.apache.org/jira/browse/ARROW-2596
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Affects Versions: 0.9.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)