[jira] [Created] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR

2018-04-26 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2516:
--

 Summary: AppVeyor Build Matrix should be specific to the changes 
made in a PR
 Key: ARROW-2516
 URL: https://issues.apache.org/jira/browse/ARROW-2516
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2515) Errors with DictionaryArray inside of ListArray or other DictionaryArray

2018-04-26 Thread Brent Kerby (JIRA)
Brent Kerby created ARROW-2515:
--

 Summary: Errors with DictionaryArray inside of ListArray or other 
DictionaryArray
 Key: ARROW-2515
 URL: https://issues.apache.org/jira/browse/ARROW-2515
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Brent Kerby


An exception ("KeyError: 26") is raised when .as_py() is called on elements of 
a ListArray over a DictionaryArray, or of a DictionaryArray with values in a 
DictionaryArray. Here are a couple tests that currently fail:

 
{code:java}
import pyarrow as pa

def test_dictionary_array_1():
dict_arr = pa.DictionaryArray.from_arrays([0, 1, 0], ['a', 'b'])
list_arr = pa.ListArray.from_arrays([0, 2, 3], dict_arr)
assert list_arr.to_pylist() == [['a', 'b'], ['a']]

def test_dictionary_array_2():
dict_arr = pa.DictionaryArray.from_arrays([0, 1, 0], ['a', 'b'])
dict_arr2 = pa.DictionaryArray.from_arrays([0, 1, 2, 1, 0], dict_arr)
assert dict_arr2.to_pylist() == ['a', 'b', 'a', 'b', 'a']
{code}
It appears that the problem is caused by the fact that the function box_scalar 
in scalar.pxi does not handle the case of dictionary array, as we currently 
have no DictionaryValue type. 

 

DictionaryArray.__getitem__ currently works around the lack of DictionaryValue 
type by dereferencing the index and constructs a scalar based on the value in 
the underlying dictionary. In other words, if we have a dictionary with int8 
indices and string values, then the result of __getitem__ will be a StringValue 
(rather than a DictionaryValue). This works in simple cases but not in the more 
complex scenarios illustrated above.

I have a patch ready, which would add a DictionaryValue type similar to other 
scalar types, resolving these bugs and removing the need for a special-cased 
implementation of DictionaryArray.__getitem__. This DictionaryValue would 
contain a couple accessor properties, "indices_value" and "dictionary_value" to 
allow access to both the index in the dictionary as well as the looked-up 
value. Then DictionaryValue.as_py() would simply call .as_py() on the 
underlying dictionary_value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2514) [Python] Inferring / converting nested Numpy array is very slow

2018-04-26 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2514:
-

 Summary: [Python] Inferring / converting nested Numpy array is 
very slow
 Key: ARROW-2514
 URL: https://issues.apache.org/jira/browse/ARROW-2514
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Converting a nested Numpy array nested walks over the Numpy data as Python 
objects, even if the dtype is not "object". This makes it pointlessly slow 
compared to the non-nested case, and even the nested Python list case:

{code:python}
>>> %%timeit data = list(range(1))
...:pa.array(data)
...:
746 µs ± 8.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %%timeit data = np.arange(1)
...:pa.array(data)
...:
81.1 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1 loops each)
>>> %%timeit data = [np.arange(1)]
...:pa.array(data)
...:
3.39 ms ± 6.27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Changing Github activity in JIRA

2018-04-26 Thread Antoine Pitrou


Looks much better now, thank you :-)


On Mon, 23 Apr 2018 11:58:10 -0700
Bryan Cutler  wrote:
> They can do it for us, I filed
> https://issues.apache.org/jira/browse/INFRA-16426
> 
> On Mon, Apr 23, 2018 at 11:43 AM, Wes McKinney  wrote:
> 
> > That sounds like an INFRA-level thing, I'm sure they'll tell us if not
> >
> > On Mon, Apr 23, 2018 at 2:40 PM, Bryan Cutler  wrote:  
> > > Yeah, I understand that we don't want to rely on a third party to store  
> > all  
> > > of our code discussions.  I took a look at a Beam issue and it had the
> > > Github discussion under "Work Log", so it does seem possible to do that.
> > > I'll send a question to INFRA about it, but is the configuration for this
> > > controlled by Arrow somewhere?
> > >
> > > On Fri, Apr 20, 2018 at 9:44 AM, Wes McKinney   
> > wrote:  
> > >  
> > >> hi Bryan,
> > >>
> > >> We definitely need to persist the GitHub activity on JIRA or a mailing
> > >> list somewhere because stuff on GitHub is not permanent and can be
> > >> deleted (e.g. comments or code reviews can be deleted). We should
> > >> inquire if there's a way to separate it from regular comments on JIRA
> > >> to make it easier for discussions on JIRA
> > >>
> > >> As for the e-mails, it's easy enough to filter out the
> > >> automatically-generated ones by ASF GitHub Bot if you don't want to
> > >> see them
> > >>
> > >> - Wes
> > >>
> > >> On Fri, Apr 20, 2018 at 12:39 PM, Antoine Pitrou 
> > >> wrote:  
> > >> >
> > >> > Hi,
> > >> >
> > >> > I agree with this.  Not receiving e-mail notifications for those would
> > >> > be nice as well (since I typically already receive e-mail  
> > notifications  
> > >> > from Github for the same activity).
> > >> >
> > >> > Regards
> > >> >
> > >> > Antoine.
> > >> >
> > >> >
> > >> > Le 20/04/2018 à 18:37, Bryan Cutler a écrit :  
> > >> >> Hi All,
> > >> >>
> > >> >> I was just wondering if it was possible to move the Github activity  
> > for  
> > >> a  
> > >> >> PR into a different tab in the JIRA, like "Work Log?"  Or maybe just  
> > >> stop  
> > >> >> posting it all together since the PR link is there?  It is usually a  
> > >> ton of  
> > >> >> text and makes it hard to have a discussion in the JIRA or go back  
> > and  
> > >> try  
> > >> >> to look at certain comments.
> > >> >>
> > >> >> Thanks,
> > >> >> Bryan
> > >> >>  
> > >>  
> >  
> 



[jira] [Created] (ARROW-2513) [Python] DictionaryType should give access to index type and dictionary array

2018-04-26 Thread Marco Neumann (JIRA)
Marco Neumann created ARROW-2513:


 Summary: [Python] DictionaryType should give access to index type 
and dictionary array
 Key: ARROW-2513
 URL: https://issues.apache.org/jira/browse/ARROW-2513
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.9.0
Reporter: Marco Neumann


Currently, only {{ordered}} is mapped from C Type to Python, but index type and 
dictionary array are not accessible from Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)