[jira] [Created] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR
Paddy Horan created ARROW-2516: -- Summary: AppVeyor Build Matrix should be specific to the changes made in a PR Key: ARROW-2516 URL: https://issues.apache.org/jira/browse/ARROW-2516 Project: Apache Arrow Issue Type: Bug Reporter: Paddy Horan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2515) Errors with DictionaryArray inside of ListArray or other DictionaryArray
Brent Kerby created ARROW-2515: -- Summary: Errors with DictionaryArray inside of ListArray or other DictionaryArray Key: ARROW-2515 URL: https://issues.apache.org/jira/browse/ARROW-2515 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Brent Kerby An exception ("KeyError: 26") is raised when .as_py() is called on elements of a ListArray over a DictionaryArray, or of a DictionaryArray with values in a DictionaryArray. Here are a couple tests that currently fail: {code:java} import pyarrow as pa def test_dictionary_array_1(): dict_arr = pa.DictionaryArray.from_arrays([0, 1, 0], ['a', 'b']) list_arr = pa.ListArray.from_arrays([0, 2, 3], dict_arr) assert list_arr.to_pylist() == [['a', 'b'], ['a']] def test_dictionary_array_2(): dict_arr = pa.DictionaryArray.from_arrays([0, 1, 0], ['a', 'b']) dict_arr2 = pa.DictionaryArray.from_arrays([0, 1, 2, 1, 0], dict_arr) assert dict_arr2.to_pylist() == ['a', 'b', 'a', 'b', 'a'] {code} It appears that the problem is caused by the fact that the function box_scalar in scalar.pxi does not handle the case of dictionary array, as we currently have no DictionaryValue type. DictionaryArray.__getitem__ currently works around the lack of DictionaryValue type by dereferencing the index and constructs a scalar based on the value in the underlying dictionary. In other words, if we have a dictionary with int8 indices and string values, then the result of __getitem__ will be a StringValue (rather than a DictionaryValue). This works in simple cases but not in the more complex scenarios illustrated above. I have a patch ready, which would add a DictionaryValue type similar to other scalar types, resolving these bugs and removing the need for a special-cased implementation of DictionaryArray.__getitem__. This DictionaryValue would contain a couple accessor properties, "indices_value" and "dictionary_value" to allow access to both the index in the dictionary as well as the looked-up value. Then DictionaryValue.as_py() would simply call .as_py() on the underlying dictionary_value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2514) [Python] Inferring / converting nested Numpy array is very slow
Antoine Pitrou created ARROW-2514: - Summary: [Python] Inferring / converting nested Numpy array is very slow Key: ARROW-2514 URL: https://issues.apache.org/jira/browse/ARROW-2514 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Antoine Pitrou Converting a nested Numpy array nested walks over the Numpy data as Python objects, even if the dtype is not "object". This makes it pointlessly slow compared to the non-nested case, and even the nested Python list case: {code:python} >>> %%timeit data = list(range(1)) ...:pa.array(data) ...: 746 µs ± 8.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) >>> %%timeit data = np.arange(1) ...:pa.array(data) ...: 81.1 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 1 loops each) >>> %%timeit data = [np.arange(1)] ...:pa.array(data) ...: 3.39 ms ± 6.27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Changing Github activity in JIRA
Looks much better now, thank you :-) On Mon, 23 Apr 2018 11:58:10 -0700 Bryan Cutlerwrote: > They can do it for us, I filed > https://issues.apache.org/jira/browse/INFRA-16426 > > On Mon, Apr 23, 2018 at 11:43 AM, Wes McKinney wrote: > > > That sounds like an INFRA-level thing, I'm sure they'll tell us if not > > > > On Mon, Apr 23, 2018 at 2:40 PM, Bryan Cutler wrote: > > > Yeah, I understand that we don't want to rely on a third party to store > > all > > > of our code discussions. I took a look at a Beam issue and it had the > > > Github discussion under "Work Log", so it does seem possible to do that. > > > I'll send a question to INFRA about it, but is the configuration for this > > > controlled by Arrow somewhere? > > > > > > On Fri, Apr 20, 2018 at 9:44 AM, Wes McKinney > > wrote: > > > > > >> hi Bryan, > > >> > > >> We definitely need to persist the GitHub activity on JIRA or a mailing > > >> list somewhere because stuff on GitHub is not permanent and can be > > >> deleted (e.g. comments or code reviews can be deleted). We should > > >> inquire if there's a way to separate it from regular comments on JIRA > > >> to make it easier for discussions on JIRA > > >> > > >> As for the e-mails, it's easy enough to filter out the > > >> automatically-generated ones by ASF GitHub Bot if you don't want to > > >> see them > > >> > > >> - Wes > > >> > > >> On Fri, Apr 20, 2018 at 12:39 PM, Antoine Pitrou > > >> wrote: > > >> > > > >> > Hi, > > >> > > > >> > I agree with this. Not receiving e-mail notifications for those would > > >> > be nice as well (since I typically already receive e-mail > > notifications > > >> > from Github for the same activity). > > >> > > > >> > Regards > > >> > > > >> > Antoine. > > >> > > > >> > > > >> > Le 20/04/2018 à 18:37, Bryan Cutler a écrit : > > >> >> Hi All, > > >> >> > > >> >> I was just wondering if it was possible to move the Github activity > > for > > >> a > > >> >> PR into a different tab in the JIRA, like "Work Log?" Or maybe just > > >> stop > > >> >> posting it all together since the PR link is there? It is usually a > > >> ton of > > >> >> text and makes it hard to have a discussion in the JIRA or go back > > and > > >> try > > >> >> to look at certain comments. > > >> >> > > >> >> Thanks, > > >> >> Bryan > > >> >> > > >> > > >
[jira] [Created] (ARROW-2513) [Python] DictionaryType should give access to index type and dictionary array
Marco Neumann created ARROW-2513: Summary: [Python] DictionaryType should give access to index type and dictionary array Key: ARROW-2513 URL: https://issues.apache.org/jira/browse/ARROW-2513 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.9.0 Reporter: Marco Neumann Currently, only {{ordered}} is mapped from C Type to Python, but index type and dictionary array are not accessible from Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)