[jira] [Created] (ARROW-5660) [GLib][CI] Use the latest macOS image and all Homebrew based libraries
Sutou Kouhei created ARROW-5660: --- Summary: [GLib][CI] Use the latest macOS image and all Homebrew based libraries Key: ARROW-5660 URL: https://issues.apache.org/jira/browse/ARROW-5660 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration, GLib Reporter: Sutou Kouhei Assignee: Sutou Kouhei -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5659) [C++] Add support for finding OpenSSL installed by Homebrew
Sutou Kouhei created ARROW-5659: --- Summary: [C++] Add support for finding OpenSSL installed by Homebrew Key: ARROW-5659 URL: https://issues.apache.org/jira/browse/ARROW-5659 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Sutou Kouhei Assignee: Sutou Kouhei -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5658) [JAVA]apache arrow-flight cannot send listvector
luckily created ARROW-5658: -- Summary: [JAVA]apache arrow-flight cannot send listvector Key: ARROW-5658 URL: https://issues.apache.org/jira/browse/ARROW-5658 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 0.13.0 Environment: java8 arrow-java 0.13.0 Reporter: luckily I can't transfer using apache arrow-flihgt. Contains listvector data. The problem description is as follows: {quote} # I parse an xml file and convert it to an arrow format and finally convert it to a parquet data format. The address of the .xml file data is url [http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)] # I created a schema that uses listvector. code show as below: List list = childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator)); VectorSchemaRoot root = VectorSchemaRoot.of(inVector) # Parse the xml file to get the list data in "cd". Use api use listvector. `ListVector listVector = (ListVector) valueVectors; List columns = column.getColumns(); Column column1 = columns.get(0); String name = column1.getId().toString(); UnionListWriter writer = listVector.getWriter(); Writer.allocate(); For (int j = 0; j < column1.getColumns().size();j++) { writer.setPosition(j); writer.startList(); Writer.list().startList(); Column column2 = column1.getColumns().get(j); List> lst = (List>) ((Map) val).get(name); For (int k = 0; k < lst.size(); k++) { Map stringStringMap = lst.get(k); String value = stringStringMap.get(column2.getId().toString()); Switch (column2.getType()) { Case FLOAT: Writer.list().float4().writeFloat4(stringConvertFloat(value)); Break; Case BOOLEAN: Writer.list().bit().writeBit(stringConvertBoolean(value)); Break; Case DECIMAL: Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale())); Break; Case TIMESTAMP: Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString())); Break; Case INTEGER: Case BIGINT: Writer.list().bigInt().writeBigInt(stringConvertLong(value)); Break; Case VARCHAR: VarCharHolder varBinaryHolder = new VarCharHolder(); varBinaryHolder.start = 0; Byte[] bytes =value.getBytes(); ArrowBuf buffer = listVector.getAllocator().buffer(bytes.length); varBinaryHolder.buffer = buffer; buffer.writeBytes(bytes); varBinaryHolder.end=bytes.length; Writer.list().varChar().write(varBinaryHolder); Break; Default: Throw new IllegalArgumentException(" error no type !!"); } } Writer.list().endList(); writer.endList(); }` 4. After the write is complete, I will send to the arrow-flight server. server code : {quote} {quote}@Override public Callable acceptPut(FlightStream flightStream) { return () -> { try (VectorSchemaRoot root = flightStream.getRoot()) { while (flightStream.next()) { VectorSchemaRoot other = null; try { logger.info(" Receive message .. size: " + root.getRowCount()); other = copyRoot(root); ArrowMessage arrowMessage = new ArrowMessage(other, other.getSchema()); spmc.offer(arrowMessage); } catch (Exception e) { logger.error(e.getMessage(), e); } } } return Flight.PutResult.parseFrom("ok".getBytes()); }; }{quote} {quote} But the server did not receive any information.!! it is error .{quote} {quote}client code :{quote} {quote}root = message.getRoot(); //client.close(); FlightClient.ClientStreamListener listener = client.startPut(FlightDescriptor.path(message.getFilename()), root); listener.putNext(); listener.completed(); client.close(); listener.putNext(); listener.completed(); Flight.PutResult result = listener.getResult(); String s = new
[jira] [Created] (ARROW-5656) [Python] Enable Flight wheels on macOS
Wes McKinney created ARROW-5656: --- Summary: [Python] Enable Flight wheels on macOS Key: ARROW-5656 URL: https://issues.apache.org/jira/browse/ARROW-5656 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 0.14.0 Follow up to ARROW-3150 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5655) [Python] Table.from_pydict/from_arrays not using types in specified schema correctly
Joris Van den Bossche created ARROW-5655: Summary: [Python] Table.from_pydict/from_arrays not using types in specified schema correctly Key: ARROW-5655 URL: https://issues.apache.org/jira/browse/ARROW-5655 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Joris Van den Bossche Example with {{from_pydict}} (from https://github.com/apache/arrow/pull/4601#issuecomment-503676534): {code:python} In [15]: table = pa.Table.from_pydict( ...: {'a': [1, 2, 3], 'b': [3, 4, 5]}, ...: schema=pa.schema([('a', pa.int64()), ('c', pa.int32())])) In [16]: table Out[16]: pyarrow.Table a: int64 c: int32 In [17]: table.to_pandas() Out[17]: a c 0 1 3 1 2 0 2 3 4 {code} Note that the specified schema has 1) different column names and 2) has a non-default type (int32 vs int64) which leads to corrupted values. This is partly due to {{Table.from_pydict}} not using the type information in the schema to convert the dictionary items to pyarrow arrays. But then it is also {{Table.from_arrays}} that is not correctly casting the arrays to another dtype if the schema specifies as such. Additional question for {{Table.pydict}} is whether it actually should override the 'b' key from the dictionary as column 'c' as defined in the schema (this behaviour depends on the order of the dictionary, which is not guaranteed below python 3.6). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5654) [C++] ChunkedArray should validate the types of the arrays
Joris Van den Bossche created ARROW-5654: Summary: [C++] ChunkedArray should validate the types of the arrays Key: ARROW-5654 URL: https://issues.apache.org/jira/browse/ARROW-5654 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Joris Van den Bossche Fix For: 1.0.0 Example from Python, showing that you can currently create a ChunkedArray with incompatible types: {code:python} In [8]: a1 = pa.array([1, 2]) In [9]: a2 = pa.array(['a', 'b']) In [10]: pa.chunked_array([a1, a2]) Out[10]: [ [ 1, 2 ], [ "a", "b" ] ] {code} So a {{ChunkedArray::Validate}} can be implemented (and which should probably be called by default upon creation?) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5653) [CI] Fix cpp docker image
Francois Saint-Jacques created ARROW-5653: - Summary: [CI] Fix cpp docker image Key: ARROW-5653 URL: https://issues.apache.org/jira/browse/ARROW-5653 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques {code:shell} make -f Makefile.docker run-cpp ... 54/64 Test #79: arrow-dataset-file_test ***Failed0.04 sec Running arrow-dataset-file_test, redirecting output into /build/cpp/build/test-logs/arrow-dataset-file_test.txt (attempt 1/1) /build/cpp/debug/arrow-dataset-file_test: error while loading shared libraries: libbrotlienc.so.1: cannot open shared object file: No such file or directory /build/cpp/src/arrow/dataset Start 80: arrow-flight-test 55/64 Test #80: arrow-flight-test ..***Failed0.04 sec Running arrow-flight-test, redirecting output into /build/cpp/build/test-logs/arrow-flight-test.txt (attempt 1/1) /build/cpp/debug/arrow-flight-t {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Failures in Arrow tests run outside of continuous integration
The last Spark Integration test timed out. Is there any way to increase the timeout for Crossbow tests? On Wed, Jun 19, 2019 at 10:05 AM Wes McKinney wrote: > hi folks, > > We have a number of integration tests and extra-CI build > configurations that are run via Crossbow, and many of them are > failing: > > > https://github.com/ursa-labs/crossbow/branches/all?page=1=nightly-480=%E2%9C%93 > > At some point we would like to notify the dev@ list about these > failures when they happen (or in a nightly e-mail), but in the > meantime some of these probably should be fixed (e.g. the Spark > integration build is failing) before the project can release again > > Thanks > Wes >
Failures in Arrow tests run outside of continuous integration
hi folks, We have a number of integration tests and extra-CI build configurations that are run via Crossbow, and many of them are failing: https://github.com/ursa-labs/crossbow/branches/all?page=1=nightly-480=%E2%9C%93 At some point we would like to notify the dev@ list about these failures when they happen (or in a nightly e-mail), but in the meantime some of these probably should be fixed (e.g. the Spark integration build is failing) before the project can release again Thanks Wes
[jira] [Created] (ARROW-5652) [CI] Fix iwyu docker image
Francois Saint-Jacques created ARROW-5652: - Summary: [CI] Fix iwyu docker image Key: ARROW-5652 URL: https://issues.apache.org/jira/browse/ARROW-5652 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques See [https://travis-ci.org/ursa-labs/crossbow/builds/547691665?utm_source=github_status_medium=notification] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5651) [Python] Incorrect conversion from strided Numpy array when other type is specified
Fabian Höring created ARROW-5651: Summary: [Python] Incorrect conversion from strided Numpy array when other type is specified Key: ARROW-5651 URL: https://issues.apache.org/jira/browse/ARROW-5651 Project: Apache Arrow Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Fabian Höring In the example below the pyarrow array gives wrong results for strided numpy arrays: {code} >> import pyarrow as pa >> import numpy as np >> p_s = pd.Series(np.arange(0, 10, dtype=np.float32)[1:-1:2]) >> pa.array(p_s, type=pa.float64()) [ 1, 2, 3, 4 ] {code} When copying the numpy array to a new location is gives the expected output: {code} >> import pyarrow as pa >> import numpy as np >> import pandas as pd >> p_s = pd.Series(np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2])) >> pa.array(p_s, type=pa.float64()) [ 1, 3, 5, 7 ] {code} Looking at the [code|https://github.com/apache/arrow/blob/7a5562174cffb21b16f990f64d114c1a94a30556/cpp/src/arrow/python/numpy_to_arrow.cc#L407] it seems like to determine the number of elements it uses the target type instead of the initial numpy type. In this case the stride is 8 bytes which corresponds to 2 elements in float32 whereas the codes tries to determine the number of elements with the target type which gives 1 element of float64 and therefore it reads the array one by one instead of every 2 elements until reaching the total number of elements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5650) [Python] Update manylinux dependency versions
Antoine Pitrou created ARROW-5650: - Summary: [Python] Update manylinux dependency versions Key: ARROW-5650 URL: https://issues.apache.org/jira/browse/ARROW-5650 Project: Apache Arrow Issue Type: Task Components: Packaging, Python Reporter: Antoine Pitrou Fix For: 0.14.0 We should bump the versions of upstream libraries we compile in the manylinux build image. -- This message was sent by Atlassian JIRA (v7.6.3#76005)