[jira] [Created] (ARROW-5660) [GLib][CI] Use the latest macOS image and all Homebrew based libraries

2019-06-19 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-5660:
---

 Summary: [GLib][CI] Use the latest macOS image and all Homebrew 
based libraries
 Key: ARROW-5660
 URL: https://issues.apache.org/jira/browse/ARROW-5660
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, GLib
Reporter: Sutou Kouhei
Assignee: Sutou Kouhei






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5659) [C++] Add support for finding OpenSSL installed by Homebrew

2019-06-19 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-5659:
---

 Summary: [C++] Add support for finding OpenSSL installed by 
Homebrew
 Key: ARROW-5659
 URL: https://issues.apache.org/jira/browse/ARROW-5659
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Sutou Kouhei
Assignee: Sutou Kouhei






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5658) [JAVA]apache arrow-flight cannot send listvector

2019-06-19 Thread luckily (JIRA)
luckily created ARROW-5658:
--

 Summary: [JAVA]apache arrow-flight cannot send listvector 
 Key: ARROW-5658
 URL: https://issues.apache.org/jira/browse/ARROW-5658
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 0.13.0
 Environment: java8 arrow-java 0.13.0
Reporter: luckily


I can't transfer using apache arrow-flihgt. Contains listvector data. The 
problem description is as follows:
{quote} # I parse an xml file and convert it to an arrow format and finally 
convert it to a parquet data format. The address of the .xml file data is url 
[http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)]
 # I created a schema that uses listvector.
code show as below:
List list = 
childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator));
VectorSchemaRoot root = VectorSchemaRoot.of(inVector)
 # Parse the xml file to get the list data in "cd". Use api use listvector.
`ListVector listVector = (ListVector) valueVectors;
List columns = column.getColumns();
Column column1 = columns.get(0);
String name = column1.getId().toString();
UnionListWriter writer = listVector.getWriter();
Writer.allocate();
For (int j = 0; j < column1.getColumns().size();j++) {

writer.setPosition(j);
writer.startList();
Writer.list().startList();
Column column2 = column1.getColumns().get(j);
List> lst = (List>) 
((Map) val).get(name);

For (int k = 0; k < lst.size(); k++) {
Map stringStringMap = lst.get(k);
String value = 
stringStringMap.get(column2.getId().toString());
Switch (column2.getType()) {
Case FLOAT:

Writer.list().float4().writeFloat4(stringConvertFloat(value));
Break;
Case BOOLEAN:

Writer.list().bit().writeBit(stringConvertBoolean(value));
Break;
Case DECIMAL:

Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale()));
Break;
Case TIMESTAMP:

Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString()));
Break;
Case INTEGER:
Case BIGINT:

Writer.list().bigInt().writeBigInt(stringConvertLong(value));
Break;
Case VARCHAR:
VarCharHolder varBinaryHolder = new 
VarCharHolder();
varBinaryHolder.start = 0;
Byte[] bytes =value.getBytes();
ArrowBuf buffer = 
listVector.getAllocator().buffer(bytes.length);
varBinaryHolder.buffer = buffer;
buffer.writeBytes(bytes);
varBinaryHolder.end=bytes.length;
Writer.list().varChar().write(varBinaryHolder);
Break;
Default:
Throw new IllegalArgumentException(" error no 
type !!");
}
}
Writer.list().endList();
writer.endList();
}`

 4. 

After the write is complete, I will send to the arrow-flight server. server 
code :
{quote}
{quote}@Override
public Callable acceptPut(FlightStream flightStream) {

 return () -> {

 try (VectorSchemaRoot root = flightStream.getRoot()) {

 while (flightStream.next()) {
 VectorSchemaRoot other = null;
 try {
 logger.info(" Receive message .. size: " + root.getRowCount());
 other = copyRoot(root);
 ArrowMessage arrowMessage = new ArrowMessage(other, other.getSchema());
 spmc.offer(arrowMessage);
 } catch (Exception e) {

 logger.error(e.getMessage(), e);
 }
 }
 }

 return Flight.PutResult.parseFrom("ok".getBytes());
 };

}{quote}
{quote} But the server did not receive any information.!! it is error .{quote}
{quote}client code :{quote}
{quote}root = message.getRoot();
//client.close();
FlightClient.ClientStreamListener listener =
 client.startPut(FlightDescriptor.path(message.getFilename()), root);
listener.putNext();
listener.completed();
client.close();
listener.putNext();
listener.completed();
Flight.PutResult result =
 listener.getResult();
String s = new 

[jira] [Created] (ARROW-5656) [Python] Enable Flight wheels on macOS

2019-06-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5656:
---

 Summary: [Python] Enable Flight wheels on macOS
 Key: ARROW-5656
 URL: https://issues.apache.org/jira/browse/ARROW-5656
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.14.0


Follow up to ARROW-3150



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5655) [Python] Table.from_pydict/from_arrays not using types in specified schema correctly

2019-06-19 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5655:


 Summary: [Python] Table.from_pydict/from_arrays not using types in 
specified schema correctly 
 Key: ARROW-5655
 URL: https://issues.apache.org/jira/browse/ARROW-5655
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Joris Van den Bossche


Example with {{from_pydict}} (from 
https://github.com/apache/arrow/pull/4601#issuecomment-503676534):

{code:python}
In [15]: table = pa.Table.from_pydict(
...: {'a': [1, 2, 3], 'b': [3, 4, 5]},
...: schema=pa.schema([('a', pa.int64()), ('c', pa.int32())]))

In [16]: table
Out[16]: 
pyarrow.Table
a: int64
c: int32

In [17]: table.to_pandas()
Out[17]: 
   a  c
0  1  3
1  2  0
2  3  4
{code}

Note that the specified schema has 1) different column names and 2) has a 
non-default type (int32 vs int64) which leads to corrupted values.

This is partly due to {{Table.from_pydict}} not using the type information in 
the schema to convert the dictionary items to pyarrow arrays. But then it is 
also {{Table.from_arrays}} that is not correctly casting the arrays to another 
dtype if the schema specifies as such.

Additional question for {{Table.pydict}} is whether it actually should override 
the 'b' key from the dictionary as column 'c' as defined in the schema (this 
behaviour depends on the order of the dictionary, which is not guaranteed below 
python 3.6).




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5654) [C++] ChunkedArray should validate the types of the arrays

2019-06-19 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5654:


 Summary: [C++] ChunkedArray should validate the types of the arrays
 Key: ARROW-5654
 URL: https://issues.apache.org/jira/browse/ARROW-5654
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Joris Van den Bossche
 Fix For: 1.0.0


Example from Python, showing that you can currently create a ChunkedArray with 
incompatible types:

{code:python}
In [8]: a1 = pa.array([1, 2])

In [9]: a2 = pa.array(['a', 'b'])

In [10]: pa.chunked_array([a1, a2])
Out[10]:

[
  [
1,
2
  ],
  [
"a",
"b"
  ]
]
{code}

So a {{ChunkedArray::Validate}} can be implemented (and which should probably 
be called by default upon creation?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5653) [CI] Fix cpp docker image

2019-06-19 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5653:
-

 Summary: [CI] Fix cpp docker image
 Key: ARROW-5653
 URL: https://issues.apache.org/jira/browse/ARROW-5653
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


{code:shell}
make -f Makefile.docker run-cpp
...
54/64 Test #79: arrow-dataset-file_test ***Failed0.04 sec
Running arrow-dataset-file_test, redirecting output into 
/build/cpp/build/test-logs/arrow-dataset-file_test.txt (attempt 1/1)
/build/cpp/debug/arrow-dataset-file_test: error while loading shared libraries: 
libbrotlienc.so.1: cannot open shared object file: No such file or directory
/build/cpp/src/arrow/dataset

  Start 80: arrow-flight-test
55/64 Test #80: arrow-flight-test ..***Failed0.04 sec
Running arrow-flight-test, redirecting output into 
/build/cpp/build/test-logs/arrow-flight-test.txt (attempt 1/1)
/build/cpp/debug/arrow-flight-t
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Failures in Arrow tests run outside of continuous integration

2019-06-19 Thread Bryan Cutler
The last Spark Integration test timed out. Is there any way to increase the
timeout for Crossbow tests?

On Wed, Jun 19, 2019 at 10:05 AM Wes McKinney  wrote:

> hi folks,
>
> We have a number of integration tests and extra-CI build
> configurations that are run via Crossbow, and many of them are
> failing:
>
>
> https://github.com/ursa-labs/crossbow/branches/all?page=1=nightly-480=%E2%9C%93
>
> At some point we would like to notify the dev@ list about these
> failures when they happen (or in a nightly e-mail), but in the
> meantime some of these probably should be fixed (e.g. the Spark
> integration build is failing) before the project can release again
>
> Thanks
> Wes
>


Failures in Arrow tests run outside of continuous integration

2019-06-19 Thread Wes McKinney
hi folks,

We have a number of integration tests and extra-CI build
configurations that are run via Crossbow, and many of them are
failing:

https://github.com/ursa-labs/crossbow/branches/all?page=1=nightly-480=%E2%9C%93

At some point we would like to notify the dev@ list about these
failures when they happen (or in a nightly e-mail), but in the
meantime some of these probably should be fixed (e.g. the Spark
integration build is failing) before the project can release again

Thanks
Wes


[jira] [Created] (ARROW-5652) [CI] Fix iwyu docker image

2019-06-19 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5652:
-

 Summary: [CI] Fix iwyu docker image
 Key: ARROW-5652
 URL: https://issues.apache.org/jira/browse/ARROW-5652
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


See 
[https://travis-ci.org/ursa-labs/crossbow/builds/547691665?utm_source=github_status_medium=notification]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5651) [Python] Incorrect conversion from strided Numpy array when other type is specified

2019-06-19 Thread JIRA
Fabian Höring created ARROW-5651:


 Summary: [Python] Incorrect conversion from strided Numpy array 
when other type is specified
 Key: ARROW-5651
 URL: https://issues.apache.org/jira/browse/ARROW-5651
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Fabian Höring


In the example below the pyarrow array gives wrong results for strided numpy 
arrays:

{code}
>> import pyarrow as pa
>> import numpy as np
>> p_s = pd.Series(np.arange(0, 10, dtype=np.float32)[1:-1:2])
>> pa.array(p_s, type=pa.float64())

[
  1,
  2,
  3,
  4
]
{code}

When copying the numpy array to a new location is gives the expected output:

{code}
>> import pyarrow as pa
>> import numpy as np
>> import pandas as pd
>> p_s = pd.Series(np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2]))
>> pa.array(p_s, type=pa.float64())
  
 [
 1,
 3,
 5,
 7 
]  
{code}

Looking at the 
[code|https://github.com/apache/arrow/blob/7a5562174cffb21b16f990f64d114c1a94a30556/cpp/src/arrow/python/numpy_to_arrow.cc#L407]
 it seems like to determine the number of elements it uses the target type 
instead of the initial numpy type.

In this case the stride is 8 bytes which corresponds to 2 elements in float32 
whereas the codes tries to determine the number of elements with the target 
type which gives 1 element of float64 and therefore it reads the array one by 
one instead of every 2 elements until reaching the total number of elements.







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5650) [Python] Update manylinux dependency versions

2019-06-19 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5650:
-

 Summary: [Python] Update manylinux dependency versions
 Key: ARROW-5650
 URL: https://issues.apache.org/jira/browse/ARROW-5650
 Project: Apache Arrow
  Issue Type: Task
  Components: Packaging, Python
Reporter: Antoine Pitrou
 Fix For: 0.14.0


We should bump the versions of upstream libraries we compile in the manylinux 
build image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)