[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434901#comment-16434901
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

andygrove commented on a change in pull request #1875: ARROW-2435: [Rust] Add 
memory pool abstraction.
URL: https://github.com/apache/arrow/pull/1875#discussion_r180959288
 
 

 ##
 File path: rust/src/memory_pool.rs
 ##
 @@ -0,0 +1,90 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use libc;
+use std::mem;
+use std::cmp;
+
+use super::error::ArrowError;
+use super::error::Result;
+
+const ALIGNMENT: usize = 64;
+
+/// Memory pool for allocating memory. It's also responsible for tracking 
memory usage.
+pub trait MemoryPool {
+/// Allocate memory.
+/// The implementation should ensures that allocated memory is aligned.
+fn allocate(, size: usize) -> Result<*mut u8>;
+
+/// Reallocate memory.
+/// If the implementation doesn't support reallocating aligned memory, it 
allocates new memory
+/// and copied old memory to it.
+fn reallocate(, old_size: usize, new_size: usize, pointer: *mut u8) 
-> Result<*mut u8>;
+
+/// Free memory.
+fn free(, ptr: *mut u8);
+}
+
+/// Implementation of memory pool using libc api.
+#[allow(dead_code)]
+struct LibcMemoryPool;
+
+impl MemoryPool for LibcMemoryPool {
+fn allocate(, size: usize) -> Result<*mut u8> {
+unsafe {
+let mut page: *mut libc::c_void = mem::uninitialized();
+let result = libc::posix_memalign( page, ALIGNMENT, size);
+match result {
+0 => Ok(mem::transmute::<*mut libc::c_void, *mut u8>(page)),
+_ => Err(ArrowError::MemoryError(
+"Failed to allocate memory".to_string(),
+)),
+}
+}
+}
+
+fn reallocate(, old_size: usize, new_size: usize, pointer: *mut u8) 
-> Result<*mut u8> {
+unsafe {
+let old_src = mem::transmute::<*mut u8, *mut 
libc::c_void>(pointer);
+let result = self.allocate(new_size)?;
+let dst = mem::transmute::<*mut u8, *mut libc::c_void>(result);
+libc::memcpy(dst, old_src, cmp::min(old_size, new_size));
+libc::free(old_src);
+Ok(result)
+}
+}
+
+fn free(, ptr: *mut u8) {
+unsafe { libc::free(mem::transmute::<*mut u8, *mut libc::c_void>(ptr)) 
}
+}
+}
+
+#[cfg(test)]
+mod tests {
+use super::*;
+
+#[test]
+fn test_allocate() {
+let memory_pool = LibcMemoryPool {};
+
+for _ in 0..10 {
+let p = memory_pool.allocate(1024).unwrap();
+// make sure this is 64-byte aligned
+assert_eq!(0, (p as usize) % 64);
+}
 
 Review comment:
   Could you update this test to also call free(). Also would be good to add a 
test for reallocate as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2448) Segfault when plasma client goes out of scope before buffer.

2018-04-11 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434580#comment-16434580
 ] 

Antoine Pitrou commented on ARROW-2448:
---

Yes. The issue is that we don't control {{PlasmaClient}} lifetime (the user 
allocates it however they want, as the constructor is public). And there's no 
notion of an invalidated buffer.

Also, I don't understand why the buffer still has valid contents after the 
client is destroyed.

> Segfault when plasma client goes out of scope before buffer.
> 
>
> Key: ARROW-2448
> URL: https://issues.apache.org/jira/browse/ARROW-2448
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++), Python
>Reporter: Robert Nishihara
>Priority: Major
>
> The following causes a segfault.
>  
> First start a plasma store with
> {code:java}
> plasma_store -s /tmp/store -m 100{code}
> Then run the following in Python.
> {code}
> import pyarrow.plasma as plasma
> import numpy as np
> client = plasma.connect('/tmp/store', '', 0)
> object_id = client.put(np.zeros(3))
> buf = client.get(object_id)
> del client
> del buf  # This segfaults.{code}
> The backtrace is 
> {code:java}
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0xfffc)
>   * frame #0: 0x0001056deaee 
> libplasma.0.dylib`plasma::PlasmaClient::Release(plasma::UniqueID const&) + 142
>     frame #1: 0x0001056de9e9 
> libplasma.0.dylib`plasma::PlasmaBuffer::~PlasmaBuffer() + 41
>     frame #2: 0x0001056dec9f libplasma.0.dylib`arrow::Buffer::~Buffer() + 
> 63
>     frame #3: 0x000106206661 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() 
> [inlined] std::__1::__shared_count::__release_shared(this=0x0001019b7d20) 
> at memory:3444
>     frame #4: 0x000106206617 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() 
> [inlined] 
> std::__1::__shared_weak_count::__release_shared(this=0x0001019b7d20) at 
> memory:3486
>     frame #5: 0x000106206617 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780)
>  at memory:4412
>     frame #6: 0x000106002b35 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780)
>  at memory:4410
>     frame #7: 0x0001061052c5 lib.cpython-36m-darwin.so`void 
> __Pyx_call_destructor >(x=std::__1::shared_ptr::element_type @ 0x0001019b7d38 
> strong=0 weak=1) at lib.cxx:486
>     frame #8: 0x000106104f93 
> lib.cpython-36m-darwin.so`__pyx_tp_dealloc_7pyarrow_3lib_Buffer(o=0x000100791768)
>  at lib.cxx:107704
>     frame #9: 0x0001069fcd54 
> multiarray.cpython-36m-darwin.so`array_dealloc + 292
>     frame #10: 0x0001000e8daf 
> libpython3.6m.dylib`_PyDict_DelItem_KnownHash + 463
>     frame #11: 0x000100171899 
> libpython3.6m.dylib`_PyEval_EvalFrameDefault + 13321
>     frame #12: 0x0001001791ef 
> libpython3.6m.dylib`_PyEval_EvalCodeWithName + 2447
>     frame #13: 0x00010016e3d4 libpython3.6m.dylib`PyEval_EvalCode + 100
>     frame #14: 0x0001001a3bd6 
> libpython3.6m.dylib`PyRun_InteractiveOneObject + 582
>     frame #15: 0x0001001a350e 
> libpython3.6m.dylib`PyRun_InteractiveLoopFlags + 222
>     frame #16: 0x0001001a33fc libpython3.6m.dylib`PyRun_AnyFileExFlags + 
> 60
>     frame #17: 0x0001001bc835 libpython3.6m.dylib`Py_Main + 3829
>     frame #18: 0x00010df8 python`main + 232
>     frame #19: 0x7fff6cd80015 libdyld.dylib`start + 1
>     frame #20: 0x7fff6cd80015 libdyld.dylib`start + 1{code}
> Basically, the issue is that when the buffer goes out of scope, it calls 
> {{Release}} on the plasma client, but the client has already been deallocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2195) [Plasma] Segfault when retrieving RecordBatch from plasma store

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434571#comment-16434571
 ] 

ASF GitHub Bot commented on ARROW-2195:
---

robertnishihara commented on issue #1807: ARROW-2195: [Plasma] Return 
auto-releasing buffers
URL: https://github.com/apache/arrow/pull/1807#issuecomment-380595114
 
 
   @pitrou I'm seeing https://issues.apache.org/jira/browse/ARROW-2448 when 
using this PR.
   
   It seems like we need each client to keep track of the buffers that it 
produces and to invalidate them when the client disconnects (or something like 
that). cc @pcmoritz 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] Segfault when retrieving RecordBatch from plasma store
> ---
>
> Key: ARROW-2195
> URL: https://issues.apache.org/jira/browse/ARROW-2195
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++), Python
>Affects Versions: 0.8.0
>Reporter: Philipp Moritz
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> It can be reproduced with the following script:
> {code:python}
> import pyarrow as pa
> import pyarrow.plasma as plasma
> def retrieve1():
> client = plasma.connect('test', "", 0)
> key = "keynumber1keynumber1"
> pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> [buff] = client .get_buffers([pid])
> batch = pa.RecordBatchStreamReader(buff).read_next_batch()
> print(batch)
> print(batch.schema)
> print(batch[0])
> return batch
> client = plasma.connect('test', "", 0)
> test1 = [1, 12, 23, 3, 21, 34]
> test1 = pa.array(test1, pa.int32())
> batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
> key = "keynumber1keynumber1"
> pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> sink = pa.MockOutputStream()
> stream_writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> stream_writer.write_batch(batch)
> stream_writer.close()
> bff = client.create(pid, sink.size())
> stream = pa.FixedSizeBufferWriter(bff)
> writer = pa.RecordBatchStreamWriter(stream, batch.schema)
> writer.write_batch(batch)
> client.seal(pid)
> batch = retrieve1()
> print(batch)
> print(batch.schema)
> print(batch[0])
> {code}
>  
> Preliminary backtrace:
>  
> {code}
> CESS (code=1, address=0x38158)
>     frame #0: 0x00010e6457fc 
> lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28
> lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py:
> ->  0x10e6457fc <+28>: movslq (%rdx,%rcx,4), %rdi
>     0x10e645800 <+32>: callq  0x10e698170               ; symbol stub for: 
> PyInt_FromLong
>     0x10e645805 <+37>: testq  %rax, %rax
>     0x10e645808 <+40>: je     0x10e64580c               ; <+44>
> (lldb) bt
>  * thread #1: tid = 0xf1378e, 0x00010e6457fc 
> lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28, 
> queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, 
> address=0x38158)
>   * frame #0: 0x00010e6457fc 
> lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28
>     frame #1: 0x00010e5ccd35 lib.so`__Pyx_PyObject_CallNoArg(_object*) + 
> 133
>     frame #2: 0x00010e613b25 
> lib.so`__pyx_pw_7pyarrow_3lib_10ArrayValue_3__repr__(_object*) + 933
>     frame #3: 0x00010c2f83bc libpython2.7.dylib`PyObject_Repr + 60
>     frame #4: 0x00010c35f651 libpython2.7.dylib`PyEval_EvalFrameEx + 22305
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2448) Segfault when plasma client goes out of scope before buffer.

2018-04-11 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-2448:
---

 Summary: Segfault when plasma client goes out of scope before 
buffer.
 Key: ARROW-2448
 URL: https://issues.apache.org/jira/browse/ARROW-2448
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++), Python
Reporter: Robert Nishihara


The following causes a segfault.

 

First start a plasma store with
{code:java}
plasma_store -s /tmp/store -m 100{code}
Then run the following in Python.
{code}
import pyarrow.plasma as plasma
import numpy as np

client = plasma.connect('/tmp/store', '', 0)

object_id = client.put(np.zeros(3))

buf = client.get(object_id)

del client

del buf  # This segfaults.{code}
The backtrace is 
{code:java}
(lldb) bt

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=1, address=0xfffc)

  * frame #0: 0x0001056deaee 
libplasma.0.dylib`plasma::PlasmaClient::Release(plasma::UniqueID const&) + 142

    frame #1: 0x0001056de9e9 
libplasma.0.dylib`plasma::PlasmaBuffer::~PlasmaBuffer() + 41

    frame #2: 0x0001056dec9f libplasma.0.dylib`arrow::Buffer::~Buffer() + 63

    frame #3: 0x000106206661 
lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() 
[inlined] std::__1::__shared_count::__release_shared(this=0x0001019b7d20) 
at memory:3444

    frame #4: 0x000106206617 
lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() 
[inlined] 
std::__1::__shared_weak_count::__release_shared(this=0x0001019b7d20) at 
memory:3486

    frame #5: 0x000106206617 
lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780)
 at memory:4412

    frame #6: 0x000106002b35 
lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780)
 at memory:4410

    frame #7: 0x0001061052c5 lib.cpython-36m-darwin.so`void 
__Pyx_call_destructor(x=std::__1::shared_ptr::element_type @ 0x0001019b7d38 
strong=0 weak=1) at lib.cxx:486

    frame #8: 0x000106104f93 
lib.cpython-36m-darwin.so`__pyx_tp_dealloc_7pyarrow_3lib_Buffer(o=0x000100791768)
 at lib.cxx:107704

    frame #9: 0x0001069fcd54 multiarray.cpython-36m-darwin.so`array_dealloc 
+ 292

    frame #10: 0x0001000e8daf libpython3.6m.dylib`_PyDict_DelItem_KnownHash 
+ 463

    frame #11: 0x000100171899 libpython3.6m.dylib`_PyEval_EvalFrameDefault 
+ 13321

    frame #12: 0x0001001791ef libpython3.6m.dylib`_PyEval_EvalCodeWithName 
+ 2447

    frame #13: 0x00010016e3d4 libpython3.6m.dylib`PyEval_EvalCode + 100

    frame #14: 0x0001001a3bd6 
libpython3.6m.dylib`PyRun_InteractiveOneObject + 582

    frame #15: 0x0001001a350e 
libpython3.6m.dylib`PyRun_InteractiveLoopFlags + 222

    frame #16: 0x0001001a33fc libpython3.6m.dylib`PyRun_AnyFileExFlags + 60

    frame #17: 0x0001001bc835 libpython3.6m.dylib`Py_Main + 3829

    frame #18: 0x00010df8 python`main + 232

    frame #19: 0x7fff6cd80015 libdyld.dylib`start + 1

    frame #20: 0x7fff6cd80015 libdyld.dylib`start + 1{code}
Basically, the issue is that when the buffer goes out of scope, it calls 
{{Release}} on the plasma client, but the client has already been deallocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434383#comment-16434383
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

atuldambalkar commented on a change in pull request #1759: ARROW-1780 - [WIP] 
JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector 
Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180855624
 
 

 ##
 File path: 
java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/AbstractJdbcToArrowTest.java
 ##
 @@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+import java.sql.Connection;
+import java.sql.Statement;
+
+/**
+ * Class to abstract out some common test functionality for testing JDBC to 
Arrow.
+ */
+public abstract class AbstractJdbcToArrowTest {
+
+protected void createTestData(Connection conn, Table table) throws 
Exception {
+
+Statement stmt = null;
+try {
+//create the table and insert the data and once done drop the table
+stmt = conn.createStatement();
+stmt.executeUpdate(table.getCreate());
+
+for (String insert: table.getData()) {
+stmt.executeUpdate(insert);
+}
+
+} catch (Exception e) {
+e.printStackTrace();
+} finally {
 
 Review comment:
   Thanks @laurentgo for the comments. I should be able to revert soon with 
further changes. Still getting some work done from our India development team 
member on the test cases related changes. Let me ping you on Slack for any 
quick discussion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434226#comment-16434226
 ] 

ASF GitHub Bot commented on ARROW-2432:
---

BryanCutler commented on a change in pull request #1878: ARROW-2432: [Python] 
Fix Pandas decimal type conversion with None values
URL: https://github.com/apache/arrow/pull/1878#discussion_r180826989
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -1149,19 +1155,15 @@ def 
test_fixed_size_bytes_does_not_accept_varying_lengths(self):
 
 def test_variable_size_bytes(self):
 s = pd.Series([b'123', b'', b'a', None])
-arr = pa.Array.from_pandas(s, type=pa.binary())
-assert arr.type == pa.binary()
 _check_series_roundtrip(s, type_=pa.binary())
 
 def test_binary_from_bytearray(self):
-s = pd.Series([bytearray(b'123'), bytearray(b''), bytearray(b'a')])
+s = pd.Series([bytearray(b'123'), bytearray(b''), bytearray(b'a'),
+   None])
 # Explicitly set type
-arr = pa.Array.from_pandas(s, type=pa.binary())
-assert arr.type == pa.binary()
-# Infer type from bytearrays
-arr = pa.Array.from_pandas(s)
-assert arr.type == pa.binary()
 _check_series_roundtrip(s, type_=pa.binary())
+# Infer type from bytearrays
+_check_series_roundtrip(s)
 
 Review comment:
   ooops, right - thanks for catching that!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434204#comment-16434204
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on issue #1759: ARROW-1780 - [WIP] JDBC Adapter to convert 
Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#issuecomment-380521328
 
 
   @atuldambalkar I can be reached on slack if you need me


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2193) [Plasma] plasma_store has runtime dependency on Boost shared libraries when ARROW_BOOST_USE_SHARED=on

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434186#comment-16434186
 ] 

ASF GitHub Bot commented on ARROW-2193:
---

pitrou commented on issue #1711: WIP ARROW-2193: [C++] Do not depend on Boost 
libraries at runtime in plasma_store
URL: https://github.com/apache/arrow/pull/1711#issuecomment-380518198
 
 
   Should probably close this PR as this issue has been fixed by removing 
regex_boost usage.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] plasma_store has runtime dependency on Boost shared libraries when 
> ARROW_BOOST_USE_SHARED=on
> -
>
> Key: ARROW-2193
> URL: https://issues.apache.org/jira/browse/ARROW-2193
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I'm not sure why, but when I run the pyarrow test suite (for example 
> {{py.test pyarrow/tests/test_plasma.py}}), plasma_store forks endlessly:
> {code:bash}
>  $ ps fuwww
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> [...]
> antoine  27869 12.0  0.4 863208 68976 pts/7S13:41   0:01 
> /home/antoine/miniconda3/envs/pyarrow/bin/python 
> /home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 
> -m 1
> antoine  27885 13.0  0.4 863076 68560 pts/7S13:41   0:01  \_ 
> /home/antoine/miniconda3/envs/pyarrow/bin/python 
> /home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 
> -m 1
> antoine  27901 12.1  0.4 863076 68320 pts/7S13:41   0:01  \_ 
> /home/antoine/miniconda3/envs/pyarrow/bin/python 
> /home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 
> -m 1
> antoine  27920 13.6  0.4 863208 68868 pts/7S13:41   0:01  \_ 
> /home/antoine/miniconda3/envs/pyarrow/bin/python 
> /home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 
> -m 1
> [etc.]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434167#comment-16434167
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180815237
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,431 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+
+import com.google.common.base.Preconditions;
+import org.apache.arrow.vector.BaseFixedWidthVector;
+import org.apache.arrow.vector.BigIntVector;
+import org.apache.arrow.vector.BitVector;
+import org.apache.arrow.vector.DateMilliVector;
+import org.apache.arrow.vector.DecimalVector;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.Float4Vector;
+import org.apache.arrow.vector.Float8Vector;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.SmallIntVector;
+import org.apache.arrow.vector.TimeMilliVector;
+import org.apache.arrow.vector.TimeStampVector;
+import org.apache.arrow.vector.TinyIntVector;
+import org.apache.arrow.vector.VarBinaryVector;
+import org.apache.arrow.vector.VarCharVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.math.BigDecimal;
+
+import java.nio.charset.StandardCharsets;
+import java.sql.Blob;
+import java.sql.Clob;
+import java.sql.Date;
+import java.sql.ResultSet;
+import java.sql.ResultSetMetaData;
+import java.sql.SQLException;
+import java.sql.Time;
+import java.sql.Timestamp;
+import java.sql.Types;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+Preconditions.checkNotNull(rsmd, "JDBC ResultSetMetaData object can't 

[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434161#comment-16434161
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180253135
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -200,144 +226,206 @@ public static void jdbcToArrowVectors(ResultSet rs, 
VectorSchemaRoot root) throw
 switch (rsmd.getColumnType(i)) {
 case Types.BOOLEAN:
 case Types.BIT:
-BitVector bitVector = (BitVector) 
root.getVector(columnName);
-bitVector.setSafe(rowCount, rs.getBoolean(i)? 1: 0);
-bitVector.setValueCount(rowCount + 1);
+updateVector((BitVector)root.getVector(columnName),
+rs.getBoolean(i), rowCount);
 break;
 case Types.TINYINT:
-TinyIntVector tinyIntVector = 
(TinyIntVector)root.getVector(columnName);
-tinyIntVector.setSafe(rowCount, rs.getInt(i));
-tinyIntVector.setValueCount(rowCount + 1);
+updateVector((TinyIntVector)root.getVector(columnName),
+rs.getInt(i), rowCount);
 break;
 case Types.SMALLINT:
-SmallIntVector smallIntVector = 
(SmallIntVector)root.getVector(columnName);
-smallIntVector.setSafe(rowCount, rs.getInt(i));
-smallIntVector.setValueCount(rowCount + 1);
+
updateVector((SmallIntVector)root.getVector(columnName),
+rs.getInt(i), rowCount);
 break;
 case Types.INTEGER:
-IntVector intVector = 
(IntVector)root.getVector(columnName);
-intVector.setSafe(rowCount, rs.getInt(i));
-intVector.setValueCount(rowCount + 1);
+updateVector((IntVector)root.getVector(columnName),
+rs.getInt(i), rowCount);
 break;
 case Types.BIGINT:
-BigIntVector bigIntVector = 
(BigIntVector)root.getVector(columnName);
-bigIntVector.setSafe(rowCount, rs.getInt(i));
-bigIntVector.setValueCount(rowCount + 1);
+updateVector((BigIntVector)root.getVector(columnName),
+rs.getInt(i), rowCount);
 
 Review comment:
   if bigint is a 64bits integer, it should probably use rs.getLong() (maybe 
have unit tests with large values, both positive and negative?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434159#comment-16434159
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180811074
 
 

 ##
 File path: java/adapter/jdbc/pom.xml
 ##
 @@ -0,0 +1,95 @@
+
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+4.0.0
+
+org.apache.arrow
+arrow-java-root
+0.10.0-SNAPSHOT
+
+
+arrow-jdbc
+Arrow JDBC Adapter
+http://maven.apache.org
+
+
+
+
+org.apache.arrow
+arrow-memory
+${project.version}
+
+
+
+org.apache.arrow
+arrow-vector
+${project.version}
+
+
+com.google.guava
+guava
+18.0
+
+
+
+
+
+junit
+junit
+4.11
+test
+
+
+
+com.h2database
+h2
+1.4.196
+test
+
+
+com.fasterxml.jackson.dataformat
+jackson-dataformat-yaml
+2.7.9
+test
+
+
+com.fasterxml.jackson.core
+jackson-databind
+2.7.9
+test
+
+
+
+com.google.collections
 
 Review comment:
   That seems like a legacy library, before Guava was created...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434157#comment-16434157
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180249387
 
 

 ##
 File path: java/adapter/jdbc/pom.xml
 ##
 @@ -62,10 +68,11 @@
 2.7.9
 test
 
+
 
-com.google.guava
-guava
-18.0
+com.google.collections
 
 Review comment:
   isn't that deprecated in favor of guava? (last update is 2009...)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434163#comment-16434163
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180811190
 
 

 ##
 File path: java/adapter/jdbc/pom.xml
 ##
 @@ -0,0 +1,95 @@
+
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+4.0.0
+
+org.apache.arrow
+arrow-java-root
+0.10.0-SNAPSHOT
+
+
+arrow-jdbc
+Arrow JDBC Adapter
+http://maven.apache.org
+
+
+
+
+org.apache.arrow
+arrow-memory
+${project.version}
+
+
+
+org.apache.arrow
+arrow-vector
+${project.version}
+
+
+com.google.guava
+guava
+18.0
+
+
+
+
+
+junit
+junit
+4.11
 
 Review comment:
   replace with ${dep.junit.version}


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434169#comment-16434169
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180818053
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,431 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+
+import com.google.common.base.Preconditions;
+import org.apache.arrow.vector.BaseFixedWidthVector;
+import org.apache.arrow.vector.BigIntVector;
+import org.apache.arrow.vector.BitVector;
+import org.apache.arrow.vector.DateMilliVector;
+import org.apache.arrow.vector.DecimalVector;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.Float4Vector;
+import org.apache.arrow.vector.Float8Vector;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.SmallIntVector;
+import org.apache.arrow.vector.TimeMilliVector;
+import org.apache.arrow.vector.TimeStampVector;
+import org.apache.arrow.vector.TinyIntVector;
+import org.apache.arrow.vector.VarBinaryVector;
+import org.apache.arrow.vector.VarCharVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.math.BigDecimal;
+
+import java.nio.charset.StandardCharsets;
+import java.sql.Blob;
+import java.sql.Clob;
+import java.sql.Date;
+import java.sql.ResultSet;
+import java.sql.ResultSetMetaData;
+import java.sql.SQLException;
+import java.sql.Time;
+import java.sql.Timestamp;
+import java.sql.Types;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+Preconditions.checkNotNull(rsmd, "JDBC ResultSetMetaData object can't 

[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434155#comment-16434155
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180249672
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java
 ##
 @@ -64,53 +68,48 @@
  * @param connection Database connection to be used. This method will not 
close the passed connection object. Since hte caller has passed
  *   the connection object it's the responsibility of the 
caller to close or return the connection to the pool.
  * @param query The DB Query to fetch the data.
- * @return
- * @throws SQLException Propagate any SQL Exceptions to the caller after 
closing any resources opened such as ResultSet and Statment objects.
+ * @return Arrow Data Objects {@link VectorSchemaRoot}
+ * @throws SQLException Propagate any SQL Exceptions to the caller after 
closing any resources opened such as ResultSet and Statement objects.
  */
-public static VectorSchemaRoot sqlToArrow(Connection connection, String 
query) throws Exception {
-
-assert connection != null: "JDBC conncetion object can not be null";
-assert query != null && query.length() > 0: "SQL query can not be null 
or empty";
-
-RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE);
+public static VectorSchemaRoot sqlToArrow(Connection connection, String 
query, RootAllocator rootAllocator) throws SQLException {
+Preconditions.checkNotNull(connection, "JDBC connection object can not 
be null");
+Preconditions.checkArgument(query != null && query.length() > 0, "SQL 
query can not be null or empty");
 
-Statement stmt = null;
-ResultSet rs = null;
-try {
-stmt = connection.createStatement();
-rs = stmt.executeQuery(query);
-ResultSetMetaData rsmd = rs.getMetaData();
-VectorSchemaRoot root = VectorSchemaRoot.create(
-JdbcToArrowUtils.jdbcToArrowSchema(rsmd), rootAllocator);
-JdbcToArrowUtils.jdbcToArrowVectors(rs, root);
-return root;
-} catch (Exception exc) {
-// just throw it out after logging
-throw exc;
-} finally {
-if (rs != null) {
-rs.close();
-}
-if (stmt != null) {
-stmt.close(); // test
-}
+try (Statement stmt = connection.createStatement()) {
+return sqlToArrow(stmt.executeQuery(query), rootAllocator);
 }
 }
 
 /**
- * This method returns ArrowDataFetcher Object that can be used to fetch 
and iterate on the data in the given
- * database table.
- *
- * @param connection - Database connection Object
- * @param tableName - Table name from which records will be fetched
+ * For the given JDBC {@link ResultSet}, fetch the data from Relational DB 
and convert it to Arrow objects.
  *
- * @return ArrowDataFetcher - Instance of ArrowDataFetcher which can be 
used to get Arrow Vector obejcts by calling its functionality
+ * @param resultSet
+ * @return Arrow Data Objects {@link VectorSchemaRoot}
+ * @throws Exception
  */
-public static ArrowDataFetcher jdbcArrowDataFetcher(Connection connection, 
String tableName) {
-assert connection != null: "JDBC conncetion object can not be null";
-assert tableName != null && tableName.length() > 0: "Table name can 
not be null or empty";
+public static VectorSchemaRoot sqlToArrow(ResultSet resultSet) throws 
SQLException {
+Preconditions.checkNotNull(resultSet, "JDBC ResultSet object can not 
be null");
 
-return new ArrowDataFetcher(connection, tableName);
+RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE);
+VectorSchemaRoot root = sqlToArrow(resultSet, rootAllocator);
+rootAllocator.close();
+return root;
 }
 
+/**
+ * For the given JDBC {@link ResultSet}, fetch the data from Relational DB 
and convert it to Arrow objects.
+ *
+ * @param resultSet
+ * @return Arrow Data Objects {@link VectorSchemaRoot}
+ * @throws Exception
+ */
+public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, 
RootAllocator rootAllocator) throws SQLException {
 
 Review comment:
   I know I mentioned RootAllocator, but I guess BufferAllocator (which is the 
base interface) would work as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on 

[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434168#comment-16434168
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180817457
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,431 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+
+import com.google.common.base.Preconditions;
+import org.apache.arrow.vector.BaseFixedWidthVector;
+import org.apache.arrow.vector.BigIntVector;
+import org.apache.arrow.vector.BitVector;
+import org.apache.arrow.vector.DateMilliVector;
+import org.apache.arrow.vector.DecimalVector;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.Float4Vector;
+import org.apache.arrow.vector.Float8Vector;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.SmallIntVector;
+import org.apache.arrow.vector.TimeMilliVector;
+import org.apache.arrow.vector.TimeStampVector;
+import org.apache.arrow.vector.TinyIntVector;
+import org.apache.arrow.vector.VarBinaryVector;
+import org.apache.arrow.vector.VarCharVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.math.BigDecimal;
+
+import java.nio.charset.StandardCharsets;
+import java.sql.Blob;
+import java.sql.Clob;
+import java.sql.Date;
+import java.sql.ResultSet;
+import java.sql.ResultSetMetaData;
+import java.sql.SQLException;
+import java.sql.Time;
+import java.sql.Timestamp;
+import java.sql.Types;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+Preconditions.checkNotNull(rsmd, "JDBC ResultSetMetaData object can't 

[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434164#comment-16434164
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180818360
 
 

 ##
 File path: 
java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/AbstractJdbcToArrowTest.java
 ##
 @@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+import java.sql.Connection;
+import java.sql.Statement;
+
+/**
+ * Class to abstract out some common test functionality for testing JDBC to 
Arrow.
+ */
+public abstract class AbstractJdbcToArrowTest {
+
+protected void createTestData(Connection conn, Table table) throws 
Exception {
+
+Statement stmt = null;
+try {
+//create the table and insert the data and once done drop the table
+stmt = conn.createStatement();
+stmt.executeUpdate(table.getCreate());
+
+for (String insert: table.getData()) {
+stmt.executeUpdate(insert);
+}
+
+} catch (Exception e) {
+e.printStackTrace();
+} finally {
 
 Review comment:
   you should use `try(with-resources)` construct instead...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434166#comment-16434166
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180810834
 
 

 ##
 File path: java/adapter/jdbc/pom.xml
 ##
 @@ -0,0 +1,95 @@
+
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+4.0.0
+
+org.apache.arrow
+arrow-java-root
+0.10.0-SNAPSHOT
+
+
+arrow-jdbc
+Arrow JDBC Adapter
+http://maven.apache.org
+
+
+
+
+org.apache.arrow
+arrow-memory
+${project.version}
+
+
+
+org.apache.arrow
+arrow-vector
+${project.version}
+
+
+com.google.guava
+guava
+18.0
+
+
+
+
+
+junit
+junit
+4.11
+test
+
+
+
+com.h2database
+h2
+1.4.196
+test
+
+
+com.fasterxml.jackson.dataformat
+jackson-dataformat-yaml
+2.7.9
+test
+
+
+com.fasterxml.jackson.core
+jackson-databind
+2.7.9
 
 Review comment:
   replace with ${dep.jackson.version}


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434156#comment-16434156
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180252798
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,343 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+import org.apache.arrow.vector.*;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.nio.charset.Charset;
+import java.sql.*;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+assert rsmd != null;
+
+//ImmutableList.Builder fields = ImmutableList.builder();
+List fields = new ArrayList<>();
+int columnCount = rsmd.getColumnCount();
+for (int i = 1; i <= columnCount; i++) {
+String columnName = rsmd.getColumnName(i);
+switch (rsmd.getColumnType(i)) {
+case Types.BOOLEAN:
+case Types.BIT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Bool()), null));
+break;
+case Types.TINYINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(8, true)), null));
+break;
+case Types.SMALLINT:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(16, true)), null));
+break;
+case Types.INTEGER:
+fields.add(new Field(columnName, FieldType.nullable(new 
ArrowType.Int(32, true)), null));
+break;
+case 

[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434160#comment-16434160
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180810807
 
 

 ##
 File path: java/adapter/jdbc/pom.xml
 ##
 @@ -0,0 +1,95 @@
+
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+4.0.0
+
+org.apache.arrow
+arrow-java-root
+0.10.0-SNAPSHOT
+
+
+arrow-jdbc
+Arrow JDBC Adapter
+http://maven.apache.org
+
+
+
+
+org.apache.arrow
+arrow-memory
+${project.version}
+
+
+
+org.apache.arrow
+arrow-vector
+${project.version}
+
+
+com.google.guava
+guava
+18.0
+
+
+
+
+
+junit
+junit
+4.11
+test
+
+
+
+com.h2database
+h2
+1.4.196
+test
+
+
+com.fasterxml.jackson.dataformat
+jackson-dataformat-yaml
+2.7.9
 
 Review comment:
   replace with ${dep.jackson.version}


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434165#comment-16434165
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180815032
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -0,0 +1,431 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.adapter.jdbc;
+
+
+import com.google.common.base.Preconditions;
+import org.apache.arrow.vector.BaseFixedWidthVector;
+import org.apache.arrow.vector.BigIntVector;
+import org.apache.arrow.vector.BitVector;
+import org.apache.arrow.vector.DateMilliVector;
+import org.apache.arrow.vector.DecimalVector;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.Float4Vector;
+import org.apache.arrow.vector.Float8Vector;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.SmallIntVector;
+import org.apache.arrow.vector.TimeMilliVector;
+import org.apache.arrow.vector.TimeStampVector;
+import org.apache.arrow.vector.TinyIntVector;
+import org.apache.arrow.vector.VarBinaryVector;
+import org.apache.arrow.vector.VarCharVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.DateUnit;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+import org.apache.arrow.vector.types.pojo.Schema;
+
+import java.math.BigDecimal;
+
+import java.nio.charset.StandardCharsets;
+import java.sql.Blob;
+import java.sql.Clob;
+import java.sql.Date;
+import java.sql.ResultSet;
+import java.sql.ResultSetMetaData;
+import java.sql.SQLException;
+import java.sql.Time;
+import java.sql.Timestamp;
+import java.sql.Types;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
+import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+
+
+/**
+ * Class that does most of the work to convert JDBC ResultSet data into Arrow 
columnar format Vector objects.
+ *
+ * @since 0.10.0
+ */
+public class JdbcToArrowUtils {
+
+private static final int DEFAULT_BUFFER_SIZE = 256;
+
+/**
+ * Create Arrow {@link Schema} object for the given JDBC {@link 
ResultSetMetaData}.
+ *
+ * This method currently performs following type mapping for JDBC SQL data 
types to corresponding Arrow data types.
+ *
+ * CHAR--> ArrowType.Utf8
+ * NCHAR   --> ArrowType.Utf8
+ * VARCHAR --> ArrowType.Utf8
+ * NVARCHAR --> ArrowType.Utf8
+ * LONGVARCHAR --> ArrowType.Utf8
+ * LONGNVARCHAR --> ArrowType.Utf8
+ * NUMERIC --> ArrowType.Decimal(precision, scale)
+ * DECIMAL --> ArrowType.Decimal(precision, scale)
+ * BIT --> ArrowType.Bool
+ * TINYINT --> ArrowType.Int(8, signed)
+ * SMALLINT --> ArrowType.Int(16, signed)
+ * INTEGER --> ArrowType.Int(32, signed)
+ * BIGINT --> ArrowType.Int(64, signed)
+ * REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+ * DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+ * BINARY --> ArrowType.Binary
+ * VARBINARY --> ArrowType.Binary
+ * LONGVARBINARY --> ArrowType.Binary
+ * DATE --> ArrowType.Date(DateUnit.MILLISECOND)
+ * TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
+ * TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
+ * CLOB --> ArrowType.Utf8
+ * BLOB --> ArrowType.Binary
+ *
+ * @param rsmd
+ * @return {@link Schema}
+ * @throws SQLException
+ */
+public static Schema jdbcToArrowSchema(ResultSetMetaData rsmd) throws 
SQLException {
+
+Preconditions.checkNotNull(rsmd, "JDBC ResultSetMetaData object can't 

[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434158#comment-16434158
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180810672
 
 

 ##
 File path: java/adapter/jdbc/pom.xml
 ##
 @@ -0,0 +1,95 @@
+
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+4.0.0
+
+org.apache.arrow
+arrow-java-root
+0.10.0-SNAPSHOT
+
+
+arrow-jdbc
+Arrow JDBC Adapter
+http://maven.apache.org
+
+
+
+
+org.apache.arrow
+arrow-memory
+${project.version}
+
+
+
+org.apache.arrow
+arrow-vector
+${project.version}
+
+
+com.google.guava
+guava
+18.0
 
 Review comment:
   replace with ${dep.guava.version}


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JDBC Adapter for Apache Arrow
> -
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1780) JDBC Adapter for Apache Arrow

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434162#comment-16434162
 ] 

ASF GitHub Bot commented on ARROW-1780:
---

laurentgo commented on a change in pull request #1759: ARROW-1780 - [WIP] JDBC 
Adapter to convert Relational Data objects to Arrow Data Format Vector Objects
URL: https://github.com/apache/arrow/pull/1759#discussion_r180253328
 
 

 ##
 File path: 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
 ##
 @@ -200,144 +226,206 @@ public static void jdbcToArrowVectors(ResultSet rs, 
VectorSchemaRoot root) throw
 switch (rsmd.getColumnType(i)) {
 case Types.BOOLEAN:
 case Types.BIT:
-BitVector bitVector = (BitVector) 
root.getVector(columnName);
-bitVector.setSafe(rowCount, rs.getBoolean(i)? 1: 0);
-bitVector.setValueCount(rowCount + 1);
+updateVector((BitVector)root.getVector(columnName),
+rs.getBoolean(i), rowCount);
 break;
 case Types.TINYINT:
-TinyIntVector tinyIntVector = 
(TinyIntVector)root.getVector(columnName);
-tinyIntVector.setSafe(rowCount, rs.getInt(i));
-tinyIntVector.setValueCount(rowCount + 1);
+updateVector((TinyIntVector)root.getVector(columnName),
+rs.getInt(i), rowCount);
 break;
 case Types.SMALLINT:
-SmallIntVector smallIntVector = 
(SmallIntVector)root.getVector(columnName);
-smallIntVector.setSafe(rowCount, rs.getInt(i));
-smallIntVector.setValueCount(rowCount + 1);
+
updateVector((SmallIntVector)root.getVector(columnName),
+rs.getInt(i), rowCount);
 break;
 case Types.INTEGER:
-IntVector intVector = 
(IntVector)root.getVector(columnName);
-intVector.setSafe(rowCount, rs.getInt(i));
-intVector.setValueCount(rowCount + 1);
+updateVector((IntVector)root.getVector(columnName),
+rs.getInt(i), rowCount);
 break;
 case Types.BIGINT:
-BigIntVector bigIntVector = 
(BigIntVector)root.getVector(columnName);
-bigIntVector.setSafe(rowCount, rs.getInt(i));
-bigIntVector.setValueCount(rowCount + 1);
+updateVector((BigIntVector)root.getVector(columnName),
+rs.getInt(i), rowCount);
 break;
 case Types.NUMERIC:
 case Types.DECIMAL:
-DecimalVector decimalVector = 
(DecimalVector)root.getVector(columnName);
-decimalVector.setSafe(rowCount, rs.getBigDecimal(i));
-decimalVector.setValueCount(rowCount + 1);
+updateVector((DecimalVector)root.getVector(columnName),
+rs.getBigDecimal(i), rowCount);
 break;
 case Types.REAL:
 case Types.FLOAT:
-Float4Vector float4Vector = 
(Float4Vector)root.getVector(columnName);
-float4Vector.setSafe(rowCount, rs.getFloat(i));
-float4Vector.setValueCount(rowCount + 1);
+updateVector((Float4Vector)root.getVector(columnName),
+rs.getFloat(i), rowCount);
 break;
 case Types.DOUBLE:
-Float8Vector float8Vector = 
(Float8Vector)root.getVector(columnName);
-float8Vector.setSafe(rowCount, rs.getDouble(i));
-float8Vector.setValueCount(rowCount + 1);
+updateVector((Float8Vector)root.getVector(columnName),
+rs.getDouble(i), rowCount);
 break;
 case Types.CHAR:
 case Types.NCHAR:
 case Types.VARCHAR:
 case Types.NVARCHAR:
 case Types.LONGVARCHAR:
 case Types.LONGNVARCHAR:
-VarCharVector varcharVector = 
(VarCharVector)root.getVector(columnName);
-String value = rs.getString(i) != null ? 
rs.getString(i) : "";
-varcharVector.setIndexDefined(rowCount);
-

[jira] [Resolved] (ARROW-2193) [Plasma] plasma_store has runtime dependency on Boost shared libraries when ARROW_BOOST_USE_SHARED=on

2018-04-11 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2193.
---
Resolution: Fixed
  Assignee: Antoine Pitrou  (was: Wes McKinney)

ARROW-2224 removed the reliance on boost_regex:
{code:bash}
$ ldd `which plasma_store`
linux-vdso.so.1 =>  (0x7ffeb2974000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x7f96cd5b3000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
(0x7f96cd22b000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f96ccf22000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
(0x7f96ccd0b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f96cc941000)
/lib64/ld-linux-x86-64.so.2 (0x7f96cd7d)
{code}

> [Plasma] plasma_store has runtime dependency on Boost shared libraries when 
> ARROW_BOOST_USE_SHARED=on
> -
>
> Key: ARROW-2193
> URL: https://issues.apache.org/jira/browse/ARROW-2193
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I'm not sure why, but when I run the pyarrow test suite (for example 
> {{py.test pyarrow/tests/test_plasma.py}}), plasma_store forks endlessly:
> {code:bash}
>  $ ps fuwww
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> [...]
> antoine  27869 12.0  0.4 863208 68976 pts/7S13:41   0:01 
> /home/antoine/miniconda3/envs/pyarrow/bin/python 
> /home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 
> -m 1
> antoine  27885 13.0  0.4 863076 68560 pts/7S13:41   0:01  \_ 
> /home/antoine/miniconda3/envs/pyarrow/bin/python 
> /home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 
> -m 1
> antoine  27901 12.1  0.4 863076 68320 pts/7S13:41   0:01  \_ 
> /home/antoine/miniconda3/envs/pyarrow/bin/python 
> /home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 
> -m 1
> antoine  27920 13.6  0.4 863208 68868 pts/7S13:41   0:01  \_ 
> /home/antoine/miniconda3/envs/pyarrow/bin/python 
> /home/antoine/arrow/python/pyarrow/plasma_store -s /tmp/plasma_store40209423 
> -m 1
> [etc.]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2247) [Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault

2018-04-11 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2247.
---
   Resolution: Fixed
 Assignee: Antoine Pitrou
Fix Version/s: 0.10.0

> [Python] Statically-linking boost_regex in both libarrow and libparquet 
> results in segfault
> ---
>
> Key: ARROW-2247
> URL: https://issues.apache.org/jira/browse/ARROW-2247
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.10.0
>
>
> This is a backtrace loading {{libparquet.so}} on Ubuntu 14.04 using boost 
> 1.66.1 from conda-forge. Both libarrow and libparquet contain {{boost_regex}} 
> statically linked. 
> {code}
> In [1]: import ctypes
> In [2]: ctypes.CDLL('libparquet.so')
> Program received signal SIGSEGV, Segmentation fault.
> 0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (gdb) bt
> #0  0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #1  0x7fffed74c1fc in 
> boost::re_detail_106600::cpp_regex_traits_char_layer::init() ()
>from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #2  0x7fffed794803 in 
> boost::object_cache boost::re_detail_106600::cpp_regex_traits_implementation 
> >::do_get(boost::re_detail_106600::cpp_regex_traits_base const&, 
> unsigned long) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #3  0x7fffed79e62b in boost::basic_regex boost::cpp_regex_traits > >::do_assign(char const*, char const*, 
> unsigned int) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #4  0x7fffee58561b in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p1=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  
> p2=0x7fffee60064a "", f=0) at 
> /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:381
> #5  0x7fffee5855a7 in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:366
> #6  0x7fffee5683f3 in boost::basic_regex boost::cpp_regex_traits > >::basic_regex (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:335
> #7  0x7fffee5656d0 in parquet::ApplicationVersion::ApplicationVersion (
> Python Exception  There is no member named _M_dataplus.: 
> this=0x7fffee8f1fb8 
> , created_by=)
> at ../src/parquet/metadata.cc:452
> #8  0x7fffee41c271 in __cxx_global_var_init.1(void) () at 
> ../src/parquet/metadata.cc:35
> #9  0x7fffee41c44e in _GLOBAL__sub_I_metadata.tmp.wesm_desktop.4838.ii ()
>from /home/wesm/local/lib/libparquet.so
> #10 0x77dea1da in call_init (l=, argc=argc@entry=2, 
> argv=argv@entry=0x7fff5d88, 
> env=env@entry=0x7fff5da0) at dl-init.c:78
> #11 0x77dea2c3 in call_init (env=, argv= out>, argc=, 
> l=) at dl-init.c:36
> #12 _dl_init (main_map=main_map@entry=0x13fb220, argc=2, argv=0x7fff5d88, 
> env=0x7fff5da0)
> at dl-init.c:126
> {code}
> This seems to be caused by static initializations in libparquet:
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/metadata.cc#L34
> We should see if removing these static initializations makes the problem go 
> away. If not, then statically-linking boost_regex in both libraries is not 
> advisable.
> For this reason and more, I really wish that Arrow and Parquet shared a 
> common build system and monorepo structure -- it would make handling these 
> toolchain and build-related issues much simpler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2247) [Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault

2018-04-11 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434107#comment-16434107
 ] 

Antoine Pitrou commented on ARROW-2247:
---

ARROW-2224 removed boost-regex usage from libarrow.

> [Python] Statically-linking boost_regex in both libarrow and libparquet 
> results in segfault
> ---
>
> Key: ARROW-2247
> URL: https://issues.apache.org/jira/browse/ARROW-2247
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.10.0
>
>
> This is a backtrace loading {{libparquet.so}} on Ubuntu 14.04 using boost 
> 1.66.1 from conda-forge. Both libarrow and libparquet contain {{boost_regex}} 
> statically linked. 
> {code}
> In [1]: import ctypes
> In [2]: ctypes.CDLL('libparquet.so')
> Program received signal SIGSEGV, Segmentation fault.
> 0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (gdb) bt
> #0  0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #1  0x7fffed74c1fc in 
> boost::re_detail_106600::cpp_regex_traits_char_layer::init() ()
>from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #2  0x7fffed794803 in 
> boost::object_cache boost::re_detail_106600::cpp_regex_traits_implementation 
> >::do_get(boost::re_detail_106600::cpp_regex_traits_base const&, 
> unsigned long) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #3  0x7fffed79e62b in boost::basic_regex boost::cpp_regex_traits > >::do_assign(char const*, char const*, 
> unsigned int) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #4  0x7fffee58561b in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p1=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  
> p2=0x7fffee60064a "", f=0) at 
> /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:381
> #5  0x7fffee5855a7 in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:366
> #6  0x7fffee5683f3 in boost::basic_regex boost::cpp_regex_traits > >::basic_regex (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:335
> #7  0x7fffee5656d0 in parquet::ApplicationVersion::ApplicationVersion (
> Python Exception  There is no member named _M_dataplus.: 
> this=0x7fffee8f1fb8 
> , created_by=)
> at ../src/parquet/metadata.cc:452
> #8  0x7fffee41c271 in __cxx_global_var_init.1(void) () at 
> ../src/parquet/metadata.cc:35
> #9  0x7fffee41c44e in _GLOBAL__sub_I_metadata.tmp.wesm_desktop.4838.ii ()
>from /home/wesm/local/lib/libparquet.so
> #10 0x77dea1da in call_init (l=, argc=argc@entry=2, 
> argv=argv@entry=0x7fff5d88, 
> env=env@entry=0x7fff5da0) at dl-init.c:78
> #11 0x77dea2c3 in call_init (env=, argv= out>, argc=, 
> l=) at dl-init.c:36
> #12 _dl_init (main_map=main_map@entry=0x13fb220, argc=2, argv=0x7fff5d88, 
> env=0x7fff5da0)
> at dl-init.c:126
> {code}
> This seems to be caused by static initializations in libparquet:
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/metadata.cc#L34
> We should see if removing these static initializations makes the problem go 
> away. If not, then statically-linking boost_regex in both libraries is not 
> advisable.
> For this reason and more, I really wish that Arrow and Parquet shared a 
> common build system and monorepo structure -- it would make handling these 
> toolchain and build-related issues much simpler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2224) [C++] Get rid of boost regex usage

2018-04-11 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2224.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1880
[https://github.com/apache/arrow/pull/1880]

> [C++] Get rid of boost regex usage
> --
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2224) [C++] Get rid of boost regex usage

2018-04-11 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2224:
-

Assignee: Antoine Pitrou  (was: Phillip Cloud)

> [C++] Get rid of boost regex usage
> --
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Get rid of boost regex usage

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434100#comment-16434100
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

pitrou closed pull request #1880: ARROW-2224: [C++] Remove boost-regex 
dependency
URL: https://github.com/apache/arrow/pull/1880
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/README.md b/cpp/README.md
index 8018efd9e..daeeade72 100644
--- a/cpp/README.md
+++ b/cpp/README.md
@@ -35,7 +35,6 @@ On Ubuntu/Debian you can install the requirements with:
 ```shell
 sudo apt-get install cmake \
  libboost-dev \
- libboost-regex-dev \
  libboost-filesystem-dev \
  libboost-system-dev
 ```
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 129174c8d..020e0ed44 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -157,11 +157,8 @@ if (ARROW_BOOST_VENDORED)
 
"${BOOST_LIB_DIR}/${CMAKE_STATIC_LIBRARY_PREFIX}boost_system${CMAKE_STATIC_LIBRARY_SUFFIX}")
   set(BOOST_STATIC_FILESYSTEM_LIBRARY
 
"${BOOST_LIB_DIR}/${CMAKE_STATIC_LIBRARY_PREFIX}boost_filesystem${CMAKE_STATIC_LIBRARY_SUFFIX}")
-  set(BOOST_STATIC_REGEX_LIBRARY
-
"${BOOST_LIB_DIR}/${CMAKE_STATIC_LIBRARY_PREFIX}boost_regex${CMAKE_STATIC_LIBRARY_SUFFIX}")
   set(BOOST_SYSTEM_LIBRARY "${BOOST_STATIC_SYSTEM_LIBRARY}")
   set(BOOST_FILESYSTEM_LIBRARY "${BOOST_STATIC_FILESYSTEM_LIBRARY}")
-  set(BOOST_REGEX_LIBRARY "${BOOST_STATIC_REGEX_LIBRARY}")
   if (ARROW_BOOST_HEADER_ONLY)
 set(BOOST_BUILD_PRODUCTS)
 set(BOOST_CONFIGURE_COMMAND "")
@@ -169,12 +166,11 @@ if (ARROW_BOOST_VENDORED)
   else()
 set(BOOST_BUILD_PRODUCTS
   ${BOOST_SYSTEM_LIBRARY}
-  ${BOOST_FILESYSTEM_LIBRARY}
-  ${BOOST_REGEX_LIBRARY})
+  ${BOOST_FILESYSTEM_LIBRARY})
 set(BOOST_CONFIGURE_COMMAND
   "./bootstrap.sh"
   "--prefix=${BOOST_PREFIX}"
-  "--with-libraries=filesystem,system,regex")
+  "--with-libraries=filesystem,system")
 if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG")
   set(BOOST_BUILD_VARIANT "debug")
 else()
@@ -214,19 +210,16 @@ else()
 if (ARROW_BOOST_HEADER_ONLY)
   find_package(Boost REQUIRED)
 else()
-  find_package(Boost COMPONENTS system filesystem regex REQUIRED)
+  find_package(Boost COMPONENTS system filesystem REQUIRED)
   if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG")
 set(BOOST_SHARED_SYSTEM_LIBRARY ${Boost_SYSTEM_LIBRARY_DEBUG})
 set(BOOST_SHARED_FILESYSTEM_LIBRARY ${Boost_FILESYSTEM_LIBRARY_DEBUG})
-set(BOOST_SHARED_REGEX_LIBRARY ${Boost_REGEX_LIBRARY_DEBUG})
   else()
 set(BOOST_SHARED_SYSTEM_LIBRARY ${Boost_SYSTEM_LIBRARY_RELEASE})
 set(BOOST_SHARED_FILESYSTEM_LIBRARY 
${Boost_FILESYSTEM_LIBRARY_RELEASE})
-set(BOOST_SHARED_REGEX_LIBRARY ${Boost_REGEX_LIBRARY_RELEASE})
   endif()
   set(BOOST_SYSTEM_LIBRARY boost_system_shared)
   set(BOOST_FILESYSTEM_LIBRARY boost_filesystem_shared)
-  set(BOOST_REGEX_LIBRARY boost_regex_shared)
 endif()
   else()
 # Find static boost headers and libs
@@ -235,19 +228,16 @@ else()
 if (ARROW_BOOST_HEADER_ONLY)
   find_package(Boost REQUIRED)
 else()
-  find_package(Boost COMPONENTS system filesystem regex REQUIRED)
+  find_package(Boost COMPONENTS system filesystem REQUIRED)
   if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG")
 set(BOOST_STATIC_SYSTEM_LIBRARY ${Boost_SYSTEM_LIBRARY_DEBUG})
 set(BOOST_STATIC_FILESYSTEM_LIBRARY ${Boost_FILESYSTEM_LIBRARY_DEBUG})
-set(BOOST_STATIC_REGEX_LIBRARY ${Boost_REGEX_LIBRARY_DEBUG})
   else()
 set(BOOST_STATIC_SYSTEM_LIBRARY ${Boost_SYSTEM_LIBRARY_RELEASE})
 set(BOOST_STATIC_FILESYSTEM_LIBRARY 
${Boost_FILESYSTEM_LIBRARY_RELEASE})
-set(BOOST_STATIC_REGEX_LIBRARY ${Boost_REGEX_LIBRARY_RELEASE})
   endif()
   set(BOOST_SYSTEM_LIBRARY boost_system_static)
   set(BOOST_FILESYSTEM_LIBRARY boost_filesystem_static)
-  set(BOOST_REGEX_LIBRARY boost_regex_static)
 endif()
   endif()
 endif()
@@ -264,11 +254,7 @@ if (NOT ARROW_BOOST_HEADER_ONLY)
   STATIC_LIB "${BOOST_STATIC_FILESYSTEM_LIBRARY}"
   SHARED_LIB "${BOOST_SHARED_FILESYSTEM_LIBRARY}")
 
-  ADD_THIRDPARTY_LIB(boost_regex
-  STATIC_LIB "${BOOST_STATIC_REGEX_LIBRARY}"
-  SHARED_LIB "${BOOST_SHARED_REGEX_LIBRARY}")
-
-  SET(ARROW_BOOST_LIBS boost_system boost_filesystem boost_regex)
+  SET(ARROW_BOOST_LIBS boost_system boost_filesystem)
 endif()
 
 include_directories(SYSTEM ${Boost_INCLUDE_DIR})
diff --git a/cpp/src/arrow/util/CMakeLists.txt 

[jira] [Commented] (ARROW-2182) [Python] ASV benchmark setup does not account for C++ library changing

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434087#comment-16434087
 ] 

ASF GitHub Bot commented on ARROW-2182:
---

pitrou commented on issue #1775: ARROW-2182: [Python] Build C++ libraries in 
benchmarks build step 
URL: https://github.com/apache/arrow/pull/1775#issuecomment-380499491
 
 
   Example benchmark running step here:
   https://travis-ci.org/apache/arrow/jobs/365152266#L7580
   
   The numbers are not very important but it shows `asv run` succeeding on a 
given changeset.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] ASV benchmark setup does not account for C++ library changing
> --
>
> Key: ARROW-2182
> URL: https://issues.apache.org/jira/browse/ARROW-2182
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> See https://github.com/apache/arrow/blob/master/python/README-benchmarks.md
> Perhaps we could create a helper script that will run all the 
> currently-defined benchmarks for a specific commit, and ensure that we are 
> running against pristine, up-to-date release builds of Arrow (and any other 
> dependencies, like parquet-cpp) at that commit? 
> cc [~pitrou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2432) [Python] from_pandas fails when converting decimals if have None values

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434083#comment-16434083
 ] 

ASF GitHub Bot commented on ARROW-2432:
---

pitrou commented on a change in pull request #1878: ARROW-2432: [Python] Fix 
Pandas decimal type conversion with None values
URL: https://github.com/apache/arrow/pull/1878#discussion_r180801610
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -1149,19 +1155,15 @@ def 
test_fixed_size_bytes_does_not_accept_varying_lengths(self):
 
 def test_variable_size_bytes(self):
 s = pd.Series([b'123', b'', b'a', None])
-arr = pa.Array.from_pandas(s, type=pa.binary())
-assert arr.type == pa.binary()
 _check_series_roundtrip(s, type_=pa.binary())
 
 def test_binary_from_bytearray(self):
-s = pd.Series([bytearray(b'123'), bytearray(b''), bytearray(b'a')])
+s = pd.Series([bytearray(b'123'), bytearray(b''), bytearray(b'a'),
+   None])
 # Explicitly set type
-arr = pa.Array.from_pandas(s, type=pa.binary())
-assert arr.type == pa.binary()
-# Infer type from bytearrays
-arr = pa.Array.from_pandas(s)
-assert arr.type == pa.binary()
 _check_series_roundtrip(s, type_=pa.binary())
+# Infer type from bytearrays
+_check_series_roundtrip(s)
 
 Review comment:
   But you should pass `expected_pa_type` here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas fails when converting decimals if have None values
> ---
>
> Key: ARROW-2432
> URL: https://issues.apache.org/jira/browse/ARROW-2432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using from_pandas to convert decimals fails if encounters a value of 
> {{None}}. For example:
> {code:java}
> In [1]: import pyarrow as pa
> ...: import pandas as pd
> ...: from decimal import Decimal
> ...:
> In [2]: s_dec = pd.Series([Decimal('3.14'), None])
> In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> ---
> ArrowInvalid Traceback (most recent call last)
>  in ()
> > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2))
> array.pxi in pyarrow.lib.Array.from_pandas()
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python 
> object of type NoneType but can only handle these types: decimal.Decimal
> {code}
> The above error is raised when specifying decimal type. When no type is 
> specified, a seg fault happens.
> This previously worked in 0.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Get rid of boost regex usage

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434078#comment-16434078
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency
URL: https://github.com/apache/arrow/pull/1880#issuecomment-380497242
 
 
   AppVeyor build at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.291


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Get rid of boost regex usage
> --
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1280) [C++] Implement Fixed Size List type

2018-04-11 Thread Brian Hulette (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434059#comment-16434059
 ] 

Brian Hulette commented on ARROW-1280:
--

[~xhochy] do you think this would qualify for the "beginner" label and get 
tackled at a hackathon? I would like to see support for FixedSizeList in 
Python/C++ and I wouldn't think it'd be _too_ hard to adapt the List type

> [C++] Implement Fixed Size List type
> 
>
> Key: ARROW-1280
> URL: https://issues.apache.org/jira/browse/ARROW-1280
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2224) [C++] Get rid of boost regex usage

2018-04-11 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2224:
--
Summary: [C++] Get rid of boost regex usage  (was: [C++] Replace boost 
regex usage with libre2)

> [C++] Get rid of boost regex usage
> --
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2097) [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are no errors

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433982#comment-16433982
 ] 

ASF GitHub Bot commented on ARROW-2097:
---

pitrou closed pull request #1883: ARROW-2097: [CI, Python] Reduce Travis-CI 
verbosity
URL: https://github.com/apache/arrow/pull/1883
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/ci/travis_script_python.sh b/ci/travis_script_python.sh
index a776c4263..8421e5cd3 100755
--- a/ci/travis_script_python.sh
+++ b/ci/travis_script_python.sh
@@ -84,7 +84,7 @@ fi
 export PYARROW_BUILD_TYPE=$ARROW_BUILD_TYPE
 
 pip install -q -r requirements.txt
-python setup.py build_ext --with-parquet --with-plasma --with-orc\
+python setup.py build_ext -q --with-parquet --with-plasma --with-orc\
install -q --single-version-externally-managed --record=record.text
 popd
 
@@ -105,7 +105,7 @@ if [ $TRAVIS_OS_NAME == "linux" ]; then
 fi
 
 PYARROW_PATH=$CONDA_PREFIX/lib/python$PYTHON_VERSION/site-packages/pyarrow
-python -m pytest -vv -r sxX --durations=15 -s $PYARROW_PATH --parquet
+python -m pytest -r sxX --durations=15 $PYARROW_PATH --parquet
 
 if [ "$PYTHON_VERSION" == "3.6" ] && [ $TRAVIS_OS_NAME == "linux" ]; then
   # Build documentation once


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are 
> no errors
> -
>
> Key: ARROW-2097
> URL: https://issues.apache.org/jira/browse/ARROW-2097
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>
> See https://travis-ci.org/apache/arrow/jobs/33265#L7858. It might be nice 
> to have an environment variable so that this can be toggled on or off, for 
> debugging purposes. See also ARROW-1380



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433962#comment-16433962
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

cpcloud commented on issue #1880: ARROW-2224: [C++] Remove boost-regex 
dependency
URL: https://github.com/apache/arrow/pull/1880#issuecomment-380465437
 
 
   Fair enough.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433957#comment-16433957
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency
URL: https://github.com/apache/arrow/pull/1880#issuecomment-380464846
 
 
   I took it from here:
   > isdigit and isxdigit are the only standard narrow character classification 
functions that are not affected by the currently installed C locale. although 
some implementations (e.g. Microsoft in 1252 codepage) may classify additional 
single-byte characters as digits. 
   
   http://en.cppreference.com/w/cpp/string/byte/isdigit
   
   Not sure how authoritative that page is.
   
   That said, `static_cast(std::isdigit(c))` is not very pretty.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433952#comment-16433952
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

cpcloud commented on issue #1880: ARROW-2224: [C++] Remove boost-regex 
dependency
URL: https://github.com/apache/arrow/pull/1880#issuecomment-380464015
 
 
   > It's not like isdigit is more readable anyway
   
   Readability wasn't my original concern, reimplementing a builtin function 
was.
   
   > and apparently it risks being locale-dependent on Windows (which is a can 
of worms).
   
   Is there some documentation on this somewhere? I found the following lines 
in the [`setlocale` documentation for Visual Studio 
2015](https://msdn.microsoft.com/en-us/library/x99tb11d.aspx):
   
   > LC_CTYPE
   The character-handling functions (except isdigit, isxdigit, mbstowcs, and 
mbtowc, which are unaffected).
   
   That suggests `isdigit` is *not* affected by locale. Am I reading something 
wrong?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433951#comment-16433951
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

pitrou commented on a change in pull request #1880: ARROW-2224: [C++] Remove 
boost-regex dependency
URL: https://github.com/apache/arrow/pull/1880#discussion_r180765500
 
 

 ##
 File path: cpp/src/arrow/util/decimal.cc
 ##
 @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, 
Decimal128* out) {
   }
 }
 
-static const boost::regex DECIMAL_REGEX(
-// sign of the number
-"(?[-+]?)"
-
-// digits around the decimal point
-
"(((?\\d+)\\.(?\\d*)|\\.(?\\d+)"
-")"
+namespace {
 
-// optional exponent
-"([eE](?[-+]?\\d+))?"
+struct DecimalComponents {
+  std::string sign;
+  std::string whole_digits;
+  std::string fractional_digits;
+  std::string exponent_sign;
+  std::string exponent_digits;
+};
 
-// otherwise
-"|"
+inline bool IsSign(char c) { return (c == '-' || c == '+'); }
 
-// we're just an integer
-"(?\\d+)"
+inline bool IsDot(char c) { return c == '.'; }
 
-// or an integer with an exponent
-"(?:[eE](?[-+]?\\d+))?)");
+inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); }
 
-static inline bool is_zero_character(char c) { return c == '0'; }
+inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); }
 
-Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* 
precision,
-  int32_t* scale) {
-  if (s.empty()) {
-return Status::Invalid("Empty string cannot be converted to decimal");
+inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, 
std::string* out) {
+  size_t pos;
+  for (pos = start; pos < size; ++pos) {
+if (!IsDigit(s[pos])) {
+  break;
+}
   }
+  *out = std::string(s + start, pos - start);
+  return pos;
+}
 
-  // case of all zeros
-  if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) {
-if (precision != nullptr) {
-  *precision = 0;
-}
+bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* 
out) {
 
 Review comment:
   I mean if the parse function is taking a `string::const_iterator`, it maybe 
won't accept a different kind of iterator. Or we need to make it a template 
function, piling more layers of abstraction without any concrete advantage.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433942#comment-16433942
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

cpcloud commented on issue #1880: ARROW-2224: [C++] Remove boost-regex 
dependency
URL: https://github.com/apache/arrow/pull/1880#issuecomment-380460395
 
 
   @pitrou Sweet. Thanks for doing this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433941#comment-16433941
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

cpcloud commented on a change in pull request #1880: ARROW-2224: [C++] Remove 
boost-regex dependency
URL: https://github.com/apache/arrow/pull/1880#discussion_r180761749
 
 

 ##
 File path: cpp/src/arrow/util/decimal.cc
 ##
 @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, 
Decimal128* out) {
   }
 }
 
-static const boost::regex DECIMAL_REGEX(
-// sign of the number
-"(?[-+]?)"
-
-// digits around the decimal point
-
"(((?\\d+)\\.(?\\d*)|\\.(?\\d+)"
-")"
+namespace {
 
-// optional exponent
-"([eE](?[-+]?\\d+))?"
+struct DecimalComponents {
+  std::string sign;
+  std::string whole_digits;
+  std::string fractional_digits;
+  std::string exponent_sign;
+  std::string exponent_digits;
+};
 
-// otherwise
-"|"
+inline bool IsSign(char c) { return (c == '-' || c == '+'); }
 
-// we're just an integer
-"(?\\d+)"
+inline bool IsDot(char c) { return c == '.'; }
 
-// or an integer with an exponent
-"(?:[eE](?[-+]?\\d+))?)");
+inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); }
 
-static inline bool is_zero_character(char c) { return c == '0'; }
+inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); }
 
-Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* 
precision,
-  int32_t* scale) {
-  if (s.empty()) {
-return Status::Invalid("Empty string cannot be converted to decimal");
+inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, 
std::string* out) {
+  size_t pos;
+  for (pos = start; pos < size; ++pos) {
+if (!IsDigit(s[pos])) {
+  break;
+}
   }
+  *out = std::string(s + start, pos - start);
+  return pos;
+}
 
-  // case of all zeros
-  if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) {
-if (precision != nullptr) {
-  *precision = 0;
-}
+bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* 
out) {
 
 Review comment:
   I don't follow how a `view`/`span` class is less abstracted, since that 
would presumably implement the c++ iterator interface, like every 
implementation of it usually does. However, as I said, I don't think this is 
really worth spending too much time on.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2097) [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are no errors

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433931#comment-16433931
 ] 

ASF GitHub Bot commented on ARROW-2097:
---

pitrou commented on issue #1883: ARROW-2097: [CI, Python] Reduce Travis-CI 
verbosity
URL: https://github.com/apache/arrow/pull/1883#issuecomment-380455429
 
 
   AppVeyor build at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.288
   (not that it should be affected)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are 
> no errors
> -
>
> Key: ARROW-2097
> URL: https://issues.apache.org/jira/browse/ARROW-2097
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>
> See https://travis-ci.org/apache/arrow/jobs/33265#L7858. It might be nice 
> to have an environment variable so that this can be toggled on or off, for 
> debugging purposes. See also ARROW-1380



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433888#comment-16433888
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency
URL: https://github.com/apache/arrow/pull/1880#issuecomment-380443938
 
 
   Also adding a Decimal::FromString benchmark. That benchmark is 60% faster 
with the PR (1.8M items/second up from 700k items/second here).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433867#comment-16433867
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency
URL: https://github.com/apache/arrow/pull/1880#issuecomment-380439980
 
 
   This PR reduces conversion time for decimals from Python to Arrow by ~45%:
   * before:
   ```
   [100.00%] ··· Running convert_builtins.ConvertPyListToArray.time_convert 
  ok
   [100.00%]  
   =
  type  
   -
decimal  177±0.06ms 
   =
   ```
   
   * after:
   ```
   [100.00%] ··· Running convert_builtins.ConvertPyListToArray.time_convert 
  ok
   [100.00%]  
   =
  type  
   -
decimal  101±0.5ms  
   =
   ```
   
   (irrelevant lines removed)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2097) [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are no errors

2018-04-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2097:
--
Labels: pull-request-available  (was: )

> [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are 
> no errors
> -
>
> Key: ARROW-2097
> URL: https://issues.apache.org/jira/browse/ARROW-2097
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>
> See https://travis-ci.org/apache/arrow/jobs/33265#L7858. It might be nice 
> to have an environment variable so that this can be toggled on or off, for 
> debugging purposes. See also ARROW-1380



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2097) [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are no errors

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433773#comment-16433773
 ] 

ASF GitHub Bot commented on ARROW-2097:
---

pitrou opened a new pull request #1883: ARROW-2097: [CI, Python] Reduce 
Travis-CI verbosity
URL: https://github.com/apache/arrow/pull/1883
 
 
   Python tests with Valgrind enabled produce very long output in verbose mode.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are 
> no errors
> -
>
> Key: ARROW-2097
> URL: https://issues.apache.org/jira/browse/ARROW-2097
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>
> See https://travis-ci.org/apache/arrow/jobs/33265#L7858. It might be nice 
> to have an environment variable so that this can be toggled on or off, for 
> debugging purposes. See also ARROW-1380



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2097) [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are no errors

2018-04-11 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2097:
--
Component/s: Continuous Integration

> [Python] Suppress valgrind stdout/stderr in Travis CI builds when there are 
> no errors
> -
>
> Key: ARROW-2097
> URL: https://issues.apache.org/jira/browse/ARROW-2097
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
>
> See https://travis-ci.org/apache/arrow/jobs/33265#L7858. It might be nice 
> to have an environment variable so that this can be toggled on or off, for 
> debugging purposes. See also ARROW-1380



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2330) [C++] Optimize delta buffer creation with partially finishable array builders

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433765#comment-16433765
 ] 

ASF GitHub Bot commented on ARROW-2330:
---

alendit closed pull request #1769: ARROW-2330: [C++] Optimize delta buffer 
creation with partially finishable array builders
URL: https://github.com/apache/arrow/pull/1769
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/array-test.cc b/cpp/src/arrow/array-test.cc
index fb1bebfca..7f7666bb3 100644
--- a/cpp/src/arrow/array-test.cc
+++ b/cpp/src/arrow/array-test.cc
@@ -536,6 +536,44 @@ TYPED_TEST(TestPrimitiveBuilder, SliceEquality) {
   ASSERT_TRUE(array->RangeEquals(5, 15, 0, slice));
 }
 
+TYPED_TEST(TestPrimitiveBuilder, TestPartialFinish) {
+  const int64_t size = 1000;
+  this->RandomData(size * 2);
+
+  // build an array of all values
+  std::shared_ptr all_values_array;
+  ASSERT_OK(MakeArray(this->valid_bytes_, this->draws_, size * 2, 
this->builder_.get(),
+  _values_array));
+
+  for (uint64_t idx = 0; idx < size; ++idx) {
+if (this->valid_bytes_[idx] > 0) {
+  ASSERT_OK(this->builder_->Append(this->draws_[idx]));
+} else {
+  ASSERT_OK(this->builder_->AppendNull());
+}
+  }
+
+  std::shared_ptr result1;
+  ASSERT_OK(this->builder_->Finish(false, ));
+
+  ASSERT_EQ(size, result1->length());
+  ASSERT_TRUE(all_values_array->RangeEquals(0, size, 0, result1));
+
+  for (uint64_t idx = size; idx < size * 2; ++idx) {
+if (this->valid_bytes_[idx] > 0) {
+  ASSERT_OK(this->builder_->Append(this->draws_[idx]));
+} else {
+  ASSERT_OK(this->builder_->AppendNull());
+}
+  }
+
+  std::shared_ptr result2;
+  ASSERT_OK(this->builder_->Finish(true, ));
+
+  ASSERT_EQ(size, result2->length());
+  ASSERT_TRUE(all_values_array->RangeEquals(size, size * 2, 0, result2));
+}
+
 TYPED_TEST(TestPrimitiveBuilder, TestAppendScalar) {
   DECL_T();
 
@@ -1027,6 +1065,27 @@ TEST_F(TestStringBuilder, TestZeroLength) {
   Done();
 }
 
+TEST_F(TestStringBuilder, TestPartialFinish) {
+  StringBuilder builder, builder_expected;
+  ASSERT_OK(builder.Append("foo"));
+  ASSERT_OK(builder_expected.Append("foo"));
+
+  std::shared_ptr result1, expected1;
+  ASSERT_OK(builder.Finish(false, ));
+  ASSERT_OK(builder_expected.Finish());
+  ASSERT_EQ(1, result1->length());
+  ASSERT_TRUE(result1->Equals(expected1));
+
+  ASSERT_OK(builder.Append("foo"));
+  ASSERT_OK(builder_expected.Append("foo"));
+  std::shared_ptr result2, expected2;
+  ASSERT_OK(builder.Finish(false, ));
+  ASSERT_OK(builder_expected.Finish());
+  ASSERT_EQ(1, result2->length());
+  ASSERT_EQ(1, result2->offset());
+  ASSERT_TRUE(result2->Equals(expected2));
+}
+
 // Binary container type
 // TODO(emkornfield) there should be some way to refactor these to avoid code 
duplicating
 // with String
@@ -1239,6 +1298,27 @@ TEST_F(TestBinaryBuilder, TestZeroLength) {
   Done();
 }
 
+TEST_F(TestBinaryBuilder, TestPartialFinish) {
+  BinaryBuilder builder, builder_expected;
+  ASSERT_OK(builder.Append("foo"));
+  ASSERT_OK(builder_expected.Append("foo"));
+
+  std::shared_ptr result1, expected1;
+  ASSERT_OK(builder.Finish(false, ));
+  ASSERT_OK(builder_expected.Finish());
+  ASSERT_EQ(1, result1->length());
+  ASSERT_TRUE(result1->Equals(expected1));
+
+  ASSERT_OK(builder.Append("foo"));
+  ASSERT_OK(builder_expected.Append("foo"));
+  std::shared_ptr result2, expected2;
+  ASSERT_OK(builder.Finish(false, ));
+  ASSERT_OK(builder_expected.Finish());
+  ASSERT_EQ(1, result2->length());
+  ASSERT_EQ(1, result2->offset());
+  ASSERT_TRUE(result2->Equals(expected2));
+}
+
 // --
 // Slice tests
 
@@ -1472,6 +1552,26 @@ TEST_F(TestFWBinaryArray, Slice) {
   ASSERT_TRUE(array->RangeEquals(1, 3, 0, slice));
 }
 
+TEST_F(TestFWBinaryArray, TestPartialFinish) {
+  auto type = fixed_size_binary(4);
+  FixedSizeBinaryBuilder builder(type);
+
+  ASSERT_OK(builder.Append("foo"));
+  std::shared_ptr result1;
+  ASSERT_OK(builder.Finish(false, ));
+  ASSERT_EQ(1, result1->length());
+  ASSERT_STREQ("foo", reinterpret_cast(
+  static_cast(*result1).Value(0)));
+
+  ASSERT_OK(builder.Append("bar"));
+  std::shared_ptr result2;
+  ASSERT_OK(builder.Finish());
+  ASSERT_EQ(1, result2->length());
+  ASSERT_EQ(1, result2->offset());
+  ASSERT_STREQ("bar", reinterpret_cast(
+  static_cast(*result2).Value(0)));
+}
+
 // --
 // AdaptiveInt tests
 
@@ -1603,6 +1703,31 @@ TEST_F(TestAdaptiveIntBuilder, TestAppendVector) {
   ASSERT_TRUE(expected_->Equals(result_));
 }
 

[jira] [Commented] (ARROW-2330) [C++] Optimize delta buffer creation with partially finishable array builders

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433764#comment-16433764
 ] 

ASF GitHub Bot commented on ARROW-2330:
---

alendit commented on issue #1769: ARROW-2330: [C++] Optimize delta buffer 
creation with partially finishable array builders
URL: https://github.com/apache/arrow/pull/1769#issuecomment-380419676
 
 
   Hi Uwe,
   
   I've decided to close this PR until I have a better understanding of the 
possible issues with slices.
   
   Cheers!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Optimize delta buffer creation with partially finishable array builders
> -
>
> Key: ARROW-2330
> URL: https://issues.apache.org/jira/browse/ARROW-2330
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Dimitri Vorona
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The main aim of this change is to optimize the building of delta 
> dictionaries. In the current version delta dictionaries are built using an 
> additional "overflow" buffer which leads to complicated and potentially 
> error-prone code and subpar performance by doubling the number of lookups.
> I solve this problem by introducing the notion of partially finishable array 
> builders, i.e. builder which are able to retain the state on calling Finish. 
> The interface is based on RecordBatchBuilder::Flush, i.e. Finish is 
> overloaded with additional signature Finish(bool reset_builder, 
> std::shared_ptr* out). The resulting Arrays point to the same data 
> buffer with different offsets.
> I'm aware that the change is kind of biggish, but I'd like to discuss it 
> here. The solution makes the code more straight forward, doesn't bloat the 
> code base too much and leaves the API more or less untouched. Additionally, 
> the new way to make delta dictionaries by using a different call signature to 
> Finish feel cleaner to me.
> I'm looking forward to your critic and improvement ideas.
> The pull request is available at: https://github.com/apache/arrow/pull/1769



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2447) [C++] Create a device abstraction

2018-04-11 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2447:
-

 Summary: [C++] Create a device abstraction
 Key: ARROW-2447
 URL: https://issues.apache.org/jira/browse/ARROW-2447
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, GPU
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Right now, a plain Buffer doesn't carry information about where it actually 
lies. That information also cannot be passed around, so you get APIs like 
{{PlasmaClient}} which take or return device number integers, and have 
implementations which hardcode operations on CUDA buffers.

Here is a sketch for a proposed Device abstraction:

{code:c++}
class Device {
enum DeviceKind { KIND_CPU, KIND_CUDA };

virtual DeviceKind kind() const;
//MemoryPool* default_memory_pool() const;
//std::shared_ptr Allocate(...);
};

class CpuDevice : public Device {};

class CudaDevice : public Device {
int device_num() const;
};

class Buffer {
virtual DeviceKind device_kind() const;
virtual std::shared_ptr device() const;
virtual bool on_cpu() const {
return true;
}

const uint8_t* cpu_data() const {
return on_cpu() ? data() : nullptr;
}
uint8_t* cpu_mutable_data() {
return on_cpu() ? mutable_data() : nullptr;
}

virtual CopyToCpu(std::shared_ptr dest) const;
virtual CopyFromCpu(std::shared_ptr src);
};

class CudaBuffer : public Buffer {
virtual bool on_cpu() const {
return false;
}
};

CopyBuffer(std::shared_ptr dest, const std::shared_ptr src);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2446) [C++] SliceBuffer on CudaBuffer should return CudaBuffer

2018-04-11 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2446:
--
Component/s: GPU

> [C++] SliceBuffer on CudaBuffer should return CudaBuffer
> 
>
> Key: ARROW-2446
> URL: https://issues.apache.org/jira/browse/ARROW-2446
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GPU
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> Currently {{SliceBuffer}} on a {{CudaBuffer}} returns a plain {{Buffer}} 
> instance, which is dangerous for unsuspecting consumers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433683#comment-16433683
 ] 

ASF GitHub Bot commented on ARROW-2224:
---

pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency
URL: https://github.com/apache/arrow/pull/1880#issuecomment-380401339
 
 
   Reverted the `std::isdigit` change because of a silly performance warning 
(turned into an error) on MSVC. It's not like `isdigit` is more readable 
anyway, and apparently it risks being locale-dependent on Windows (which is a 
can of worms).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Replace boost regex usage with libre2
> ---
>
> Key: ARROW-2224
> URL: https://issues.apache.org/jira/browse/ARROW-2224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> We're using {{boost::regex}} to parse decimal strings for {{decimal128}} 
> types. We should use {{libre2}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2446) [C++] SliceBuffer on CudaBuffer should return CudaBuffer

2018-04-11 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2446:
-

 Summary: [C++] SliceBuffer on CudaBuffer should return CudaBuffer
 Key: ARROW-2446
 URL: https://issues.apache.org/jira/browse/ARROW-2446
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Currently {{SliceBuffer}} on a {{CudaBuffer}} returns a plain {{Buffer}} 
instance, which is dangerous for unsuspecting consumers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433673#comment-16433673
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

liurenjie1024 commented on issue #1875: ARROW-2435: [Rust] Add memory pool 
abstraction.
URL: https://github.com/apache/arrow/pull/1875#issuecomment-380398449
 
 
   Hi, can anybody help to merge this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)