[jira] [Assigned] (ARROW-5045) [Rust] Code coverage silently failing in CI

2019-06-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-5045:
---

Assignee: Chao Sun

> [Rust] Code coverage silently failing in CI
> ---
>
> Key: ARROW-5045
> URL: https://issues.apache.org/jira/browse/ARROW-5045
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.13.0
>Reporter: Andy Grove
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
> {code:java}
> error: could not execute process `target/kcov-master/build/src/kcov --verify 
> --include-path=/home/travis/build/apache/arrow/rust 
> /home/travis/build/apache/arrow/rust/target/kcov-arrow-f04240306dd653e9 
> /home/travis/build/apache/arrow/rust/target/debug/deps/arrow-f04240306dd653e9`
>  (never executed){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5719) [Java] Support in-place vector sorting

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5719:
--
Labels: pull-request-available  (was: )

> [Java] Support in-place vector sorting
> --
>
> Key: ARROW-5719
> URL: https://issues.apache.org/jira/browse/ARROW-5719
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>
> Support in-place sorting for vectors. An in-place sorter sorts the vector by 
> directly modifying the vector data, so the input and output vectors are the 
> same one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5736) [Format] Support small bit-width indices of sparse tensor

2019-06-25 Thread Kenta Murata (JIRA)
Kenta Murata created ARROW-5736:
---

 Summary: [Format] Support small bit-width indices of sparse tensor
 Key: ARROW-5736
 URL: https://issues.apache.org/jira/browse/ARROW-5736
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Reporter: Kenta Murata
Assignee: Kenta Murata


Adding 32bit sparse index support is necessary to support non-copy data sharing 
with the existing systems such as SciPy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5726) [Java] Implement a common interface for int vectors

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5726:
--
Labels: pull-request-available  (was: )

> [Java] Implement a common interface for int vectors
> ---
>
> Key: ARROW-5726
> URL: https://issues.apache.org/jira/browse/ARROW-5726
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>
> Now in _DictionaryEncoder#encode_ it use reflection to pull out the set 
> method and then set values. 
> Set values by reflection is not efficient and code structure is not elegant 
> such as
> _Method setter = null;_
> _for (Class c : Arrays.asList(int.class, long.class)) {_
>  _try {_
>  _setter = indices.getClass().getMethod("setSafe", int.class, c);_
>  _break;_
>  _} catch (NoSuchMethodException e) {_
>  _// ignore_
>  _}_
> _}_
> Implement a common interface for int vectors to directly get set method and 
> set values seems a good choice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5735) [C++] Appveyor builds failing persistently in thrift_ep build

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872868#comment-16872868
 ] 

Wes McKinney commented on ARROW-5735:
-

This seems to be Boost-related

{code}
-- Found Boost 1.70.0 at 
C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/Boost-1.70.0
--   Requested configuration: QUIET COMPONENTS regex;system;filesystem
-- Found boost_headers 1.70.0 at 
C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/boost_headers-1.70.0
-- Found boost_regex 1.70.0 at 
C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/boost_regex-1.70.0
--   libboost_regex.lib
-- Adding boost_regex dependencies: headers
-- Found boost_system 1.70.0 at 
C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/boost_system-1.70.0
--   libboost_system.lib
-- Adding boost_system dependencies: headers
-- Found boost_filesystem 1.70.0 at 
C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/boost_filesystem-1.70.0
--   libboost_filesystem.lib
-- Adding boost_filesystem dependencies: headers
-- Boost 1.58 found.
-- Found Boost components:
   regex;system;filesystem
-- Boost include dir: 
-- Boost libraries: 
{code}

Thrift is failing due to the empty include dir

> [C++] Appveyor builds failing persistently in thrift_ep build
> -
>
> Key: ARROW-5735
> URL: https://issues.apache.org/jira/browse/ARROW-5735
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See
> {code}
> 72/541] Performing configure step for 'thrift_ep'
> FAILED: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure 
> cmd.exe /C "cd /D 
> C:\projects\arrow\cpp\build\thrift_ep-prefix\src\thrift_ep-build && 
> "C:\Program Files (x86)\CMake\bin\cmake.exe" 
> -DFLEX_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_flex.exe
>  
> -DBISON_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_bison.exe
>  -DZLIB_INCLUDE_DIR= -DWITH_SHARED_LIB=OFF -DWITH_PLUGIN=OFF -DZLIB_LIBRARY= 
> "-DCMAKE_C_COMPILER=C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe" 
> -DCMAKE_CXX_COMPILER=C:/Miniconda36-x64/Scripts/clcache.exe 
> -DCMAKE_BUILD_TYPE=RELEASE "-DCMAKE_C_FLAGS=/DWIN32 /D_WINDOWS /W3  /MD /O2 
> /Ob2 /DNDEBUG" "-DCMAKE_C_FLAGS_RELEASE=/DWIN32 /D_WINDOWS /W3  /MD /O2 /Ob2 
> /DNDEBUG" "-DCMAKE_CXX_FLAGS=/DWIN32 /D_WINDOWS  /GR /EHsc 
> /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
> "-DCMAKE_CXX_FLAGS_RELEASE=/DWIN32 /D_WINDOWS  /GR /EHsc 
> /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
> -DCMAKE_INSTALL_PREFIX=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install
>  
> -DCMAKE_INSTALL_RPATH=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install/lib
>  -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF 
> -DBUILD_TUTORIALS=OFF -DWITH_QT4=OFF -DWITH_C_GLIB=OFF -DWITH_JAVA=OFF 
> -DWITH_PYTHON=OFF -DWITH_HASKELL=OFF -DWITH_CPP=ON -DWITH_STATIC_LIB=ON 
> -DWITH_LIBEVENT=OFF -DWITH_MT=OFF -GNinja 
> C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep && "C:\Program 
> Files (x86)\CMake\bin\cmake.exe" -E touch 
> C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure"
> -- The C compiler identification is MSVC 19.16.27030.1
> -- The CXX compiler identification is MSVC 19.16.27030.1
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- 
> works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe
> -- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe -- 
> works
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> -- Parsed Thrift package version: 0.12.0
> -- Parsed Thrift version: 0.12.0 (0.2.0)
> -- Setting C++11 as the default language level.
> -- To specify a different C++ language level, set CMAKE_CXX_STANDARD
> CMake Warning (dev) at build/cmake/DefineOptions.cmake:63 (find_package):
>   Policy CMP0074 is not set: find_package uses _ROOT variables.
>   Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
>   command to set the policy and suppress this warning.
>   

[jira] [Resolved] (ARROW-5661) Support hash functions for decimal in Gandiva

2019-06-25 Thread Pindikura Ravindra (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pindikura Ravindra resolved ARROW-5661.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4618
[https://github.com/apache/arrow/pull/4618]

> Support hash functions for decimal in Gandiva
> -
>
> Key: ARROW-5661
> URL: https://issues.apache.org/jira/browse/ARROW-5661
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5735) [C++] Appveyor builds failing persistently in thrift_ep build

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5735:
--
Labels: pull-request-available  (was: )

> [C++] Appveyor builds failing persistently in thrift_ep build
> -
>
> Key: ARROW-5735
> URL: https://issues.apache.org/jira/browse/ARROW-5735
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> See
> {code}
> 72/541] Performing configure step for 'thrift_ep'
> FAILED: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure 
> cmd.exe /C "cd /D 
> C:\projects\arrow\cpp\build\thrift_ep-prefix\src\thrift_ep-build && 
> "C:\Program Files (x86)\CMake\bin\cmake.exe" 
> -DFLEX_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_flex.exe
>  
> -DBISON_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_bison.exe
>  -DZLIB_INCLUDE_DIR= -DWITH_SHARED_LIB=OFF -DWITH_PLUGIN=OFF -DZLIB_LIBRARY= 
> "-DCMAKE_C_COMPILER=C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe" 
> -DCMAKE_CXX_COMPILER=C:/Miniconda36-x64/Scripts/clcache.exe 
> -DCMAKE_BUILD_TYPE=RELEASE "-DCMAKE_C_FLAGS=/DWIN32 /D_WINDOWS /W3  /MD /O2 
> /Ob2 /DNDEBUG" "-DCMAKE_C_FLAGS_RELEASE=/DWIN32 /D_WINDOWS /W3  /MD /O2 /Ob2 
> /DNDEBUG" "-DCMAKE_CXX_FLAGS=/DWIN32 /D_WINDOWS  /GR /EHsc 
> /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
> "-DCMAKE_CXX_FLAGS_RELEASE=/DWIN32 /D_WINDOWS  /GR /EHsc 
> /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
> -DCMAKE_INSTALL_PREFIX=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install
>  
> -DCMAKE_INSTALL_RPATH=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install/lib
>  -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF 
> -DBUILD_TUTORIALS=OFF -DWITH_QT4=OFF -DWITH_C_GLIB=OFF -DWITH_JAVA=OFF 
> -DWITH_PYTHON=OFF -DWITH_HASKELL=OFF -DWITH_CPP=ON -DWITH_STATIC_LIB=ON 
> -DWITH_LIBEVENT=OFF -DWITH_MT=OFF -GNinja 
> C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep && "C:\Program 
> Files (x86)\CMake\bin\cmake.exe" -E touch 
> C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure"
> -- The C compiler identification is MSVC 19.16.27030.1
> -- The CXX compiler identification is MSVC 19.16.27030.1
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- 
> works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe
> -- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe -- 
> works
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> -- Parsed Thrift package version: 0.12.0
> -- Parsed Thrift version: 0.12.0 (0.2.0)
> -- Setting C++11 as the default language level.
> -- To specify a different C++ language level, set CMAKE_CXX_STANDARD
> CMake Warning (dev) at build/cmake/DefineOptions.cmake:63 (find_package):
>   Policy CMP0074 is not set: find_package uses _ROOT variables.
>   Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
>   command to set the policy and suppress this warning.
>   Environment variable Boost_ROOT is set to:
> C:\Miniconda36-x64\envs\arrow\Library
>   For compatibility, CMake is ignoring the variable.
> Call Stack (most recent call first):
>   CMakeLists.txt:52 (include)
> This warning is for project developers.  Use -Wno-dev to suppress it.
> -- Found Boost 1.70.0 at 
> C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/Boost-1.70.0
> --   Requested configuration: QUIET
> -- Found boost_headers 1.70.0 at 
> C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/boost_headers-1.70.0
> -- Boost 1.53 found.
> -- libevent NOT found.
> -- Could NOT find RUN_HASKELL (missing: RUN_HASKELL) 
> -- Could NOT find CABAL (missing: CABAL) 
> -- Looking for arpa/inet.h
> -- Looking for arpa/inet.h - not found
> -- Looking for fcntl.h
> -- Looking for fcntl.h - found
> -- Looking for getopt.h
> -- Looking for getopt.h - not found
> -- Looking for inttypes.h
> -- Looking for inttypes.h - found
> -- Looking for netdb.h
> -- Looking for netdb.h - not found
> -- Looking for netinet/in.h
> -- Looking for netinet/in.h - not 

[jira] [Commented] (ARROW-5735) [C++] Appveyor builds failing persistently in thrift_ep build

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872846#comment-16872846
 ] 

Wes McKinney commented on ARROW-5735:
-

For whatever reason, this seems to be mostly happening on the 
ApacheSoftwareFoundation Appveyor project but not on Appveyor builds on 
contributor forks. Perplexing

> [C++] Appveyor builds failing persistently in thrift_ep build
> -
>
> Key: ARROW-5735
> URL: https://issues.apache.org/jira/browse/ARROW-5735
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> See
> {code}
> 72/541] Performing configure step for 'thrift_ep'
> FAILED: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure 
> cmd.exe /C "cd /D 
> C:\projects\arrow\cpp\build\thrift_ep-prefix\src\thrift_ep-build && 
> "C:\Program Files (x86)\CMake\bin\cmake.exe" 
> -DFLEX_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_flex.exe
>  
> -DBISON_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_bison.exe
>  -DZLIB_INCLUDE_DIR= -DWITH_SHARED_LIB=OFF -DWITH_PLUGIN=OFF -DZLIB_LIBRARY= 
> "-DCMAKE_C_COMPILER=C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe" 
> -DCMAKE_CXX_COMPILER=C:/Miniconda36-x64/Scripts/clcache.exe 
> -DCMAKE_BUILD_TYPE=RELEASE "-DCMAKE_C_FLAGS=/DWIN32 /D_WINDOWS /W3  /MD /O2 
> /Ob2 /DNDEBUG" "-DCMAKE_C_FLAGS_RELEASE=/DWIN32 /D_WINDOWS /W3  /MD /O2 /Ob2 
> /DNDEBUG" "-DCMAKE_CXX_FLAGS=/DWIN32 /D_WINDOWS  /GR /EHsc 
> /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
> "-DCMAKE_CXX_FLAGS_RELEASE=/DWIN32 /D_WINDOWS  /GR /EHsc 
> /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
> -DCMAKE_INSTALL_PREFIX=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install
>  
> -DCMAKE_INSTALL_RPATH=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install/lib
>  -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF 
> -DBUILD_TUTORIALS=OFF -DWITH_QT4=OFF -DWITH_C_GLIB=OFF -DWITH_JAVA=OFF 
> -DWITH_PYTHON=OFF -DWITH_HASKELL=OFF -DWITH_CPP=ON -DWITH_STATIC_LIB=ON 
> -DWITH_LIBEVENT=OFF -DWITH_MT=OFF -GNinja 
> C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep && "C:\Program 
> Files (x86)\CMake\bin\cmake.exe" -E touch 
> C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure"
> -- The C compiler identification is MSVC 19.16.27030.1
> -- The CXX compiler identification is MSVC 19.16.27030.1
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
> -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- 
> works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe
> -- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe -- 
> works
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> -- Parsed Thrift package version: 0.12.0
> -- Parsed Thrift version: 0.12.0 (0.2.0)
> -- Setting C++11 as the default language level.
> -- To specify a different C++ language level, set CMAKE_CXX_STANDARD
> CMake Warning (dev) at build/cmake/DefineOptions.cmake:63 (find_package):
>   Policy CMP0074 is not set: find_package uses _ROOT variables.
>   Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
>   command to set the policy and suppress this warning.
>   Environment variable Boost_ROOT is set to:
> C:\Miniconda36-x64\envs\arrow\Library
>   For compatibility, CMake is ignoring the variable.
> Call Stack (most recent call first):
>   CMakeLists.txt:52 (include)
> This warning is for project developers.  Use -Wno-dev to suppress it.
> -- Found Boost 1.70.0 at 
> C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/Boost-1.70.0
> --   Requested configuration: QUIET
> -- Found boost_headers 1.70.0 at 
> C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/boost_headers-1.70.0
> -- Boost 1.53 found.
> -- libevent NOT found.
> -- Could NOT find RUN_HASKELL (missing: RUN_HASKELL) 
> -- Could NOT find CABAL (missing: CABAL) 
> -- Looking for arpa/inet.h
> -- Looking for arpa/inet.h - not found
> -- Looking for fcntl.h
> -- Looking for fcntl.h - found
> -- Looking for getopt.h
> -- Looking for getopt.h - not found
> -- Looking for inttypes.h
> -- Looking for inttypes.h - found
> -- Looking 

[jira] [Created] (ARROW-5735) [C++] Appveyor builds failing persistently in thrift_ep build

2019-06-25 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5735:
---

 Summary: [C++] Appveyor builds failing persistently in thrift_ep 
build
 Key: ARROW-5735
 URL: https://issues.apache.org/jira/browse/ARROW-5735
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.14.0


See

{code}
72/541] Performing configure step for 'thrift_ep'
FAILED: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure 
cmd.exe /C "cd /D 
C:\projects\arrow\cpp\build\thrift_ep-prefix\src\thrift_ep-build && "C:\Program 
Files (x86)\CMake\bin\cmake.exe" 
-DFLEX_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_flex.exe
 
-DBISON_EXECUTABLE=C:/projects/arrow/cpp/build/winflexbison_ep/src/winflexbison_ep-install/win_bison.exe
 -DZLIB_INCLUDE_DIR= -DWITH_SHARED_LIB=OFF -DWITH_PLUGIN=OFF -DZLIB_LIBRARY= 
"-DCMAKE_C_COMPILER=C:/Program Files (x86)/Microsoft Visual 
Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe" 
-DCMAKE_CXX_COMPILER=C:/Miniconda36-x64/Scripts/clcache.exe 
-DCMAKE_BUILD_TYPE=RELEASE "-DCMAKE_C_FLAGS=/DWIN32 /D_WINDOWS /W3  /MD /O2 
/Ob2 /DNDEBUG" "-DCMAKE_C_FLAGS_RELEASE=/DWIN32 /D_WINDOWS /W3  /MD /O2 /Ob2 
/DNDEBUG" "-DCMAKE_CXX_FLAGS=/DWIN32 /D_WINDOWS  /GR /EHsc 
/D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
"-DCMAKE_CXX_FLAGS_RELEASE=/DWIN32 /D_WINDOWS  /GR /EHsc 
/D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /MD /Od /UNDEBUG" 
-DCMAKE_INSTALL_PREFIX=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install
 
-DCMAKE_INSTALL_RPATH=C:/projects/arrow/cpp/build/thrift_ep/src/thrift_ep-install/lib
 -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF 
-DBUILD_TUTORIALS=OFF -DWITH_QT4=OFF -DWITH_C_GLIB=OFF -DWITH_JAVA=OFF 
-DWITH_PYTHON=OFF -DWITH_HASKELL=OFF -DWITH_CPP=ON -DWITH_STATIC_LIB=ON 
-DWITH_LIBEVENT=OFF -DWITH_MT=OFF -GNinja 
C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep && "C:\Program Files 
(x86)\CMake\bin\cmake.exe" -E touch 
C:/projects/arrow/cpp/build/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure"
-- The C compiler identification is MSVC 19.16.27030.1
-- The CXX compiler identification is MSVC 19.16.27030.1
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual 
Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe
-- Check for working CXX compiler: C:/Miniconda36-x64/Scripts/clcache.exe -- 
works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Parsed Thrift package version: 0.12.0
-- Parsed Thrift version: 0.12.0 (0.2.0)
-- Setting C++11 as the default language level.
-- To specify a different C++ language level, set CMAKE_CXX_STANDARD
CMake Warning (dev) at build/cmake/DefineOptions.cmake:63 (find_package):
  Policy CMP0074 is not set: find_package uses _ROOT variables.
  Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
  Environment variable Boost_ROOT is set to:
C:\Miniconda36-x64\envs\arrow\Library
  For compatibility, CMake is ignoring the variable.
Call Stack (most recent call first):
  CMakeLists.txt:52 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.
-- Found Boost 1.70.0 at 
C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/Boost-1.70.0
--   Requested configuration: QUIET
-- Found boost_headers 1.70.0 at 
C:/Miniconda36-x64/envs/arrow/Library/lib/cmake/boost_headers-1.70.0
-- Boost 1.53 found.
-- libevent NOT found.
-- Could NOT find RUN_HASKELL (missing: RUN_HASKELL) 
-- Could NOT find CABAL (missing: CABAL) 
-- Looking for arpa/inet.h
-- Looking for arpa/inet.h - not found
-- Looking for fcntl.h
-- Looking for fcntl.h - found
-- Looking for getopt.h
-- Looking for getopt.h - not found
-- Looking for inttypes.h
-- Looking for inttypes.h - found
-- Looking for netdb.h
-- Looking for netdb.h - not found
-- Looking for netinet/in.h
-- Looking for netinet/in.h - not found
-- Looking for signal.h
-- Looking for signal.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for unistd.h
-- Looking for unistd.h - not found
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Looking for sys/ioctl.h
-- Looking for sys/ioctl.h - not found
-- Looking for sys/param.h
-- Looking for sys/param.h - not found
-- Looking for sys/resource.h
-- Looking for sys/resource.h - not found
-- Looking for 

[jira] [Resolved] (ARROW-5702) [C++] parquet::arrow::FileReader::GetSchema()

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5702.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4668
[https://github.com/apache/arrow/pull/4668]

> [C++] parquet::arrow::FileReader::GetSchema()
> -
>
> Key: ARROW-5702
> URL: https://issues.apache.org/jira/browse/ARROW-5702
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Romain François
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The class has 
> {code:c++}
>   /// \brief Return arrow schema by apply selection of column indices.
>   /// \returns error status if passed wrong indices.
>   ::arrow::Status GetSchema(const std::vector& indices,
> std::shared_ptr<::arrow::Schema>* out);
> {code}
> but not a GetSchema() that would return the schema for all the columns, the 
> underlying implementation has it though. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5702) [C++] parquet::arrow::FileReader::GetSchema()

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5702:
---

Assignee: Romain François

> [C++] parquet::arrow::FileReader::GetSchema()
> -
>
> Key: ARROW-5702
> URL: https://issues.apache.org/jira/browse/ARROW-5702
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Romain François
>Assignee: Romain François
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The class has 
> {code:c++}
>   /// \brief Return arrow schema by apply selection of column indices.
>   /// \returns error status if passed wrong indices.
>   ::arrow::Status GetSchema(const std::vector& indices,
> std::shared_ptr<::arrow::Schema>* out);
> {code}
> but not a GetSchema() that would return the schema for all the columns, the 
> underlying implementation has it though. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5725) [Crossbow] Port conda recipes to azure pipelines

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872842#comment-16872842
 ] 

Wes McKinney commented on ARROW-5725:
-

[~kszucs] is this necessary for 0.14.0?

> [Crossbow] Port conda recipes to azure pipelines 
> -
>
> Key: ARROW-5725
> URL: https://issues.apache.org/jira/browse/ARROW-5725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Conda forge builds stopped working. CF is transitioning toward azure 
> pipelines so port the conda crossbow builds to azure as well, and update the 
> recipes (including gandiva).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5630) [Python] Table of nested arrays doesn't round trip

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872841#comment-16872841
 ] 

Wes McKinney commented on ARROW-5630:
-

We'll have to fix this in the next release, I'm not going to have time to debug

> [Python] Table of nested arrays doesn't round trip
> --
>
> Key: ARROW-5630
> URL: https://issues.apache.org/jira/browse/ARROW-5630
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: pyarrow 0.13, Windows 10
>Reporter: Philip Felton
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: parquet
> Fix For: 1.0.0
>
>
> This is pyarrow 0.13 on Windows.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> def make_table(num_rows):
> typ = pa.list_(pa.field("item", pa.float32(), False))
> return pa.Table.from_arrays([
> pa.array([[0] * (i%10) for i in range(0, num_rows)], type=typ),
> pa.array([[0] * ((i+5)%10) for i in range(0, num_rows)], type=typ)
> ], ['a', 'b'])
> pq.write_table(make_table(100), 'test.parquet')
> pq.read_table('test.parquet')
> {code}
> The last line throws the following exception:
> {noformat}
> ---
> ArrowInvalid  Traceback (most recent call last)
>  in 
> > 1 pq.read_table('full.parquet')
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read_table(source, 
> columns, use_threads, metadata, use_pandas_metadata, memory_map, filesystem)
>1150 return fs.read_parquet(path, columns=columns,
>1151use_threads=use_threads, 
> metadata=metadata,
> -> 1152
> use_pandas_metadata=use_pandas_metadata)
>1153 
>1154 pf = ParquetFile(source, metadata=metadata)
> ~\Anaconda3\lib\site-packages\pyarrow\filesystem.py in read_parquet(self, 
> path, columns, metadata, schema, use_threads, use_pandas_metadata)
> 179  filesystem=self)
> 180 return dataset.read(columns=columns, use_threads=use_threads,
> --> 181 use_pandas_metadata=use_pandas_metadata)
> 182 
> 183 def open(self, path, mode='rb'):
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
>1012 table = piece.read(columns=columns, 
> use_threads=use_threads,
>1013partitions=self.partitions,
> -> 1014
> use_pandas_metadata=use_pandas_metadata)
>1015 tables.append(table)
>1016 
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, partitions, open_file_func, file, use_pandas_metadata)
> 562 table = reader.read_row_group(self.row_group, **options)
> 563 else:
> --> 564 table = reader.read(**options)
> 565 
> 566 if len(self.partition_keys) > 0:
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
> 212 columns, use_pandas_metadata=use_pandas_metadata)
> 213 return self.reader.read_all(column_indices=column_indices,
> --> 214 use_threads=use_threads)
> 215 
> 216 def scan_contents(self, columns=None, batch_size=65536):
> ~\Anaconda3\lib\site-packages\pyarrow\_parquet.pyx in 
> pyarrow._parquet.ParquetReader.read_all()
> ~\Anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Column 1 named b expected length 932066 but got length 932063
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5630) [Python] Table of nested arrays doesn't round trip

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5630:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [Python] Table of nested arrays doesn't round trip
> --
>
> Key: ARROW-5630
> URL: https://issues.apache.org/jira/browse/ARROW-5630
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: pyarrow 0.13, Windows 10
>Reporter: Philip Felton
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: parquet
> Fix For: 1.0.0
>
>
> This is pyarrow 0.13 on Windows.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> def make_table(num_rows):
> typ = pa.list_(pa.field("item", pa.float32(), False))
> return pa.Table.from_arrays([
> pa.array([[0] * (i%10) for i in range(0, num_rows)], type=typ),
> pa.array([[0] * ((i+5)%10) for i in range(0, num_rows)], type=typ)
> ], ['a', 'b'])
> pq.write_table(make_table(100), 'test.parquet')
> pq.read_table('test.parquet')
> {code}
> The last line throws the following exception:
> {noformat}
> ---
> ArrowInvalid  Traceback (most recent call last)
>  in 
> > 1 pq.read_table('full.parquet')
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read_table(source, 
> columns, use_threads, metadata, use_pandas_metadata, memory_map, filesystem)
>1150 return fs.read_parquet(path, columns=columns,
>1151use_threads=use_threads, 
> metadata=metadata,
> -> 1152
> use_pandas_metadata=use_pandas_metadata)
>1153 
>1154 pf = ParquetFile(source, metadata=metadata)
> ~\Anaconda3\lib\site-packages\pyarrow\filesystem.py in read_parquet(self, 
> path, columns, metadata, schema, use_threads, use_pandas_metadata)
> 179  filesystem=self)
> 180 return dataset.read(columns=columns, use_threads=use_threads,
> --> 181 use_pandas_metadata=use_pandas_metadata)
> 182 
> 183 def open(self, path, mode='rb'):
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
>1012 table = piece.read(columns=columns, 
> use_threads=use_threads,
>1013partitions=self.partitions,
> -> 1014
> use_pandas_metadata=use_pandas_metadata)
>1015 tables.append(table)
>1016 
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, partitions, open_file_func, file, use_pandas_metadata)
> 562 table = reader.read_row_group(self.row_group, **options)
> 563 else:
> --> 564 table = reader.read(**options)
> 565 
> 566 if len(self.partition_keys) > 0:
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
> 212 columns, use_pandas_metadata=use_pandas_metadata)
> 213 return self.reader.read_all(column_indices=column_indices,
> --> 214 use_threads=use_threads)
> 215 
> 216 def scan_contents(self, columns=None, batch_size=65536):
> ~\Anaconda3\lib\site-packages\pyarrow\_parquet.pyx in 
> pyarrow._parquet.ParquetReader.read_all()
> ~\Anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Column 1 named b expected length 932066 but got length 932063
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5613) [Rust] Fail to compile with unrecognized platform-specific intrinsic function

2019-06-25 Thread Paddy Horan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872839#comment-16872839
 ] 

Paddy Horan commented on ARROW-5613:


[~csun] is this still an issue?

packed_simd should support 
[OSX|[https://github.com/rust-lang-nursery/packed_simd#platform-support].]  
Could you open an issue there?  I would do it myself but I don't have a Mac to 
be able to provide feedback to the maintainer, I'd just be the middle man.

> [Rust] Fail to compile with unrecognized platform-specific intrinsic function
> -
>
> Key: ARROW-5613
> URL: https://issues.apache.org/jira/browse/ARROW-5613
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Chao Sun
>Priority: Major
>
> I'm testing a project which depends on the Arrow crate. It failed with the 
> following error:
> {code}
> error[E0441]: unrecognized platform-specific intrinsic function: 
> `simd_bitmask`
>--> 
> /Users/sunchao/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/codegen/llvm.rs:100:5
> |
> 100 | crate fn simd_bitmask(value: T) -> U;
> | ^^^
> error: aborting due to previous error
> For more information about this error, try `rustc --explain E0441`.
> error: Could not compile `packed_simd`.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5734) [Python] Dispatch to Table.from_arrays from pyarrow.table factory function

2019-06-25 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5734:
---

 Summary: [Python] Dispatch to Table.from_arrays from pyarrow.table 
factory function
 Key: ARROW-5734
 URL: https://issues.apache.org/jira/browse/ARROW-5734
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 1.0.0


Follow up work, to ARROW-4847



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4847) [Python] Add pyarrow.table factory function that dispatches to various ctors based on type of input

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4847.
-
Resolution: Fixed

Issue resolved by pull request 4601
[https://github.com/apache/arrow/pull/4601]

> [Python] Add pyarrow.table factory function that dispatches to various ctors 
> based on type of input
> ---
>
> Key: ARROW-4847
> URL: https://issues.apache.org/jira/browse/ARROW-4847
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> For example, in {{pyarrow.table(df)}} if {{df}} is a {{pandas.DataFrame}}, 
> then table will dispatch to {{pa.Table.from_pandas}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5732) [C++] macOS builds failing idiosyncratically on master with warnings from pmmintrin.h

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872826#comment-16872826
 ] 

Wes McKinney commented on ARROW-5732:
-

Seems we are not alone, conda-forge seeing macOS related failures from related 
issues

https://gitter.im/conda-forge/conda-forge.github.io?at=5d12c0c92a120a0647bec9b9

> [C++] macOS builds failing idiosyncratically on master with warnings from 
> pmmintrin.h
> -
>
> Key: ARROW-5732
> URL: https://issues.apache.org/jira/browse/ARROW-5732
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> This just started happening today and doesn't seem to be related to code 
> changes
> https://travis-ci.org/apache/arrow/jobs/550459162#L3495



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5699) [C++] Optimize parsing of Decimal128 in CSV

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5699.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4669
[https://github.com/apache/arrow/pull/4669]

> [C++] Optimize parsing of Decimal128 in CSV
> ---
>
> Key: ARROW-5699
> URL: https://issues.apache.org/jira/browse/ARROW-5699
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> - Remove multiple string copies in Decimal.FromString()
> - Add unsafe append method to Decimal128Builder



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1279) [Integration][Java] Integration tests for Map type

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1279:

Summary: [Integration][Java] Integration tests for Map type  (was: 
Integration tests for Map type)

> [Integration][Java] Integration tests for Map type
> --
>
> Key: ARROW-1279
> URL: https://issues.apache.org/jira/browse/ARROW-1279
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Integration, Java
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5732) [C++] macOS builds failing idiosyncratically on master with warnings from pmmintrin.h

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872823#comment-16872823
 ] 

Wes McKinney commented on ARROW-5732:
-

Hypothetically the VMs could be missing sse4.2, we could run {{sysctl 
machdep.cpu.features}} to see

> [C++] macOS builds failing idiosyncratically on master with warnings from 
> pmmintrin.h
> -
>
> Key: ARROW-5732
> URL: https://issues.apache.org/jira/browse/ARROW-5732
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> This just started happening today and doesn't seem to be related to code 
> changes
> https://travis-ci.org/apache/arrow/jobs/550459162#L3495



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5727) [Python] [CI] Install pytest-faulthandler before running tests

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5727:
---

Assignee: Antoine Pitrou

> [Python] [CI] Install pytest-faulthandler before running tests
> --
>
> Key: ARROW-5727
> URL: https://issues.apache.org/jira/browse/ARROW-5727
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Affects Versions: 0.13.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The `faulthandler` module is able to dump a Python stack trace when the 
> process crashes. This can make some CI failures more palatable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5727) [Python] [CI] Install pytest-faulthandler before running tests

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5727.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4692
[https://github.com/apache/arrow/pull/4692]

> [Python] [CI] Install pytest-faulthandler before running tests
> --
>
> Key: ARROW-5727
> URL: https://issues.apache.org/jira/browse/ARROW-5727
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Affects Versions: 0.13.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The `faulthandler` module is able to dump a Python stack trace when the 
> process crashes. This can make some CI failures more palatable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5728) [Python] [CI] Travis-CI failures in test_jvm.py

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5728.
-
Resolution: Fixed

Issue resolved by pull request 4694
[https://github.com/apache/arrow/pull/4694]

> [Python] [CI] Travis-CI failures in test_jvm.py
> ---
>
> Key: ARROW-5728
> URL: https://issues.apache.org/jira/browse/ARROW-5728
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java, Python
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See https://travis-ci.org/apache/arrow/jobs/550245616
> [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5733) [Python] Utilize common toolchain build scripts between manylinux1, manylinux2010

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5733:

Fix Version/s: 1.0.0

> [Python] Utilize common toolchain build scripts between manylinux1, 
> manylinux2010
> -
>
> Key: ARROW-5733
> URL: https://issues.apache.org/jira/browse/ARROW-5733
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> Reduce code duplication by maintaining common dependency build scripts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5497) [Release] Build and publish R/Java/JS docs

2019-06-25 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5497:
---
Priority: Major  (was: Blocker)

> [Release] Build and publish R/Java/JS docs
> --
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, Java, JavaScript, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> Edit: this ticket was originally just about adding the R package docs, but it 
> seems that the JS and Java docs aren't getting built as part of the release 
> process anymore, so that needs to be fixed.
>  
> Original description:
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5257) [Website] Update site to use "official" Apache Arrow logo, add clearly marked links to logo

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5257:
--
Labels: pull-request-available  (was: )

> [Website] Update site to use "official" Apache Arrow logo, add clearly marked 
> links to logo
> ---
>
> Key: ARROW-5257
> URL: https://issues.apache.org/jira/browse/ARROW-5257
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
> Attachments: image-2019-05-03-16-19-05-041.png, 
> image-2019-05-08-13-15-53-694.png
>
>
> See logo at 
> https://docs.google.com/presentation/d/1qmvPpFU7sdm9l6A6LEyI0zIzswGtJW0Sbd_lfHLaXQs/edit#slide=id.g4258234456_0_1
> An unofficial logo lacking the "Apache" has been making the rounds on the 
> internet, so I think it would be a good idea to update our web properties 
> with the approved logo as discussed on the mailing list
> Whoever does this task -- please make sure to compress the PNG asset of the 
> logo prior to checking in to source control



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5666) [Python] Underscores in partition (string) values are dropped when reading dataset

2019-06-25 Thread Julian de Ruiter (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872692#comment-16872692
 ] 

Julian de Ruiter commented on ARROW-5666:
-

Ideally you would want to preserve the data type of the partition columns, but 
that's going to be hard to do properly if you also want to include other types 
such as dates, etc. Maybe it would be most consistent to just read partitioned 
columns as a categorical type and let the user handle the details, as this 
makes the code less magic than what happens now.

> [Python] Underscores in partition (string) values are dropped when reading 
> dataset
> --
>
> Key: ARROW-5666
> URL: https://issues.apache.org/jira/browse/ARROW-5666
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: Julian de Ruiter
>Priority: Major
>  Labels: parquet
>
> When reading a partitioned dataset, in which the partition column contains 
> string values with underscores, pyarrow seems to be ignoring the underscores 
> in the resulting values.
> For example if I write and then read a dataset as follows:
> {code:java}
> import pyarrow as pa
> import pandas as pd
> df = pd.DataFrame({
>     "year_week": ["2019_2", "2019_3"],
>     "value": [1, 2]
> })
> table = pa.Table.from_pandas(df.head())
> pq.write_to_dataset(table, 'test', partition_cols=["year_week"])
> table2 = pq.ParquetDataset('test').read()
> {code}
> The resulting 'year_week' column in table 2 has lost the underscores:
> {code:java}
> table2[1] # Gives:
>  indices=int32, ordered=0>)>
> [
>   -- dictionary:
>     [
>   20192,
>   20193
>     ]
>   -- indices:
>     [
>   0
>     ],
>   -- dictionary:
>     [
>   20192,
>   20193
>     ]
>   -- indices:
>     [
>   1
>     ]
> ]
> {code}
> Is this intentional behaviour or is this a bug in arrow?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5257) [Website] Update site to use "official" Apache Arrow logo, add clearly marked links to logo

2019-06-25 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872659#comment-16872659
 ] 

Neal Richardson commented on ARROW-5257:


Yeah, I'm planning to use white there and then put the black version on the 
powered-by page. Planning to do that today actually–sound good?

> [Website] Update site to use "official" Apache Arrow logo, add clearly marked 
> links to logo
> ---
>
> Key: ARROW-5257
> URL: https://issues.apache.org/jira/browse/ARROW-5257
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
> Attachments: image-2019-05-03-16-19-05-041.png, 
> image-2019-05-08-13-15-53-694.png
>
>
> See logo at 
> https://docs.google.com/presentation/d/1qmvPpFU7sdm9l6A6LEyI0zIzswGtJW0Sbd_lfHLaXQs/edit#slide=id.g4258234456_0_1
> An unofficial logo lacking the "Apache" has been making the rounds on the 
> internet, so I think it would be a good idea to update our web properties 
> with the approved logo as discussed on the mailing list
> Whoever does this task -- please make sure to compress the PNG asset of the 
> logo prior to checking in to source control



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5257) [Website] Update site to use "official" Apache Arrow logo, add clearly marked links to logo

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872656#comment-16872656
 ] 

Wes McKinney commented on ARROW-5257:
-

The white logo seems fine. Somewhere we need to give people an obvious path to 
obtaining the black official logo PNG asset, though

> [Website] Update site to use "official" Apache Arrow logo, add clearly marked 
> links to logo
> ---
>
> Key: ARROW-5257
> URL: https://issues.apache.org/jira/browse/ARROW-5257
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
> Attachments: image-2019-05-03-16-19-05-041.png, 
> image-2019-05-08-13-15-53-694.png
>
>
> See logo at 
> https://docs.google.com/presentation/d/1qmvPpFU7sdm9l6A6LEyI0zIzswGtJW0Sbd_lfHLaXQs/edit#slide=id.g4258234456_0_1
> An unofficial logo lacking the "Apache" has been making the rounds on the 
> internet, so I think it would be a good idea to update our web properties 
> with the approved logo as discussed on the mailing list
> Whoever does this task -- please make sure to compress the PNG asset of the 
> logo prior to checking in to source control



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5427) [Python] RangeIndex serialization change implications

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872654#comment-16872654
 ] 

Wes McKinney commented on ARROW-5427:
-

I can have a look at the patch

> [Python] RangeIndex serialization change implications
> -
>
> Key: ARROW-5427
> URL: https://issues.apache.org/jira/browse/ARROW-5427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: Joris Van den Bossche
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In 0.13, the conversion of a pandas DataFrame's RangeIndex changed: it is no 
> longer serialized as an actual column in the arrow table, but only saved as 
> metadata (in the pandas metadata) (ARROW-1639).
> This change lead to a couple of issues:
> - It can sometimes be unpredictable in pandas when you have a RangeIndex and 
> when not. Which means that the resulting schema in arrow can be somewhat 
> unexpected. See ARROW-5104: empty DataFrame has RangeIndex or not depending 
> on how it was created
> - The metadata is not always enough (or not updated) to reconstruct it when 
> the table has been modified / subsetted.  
>   For example, ARROW-5138: retrieving a single row group from parquet file 
> doesn't restore index properly (since the RangeIndex metadata was for the 
> full table, not this subset)
>   And another one, ARROW-5139: empty column selection no longer restores 
> index.
> I think we should decide if we either want to try to fix those (or give an 
> option to avoid those issues), or either close those as "won't fix".
> One idea I had that could potentially alleviate some of those issues:
> - Make it possible for the user to still force actual serialization of the 
> index, always, even if it is a RangeIndex.
> - To not introduce a new option, we could reuse the {{preserve_index}} 
> keyword: change the default to None (which means the current behaviour), and 
> change {{True}} to mean "always serialize" (although this is not fully 
> backwards compatible with 0.13.0 for those users who explicitly specified the 
> keyword).
> I am not sure this is worth the added complexity (although I personally like 
> providing the option where the index is simply always serialized as columns, 
> without surprises). But ideally we decide on it for 0.14, to either fix or 
> close the mentioned issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5630) [Python] Table of nested arrays doesn't round trip

2019-06-25 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-5630:
-

Assignee: Francois Saint-Jacques

> [Python] Table of nested arrays doesn't round trip
> --
>
> Key: ARROW-5630
> URL: https://issues.apache.org/jira/browse/ARROW-5630
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: pyarrow 0.13, Windows 10
>Reporter: Philip Felton
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
>
> This is pyarrow 0.13 on Windows.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> def make_table(num_rows):
> typ = pa.list_(pa.field("item", pa.float32(), False))
> return pa.Table.from_arrays([
> pa.array([[0] * (i%10) for i in range(0, num_rows)], type=typ),
> pa.array([[0] * ((i+5)%10) for i in range(0, num_rows)], type=typ)
> ], ['a', 'b'])
> pq.write_table(make_table(100), 'test.parquet')
> pq.read_table('test.parquet')
> {code}
> The last line throws the following exception:
> {noformat}
> ---
> ArrowInvalid  Traceback (most recent call last)
>  in 
> > 1 pq.read_table('full.parquet')
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read_table(source, 
> columns, use_threads, metadata, use_pandas_metadata, memory_map, filesystem)
>1150 return fs.read_parquet(path, columns=columns,
>1151use_threads=use_threads, 
> metadata=metadata,
> -> 1152
> use_pandas_metadata=use_pandas_metadata)
>1153 
>1154 pf = ParquetFile(source, metadata=metadata)
> ~\Anaconda3\lib\site-packages\pyarrow\filesystem.py in read_parquet(self, 
> path, columns, metadata, schema, use_threads, use_pandas_metadata)
> 179  filesystem=self)
> 180 return dataset.read(columns=columns, use_threads=use_threads,
> --> 181 use_pandas_metadata=use_pandas_metadata)
> 182 
> 183 def open(self, path, mode='rb'):
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
>1012 table = piece.read(columns=columns, 
> use_threads=use_threads,
>1013partitions=self.partitions,
> -> 1014
> use_pandas_metadata=use_pandas_metadata)
>1015 tables.append(table)
>1016 
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, partitions, open_file_func, file, use_pandas_metadata)
> 562 table = reader.read_row_group(self.row_group, **options)
> 563 else:
> --> 564 table = reader.read(**options)
> 565 
> 566 if len(self.partition_keys) > 0:
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
> 212 columns, use_pandas_metadata=use_pandas_metadata)
> 213 return self.reader.read_all(column_indices=column_indices,
> --> 214 use_threads=use_threads)
> 215 
> 216 def scan_contents(self, columns=None, batch_size=65536):
> ~\Anaconda3\lib\site-packages\pyarrow\_parquet.pyx in 
> pyarrow._parquet.ParquetReader.read_all()
> ~\Anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Column 1 named b expected length 932066 but got length 932063
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5728) [Python] [CI] Travis-CI failures in test_jvm.py

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5728:
--
Labels: pull-request-available  (was: )

> [Python] [CI] Travis-CI failures in test_jvm.py
> ---
>
> Key: ARROW-5728
> URL: https://issues.apache.org/jira/browse/ARROW-5728
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java, Python
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> See https://travis-ci.org/apache/arrow/jobs/550245616
> [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5731) [CI] Turbodbc integration tests are failing

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872642#comment-16872642
 ] 

Wes McKinney commented on ARROW-5731:
-

I agree that we should fix it. But I don't think we should block the release 
over it

> [CI] Turbodbc integration tests are failing 
> 
>
> Key: ARROW-5731
> URL: https://issues.apache.org/jira/browse/ARROW-5731
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> Have not investigated yet, build: 
> https://circleci.com/gh/ursa-labs/crossbow/383
> cc [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5728) [Python] [CI] Travis-CI failures in test_jvm.py

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872638#comment-16872638
 ] 

Wes McKinney commented on ARROW-5728:
-

I'll pin jpype until it can be diagnosed further

> [Python] [CI] Travis-CI failures in test_jvm.py
> ---
>
> Key: ARROW-5728
> URL: https://issues.apache.org/jira/browse/ARROW-5728
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java, Python
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> See https://travis-ci.org/apache/arrow/jobs/550245616
> [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5728) [Python] [CI] Travis-CI failures in test_jvm.py

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5728:
---

Assignee: Wes McKinney

> [Python] [CI] Travis-CI failures in test_jvm.py
> ---
>
> Key: ARROW-5728
> URL: https://issues.apache.org/jira/browse/ARROW-5728
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java, Python
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> See https://travis-ci.org/apache/arrow/jobs/550245616
> [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5427) [Python] RangeIndex serialization change implications

2019-06-25 Thread Francois Saint-Jacques (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872636#comment-16872636
 ] 

Francois Saint-Jacques commented on ARROW-5427:
---

I think this is related to the dask failure ARROW-5730 . Once I apply this 
patch, it reduces the failures from 9 to 1.

> [Python] RangeIndex serialization change implications
> -
>
> Key: ARROW-5427
> URL: https://issues.apache.org/jira/browse/ARROW-5427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: Joris Van den Bossche
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In 0.13, the conversion of a pandas DataFrame's RangeIndex changed: it is no 
> longer serialized as an actual column in the arrow table, but only saved as 
> metadata (in the pandas metadata) (ARROW-1639).
> This change lead to a couple of issues:
> - It can sometimes be unpredictable in pandas when you have a RangeIndex and 
> when not. Which means that the resulting schema in arrow can be somewhat 
> unexpected. See ARROW-5104: empty DataFrame has RangeIndex or not depending 
> on how it was created
> - The metadata is not always enough (or not updated) to reconstruct it when 
> the table has been modified / subsetted.  
>   For example, ARROW-5138: retrieving a single row group from parquet file 
> doesn't restore index properly (since the RangeIndex metadata was for the 
> full table, not this subset)
>   And another one, ARROW-5139: empty column selection no longer restores 
> index.
> I think we should decide if we either want to try to fix those (or give an 
> option to avoid those issues), or either close those as "won't fix".
> One idea I had that could potentially alleviate some of those issues:
> - Make it possible for the user to still force actual serialization of the 
> index, always, even if it is a RangeIndex.
> - To not introduce a new option, we could reuse the {{preserve_index}} 
> keyword: change the default to None (which means the current behaviour), and 
> change {{True}} to mean "always serialize" (although this is not fully 
> backwards compatible with 0.13.0 for those users who explicitly specified the 
> keyword).
> I am not sure this is worth the added complexity (although I personally like 
> providing the option where the index is simply always serialized as columns, 
> without surprises). But ideally we decide on it for 0.14, to either fix or 
> close the mentioned issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5690) [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5690.
-
Resolution: Fixed

Issue resolved by pull request 4685
[https://github.com/apache/arrow/pull/4685]

> [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing
> -
>
> Key: ARROW-5690
> URL: https://issues.apache.org/jira/browse/ARROW-5690
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If I build macOS arrow wheels with crossbow from the latest master 
> (a77257f4790c562dcb74724fc4a22c157ab36018) and install them, importing 
> pyarrow gives the following error message:
> {code:java}
> In [1]: import pyarrow                                                        
>                                                                               
>                          
> ---
> ImportError                               Traceback (most recent call last)
>  in 
> > 1 import pyarrow
> ~/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py in 
>      47 import pyarrow.compat as compat
>      48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>      50 from pyarrow.lib import (null, bool_,
>      51                          int8, int16, int32, int64,
> ImportError: 
> dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: /usr/local/opt/protobuf/lib/libprotobuf.18.dylib
>   Referenced from: 
> /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.14.dylib
>   Reason: image not found{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4453) [Python] Create Cython wrappers for SparseTensor

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4453:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [Python] Create Cython wrappers for SparseTensor
> 
>
> Key: ARROW-4453
> URL: https://issues.apache.org/jira/browse/ARROW-4453
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Rok Mihevc
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We should have cython wrappers for [https://github.com/apache/arrow/pull/2546]
> This is related to support for 
> https://issues.apache.org/jira/browse/ARROW-4223 and 
> https://issues.apache.org/jira/browse/ARROW-4224
> I imagine the code would be similar to 
> https://github.com/apache/arrow/blob/5a502d281545402240e818d5fd97a9aaf36363f2/python/pyarrow/array.pxi#L748



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5723) [Gandiva][Crossbow] Builds failing

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5723:

Component/s: C++ - Gandiva

> [Gandiva][Crossbow] Builds failing
> --
>
> Key: ARROW-5723
> URL: https://issues.apache.org/jira/browse/ARROW-5723
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Latest builds are failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5723) [Gandiva][Crossbow] Builds failing

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5723.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4688
[https://github.com/apache/arrow/pull/4688]

> [Gandiva][Crossbow] Builds failing
> --
>
> Key: ARROW-5723
> URL: https://issues.apache.org/jira/browse/ARROW-5723
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Latest builds are failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5710) [C++] Allow compiling Gandiva with Ninja on Windows

2019-06-25 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved ARROW-5710.
---
Resolution: Fixed

Issue resolved by pull request 4679
[https://github.com/apache/arrow/pull/4679]

> [C++] Allow compiling Gandiva with Ninja on Windows
> ---
>
> Key: ARROW-5710
> URL: https://issues.apache.org/jira/browse/ARROW-5710
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva
>Affects Versions: 0.13.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1279) Integration tests for Map type

2019-06-25 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872595#comment-16872595
 ] 

Bryan Cutler commented on ARROW-1279:
-

Moved to 0.14.0 as it would be good to get in the release since we have already 
added MapType for C++ and Java. This is not a blocker though.

> Integration tests for Map type
> --
>
> Key: ARROW-1279
> URL: https://issues.apache.org/jira/browse/ARROW-1279
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Integration, Java
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1279) Integration tests for Map type

2019-06-25 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-1279:

Fix Version/s: (was: 1.0.0)
   0.14.0

> Integration tests for Map type
> --
>
> Key: ARROW-1279
> URL: https://issues.apache.org/jira/browse/ARROW-1279
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Integration, Java
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5695) [C#][Release] Run sourcelink test in verify-release-candidate.sh

2019-06-25 Thread Yosuke Shiro (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872593#comment-16872593
 ] 

Yosuke Shiro commented on ARROW-5695:
-

[~kou]

I'll add `test_package_csharp()` in verify-release-candidate.sh.
Is my understanding correct?

 
{code:java}
test_package_csharp() {
  pushd csharp

  dotnet test
  mv dummy.git ../.git
  dotnet pack -c Release
  mv ../.git dummy.git
  ~/.dotnet/tools/sourcelink test 
artifacts/Apache.Arrow/Release/netstandard1.3/Apache.Arrow.pdb
  ~/.dotnet/tools/sourcelink test 
artifacts/Apache.Arrow/Release/netcoreapp2.1/Apache.Arrow.pdb

  popd
}
{code}
 

 

> [C#][Release] Run sourcelink test in verify-release-candidate.sh
> 
>
> Key: ARROW-5695
> URL: https://issues.apache.org/jira/browse/ARROW-5695
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Reporter: Yosuke Shiro
>Assignee: Yosuke Shiro
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Follow up of 
> https://github.com/apache/arrow/pull/4488#pullrequestreview-253110667.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5492) [R] Add "col_select" argument to read_* functions to read subset of columns

2019-06-25 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5492.
---
Resolution: Fixed

Issue resolved by pull request 4627
[https://github.com/apache/arrow/pull/4627]

> [R] Add "col_select" argument to read_* functions to read subset of columns 
> 
>
> Key: ARROW-5492
> URL: https://issues.apache.org/jira/browse/ARROW-5492
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> read_feather, read_parquet, read_csv_arrow (and read_json, when it exists) 
> should take a `col_select` argument, following the model of 
> [vroom|http://vroom.r-lib.org/articles/vroom.html#column-selection] (readr 
> and base R file readers also support this feature, just much more awkwardly).
> Currently, read_feather has a "columns" argument and none of the other 
> readers expose it. Parquet can certainly support it; cf. 
> {{pyarrow.parquet.read_table.}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5730) [CI] Dask integration tests are failing

2019-06-25 Thread Francois Saint-Jacques (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872580#comment-16872580
 ] 

Francois Saint-Jacques commented on ARROW-5730:
---

Locally I'm getting:
{code:bash}
___
 ERROR collecting test session 

opt/conda/lib/python3.6/site-packages/_pytest/config/__init__.py:440: in 
_importconftest
return self._conftestpath2mod[conftestpath]
E   KeyError: 
local('/opt/conda/lib/python3.6/site-packages/pyarrow/tests/conftest.py')

During handling of the above exception, another exception occurred:
opt/conda/lib/python3.6/site-packages/_pytest/config/__init__.py:446: in 
_importconftest
mod = conftestpath.pyimport()
opt/conda/lib/python3.6/site-packages/py/_path/local.py:701: in pyimport
__import__(modname)
opt/conda/lib/python3.6/site-packages/pyarrow/__init__.py:49: in 
from pyarrow.lib import cpu_count, set_cpu_count
E   ImportError: 
/opt/conda/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so:
 undefined symbol: 
_ZN5arrow2py14InferArrowTypeEP7_objectbPSt10shared_ptrINS_8DataTypeEE
{code}
 

 

> [CI] Dask integration tests are failing
> ---
>
> Key: ARROW-5730
> URL: https://issues.apache.org/jira/browse/ARROW-5730
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>
> Have not investigated yet, build: 
> https://circleci.com/gh/ursa-labs/crossbow/387



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5731) [CI] Turbodbc integration tests are failing

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5731:

Fix Version/s: (was: 0.14.0)
   1.0.0

> [CI] Turbodbc integration tests are failing 
> 
>
> Key: ARROW-5731
> URL: https://issues.apache.org/jira/browse/ARROW-5731
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> Have not investigated yet, build: 
> https://circleci.com/gh/ursa-labs/crossbow/383
> cc [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5731) [CI] Turbodbc integration tests are failing

2019-06-25 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872570#comment-16872570
 ] 

Wes McKinney commented on ARROW-5731:
-

Non-critical for 0.14.0

> [CI] Turbodbc integration tests are failing 
> 
>
> Key: ARROW-5731
> URL: https://issues.apache.org/jira/browse/ARROW-5731
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 1.0.0
>
>
> Have not investigated yet, build: 
> https://circleci.com/gh/ursa-labs/crossbow/383
> cc [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5166) [Python][Parquet] Statistics for uint64 columns may overflow

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5166.
-
Resolution: Fixed

Fixed in 
https://github.com/apache/arrow/commit/74841f5bfcfdeab7d90fda9e469a92aef02a23f3

> [Python][Parquet] Statistics for uint64 columns may overflow
> 
>
> Key: ARROW-5166
> URL: https://issues.apache.org/jira/browse/ARROW-5166
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: python 3.6
> pyarrow 0.13.0
>Reporter: Marco Neumann
>Assignee: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
> Attachments: int64_statistics_overflow.parquet
>
>
> See the attached parquet file, where the statistics max value is smaller than 
> the min value.
> You can roundtrip that file through pandas and store it back to provoke the 
> same bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4139) [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is set

2019-06-25 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4139.
-
Resolution: Fixed

Issue resolved by pull request 4680
[https://github.com/apache/arrow/pull/4680]

> [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is 
> set
> ---
>
> Key: ARROW-4139
> URL: https://issues.apache.org/jira/browse/ARROW-4139
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Matthew Rocklin
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: parquet, pull-request-available, python
> Fix For: 0.14.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When writing Pandas data to Parquet format and reading it back again I find 
> that that statistics of text columns are stored as byte arrays rather than as 
> unicode text. 
> I'm not sure if this is a bug in Arrow, PyArrow, or just in my understanding 
> of how best to manage statistics.  (I'd be quite happy to learn that it was 
> the latter).
> Here is a minimal example
> {code:python}
> import pandas as pd
> df = pd.DataFrame({'x': ['a']})
> df.to_parquet('df.parquet')
> import pyarrow.parquet as pq
> pf = pq.ParquetDataset('df.parquet')
> piece = pf.pieces[0]
> rg = piece.row_group(0)
> md = piece.get_metadata(pq.ParquetFile)
> rg = md.row_group(0)
> c = rg.column(0)
> >>> c
> 
>   file_offset: 63
>   file_path: 
>   physical_type: BYTE_ARRAY
>   num_values: 1
>   path_in_schema: x
>   is_stats_set: True
>   statistics:
> 
>   has_min_max: True
>   min: b'a'
>   max: b'a'
>   null_count: 0
>   distinct_count: 0
>   num_values: 1
>   physical_type: BYTE_ARRAY
>   compression: SNAPPY
>   encodings: ('PLAIN_DICTIONARY', 'PLAIN', 'RLE')
>   has_dictionary_page: True
>   dictionary_page_offset: 4
>   data_page_offset: 25
>   total_compressed_size: 59
>   total_uncompressed_size: 55
> >>> type(c.statistics.min)
> bytes
> {code}
> My guess is that we would want to store a logical type in the statistics like 
> UNICODE, though I don't have enough experience with Parquet data types to 
> know if this is a good idea or possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5500) [R] read_csv_arrow() signature should match readr::read_csv()

2019-06-25 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5500:
--

Assignee: Neal Richardson

> [R] read_csv_arrow() signature should match readr::read_csv()
> -
>
> Key: ARROW-5500
> URL: https://issues.apache.org/jira/browse/ARROW-5500
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> So that using it is natural for R users. Internally handle all of the logic 
> needed to map those onto csv_convert_options, csv_read_options, and 
> csv_parse_options. And give a useful error message if a user requests a 
> setting that readr supports but arrow does not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5718) [R] Add as_record_batch()

2019-06-25 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872497#comment-16872497
 ] 

Neal Richardson commented on ARROW-5718:


Makes sense to me.

> [R] Add as_record_batch()
> -
>
> Key: ARROW-5718
> URL: https://issues.apache.org/jira/browse/ARROW-5718
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Minor
> Fix For: 0.14.0
>
>
> ARROW-3814 / 
> [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
>  changed the API of `record_batch()` and `arrow::table()` such that you could 
> no longer pass in a data.frame to the function, not without [massaging it 
> yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
>  That broke sparklyr integration tests with an opaque `cannot infer type from 
> data` error, and it's unfortunate that there's no longer a direct way to go 
> from a data.frame to a record batch, which sounds like a common need.
> In order to follow best practices (cf. the 
> [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) 
> add an {{as_record_batch}} function, which the data.frame method is probably 
> just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and 
> (2) if a user supplies a single, unnamed data.frame as the argument to 
> {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
> may later decide that we should automatically call as_record_batch(), but in 
> case that is too magical and prevents some legitimate use case, let's hold 
> off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
> and if that function doesn't exist, fall back to {{record_batch}} (because 
> that means it has an older released version of arrow that doesn't have 
> as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5718) [R] Add as_record_batch()

2019-06-25 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5718:
--

Assignee: Romain François

> [R] Add as_record_batch()
> -
>
> Key: ARROW-5718
> URL: https://issues.apache.org/jira/browse/ARROW-5718
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Minor
> Fix For: 0.14.0
>
>
> ARROW-3814 / 
> [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
>  changed the API of `record_batch()` and `arrow::table()` such that you could 
> no longer pass in a data.frame to the function, not without [massaging it 
> yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
>  That broke sparklyr integration tests with an opaque `cannot infer type from 
> data` error, and it's unfortunate that there's no longer a direct way to go 
> from a data.frame to a record batch, which sounds like a common need.
> In order to follow best practices (cf. the 
> [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) 
> add an {{as_record_batch}} function, which the data.frame method is probably 
> just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and 
> (2) if a user supplies a single, unnamed data.frame as the argument to 
> {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
> may later decide that we should automatically call as_record_batch(), but in 
> case that is too magical and prevents some legitimate use case, let's hold 
> off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
> and if that function doesn't exist, fall back to {{record_batch}} (because 
> that means it has an older released version of arrow that doesn't have 
> as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5731) [CI] Turbodbc integration tests are failing

2019-06-25 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5731:
--

 Summary: [CI] Turbodbc integration tests are failing 
 Key: ARROW-5731
 URL: https://issues.apache.org/jira/browse/ARROW-5731
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Krisztian Szucs
 Fix For: 0.14.0


Have not investigated yet, build: https://circleci.com/gh/ursa-labs/crossbow/383

cc [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5730) [CI] Dask integration tests are failing

2019-06-25 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5730:
--

 Summary: [CI] Dask integration tests are failing
 Key: ARROW-5730
 URL: https://issues.apache.org/jira/browse/ARROW-5730
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Krisztian Szucs
 Fix For: 0.14.0


Have not investigated yet, build: https://circleci.com/gh/ursa-labs/crossbow/387



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5724) [R] [CI] AppVeyor build should use ccache

2019-06-25 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5724:
--

Assignee: Neal Richardson

> [R] [CI] AppVeyor build should use ccache
> -
>
> Key: ARROW-5724
> URL: https://issues.apache.org/jira/browse/ARROW-5724
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Antoine Pitrou
>Assignee: Neal Richardson
>Priority: Major
>
> It looks like ccache is not installed for the R AppVeyor build. [~npr] [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5720) [C++] Create benchmarks for decimal related classes.

2019-06-25 Thread Micah Kornfield (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-5720.

Resolution: Duplicate

Will handle as part of the parent issue.

> [C++] Create benchmarks for decimal related classes.
> 
>
> Key: ARROW-5720
> URL: https://issues.apache.org/jira/browse/ARROW-5720
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5727) [Python] [CI] Install pytest-faulthandler before running tests

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5727:
--
Labels: pull-request-available  (was: )

> [Python] [CI] Install pytest-faulthandler before running tests
> --
>
> Key: ARROW-5727
> URL: https://issues.apache.org/jira/browse/ARROW-5727
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Affects Versions: 0.13.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The `faulthandler` module is able to dump a Python stack trace when the 
> process crashes. This can make some CI failures more palatable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-5729) [Python][Java] ArrowType.Int object has no attribute 'isSigned'

2019-06-25 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-5729.
-
Resolution: Duplicate

> [Python][Java] ArrowType.Int object has no attribute 'isSigned'
> ---
>
> Key: ARROW-5729
> URL: https://issues.apache.org/jira/browse/ARROW-5729
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Python
>Reporter: Krisztian Szucs
>Priority: Blocker
> Fix For: 0.14.0
>
>
> This failure has recently occured on the master and on other PRs 
> https://travis-ci.org/apache/arrow/jobs/550245616
> Seemingly there isn't any related changes which could cause it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5714) [JS] Inconsistent behavior in Int64Builder with/without BigNum

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5714:
--
Labels: pull-request-available  (was: )

> [JS] Inconsistent behavior in Int64Builder with/without BigNum
> --
>
> Key: ARROW-5714
> URL: https://issues.apache.org/jira/browse/ARROW-5714
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When the Int64Builder is used in a context without BigNum, appending two 
> numbers combines them into a single Int64:
> {code}
> > v = Arrow.Builder.new({type: new 
> > Arrow.Int64()}).append(1).append(2).finish().toVector()
> > v.get(0)
> Int32Array [ 1, 2 ]
> {code}
> Whereas the same process with BigNum creates two new Int64s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5729) [Python][Java] ArrowType.Int object has no attribute 'isSigned'

2019-06-25 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5729:
--

 Summary: [Python][Java] ArrowType.Int object has no attribute 
'isSigned'
 Key: ARROW-5729
 URL: https://issues.apache.org/jira/browse/ARROW-5729
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java, Python
Reporter: Krisztian Szucs
 Fix For: 0.14.0


This failure has recently occured on the master and on other PRs 
https://travis-ci.org/apache/arrow/jobs/550245616
Seemingly there isn't any related changes which could cause it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5728) [Python] [CI] Travis-CI failures in test_jvm.py

2019-06-25 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872408#comment-16872408
 ] 

Antoine Pitrou commented on ARROW-5728:
---

It seems this occurred at the 0.6.3 -> 0.7.0 JPype1 upgrade.

> [Python] [CI] Travis-CI failures in test_jvm.py
> ---
>
> Key: ARROW-5728
> URL: https://issues.apache.org/jira/browse/ARROW-5728
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java, Python
>Reporter: Antoine Pitrou
>Priority: Blocker
>
> See https://travis-ci.org/apache/arrow/jobs/550245616
> [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5728) [Python] [CI] Travis-CI failures in test_jvm.py

2019-06-25 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5728:
--
Summary: [Python] [CI] Travis-CI failures in test_jvm.py  (was: [Python] 
[Packaging] manylinux1 failures in test_jvm.py)

> [Python] [CI] Travis-CI failures in test_jvm.py
> ---
>
> Key: ARROW-5728
> URL: https://issues.apache.org/jira/browse/ARROW-5728
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Packaging, Python
>Reporter: Antoine Pitrou
>Priority: Blocker
>
> See https://travis-ci.org/apache/arrow/jobs/550245616
> [~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2461) [Python] Build wheels for manylinux2010 tag

2019-06-25 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-2461.

   Resolution: Fixed
Fix Version/s: (was: 0.15.0)
   0.14.0

Issue resolved by pull request 4675
[https://github.com/apache/arrow/pull/4675]

> [Python] Build wheels for manylinux2010 tag
> ---
>
> Key: ARROW-2461
> URL: https://issues.apache.org/jira/browse/ARROW-2461
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Antoine Pitrou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> There is now work in progress on an updated manylinux tag based on CentOS6. 
> We should provide wheels for this tag and the old {{manylinux1}} tag for one 
> release and then switch to the new tag in the release afterwards. This should 
> enable us also to raise the minimum compiler requirement to gcc 4.9 (or 
> higher once conda-forge has migrated to a newer compiler).
> The relevant PEP is https://www.python.org/dev/peps/pep-0571/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5728) [Python] [Packaging] manylinux1 failures in test_jvm.py

2019-06-25 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5728:
-

 Summary: [Python] [Packaging] manylinux1 failures in test_jvm.py
 Key: ARROW-5728
 URL: https://issues.apache.org/jira/browse/ARROW-5728
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java, Packaging, Python
Reporter: Antoine Pitrou


See https://travis-ci.org/apache/arrow/jobs/550245616

[~xhochy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5727) [Python] [CI] Install pytest-faulthandler before running tests

2019-06-25 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5727:
-

 Summary: [Python] [CI] Install pytest-faulthandler before running 
tests
 Key: ARROW-5727
 URL: https://issues.apache.org/jira/browse/ARROW-5727
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Python
Affects Versions: 0.13.0
Reporter: Antoine Pitrou


The `faulthandler` module is able to dump a Python stack trace when the process 
crashes. This can make some CI failures more palatable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5328) [R] Add shell scripts to do a full package rebuild and test locally

2019-06-25 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-5328.

Resolution: Fixed

> [R] Add shell scripts to do a full package rebuild and test locally
> ---
>
> Key: ARROW-5328
> URL: https://issues.apache.org/jira/browse/ARROW-5328
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
>  Labels: release
> Fix For: 1.0.0
>
>
> The R package development instructions in 
> https://github.com/apache/arrow/blob/master/r/README.Rmd expect that the 
> developer is working in a particular R-console-centric way, perhaps within 
> RStudio or similar. I think we should have scripts that enable development to 
> be performed entirely on the command line. This would probably already exist, 
> except that our Travis-CI setup is relying on some non-Arrow-specific scripts 
> that live outside of this repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5328) [R] Add shell scripts to do a full package rebuild and test locally

2019-06-25 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5328:
---
Fix Version/s: (was: 1.0.0)
   0.14.0

> [R] Add shell scripts to do a full package rebuild and test locally
> ---
>
> Key: ARROW-5328
> URL: https://issues.apache.org/jira/browse/ARROW-5328
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
>  Labels: release
> Fix For: 0.14.0
>
>
> The R package development instructions in 
> https://github.com/apache/arrow/blob/master/r/README.Rmd expect that the 
> developer is working in a particular R-console-centric way, perhaps within 
> RStudio or similar. I think we should have scripts that enable development to 
> be performed entirely on the command line. This would probably already exist, 
> except that our Travis-CI setup is relying on some non-Arrow-specific scripts 
> that live outside of this repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (ARROW-5328) [R] Add shell scripts to do a full package rebuild and test locally

2019-06-25 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reopened ARROW-5328:


> [R] Add shell scripts to do a full package rebuild and test locally
> ---
>
> Key: ARROW-5328
> URL: https://issues.apache.org/jira/browse/ARROW-5328
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
>  Labels: release
> Fix For: 1.0.0
>
>
> The R package development instructions in 
> https://github.com/apache/arrow/blob/master/r/README.Rmd expect that the 
> developer is working in a particular R-console-centric way, perhaps within 
> RStudio or similar. I think we should have scripts that enable development to 
> be performed entirely on the command line. This would probably already exist, 
> except that our Travis-CI setup is relying on some non-Arrow-specific scripts 
> that live outside of this repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5328) [R] Add shell scripts to do a full package rebuild and test locally

2019-06-25 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-5328.

Resolution: Fixed

> [R] Add shell scripts to do a full package rebuild and test locally
> ---
>
> Key: ARROW-5328
> URL: https://issues.apache.org/jira/browse/ARROW-5328
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
>  Labels: release
> Fix For: 0.14.0
>
>
> The R package development instructions in 
> https://github.com/apache/arrow/blob/master/r/README.Rmd expect that the 
> developer is working in a particular R-console-centric way, perhaps within 
> RStudio or similar. I think we should have scripts that enable development to 
> be performed entirely on the command line. This would probably already exist, 
> except that our Travis-CI setup is relying on some non-Arrow-specific scripts 
> that live outside of this repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3811) [R] struct arrays inference

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3811:
--
Labels: pull-request-available  (was: )

> [R] struct arrays inference
> ---
>
> Key: ARROW-3811
> URL: https://issues.apache.org/jira/browse/ARROW-3811
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5726) [Java] Implement a common interface for int vectors

2019-06-25 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5726:
-

 Summary: [Java] Implement a common interface for int vectors
 Key: ARROW-5726
 URL: https://issues.apache.org/jira/browse/ARROW-5726
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Now in _DictionaryEncoder#encode_ it use reflection to pull out the set method 
and then set values. 

Set values by reflection is not efficient and code structure is not elegant 
such as

_Method setter = null;_
_for (Class c : Arrays.asList(int.class, long.class)) {_
 _try {_
 _setter = indices.getClass().getMethod("setSafe", int.class, c);_
 _break;_
 _} catch (NoSuchMethodException e) {_
 _// ignore_
 _}_
_}_

Implement a common interface for int vectors to directly get set method and set 
values seems a good choice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5725) [Crossbow] Port conda recipes to azure pipelines

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5725:
--
Labels: pull-request-available  (was: )

> [Crossbow] Port conda recipes to azure pipelines 
> -
>
> Key: ARROW-5725
> URL: https://issues.apache.org/jira/browse/ARROW-5725
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Conda forge builds stopped working. CF is transitioning toward azure 
> pipelines so port the conda crossbow builds to azure as well, and update the 
> recipes (including gandiva).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5653) [CI] Fix cpp docker image

2019-06-25 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved ARROW-5653.
---
Resolution: Duplicate

> [CI] Fix cpp docker image
> -
>
> Key: ARROW-5653
> URL: https://issues.apache.org/jira/browse/ARROW-5653
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:bash}
> make -f Makefile.docker run-cpp
> ...
> 54/64 Test #79: arrow-dataset-file_test ***Failed0.04 sec
> Running arrow-dataset-file_test, redirecting output into 
> /build/cpp/build/test-logs/arrow-dataset-file_test.txt (attempt 1/1)
> /build/cpp/debug/arrow-dataset-file_test: error while loading shared 
> libraries: libbrotlienc.so.1: cannot open shared object file: No such file or 
> directory
> /build/cpp/src/arrow/dataset
>   Start 80: arrow-flight-test
> 55/64 Test #80: arrow-flight-test ..***Failed0.04 sec
> Running arrow-flight-test, redirecting output into 
> /build/cpp/build/test-logs/arrow-flight-test.txt (attempt 1/1)
> /build/cpp/debug/arrow-flight-t
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3811) [R] struct arrays inference

2019-06-25 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/ARROW-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain François reassigned ARROW-3811:
--

Assignee: Romain François

> [R] struct arrays inference
> ---
>
> Key: ARROW-3811
> URL: https://issues.apache.org/jira/browse/ARROW-3811
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2136) [Python] Non-nullable schema fields not checked in conversions from pandas

2019-06-25 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-2136.

Resolution: Fixed

Issue resolved by pull request 4683
[https://github.com/apache/arrow/pull/4683]

> [Python] Non-nullable schema fields not checked in conversions from pandas
> --
>
> Key: ARROW-2136
> URL: https://issues.apache.org/jira/browse/ARROW-2136
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Matthew Gilbert
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If you provide a schema with {{nullable=False}} but pass a {{DataFrame}} 
> which in fact has nulls it appears the schema is ignored? I would expect an 
> error here.
> {code}
> import pyarrow as pa
> import pandas as pd
> df = pd.DataFrame({"a":[1.2, 2.1, pd.np.NaN]})
> schema = pa.schema([pa.field("a", pa.float64(), nullable=False)])
> table = pa.Table.from_pandas(df, schema=schema)
> table[0]
> 
> chunk 0: 
> [
>   1.2,
>   2.1,
>   NA
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5718) [R] Add as_record_batch()

2019-06-25 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872287#comment-16872287
 ] 

Romain François commented on ARROW-5718:


I think it's fine if we were to auto splice, i.e.: 

{code:r}
record_batch(mtcars)
{code}

would be the same as 

{code:r}
record_batch(!!!mtcars)
{code}

because unnamed, this is the direction we'll take in. dplyr too for e.g. 
summarise and mutate. 

However, something like : 

{code:r}
record_batch(x = mtcars)
{code}

will create a struct array, aka a data frame column. 


> [R] Add as_record_batch()
> -
>
> Key: ARROW-5718
> URL: https://issues.apache.org/jira/browse/ARROW-5718
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Minor
> Fix For: 0.14.0
>
>
> ARROW-3814 / 
> [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
>  changed the API of `record_batch()` and `arrow::table()` such that you could 
> no longer pass in a data.frame to the function, not without [massaging it 
> yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
>  That broke sparklyr integration tests with an opaque `cannot infer type from 
> data` error, and it's unfortunate that there's no longer a direct way to go 
> from a data.frame to a record batch, which sounds like a common need.
> In order to follow best practices (cf. the 
> [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) 
> add an {{as_record_batch}} function, which the data.frame method is probably 
> just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and 
> (2) if a user supplies a single, unnamed data.frame as the argument to 
> {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
> may later decide that we should automatically call as_record_batch(), but in 
> case that is too magical and prevents some legitimate use case, let's hold 
> off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
> and if that function doesn't exist, fall back to {{record_batch}} (because 
> that means it has an older released version of arrow that doesn't have 
> as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5721) [Rust] Move array related code into a separate module

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5721:
--
Labels: pull-request-available  (was: )

> [Rust] Move array related code into a separate module
> -
>
> Key: ARROW-5721
> URL: https://issues.apache.org/jira/browse/ARROW-5721
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>
> We should move all array related code into a separate module {{array}}, and 
> re-export public interfaces. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5725) [Crossbow] Port conda recipes to azure pipelines

2019-06-25 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5725:
--

 Summary: [Crossbow] Port conda recipes to azure pipelines 
 Key: ARROW-5725
 URL: https://issues.apache.org/jira/browse/ARROW-5725
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 0.14.0


Conda forge builds stopped working. CF is transitioning toward azure pipelines 
so port the conda crossbow builds to azure as well, and update the recipes 
(including gandiva).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5724) [R] [CI] AppVeyor build should use ccache

2019-06-25 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5724:
-

 Summary: [R] [CI] AppVeyor build should use ccache
 Key: ARROW-5724
 URL: https://issues.apache.org/jira/browse/ARROW-5724
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Antoine Pitrou


It looks like ccache is not installed for the R AppVeyor build. [~npr] [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2298) [Python] Add option to not consider NaN to be null when converting to an integer Arrow type

2019-06-25 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-2298.

Resolution: Fixed

Issue resolved by pull request 4682
[https://github.com/apache/arrow/pull/4682]

> [Python] Add option to not consider NaN to be null when converting to an 
> integer Arrow type
> ---
>
> Key: ARROW-2298
> URL: https://issues.apache.org/jira/browse/ARROW-2298
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Follow-on work to ARROW-2135



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5658) [JAVA] apache arrow-flight cannot send listvector

2019-06-25 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872219#comment-16872219
 ] 

Liya Fan commented on ARROW-5658:
-

[~liaotian1005] we have found the reason for the failure.

It is caused by inconsistent schema for VectorSchemaRoot.

In particular, the schema looks like this at the beginning:

!image-2019-06-25-17-58-09-038.png!

As new data are inserted to the vector, vector structure changes to something 
like this:

!image-2019-06-25-17-59-07-352.png!

The change is due to a mechanism named writer promotion. For details, please 
see class PromotableWriter.

In the client side, the new vector structure is sent along with the obsolete 
schema. In the server side, the code tries to decode the vector structure with 
the out-of-date schema, so it fails, without printing any output. 

 

Our solution is to provide a method in class VectorSchemaRoot, which brings the 
schema in sync with the vector structure. So to fix the problem in your code, 
simply call this method after the data is inserted. Please see the modified 
code here: [^ClientStart.java]

 

 

> [JAVA] apache arrow-flight cannot send listvector 
> --
>
> Key: ARROW-5658
> URL: https://issues.apache.org/jira/browse/ARROW-5658
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.13.0
> Environment: java8 arrow-java 0.13.0
>Reporter: luckily
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: ClientStart.java, ClientStart.java, ServerStart.java, 
> image-2019-06-25-17-58-09-038.png, image-2019-06-25-17-59-07-352.png, pom.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I can't transfer using apache arrow-flihgt. Contains listvector data. The 
> problem description is as follows:
> {quote} # I parse an xml file and convert it to an arrow format and finally 
> convert it to a parquet data format. The address of the .xml file data is url 
> [http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)]
>  # I created a schema that uses listvector.
> code show as below:
> List list = 
> childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator));
> VectorSchemaRoot root = VectorSchemaRoot.of(inVector)
>  # Parse the xml file to get the list data in "cd". Use api use listvector.
> `ListVector listVector = (ListVector) valueVectors;
> List columns = column.getColumns();
> Column column1 = columns.get(0);
> String name = column1.getId().toString();
> UnionListWriter writer = listVector.getWriter();
> Writer.allocate();
> For (int j = 0; j < column1.getColumns().size();j++) {
> writer.setPosition(j);
> writer.startList();
> Writer.list().startList();
> Column column2 = column1.getColumns().get(j);
> List> lst = (List String>>) ((Map) val).get(name);
> For (int k = 0; k < lst.size(); k++) {
> Map stringStringMap = lst.get(k);
> String value = 
> stringStringMap.get(column2.getId().toString());
> Switch (column2.getType()) {
> Case FLOAT:
> 
> Writer.list().float4().writeFloat4(stringConvertFloat(value));
> Break;
> Case BOOLEAN:
> 
> Writer.list().bit().writeBit(stringConvertBoolean(value));
> Break;
> Case DECIMAL:
> 
> Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale()));
> Break;
> Case TIMESTAMP:
> 
> Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString()));
> Break;
> Case INTEGER:
> Case BIGINT:
> 
> Writer.list().bigInt().writeBigInt(stringConvertLong(value));
> Break;
> Case VARCHAR:
> VarCharHolder varBinaryHolder = new 
> VarCharHolder();
> varBinaryHolder.start = 0;
> Byte[] bytes =value.getBytes();
> ArrowBuf buffer = 
> listVector.getAllocator().buffer(bytes.length);
> varBinaryHolder.buffer 

[jira] [Updated] (ARROW-5658) [JAVA] apache arrow-flight cannot send listvector

2019-06-25 Thread Liya Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan updated ARROW-5658:

Attachment: ClientStart.java

> [JAVA] apache arrow-flight cannot send listvector 
> --
>
> Key: ARROW-5658
> URL: https://issues.apache.org/jira/browse/ARROW-5658
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.13.0
> Environment: java8 arrow-java 0.13.0
>Reporter: luckily
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: ClientStart.java, ClientStart.java, ServerStart.java, 
> image-2019-06-25-17-58-09-038.png, image-2019-06-25-17-59-07-352.png, pom.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I can't transfer using apache arrow-flihgt. Contains listvector data. The 
> problem description is as follows:
> {quote} # I parse an xml file and convert it to an arrow format and finally 
> convert it to a parquet data format. The address of the .xml file data is url 
> [http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)]
>  # I created a schema that uses listvector.
> code show as below:
> List list = 
> childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator));
> VectorSchemaRoot root = VectorSchemaRoot.of(inVector)
>  # Parse the xml file to get the list data in "cd". Use api use listvector.
> `ListVector listVector = (ListVector) valueVectors;
> List columns = column.getColumns();
> Column column1 = columns.get(0);
> String name = column1.getId().toString();
> UnionListWriter writer = listVector.getWriter();
> Writer.allocate();
> For (int j = 0; j < column1.getColumns().size();j++) {
> writer.setPosition(j);
> writer.startList();
> Writer.list().startList();
> Column column2 = column1.getColumns().get(j);
> List> lst = (List String>>) ((Map) val).get(name);
> For (int k = 0; k < lst.size(); k++) {
> Map stringStringMap = lst.get(k);
> String value = 
> stringStringMap.get(column2.getId().toString());
> Switch (column2.getType()) {
> Case FLOAT:
> 
> Writer.list().float4().writeFloat4(stringConvertFloat(value));
> Break;
> Case BOOLEAN:
> 
> Writer.list().bit().writeBit(stringConvertBoolean(value));
> Break;
> Case DECIMAL:
> 
> Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale()));
> Break;
> Case TIMESTAMP:
> 
> Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString()));
> Break;
> Case INTEGER:
> Case BIGINT:
> 
> Writer.list().bigInt().writeBigInt(stringConvertLong(value));
> Break;
> Case VARCHAR:
> VarCharHolder varBinaryHolder = new 
> VarCharHolder();
> varBinaryHolder.start = 0;
> Byte[] bytes =value.getBytes();
> ArrowBuf buffer = 
> listVector.getAllocator().buffer(bytes.length);
> varBinaryHolder.buffer = buffer;
> buffer.writeBytes(bytes);
> varBinaryHolder.end=bytes.length;
> 
> Writer.list().varChar().write(varBinaryHolder);
> Break;
> Default:
> Throw new IllegalArgumentException(" error no 
> type !!");
> }
> }
> Writer.list().endList();
> writer.endList();
> }`
>  4. 
> After the write is complete, I will send to the arrow-flight server. server 
> code :
> {quote}
> {quote}@Override
> public Callable acceptPut(FlightStream flightStream) {
>  return () -> {
>  try (VectorSchemaRoot root = flightStream.getRoot()) {
>  while (flightStream.next()) {
>  VectorSchemaRoot other = null;
>  try {
>  logger.info(" Receive message .. size: " + root.getRowCount());
> 

[jira] [Updated] (ARROW-5658) [JAVA] apache arrow-flight cannot send listvector

2019-06-25 Thread Liya Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan updated ARROW-5658:

Attachment: image-2019-06-25-17-59-07-352.png

> [JAVA] apache arrow-flight cannot send listvector 
> --
>
> Key: ARROW-5658
> URL: https://issues.apache.org/jira/browse/ARROW-5658
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.13.0
> Environment: java8 arrow-java 0.13.0
>Reporter: luckily
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: ClientStart.java, ServerStart.java, 
> image-2019-06-25-17-58-09-038.png, image-2019-06-25-17-59-07-352.png, pom.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I can't transfer using apache arrow-flihgt. Contains listvector data. The 
> problem description is as follows:
> {quote} # I parse an xml file and convert it to an arrow format and finally 
> convert it to a parquet data format. The address of the .xml file data is url 
> [http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)]
>  # I created a schema that uses listvector.
> code show as below:
> List list = 
> childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator));
> VectorSchemaRoot root = VectorSchemaRoot.of(inVector)
>  # Parse the xml file to get the list data in "cd". Use api use listvector.
> `ListVector listVector = (ListVector) valueVectors;
> List columns = column.getColumns();
> Column column1 = columns.get(0);
> String name = column1.getId().toString();
> UnionListWriter writer = listVector.getWriter();
> Writer.allocate();
> For (int j = 0; j < column1.getColumns().size();j++) {
> writer.setPosition(j);
> writer.startList();
> Writer.list().startList();
> Column column2 = column1.getColumns().get(j);
> List> lst = (List String>>) ((Map) val).get(name);
> For (int k = 0; k < lst.size(); k++) {
> Map stringStringMap = lst.get(k);
> String value = 
> stringStringMap.get(column2.getId().toString());
> Switch (column2.getType()) {
> Case FLOAT:
> 
> Writer.list().float4().writeFloat4(stringConvertFloat(value));
> Break;
> Case BOOLEAN:
> 
> Writer.list().bit().writeBit(stringConvertBoolean(value));
> Break;
> Case DECIMAL:
> 
> Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale()));
> Break;
> Case TIMESTAMP:
> 
> Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString()));
> Break;
> Case INTEGER:
> Case BIGINT:
> 
> Writer.list().bigInt().writeBigInt(stringConvertLong(value));
> Break;
> Case VARCHAR:
> VarCharHolder varBinaryHolder = new 
> VarCharHolder();
> varBinaryHolder.start = 0;
> Byte[] bytes =value.getBytes();
> ArrowBuf buffer = 
> listVector.getAllocator().buffer(bytes.length);
> varBinaryHolder.buffer = buffer;
> buffer.writeBytes(bytes);
> varBinaryHolder.end=bytes.length;
> 
> Writer.list().varChar().write(varBinaryHolder);
> Break;
> Default:
> Throw new IllegalArgumentException(" error no 
> type !!");
> }
> }
> Writer.list().endList();
> writer.endList();
> }`
>  4. 
> After the write is complete, I will send to the arrow-flight server. server 
> code :
> {quote}
> {quote}@Override
> public Callable acceptPut(FlightStream flightStream) {
>  return () -> {
>  try (VectorSchemaRoot root = flightStream.getRoot()) {
>  while (flightStream.next()) {
>  VectorSchemaRoot other = null;
>  try {
>  logger.info(" Receive message .. size: " + root.getRowCount());
>  

[jira] [Updated] (ARROW-5658) [JAVA] apache arrow-flight cannot send listvector

2019-06-25 Thread Liya Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan updated ARROW-5658:

Attachment: image-2019-06-25-17-58-09-038.png

> [JAVA] apache arrow-flight cannot send listvector 
> --
>
> Key: ARROW-5658
> URL: https://issues.apache.org/jira/browse/ARROW-5658
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.13.0
> Environment: java8 arrow-java 0.13.0
>Reporter: luckily
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: ClientStart.java, ServerStart.java, 
> image-2019-06-25-17-58-09-038.png, pom.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I can't transfer using apache arrow-flihgt. Contains listvector data. The 
> problem description is as follows:
> {quote} # I parse an xml file and convert it to an arrow format and finally 
> convert it to a parquet data format. The address of the .xml file data is url 
> [http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)]
>  # I created a schema that uses listvector.
> code show as below:
> List list = 
> childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator));
> VectorSchemaRoot root = VectorSchemaRoot.of(inVector)
>  # Parse the xml file to get the list data in "cd". Use api use listvector.
> `ListVector listVector = (ListVector) valueVectors;
> List columns = column.getColumns();
> Column column1 = columns.get(0);
> String name = column1.getId().toString();
> UnionListWriter writer = listVector.getWriter();
> Writer.allocate();
> For (int j = 0; j < column1.getColumns().size();j++) {
> writer.setPosition(j);
> writer.startList();
> Writer.list().startList();
> Column column2 = column1.getColumns().get(j);
> List> lst = (List String>>) ((Map) val).get(name);
> For (int k = 0; k < lst.size(); k++) {
> Map stringStringMap = lst.get(k);
> String value = 
> stringStringMap.get(column2.getId().toString());
> Switch (column2.getType()) {
> Case FLOAT:
> 
> Writer.list().float4().writeFloat4(stringConvertFloat(value));
> Break;
> Case BOOLEAN:
> 
> Writer.list().bit().writeBit(stringConvertBoolean(value));
> Break;
> Case DECIMAL:
> 
> Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale()));
> Break;
> Case TIMESTAMP:
> 
> Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString()));
> Break;
> Case INTEGER:
> Case BIGINT:
> 
> Writer.list().bigInt().writeBigInt(stringConvertLong(value));
> Break;
> Case VARCHAR:
> VarCharHolder varBinaryHolder = new 
> VarCharHolder();
> varBinaryHolder.start = 0;
> Byte[] bytes =value.getBytes();
> ArrowBuf buffer = 
> listVector.getAllocator().buffer(bytes.length);
> varBinaryHolder.buffer = buffer;
> buffer.writeBytes(bytes);
> varBinaryHolder.end=bytes.length;
> 
> Writer.list().varChar().write(varBinaryHolder);
> Break;
> Default:
> Throw new IllegalArgumentException(" error no 
> type !!");
> }
> }
> Writer.list().endList();
> writer.endList();
> }`
>  4. 
> After the write is complete, I will send to the arrow-flight server. server 
> code :
> {quote}
> {quote}@Override
> public Callable acceptPut(FlightStream flightStream) {
>  return () -> {
>  try (VectorSchemaRoot root = flightStream.getRoot()) {
>  while (flightStream.next()) {
>  VectorSchemaRoot other = null;
>  try {
>  logger.info(" Receive message .. size: " + root.getRowCount());
>  other = copyRoot(root);
>  

[jira] [Updated] (ARROW-5658) [JAVA] apache arrow-flight cannot send listvector

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5658:
--
Labels: pull-request-available  (was: )

> [JAVA] apache arrow-flight cannot send listvector 
> --
>
> Key: ARROW-5658
> URL: https://issues.apache.org/jira/browse/ARROW-5658
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Java
>Affects Versions: 0.13.0
> Environment: java8 arrow-java 0.13.0
>Reporter: luckily
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: ClientStart.java, ServerStart.java, pom.xml
>
>
> I can't transfer using apache arrow-flihgt. Contains listvector data. The 
> problem description is as follows:
> {quote} # I parse an xml file and convert it to an arrow format and finally 
> convert it to a parquet data format. The address of the .xml file data is url 
> [http://www.w3school.com.cn/example/xmle/cd_catalog.xml|http://www.w3school.com.cn/example/xmle/cd_catalog.xml)]
>  # I created a schema that uses listvector.
> code show as below:
> List list = 
> childrenBuilder.add(ListVector.empty(column.getId().toString(),allocator));
> VectorSchemaRoot root = VectorSchemaRoot.of(inVector)
>  # Parse the xml file to get the list data in "cd". Use api use listvector.
> `ListVector listVector = (ListVector) valueVectors;
> List columns = column.getColumns();
> Column column1 = columns.get(0);
> String name = column1.getId().toString();
> UnionListWriter writer = listVector.getWriter();
> Writer.allocate();
> For (int j = 0; j < column1.getColumns().size();j++) {
> writer.setPosition(j);
> writer.startList();
> Writer.list().startList();
> Column column2 = column1.getColumns().get(j);
> List> lst = (List String>>) ((Map) val).get(name);
> For (int k = 0; k < lst.size(); k++) {
> Map stringStringMap = lst.get(k);
> String value = 
> stringStringMap.get(column2.getId().toString());
> Switch (column2.getType()) {
> Case FLOAT:
> 
> Writer.list().float4().writeFloat4(stringConvertFloat(value));
> Break;
> Case BOOLEAN:
> 
> Writer.list().bit().writeBit(stringConvertBoolean(value));
> Break;
> Case DECIMAL:
> 
> Writer.list().decimal().writeDecimal(stringConvertDecimal(value,column2.getScale()));
> Break;
> Case TIMESTAMP:
> 
> Writer.list().dateMilli().writeDateMilli(stringConvertTimestamp(value,column2.format.toString()));
> Break;
> Case INTEGER:
> Case BIGINT:
> 
> Writer.list().bigInt().writeBigInt(stringConvertLong(value));
> Break;
> Case VARCHAR:
> VarCharHolder varBinaryHolder = new 
> VarCharHolder();
> varBinaryHolder.start = 0;
> Byte[] bytes =value.getBytes();
> ArrowBuf buffer = 
> listVector.getAllocator().buffer(bytes.length);
> varBinaryHolder.buffer = buffer;
> buffer.writeBytes(bytes);
> varBinaryHolder.end=bytes.length;
> 
> Writer.list().varChar().write(varBinaryHolder);
> Break;
> Default:
> Throw new IllegalArgumentException(" error no 
> type !!");
> }
> }
> Writer.list().endList();
> writer.endList();
> }`
>  4. 
> After the write is complete, I will send to the arrow-flight server. server 
> code :
> {quote}
> {quote}@Override
> public Callable acceptPut(FlightStream flightStream) {
>  return () -> {
>  try (VectorSchemaRoot root = flightStream.getRoot()) {
>  while (flightStream.next()) {
>  VectorSchemaRoot other = null;
>  try {
>  logger.info(" Receive message .. size: " + root.getRowCount());
>  other = copyRoot(root);
>  ArrowMessage arrowMessage = new ArrowMessage(other, other.getSchema());
>  

[jira] [Updated] (ARROW-5723) [Gandiva][Crossbow] Builds failing

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5723:
--
Labels: pull-request-available  (was: )

> [Gandiva][Crossbow] Builds failing
> --
>
> Key: ARROW-5723
> URL: https://issues.apache.org/jira/browse/ARROW-5723
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
>
> Latest builds are failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5555) [R] Add install_arrow() function to assist the user in obtaining C++ runtime libraries

2019-06-25 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain François resolved ARROW-.

Resolution: Fixed

Issue resolved by pull request 4654
[https://github.com/apache/arrow/pull/4654]

> [R] Add install_arrow() function to assist the user in obtaining C++ runtime 
> libraries
> --
>
> Key: ARROW-
> URL: https://issues.apache.org/jira/browse/ARROW-
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.14.0
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Following ARROW-5488, it will be possible to install the R package without 
> having libarrow installed, but you won't be able to do anything until you do. 
> The error message you get when trying to use the package directs you to call 
> {{install_arrow()}}. 
> This function will at a minimum give a recommendation of steps to take to 
> install the library. In some cases, we may be able to download and install it 
> for the user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5700) [C++] Failed to open local file: Bad file descriptor

2019-06-25 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872195#comment-16872195
 ] 

Antoine Pitrou commented on ARROW-5700:
---

> Is there any chances that open() is called instead of _wsopen_s(), meaning 
> _WIN32 is not defined?

I don't think so, compilation would probably fail.


> [C++] Failed to open local file: Bad file descriptor
> 
>
> Key: ARROW-5700
> URL: https://issues.apache.org/jira/browse/ARROW-5700
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Windows version 10.0.10586.1106 (th2_release.170904-1742)
>Reporter: Tham
>Priority: Major
>
> I open an output stream to write a parquet file. Here is the code:
> {code:java}
> std::shared_ptr outStream;
> arrow::Status err;
> err = arrow::io::FileOutputStream::Open(filePath.toStdString(), false, 
> );
> if (err.code() != arrow::StatusCode::OK) {
> std::cout << err.message() << std::endl;
> }
> {code}
> Here is the error message I got:
> {code:java}
> Failed to open local file:  , error: Bad file 
> descriptor"
> {code}
> I've got this error only when running on Windows version 10.0.10586.1106 
> (th2_release.170904-1742).
> Any idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5700) [C++] Failed to open local file: Bad file descriptor

2019-06-25 Thread Tham (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872194#comment-16872194
 ] 

Tham commented on ARROW-5700:
-

bq. I took a look at the CRT source code, and it's mysterious how EBADF 
"Invalid file descriptor" could be returned by [_wsopen_s]
Is there any chances that open() is called instead of _wsopen_s(), meaning 
_WIN32 is not defined?

> [C++] Failed to open local file: Bad file descriptor
> 
>
> Key: ARROW-5700
> URL: https://issues.apache.org/jira/browse/ARROW-5700
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Windows version 10.0.10586.1106 (th2_release.170904-1742)
>Reporter: Tham
>Priority: Major
>
> I open an output stream to write a parquet file. Here is the code:
> {code:java}
> std::shared_ptr outStream;
> arrow::Status err;
> err = arrow::io::FileOutputStream::Open(filePath.toStdString(), false, 
> );
> if (err.code() != arrow::StatusCode::OK) {
> std::cout << err.message() << std::endl;
> }
> {code}
> Here is the error message I got:
> {code:java}
> Failed to open local file:  , error: Bad file 
> descriptor"
> {code}
> I've got this error only when running on Windows version 10.0.10586.1106 
> (th2_release.170904-1742).
> Any idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5700) [C++] Failed to open local file: Bad file descriptor

2019-06-25 Thread Tham (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872189#comment-16872189
 ] 

Tham commented on ARROW-5700:
-

{quote}Does it happen with any output file?
{quote}
I'm not sure. We receive this crash from customer. After this issue makes our 
application crashed, every time our application is started again, it fails 
immediately with another message (on same file):
{code:java}
Failed to open local file: , error: File exists{code}
So we don't know if it fails with another files.

I will try to install some Windows version (or at least this Windows version) 
on our office and try to debug as you suggested.

{quote}(you could also try to upgrade to a more recent version, as I think the 
code changed slightly){quote}
I will try with current version first, then upgrade later to see if this issue 
remains.

> [C++] Failed to open local file: Bad file descriptor
> 
>
> Key: ARROW-5700
> URL: https://issues.apache.org/jira/browse/ARROW-5700
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Windows version 10.0.10586.1106 (th2_release.170904-1742)
>Reporter: Tham
>Priority: Major
>
> I open an output stream to write a parquet file. Here is the code:
> {code:java}
> std::shared_ptr outStream;
> arrow::Status err;
> err = arrow::io::FileOutputStream::Open(filePath.toStdString(), false, 
> );
> if (err.code() != arrow::StatusCode::OK) {
> std::cout << err.message() << std::endl;
> }
> {code}
> Here is the error message I got:
> {code:java}
> Failed to open local file:  , error: Bad file 
> descriptor"
> {code}
> I've got this error only when running on Windows version 10.0.10586.1106 
> (th2_release.170904-1742).
> Any idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5700) [C++] Failed to open local file: Bad file descriptor

2019-06-25 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872159#comment-16872159
 ] 

Antoine Pitrou commented on ARROW-5700:
---

I took a look at the CRT source code, and it's mysterious how EBADF "Invalid 
file descriptor" could be returned by 
[_wsopen_s](https://docs.microsoft.com/fr-fr/cpp/c-runtime-library/reference/sopen-s-wsopen-s?view=vs-2019).

Since you compiled, you could probably try to debug with the Arrow source code. 
The function is {{FileOpenWritable}} in {{cpp/src/arrow/util/io-util.cc}}.

(you could also try to upgrade to a more recent version, as I think the code 
changed slightly)



> [C++] Failed to open local file: Bad file descriptor
> 
>
> Key: ARROW-5700
> URL: https://issues.apache.org/jira/browse/ARROW-5700
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Windows version 10.0.10586.1106 (th2_release.170904-1742)
>Reporter: Tham
>Priority: Major
>
> I open an output stream to write a parquet file. Here is the code:
> {code:java}
> std::shared_ptr outStream;
> arrow::Status err;
> err = arrow::io::FileOutputStream::Open(filePath.toStdString(), false, 
> );
> if (err.code() != arrow::StatusCode::OK) {
> std::cout << err.message() << std::endl;
> }
> {code}
> Here is the error message I got:
> {code:java}
> Failed to open local file:  , error: Bad file 
> descriptor"
> {code}
> I've got this error only when running on Windows version 10.0.10586.1106 
> (th2_release.170904-1742).
> Any idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5723) [Gandiva][Crossbow] Builds failing

2019-06-25 Thread Praveen Kumar Desabandu (JIRA)
Praveen Kumar Desabandu created ARROW-5723:
--

 Summary: [Gandiva][Crossbow] Builds failing
 Key: ARROW-5723
 URL: https://issues.apache.org/jira/browse/ARROW-5723
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Praveen Kumar Desabandu
Assignee: Praveen Kumar Desabandu


Latest builds are failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5700) [C++] Failed to open local file: Bad file descriptor

2019-06-25 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872147#comment-16872147
 ] 

Antoine Pitrou commented on ARROW-5700:
---

Does it happen with any output file?

> [C++] Failed to open local file: Bad file descriptor
> 
>
> Key: ARROW-5700
> URL: https://issues.apache.org/jira/browse/ARROW-5700
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Windows version 10.0.10586.1106 (th2_release.170904-1742)
>Reporter: Tham
>Priority: Major
>
> I open an output stream to write a parquet file. Here is the code:
> {code:java}
> std::shared_ptr outStream;
> arrow::Status err;
> err = arrow::io::FileOutputStream::Open(filePath.toStdString(), false, 
> );
> if (err.code() != arrow::StatusCode::OK) {
> std::cout << err.message() << std::endl;
> }
> {code}
> Here is the error message I got:
> {code:java}
> Failed to open local file:  , error: Bad file 
> descriptor"
> {code}
> I've got this error only when running on Windows version 10.0.10586.1106 
> (th2_release.170904-1742).
> Any idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)