[jira] [Created] (ARROW-8132) [C++] arrow-s3fs-test failing on master

2020-03-16 Thread Hatem Helal (Jira)
Hatem Helal created ARROW-8132:
--

 Summary: [C++] arrow-s3fs-test failing on master
 Key: ARROW-8132
 URL: https://issues.apache.org/jira/browse/ARROW-8132
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Hatem Helal


Log:

[https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/branch/master/job/9dgr7xl635yuwh7y#L1917]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6096) [C++] Remove dependency on boost regex library

2019-08-01 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-6096:
--

 Summary: [C++] Remove dependency on boost regex library
 Key: ARROW-6096
 URL: https://issues.apache.org/jira/browse/ARROW-6096
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Hatem Helal
Assignee: Hatem Helal


There appears to be only one place where the boost regex library is used:

[cpp/src/parquet/metadata.cc|https://github.com/apache/arrow/blob/eb73b962e42b5ae6983bf026ebf825f1f707e245/cpp/src/parquet/metadata.cc#L32]

I think this can be replaced by the C++11 regex library.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6061) [C++] Cannot build libarrow without rapidjson

2019-07-29 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-6061:
--

 Summary: [C++] Cannot build libarrow without rapidjson
 Key: ARROW-6061
 URL: https://issues.apache.org/jira/browse/ARROW-6061
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Hatem Helal
Assignee: Hatem Helal


 
{code:java}
arrow/cpp/src/arrow/json/chunker.cc:25:30:fatal error: rapidjson/reader.h: No 
such file or directory
 #include "rapidjson/reader.h"

compilation terminated.{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5676) [CI] hadolint failing on r/Dockerfile causing Travis "Lint, Release tests" failure

2019-06-21 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5676:
--

 Summary: [CI] hadolint failing on r/Dockerfile causing Travis 
"Lint, Release tests" failure
 Key: ARROW-5676
 URL: https://issues.apache.org/jira/browse/ARROW-5676
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Hatem Helal


See [https://travis-ci.org/apache/arrow/jobs/548674391#L544]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5675) [Doc] Fix typo in documentation describing compile/debug workflow on macOS with Xcode IDE

2019-06-21 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5675:
--

 Summary: [Doc] Fix typo in documentation describing compile/debug 
workflow on macOS with Xcode IDE
 Key: ARROW-5675
 URL: https://issues.apache.org/jira/browse/ARROW-5675
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Reporter: Hatem Helal
Assignee: Hatem Helal


See

 

https://github.com/apache/arrow/pull/4596#discussion_r296093152



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5638) [C++] cmake fails to generate Xcode project when Gandiva JNI bindings are enabled

2019-06-18 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5638:
--

 Summary: [C++] cmake fails to generate Xcode project when Gandiva 
JNI bindings are enabled
 Key: ARROW-5638
 URL: https://issues.apache.org/jira/browse/ARROW-5638
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Hatem Helal


See comment with error here:

https://github.com/apache/arrow/pull/4596#issuecomment-502954709



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5632) [Doc] Add some documentation describing compile/debug workflow on macOS with Xcode IDE

2019-06-17 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5632:
--

 Summary: [Doc] Add some documentation describing compile/debug 
workflow on macOS with Xcode IDE
 Key: ARROW-5632
 URL: https://issues.apache.org/jira/browse/ARROW-5632
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Hatem Helal
Assignee: Hatem Helal






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5608) [C++][parquet] Invalid memory access when using parquet::arrow::ColumnReader

2019-06-14 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5608:
--

 Summary: [C++][parquet] Invalid memory access when using 
parquet::arrow::ColumnReader
 Key: ARROW-5608
 URL: https://issues.apache.org/jira/browse/ARROW-5608
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Hatem Helal
Assignee: Hatem Helal


I've observed occasional crashes when using the 
{{parquet::arrow::ColumnReader}} to iteratively read a fixed number of records. 
 This has been quite tricky to isolate but compiling the attached version of 
parquet-arrow-example with ASAN has pointed me to an out-of-bounds access at 
[cpp/src/parquet/arrow/record_reader.cc#L356|https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/record_reader.cc#L356]

ASAN stack trace
{code:java}
==18666==ERROR: AddressSanitizer: global-buffer-overflow on address 
0x00010c1b3038 at pc 0x000108330bdd bp 0x7ffee8d16450 sp 0x7ffee8d15c00
READ of size 198 at 0x00010c1b3038 thread T0
#0 0x108330bdc in __asan_memmove 
(libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x54bdc)
#1 0x107205e96 in parquet::internal::RecordReader::RecordReaderImpl::Reset() 
algorithm:1828
#2 0x107205813 in parquet::internal::RecordReader::Reset() record_reader.cc:932
#3 0x106faea47 in parquet::arrow::PrimitiveImpl::NextBatch(long long, 
std::__1::shared_ptr*) reader.cc:1549
#4 0x106f6e69b in parquet::arrow::ColumnReader::NextBatch(long long, 
std::__1::shared_ptr*) reader.cc:1665
#5 0x106f06afe in read_column_iterative() reader-writer.cc:162
#6 0x106f09e9a in main reader-writer.cc:174
#7 0x7fff79472ed8 in start (libdyld.dylib:x86_64+0x16ed8){code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5157) [Website] Add MATLAB to powered by Apache Arrow page

2019-04-10 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5157:
--

 Summary: [Website] Add MATLAB to powered by Apache Arrow page
 Key: ARROW-5157
 URL: https://issues.apache.org/jira/browse/ARROW-5157
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Hatem Helal
Assignee: Hatem Helal


MATLAB recently shipped R2019a with builtin support for Apache Parquet files 
and we used Arrow in the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4785) [CI] Make Travis CI resilient against GPG errors

2019-03-06 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-4785:
--

 Summary: [CI] Make Travis CI resilient against GPG errors
 Key: ARROW-4785
 URL: https://issues.apache.org/jira/browse/ARROW-4785
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Hatem Helal


Travis Jobs sometime fail with a GPG error:

{{W: An error occurred during the signature verification. The repository is not 
updated and the previous index files will be used. GPG error: 
https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease: The following 
signatures couldn't be verified because the public key is not available: 
NO_PUBKEY 6B05F25D762E3157}}{{W: Failed to fetch 
https://packagecloud.io/github/git-lfs/ubuntu/dists/trusty/InRelease The 
following signatures couldn't be verified because the public key is not 
available: NO_PUBKEY 6B05F25D762E3157}}{{E: Failed to fetch 
http://security.ubuntu.com/ubuntu/dists/trusty-security/main/binary-i386/Packages.gz
 Hash Sum mismatch}}{{W: Some index files failed to download. They have been 
ignored, or old ones used instead.}}{{The command "if [ $TRAVIS_OS_NAME == 
"linux" ]; then}}{{ sudo bash -c "echo -e 'Acquire::Retries 10; 
Acquire::http::Timeout \"20\";' > /etc/apt/apt.conf.d/99-travis-retry"}}{{ sudo 
add-apt-repository -y ppa:ubuntu-toolchain-r/test}}{{ sudo apt-get update 
-qq}}{{ fi}}{{ " failed and exited with 100 during .}}{{ }}{{Your build has 
been stopped.}}

 

It would be nice if the number of retries, timeout, or both could be increased 
to make the travis jobs more resilient to this seemingly sporadic issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4661) [C++] Consolidate random string generators for use in benchmarks and unittests

2019-02-22 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-4661:
--

 Summary: [C++] Consolidate random string generators for use in 
benchmarks and unittests
 Key: ARROW-4661
 URL: https://issues.apache.org/jira/browse/ARROW-4661
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Hatem Helal
Assignee: Hatem Helal
 Fix For: 0.14.0


This was discussed in here:

[https://github.com/apache/arrow/pull/3721]

For testing/benchmarking dictionary encoding its useful to control the number 
of repeated values and it would also be good to optionally include null values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4260) [Python] test_serialize_deserialize_pandas is failing on OSX with Xcode 6.4

2019-01-14 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-4260:
--

 Summary: [Python] test_serialize_deserialize_pandas is failing on 
OSX with Xcode 6.4
 Key: ARROW-4260
 URL: https://issues.apache.org/jira/browse/ARROW-4260
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Hatem Helal


See 
 [https://travis-ci.org/apache/arrow/jobs/479378190#L2427]
  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4156) [C++] xcodebuild failure for cmake generated project

2019-01-04 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-4156:
--

 Summary: [C++] xcodebuild failure for cmake generated project
 Key: ARROW-4156
 URL: https://issues.apache.org/jira/browse/ARROW-4156
 Project: Apache Arrow
  Issue Type: Wish
Reporter: Hatem Helal
Assignee: Uwe L. Korn


Using the cmake xcode project generator fails to build using xcodebuild as 
follows:
{code:java}
$ cmake .. -G Xcode -DARROW_PARQUET=ON  -DPARQUET_BUILD_EXECUTABLES=ON 
-DPARQUET_BUILD_EXAMPLES=ON  
-DFLATBUFFERS_HOME=/usr/local/Cellar/flatbuffers/1.10.0 
-DCMAKE_BUILD_TYPE=Debug  -DTHRIFT_HOME=/usr/local/Cellar/thrift/0.11.0 
-DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_BUILD_TESTS=ON 
-DClangTools_PATH=/usr/local/Cellar/llvm@6/6.0.1_1



Libtool 
xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
 normal x86_64
cd /Users/hhelal/Documents/code/arrow/cpp
export MACOSX_DEPLOYMENT_TARGET=10.14
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool
 -static -arch_only x86_64 -syslibroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk
 
-L/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal
 -filelist 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/x86_64/arrow_objlib.LinkFileList
 -o 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a

PhaseScriptExecution CMake\ PostBuild\ Rules 
xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
cd /Users/hhelal/Documents/code/arrow/cpp
/bin/sh -c 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
echo "Depend check for xcode"
Depend check for xcode
cd /Users/hhelal/Documents/code/arrow/cpp/xcode-build && make -C 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build -f 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/CMakeScripts/XCODE_DEPEND_HELPER.make
 PostBuild.arrow_objlib.Debug
/bin/rm -f 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
/bin/rm -f 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.a

=== BUILD TARGET arrow_shared OF PROJECT arrow WITH THE DEFAULT CONFIGURATION 
(Debug) ===

Check dependencies

Write auxiliary files
write-file 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
chmod 0755 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh

PhaseScriptExecution CMake\ PostBuild\ Rules 
xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
cd /Users/hhelal/Documents/code/arrow/cpp
/bin/sh -c 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
echo "Creating symlinks"
Creating symlinks
/usr/local/Cellar/cmake/3.12.4/bin/cmake -E cmake_symlink_library 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.0.0.dylib
 
/Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.dylib
 /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
CMake Error: cmake_symlink_library: System Error: No such file or directory
CMake Error: cmake_symlink_library: System Error: No such file or directory
make: *** [arrow_shared_buildpart_0] Error 1

** BUILD FAILED **


The following build commands failed:
PhaseScriptExecution CMake\ PostBuild\ Rules 
xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
(1 failure)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3564) pyarrow: writing version 2.0 parquet format with dictionary encoding enabled

2018-10-19 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-3564:
--

 Summary: pyarrow: writing version 2.0 parquet format with 
dictionary encoding enabled
 Key: ARROW-3564
 URL: https://issues.apache.org/jira/browse/ARROW-3564
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 0.11.0
Reporter: Hatem Helal
 Attachments: example_v1.0_dict_False.parquet, 
example_v1.0_dict_True.parquet, example_v2.0_dict_False.parquet, 
example_v2.0_dict_True.parquet, pyarrow_repro.py

Using pyarrow v0.11.0, the following script writes a simple table (lifted from 
the [pyarrow doc|https://arrow.apache.org/docs/python/parquet.html]) to both 
parquet format versions 1.0 and 2.0, with and without dictionary encoding 
enabled.
|{{import}} {{pyarrow.parquet as pq}}
{{import}} {{numpy as np}}
{{import}} {{pandas as pd}}
{{import}} {{pyarrow as pa}}
{{import}} {{itertools}}
 
{{df }}{{=}} {{pd.DataFrame({}}{{'one'}}{{: [}}{{-}}{{1}}{{, np.nan, 
}}{{2.5}}{{],}}
{{}}{{'two'}}{{: [}}{{'foo'}}{{, }}{{'bar'}}{{, }}{{'baz'}}{{],}}
{{}}{{'three'}}{{: [}}{{True}}{{, }}{{False}}{{, }}{{True}}{{]},}}
{{}}{{index}}{{=}}{{list}}{{(}}{{'abc'}}{{))}}
 
{{table }}{{=}} {{pa.Table.from_pandas(df)}}
 
{{use_dict }}{{=}} {{[}}{{True}}{{, }}{{False}}{{]}}
{{version }}{{=}} {{[}}{{'1.0'}}{{, }}{{'2.0'}}{{]}}
 
{{for}} {{tf, v }}{{in}} {{itertools.product(use_dict, version):}}
{{}}{{filename }}{{=}} {{'example_v'}} {{+}} {{v  }}{{+}} {{'_dict_'}} 
{{+}} {{str}}{{(tf) }}{{+}} {{'.parquet'}}
{{}}{{pq.write_table(table, filename, use_dictionary}}{{=}}{{tf, 
version}}{{=}}{{v)}}|

Inspecting the written files using 
[parquet-tools|https://github.com/apache/parquet-mr/tree/master/parquet-tools] 
appears to show that dictionary encoding is not used in either of the version 
2.0 files.  Both files report that the columns are encoded using {{PLAIN,RLE}} 
and that the dictionary page offset is zero.  I was expecting that the column 
encoding would include {{RLE_DICTIONARY}}. Attached are the script with repro 
steps and the files that were generated by it.

Below is the output of using {{parquet-tools meta}} on the version 2.0 files
{panel:title=version='2.0', use_dictionary = True}
{panel}
|{{% parquet-tools meta example_v2.0_dict_True.parquet}}
{{file:  file:.../example_v2.0_dict_True.parquet}}
{{creator:   parquet-cpp version 1.5.1-SNAPSHOT}}
{{extra: pandas = \{"pandas_version": "0.23.4", "index_columns": 
["__index_level_0__"], "columns": [{"metadata": null, "field_name": "one", 
"name": "one", "numpy_type": "float64", "pandas_type": "float64"}, 
\{"metadata": null, "field_name": "three", "name": "three", "numpy_type": 
"bool", "pandas_type": "bool"}, \{"metadata": null, "field_name": "two", 
"name": "two", "numpy_type": "object", "pandas_type": "bytes"}, \{"metadata": 
null, "field_name": "__index_level_0__", "name": null, "numpy_type": "object", 
"pandas_type": "bytes"}], "column_indexes": [\{"metadata": null, "field_name": 
null, "name": null, "numpy_type": "object", "pandas_type": "bytes"}]}}}
 
{{file schema:   schema}}
{{}}
{{one:   OPTIONAL DOUBLE R:0 D:1}}
{{three: OPTIONAL BOOLEAN R:0 D:1}}
{{two:   OPTIONAL BINARY R:0 D:1}}
{{__index_level_0__: OPTIONAL BINARY R:0 D:1}}
 
{{row group 1:   RC:3 TS:211 OFFSET:4}}
{{}}
{{one:    DOUBLE SNAPPY DO:0 FPO:4 SZ:65/63/0.97 VC:3 ENC:PLAIN,RLE 
ST:[min: -1.0, max: 2.5, num_nulls: 1]}}
{{three:  BOOLEAN SNAPPY DO:0 FPO:142 SZ:36/34/0.94 VC:3 
ENC:PLAIN,RLE ST:[min: false, max: true, num_nulls: 0]}}
{{two:    BINARY SNAPPY DO:0 FPO:225 SZ:60/58/0.97 VC:3 
ENC:PLAIN,RLE ST:[min: 0x626172, max: 0x666F6F, num_nulls: 0]}}
{{__index_level_0__:  BINARY SNAPPY DO:0 FPO:328 SZ:50/48/0.96 VC:3 
ENC:PLAIN,RLE ST:[min: 0x61, max: 0x63, num_nulls: 0]}}|
{panel:title=version='2.0', use_dictionary = False}
{panel}
|{{% parquet-tools meta example_v2.0_dict_False.parquet}}
{{file:  file:.../example_v2.0_dict_False.parquet}}
{{creator:   parquet-cpp version 1.5.1-SNAPSHOT}}
{{extra: pandas = \{"pandas_version": "0.23.4", "index_columns": 
["__index_level_0__"], "columns": [{"metadata": null, "field_name": "one", 
"name": "one", "numpy_type": "float64", "pandas_type": "float64"}, 
\{"metadata": null, "field_name": "three", "name": "three", "numpy_type": 
"bool", "pandas_type": "bool"}, \{"metadata": null, "field_name": "two", 
"name": "two", "numpy_type": "object", "pandas_type": "bytes"}, \{"metadata": 
null, "field_name": "__index_level_0__", "name": null, "numpy_type": "object", 
"pandas_type": "bytes"}], "column_indexes": [\{"metadata": null, "field_name": 
null, "name": null,