[GitHub] orc pull request #185: ORC-261: [C++] Fix installation to include all header...

2017-11-01 Thread jcrist
GitHub user jcrist opened a pull request:

https://github.com/apache/orc/pull/185

ORC-261: [C++] Fix installation to include all header files

Addresses [issue 
261](https://issues.apache.org/jira/projects/ORC/issues/ORC-261).

- Fixes installation to include all necessary header files in `include/orc`
- Removes a few unnecessary includes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jcrist/orc fix-install-includes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #185


commit 92200217641d645cd382f0cab5c8ffc33e62e5b7
Author: Jim Crist 
Date:   2017-11-01T22:20:00Z

Fix installation to include all header files

- Fixes installation to include all necessary header files in
  `include/orc`
- Removes a few unnecessary includes




---


[GitHub] orc pull request #185: ORC-261: [C++] Fix installation to include all header...

2017-11-01 Thread jcrist
Github user jcrist commented on a diff in the pull request:

https://github.com/apache/orc/pull/185#discussion_r148402662
  
--- Diff: tools/src/CMakeLists.txt ---
@@ -10,10 +10,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+# TODO:
+# - orc-metadata relies on the protobuf routines, meaning protobuf and
+#   binary_dir/c++/src still need to be included
+# - timezone-dump relies on non-public timezone routines. I *think* this
+#   executable can just be removed, as it looks like it was written for 
testing
+#   alone.
--- End diff --

If these issues can be resolved, then the `include_directories` below can 
just be:

```
include_directories (
   ${PROJECT_SOURCE_DIR}/c++/include
   ${PROJECT_BINARY_DIR}/c++/include
)
```

meaning the executables use the same interface a user of this library would 
get. Whether that means the protobuf headers are included in the build, or the 
interface is expanded to make `orc-metadata` not rely on the protobuf headers 
directly I'm not sure.

Either way this probably doesn't need to be done now, it's just a potential 
hygiene thing.


---


[GitHub] orc issue #185: ORC-261: [C++] Fix installation to include all header files

2017-11-01 Thread jcrist
Github user jcrist commented on the issue:

https://github.com/apache/orc/pull/185
  
Note that this is the first time I've touched C++ code outside undergrad, 
so there's a decent chance I've done something wrong here. Everything seems to 
pass locally for me though.


---


[GitHub] orc issue #170: ORC-207: [C++] Enable users the ability to provide their own...

2017-11-06 Thread jcrist
Github user jcrist commented on the issue:

https://github.com/apache/orc/pull/170
  
I find myself in need of this PR. Looks like this has a few merge 
conflicts, but other than that is this ready to merge?


---


[GitHub] orc issue #170: ORC-207: [C++] Enable users the ability to provide their own...

2017-11-07 Thread jcrist
Github user jcrist commented on the issue:

https://github.com/apache/orc/pull/170
  
Thanks @majetideepak. If you also have a chance, it would be nice to get 
#185 reviewed and merged too. Waiting on both of these so I can use orc in a 
larger cmake project.


---


[GitHub] orc pull request #192: Cleanup cmake scripts

2017-11-21 Thread jcrist
GitHub user jcrist opened a pull request:

https://github.com/apache/orc/pull/192

Cleanup cmake scripts

This was an attempt to fix using local versions of thirdparty libraries (as 
added in https://github.com/apache/orc/pull/170) so that installing doesn't 
pull in every file matching not matching `*.so|*.dylib` in the containing `lib` 
folder (as it does now). However, it kind of spiraled out in scope as more 
issues developed. Apologies for dumping this here without prior discussion.

---

A summary of the backwards compatible changes:

- Cleanup handling of `find_package` scripts to:
  - Set consistent variable names, and also match conventions in other 
apache projects
  - Set consistent variable names in both `*_HOME` specified and vendored 
paths
  - Provide error messages when not found, and only search when `*_HOME` is 
specified
  - Allow specifying `*_HOME` paths in other cmake projects using orc as an 
`ExternalProject`. This allows the wrapping project to more easily specify 
location of third party libraries with orc (e.g. `snappy`).

- Add an option to not build the cpp tests, and only install/search for 
gtest if turned on

- Add an option to not build the tools

---

A few potentially more contentious changes:

- Don't install third party libraries, even when vendored. The current 
behavior installs the thirdparty libs even if not vendored, which feels wrong 
to me. It also accidentally copies every file in the `lib` directory that isn't 
a shared library, which is definitely a bug.

  At a minimum, the thirdparty libraries that aren't vendored shouldn't be 
installed, and the accidental copying fixed. I'd argue that no thirdparty 
library should be installed, as it may clobber existing versions of e.g. 
`protobuf.a`. Downstream projects may handle the linking and distribution of 
thirdparty libraries themselves.

  If people disagree, perhaps a flag `INSTALL_THIRDPARTY_LIBS=on` could be 
added, to allow downstream projects to turn this off?

- Don't install `NOTICE` and `LICENSE`. When running `make install` 
currently, these get installed in the root directory, which is not where they 
should go. Since the header files contain license headers already, and other 
apache projects don't install `NOTICE` and `LICENSE` when running `make 
install`, I think this should be fine?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jcrist/orc cleanup-cmake-scripts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #192


commit 28c5b20416359524c971ce7c536d561664aab489
Author: Jim Crist 
Date:   2017-11-20T21:15:15Z

Cleanup cmake

- Add option to build tools (default ON, matching current behavior)
- Add option to build tests (default ON, matching current behavior)
- Cleaned up cmake find-package logic to be more consistent, and match
  conventions in other libraries.
- Don't install third party libraries, even if vendored. This seems to
  match common practice in other libraries. At a minimum third party
  libraries shouldn't be installed if they're not vendored (currently
  they're installed either way).

commit 6630118c0614b5ede27917ab082154a9a7c71f5d
Author: Jim Crist 
Date:   2017-11-20T23:56:53Z

find_package required errors if not found

commit 3dcd68e900a3ea165f3df426dfa154d53c2f906f
Author: Jim Crist 
Date:   2017-11-21T21:28:46Z

Don't install LICENSE or NOTICE

- The header files contain the license headers
- Running make install installs LICENSE and NOTICE into the root
directory, which is not where they should go.

Instead, opt for not installing either, and rely on header licenses to
be sufficient.




---


[GitHub] orc issue #192: Cleanup cmake scripts

2017-11-21 Thread jcrist
Github user jcrist commented on the issue:

https://github.com/apache/orc/pull/192
  
As an example, in a downstream c++ project I'm building `liborc.a` using 
the following `ExternalProject` setup:

```python
set (ORC_CMAKE_ARGS -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
-DCMAKE_INSTALL_PREFIX=${ORC_PREFIX}
-DBUILD_LIBHDFSPP=OFF
-DBUILD_JAVA=OFF
-DBUILD_TOOLS=OFF
-DBUILD_CPP_TESTS=OFF
-DPROTOBUF_HOME=${PROTOBUF_HOME}
-DLZ4_HOME=${LZ4_HOME}
-DSNAPPY_HOME=${SNAPPY_HOME}
-DZLIB_HOME=${ZLIB_HOME})

ExternalProject_Add(orc_ep
GIT_REPOSITORY "https://github.com/apache/orc";
GIT_TAG ${ORC_VERSION}
BUILD_BYPRODUCTS ${ORC_STATIC_LIB}
CMAKE_ARGS ${ORC_CMAKE_ARGS})
```

where `protobuf`, `lz4`, `snappy`, and `zlib` are all already existing 
dependencies.


---


[GitHub] orc issue #192: Cleanup cmake scripts

2017-11-21 Thread jcrist
Github user jcrist commented on the issue:

https://github.com/apache/orc/pull/192
  
Ping @majetideepak for discussion and review.


---


[GitHub] orc pull request #192: Cleanup cmake scripts

2017-11-22 Thread jcrist
Github user jcrist commented on a diff in the pull request:

https://github.com/apache/orc/pull/192#discussion_r152660084
  
--- Diff: cmake_modules/ThirdpartyToolchain.cmake ---
@@ -10,19 +10,46 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+set (LZ4_VERSION "1.7.5")
+set (SNAPPY_VERSION "1.1.4")
+set (ZLIB_VERSION "1.2.11")
+set (GTEST_VERSION "1.8.0")
+set (PROTOBUF_VERSION "2.6.0")
+
 set (THIRDPARTY_DIR "${CMAKE_BINARY_DIR}/c++/libs/thirdparty")
 
 string(TOUPPER ${CMAKE_BUILD_TYPE} UPPERCASE_BUILD_TYPE)
 
+if (DEFINED ENV{SNAPPY_HOME})
+  set (SNAPPY_HOME "$ENV{SNAPPY_HOME}")
+endif ()
+
+if (DEFINED ENV{ZLIB_HOME})
+  set (ZLIB_HOME "$ENV{ZLIB_HOME}")
+endif ()
+
+if (DEFINED ENV{LZ4_HOME})
+  set (LZ4_HOME "$ENV{LZ4_HOME}")
+endif ()
+
+if (DEFINED ENV{PROTOBUF_HOME})
+  set (PROTOBUF_HOME "$ENV{PROTOBUF_HOME}")
+endif ()
+
+if (DEFINED ENV{GTEST_HOME})
+  set (GTEST_HOME "$ENV{GTEST_HOME}")
+endif ()
+
 # --
 # Snappy
 
-set (SNAPPY_HOME "$ENV{SNAPPY_HOME}")
-find_package (Snappy)
-if (NOT SNAPPY_FOUND)
+if (NOT "${SNAPPY_HOME}" STREQUAL "")
+  find_package (Snappy REQUIRED)
+  set(SNAPPY_VENDORED FALSE)
+else ()
--- End diff --

As I wrote it, we error out if `SNAPPY_HOME` is set but not found (since 
`REQUIRED` is set). I think this makes more sense - as a user if I specify 
`SNAPPY_HOME` I'd rather get an error than silently download and build a new 
version if it fails to find it.


---


[GitHub] orc pull request #192: ORC-266: [C++] Cleanup cmake scripts

2017-11-22 Thread jcrist
Github user jcrist commented on a diff in the pull request:

https://github.com/apache/orc/pull/192#discussion_r152686927
  
--- Diff: cmake_modules/ThirdpartyToolchain.cmake ---
@@ -41,32 +68,35 @@ if (NOT SNAPPY_FOUND)
 LOG_BUILD 1
 LOG_INSTALL 1
 BUILD_BYPRODUCTS "${SNAPPY_STATIC_LIB}")
+
+  set (SNAPPY_VENDORED TRUE)
 endif ()
-include_directories (SYSTEM ${SNAPPY_INCLUDE_DIRS})
+
+include_directories (SYSTEM ${SNAPPY_INCLUDE_DIR})
 add_library (snappy STATIC IMPORTED)
 set_target_properties (snappy PROPERTIES IMPORTED_LOCATION 
${SNAPPY_STATIC_LIB})
-set (SNAPPY_LIBRARIES snappy)
-add_dependencies (snappy snappy_ep)
-install(DIRECTORY ${SNAPPY_PREFIX}/lib DESTINATION .
--- End diff --

Done. Went with `INSTALL_VENDORED_LIBS`, which defaults to on.


---


[GitHub] orc issue #192: ORC-266: [C++] Cleanup cmake scripts

2017-11-22 Thread jcrist
Github user jcrist commented on the issue:

https://github.com/apache/orc/pull/192
  
Opened issue https://issues.apache.org/jira/browse/ORC-266. I think this 
should be good now?


---


[GitHub] orc pull request #193: ORC-267: INSTALL NOTICES and LICENSE to binary tarbal...

2017-12-04 Thread jcrist
Github user jcrist commented on a diff in the pull request:

https://github.com/apache/orc/pull/193#discussion_r154717285
  
--- Diff: CMakeLists.txt ---
@@ -105,6 +105,11 @@ INCLUDE(ThirdpartyToolchain)
 set (EXAMPLE_DIRECTORY ${CMAKE_SOURCE_DIR}/examples)
 
 add_subdirectory(c++)
+
+install(
+  FILES LICENSE NOTICE
+  DESTINATION .)
--- End diff --

If I do `make install`, these just get placed in my root directory. Other 
apache projects (e.g. arrow, parquet-cpp) don't install the `LICENSE` or 
`NOTICE` (the header files all have the license information), so why is this 
needed? This is why I removed these in #192, sorry if there was a good reason 
for installing these.


---


[GitHub] orc issue #193: ORC-267: INSTALL NOTICES and LICENSE to binary tarball

2017-12-04 Thread jcrist
Github user jcrist commented on the issue:

https://github.com/apache/orc/pull/193
  
From http://www.apache.org/dev/apply-license.html#new it looks like they 
recommend storing `LICENSE` and `NOTICE` in the root of the tar archive of a 
distribution (as you've done here). Unfortunately cpack doesn't make it easy to 
distinguish between files in `make install` and `make package`.

Neither `parquet-cpp` nor `arrow` include `LICENSE` or `NOTICE` on `make 
install` or `make package` (`parquet-cpp` only, arrow doesn't use `cpack`). For 
at least the python wrappers for these libraries, the actual release artifacts 
are generated via an external tool that adds in the LICENSE stuff.

This separation of build/package makes sense to me, but may not for this 
project (not sure how y'all handle releases). Perhaps the LICENSE/NOTICE could 
be added to the `tar.gz` file after `make package` manually via another make 
target? Adding to `doc/orc/{NOTICE,LICENSE}` might be equally fine? I'm not 
really sure what's best here, either is fine with me.


---


[GitHub] orc pull request #207: Move install location of LICENSE/NOTICE

2018-01-04 Thread jcrist
GitHub user jcrist opened a pull request:

https://github.com/apache/orc/pull/207

Move install location of LICENSE/NOTICE

Currently these are put in the root directory on `make install`, which is 
not where they should go. Moves them to `share/doc/orc/`, as discussed in 
https://github.com/apache/orc/pull/193#issuecomment-349063277.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jcrist/orc move-install-of-license-notice

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/207.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #207


commit 09f6cdc10b9f4f0c6f1bc908e163d1b0a0257e9c
Author: Jim Crist 
Date:   2018-01-04T22:23:48Z

Move install location of LICENSE/NOTICE

Currently these are put in the root directory on `make install`, which
is not where they should go. Moves them to `share/doc/orc/`.




---