GitHub user mizitch opened a pull request:

    https://github.com/apache/incubator-beam/pull/1327

    [BEAM-840] Some minor changes and fixes for sorter module. 

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [x] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [x] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [x] If this contribution is large, please file an Apache
           [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
    
    ---
    Includes:
    * Limit max memory for ExternalSorter and BufferedExternalSorter to 2047 MB 
to prevent int overflow within Hadoop's sorting library
    * Fix int overflow for large memory values in InMemorySorter
    * Add note about estimated disk use to README.MD
    * Fix to make Hadoop's sorting library put all temp files under the 
specified directory
    * Have Hadoop clean up the temp directory on exit
    * Stop shading hadoop dependencies. Some context:
    ** The existing shading is broken (modules that depend on this one cannot 
use it successfully).
    ** Hadoop's use of reflection in several instances makes shading the 
dependency "in a good way" nearly impossible. It requires a couple of rather 
brittle hacks, and, for clients that depend on certain conflicting versions of 
hadoop these hacks can mean it doesn't meet its intended goal of preventing 
conflicts anyway.
    ** From what I can tell, there's no good way to shade this to make it 
universally usable, so leaving it unshaded seems like a reasonable default.
    ** Without shading Hadoop, this module can be successfully used from Beam's 
wordcount example (which actually does have pre-existing hadoop dependencies 
already).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mizitch/incubator-beam sorter-gcs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/1327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1327
    
----
commit d07c4ce9349abac4d0c53223072f1c84a1dc98c6
Author: Mitch Shanklin <mshank...@google.com>
Date:   2016-11-09T22:09:49Z

    Some minor changes and fixes for sorter module. Includes:
    
    * Limit max memory for ExternalSorter and BufferedExternalSorter to 2047 MB 
to prevent int overflow within Hadoop's sorting library
    * Fix int overflow for large memory values in InMemorySorter
    * Add note about estimated disk use to README.MD
    * Fix to make Hadoop's sorting library put all temp files under the 
specified directory
    * Have Hadoop clean up the temp directory on exit
    * Stop shading hadoop dependencies. Some context:
    ** The existing shading is broken (modules that depend on this one cannot 
use it successfully).
    ** Hadoop's use of reflection in several instances makes shading the 
dependency "in a good way" nearly impossible. It requires a couple of rather 
brittle hacks, and, for clients that depend on certain conflicting versions of 
hadoop these hacks can mean it doesn't meet its intended goal of preventing 
conflicts anyway.
    ** From what I can tell, there's no good way to shade this to make it 
universally usable, so leaving it unshaded seems like a reasonable default.
    ** Without shading Hadoop, this module can be successfully used from Beam's 
wordcount example (which actually does have pre-existing hadoop dependencies 
already).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to