[ 
https://issues.apache.org/jira/browse/ARROW-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191720#comment-16191720
 ] 

ASF GitHub Bot commented on ARROW-1473:
---------------------------------------

GitHub user siddharthteotia opened a pull request:

    https://github.com/apache/arrow/pull/1163

    ARROW-1473: [JAVA] Initial prototype code hierarchy for review

    cc @jacques-n , @BryanCutler , @icexelloss , @elahrvivaz 
    
    1. **BaseFixedWidthVector abstract class** - can be extended with a 
template to generate all the fixed width vector types. 
    
    2. **FixedValueVectorsPrototype.java** - template similar to what we have 
currently in FixedValueVectors.java . However, all the common functionality has 
been moved from the template to BaseFixedWidthVector. So the template pretty 
much contains the custom mutator and accessor methods which are extremely 
complex as there is no partial approach to refactor them. 
    
    This is why I was suggesting that we should try to get rid of template for 
fixed value vectors. 
    
    Another approach could be to use the template for simple types like INT, 
FLOAT4, FLOAT8 since the "if" conditions in template code are probably not very 
complicated for these types. But for types like TIMESTAMP and others, there are 
giant "if" blocks.
    
    3. **PrototypeIntVectorNonCodegen** - Non code generated IntVector class 
that implements IntVector specific functionality by extending 
BaseFixedWidthVector. The get() functions in the Accessor bypass ArrowBuf and 
directly work with the memory address to fish out the value.
    
    4. **BaseVariableWidthVector abstract class** - looking at the code of 
VarCharVector and VarBinaryVector, it seems like the code is 95% same with only 
1 or 2 accessor functions being different. So we can implement all the 
functionality (including mutator and accessor as well) in the base class and 
then have small non code generated subclasses that just have the specific 
functionality for VarChar and VarBinary. We will no longer need 
VariableLengthVectors.java template.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/siddharthteotia/arrow ARROW-1473

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/arrow/pull/1163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1163
    
----
commit b62f99a603a4822756c610e52e4643980d4bebf1
Author: Phillip Cloud <cpcl...@gmail.com>
Date:   2017-09-28T00:58:08Z

    ARROW-1607: [C++] Implement DictionaryBuilder for Decimals
    
    Author: Phillip Cloud <cpcl...@gmail.com>
    
    Closes #1128 from cpcloud/ARROW-1607 and squashes the following commits:
    
    3137566 [Phillip Cloud] ARROW-1607: [C++] Implement DictionaryBuilder for 
Decimals

commit 78181d8ecfce3e658e143210adfc658d6259dec0
Author: Philipp Moritz <pcmor...@gmail.com>
Date:   2017-09-28T13:46:02Z

    ARROW-1609: [Plasma] Xcode 9 compilation workaround
    
    see https://github.com/apache/arrow/pull/1139
    
    Author: Philipp Moritz <pcmor...@gmail.com>
    
    Closes #1144 from pcmoritz/plasma-xcode-9-workaround and squashes the 
following commits:
    
    7642499 [Philipp Moritz] fix on other platforms
    c5cdddd [Philipp Moritz] xcode 9 compilation workaround

commit bcb29d081f26852c2d7b8329f542b1e81bfc9681
Author: Uwe L. Korn <uw...@xhochy.com>
Date:   2017-09-28T13:47:11Z

    ARROW-1620: Python: Download Boost in manylinux1 build from bintray
    
    Author: Uwe L. Korn <uw...@xhochy.com>
    
    Closes #1141 from xhochy/ARROW-1620 and squashes the following commits:
    
    30da182 [Uwe L. Korn] ARROW-1620: Python: Download Boost in manylinux1 
build from bintray

commit 7323677665c5b71203a1e526c0a19eb5520442d0
Author: siddharth <siddha...@dremio.com>
Date:   2017-09-29T13:58:09Z

    ARROW-1618: [JAVA] Reduce Heap Usage (Phase 1)
    
    cc @jacques-n , @icexelloss , @BryanCutler
    
    This is initial small phase of our attempt to reduce heap usage per vector.
    
    As part of investigation, we realized its better to address few things as 
part of subtasks for ARROW-1463. Accordingly, I need to update the requirements 
document w.r.t heap usage for ARROW-1471.
    
    This patch gets rid of Release Listener object in Allocation Manager as all 
the logic is implemented as part of AllocationManager itself.
    
    
https://docs.google.com/document/d/1MU-ah_bBHIxXNrd7SkwewGCOOexkXJ7cgKaCis5f-PI/edit
    
    Author: siddharth <siddha...@dremio.com>
    
    Closes #1142 from siddharthteotia/ARROW-1618 and squashes the following 
commits:
    
    77151a27 [siddharth] ARROW-1618: Reduce Heap Usage (Phase 1)

commit f8cf91d25eed5aebebb4aaec61fb9b2af9e97bb1
Author: Bryan Cutler <cutl...@gmail.com>
Date:   2017-09-29T15:31:33Z

    ARROW-1619: [Java] Set lastSet in JsonFileReader
    
    When reading a vector in JsonFileReader, lastSet should be set in 
VariableWidthVectors after reading inner vectors or else subsequent operations 
could corrupt the offsets.  This also allows to simplify some of the related 
code.  Additionally, ListVector.lastSet should be explicitly initialized to 0, 
which is it's starting offset.
    
    Author: Bryan Cutler <cutl...@gmail.com>
    
    Closes #1140 from BryanCutler/java-JsonReader-setLast-ARROW-1619 and 
squashes the following commits:
    
    8f97a3db [Bryan Cutler] added test for VarBinaryVector that checks lastSet 
after reading
    70df0cc4 [Bryan Cutler] set lastSet in JsonFileReader and initialize 
lastSet for ListVector

commit d4e09c7654030f459f57d330bec639699eadcef6
Author: Rene Sugar <rene.su...@gmail.com>
Date:   2017-09-29T20:29:38Z

    ARROW-1615 Added BUILD_WARNING_LEVEL and BUILD_WARNING_FLAGS to Setup…
    
    …CxxFlags.cmake
    
    Author: Rene Sugar <rene.su...@gmail.com>
    
    Closes #1145 from renesugar/ARROW-1615 and squashes the following commits:
    
    71a615e3 [Rene Sugar] ARROW-1615 Add -Wno-vla-extension and change 
non-checkin builds back to -Wall
    18958430 [Rene Sugar] ARROW-1615 Add -Wno-cast-align
    5fe4e8e8 [Rene Sugar] ARROW-1615 Move -Wno-shorten-64-to-32 after 
-Wconversion
    9d3c7ec3 [Rene Sugar] ARROW-1615 Identify compiler version for clang-802 
plus more warning entries
    5ebaf86e [Rene Sugar] ARROW-1615 Moved version specific warning entry
    971e61aa [Rene Sugar] ARROW-1615 Fixed version specific warning entry
    6cf24977 [Rene Sugar] ARROW-1615 Added more version specific Clang warning 
entries
    50def439 [Rene Sugar] ARROW-1615 Updated build warning level terminology
    ea906eb4 [Rene Sugar] ARROW-1615 Check compiler version before disabling 
some warnings
    159e1897 [Rene Sugar] ARROW-1615 Include CompilerInfo before SetupCxxFlags 
in arrow/python
    8359c966 [Rene Sugar] ARROW-1615 Added BUILD_WARNING_LEVEL and 
BUILD_WARNING_FLAGS to SetupCxxFlags.cmake

commit 7c616114fb83d02faa5921db0296ca994a9b232b
Author: Wes McKinney <wes.mckin...@twosigma.com>
Date:   2017-09-30T04:01:42Z

    ARROW-1600: [C++] Add Buffer constructor that wraps std::string
    
    Many other libraries interchange binary data with `std::string`. This makes 
it easy to wrap such data in an `arrow::Buffer`.
    
    It may be worth adding a function that creates a buffer from a string, but 
owns its memory.
    
    I also deprecated `arrow::GetBufferFromString`, which shouldn't have been 
public in the first place, since the new ctor is more general
    
    Author: Wes McKinney <wes.mckin...@twosigma.com>
    
    Closes #1147 from wesm/ARROW-1600 and squashes the following commits:
    
    f60f502c [Wes McKinney] Remove TestBuffer fixture
    644bf2b7 [Wes McKinney] Add Buffer ctor that wraps std::string

commit 796129b4f0f714fdb3c4fbf5bc2d2deb55424a84
Author: Wes McKinney <wes.mckin...@twosigma.com>
Date:   2017-09-30T04:02:58Z

    ARROW-838: [Python] Expand pyarrow.array to handle NumPy arrays not 
originating in pandas
    
    This unifies the ingest path for 1D data into `pyarrow.array`. I added the 
argument `from_pandas` to turn null sentinel checking on or off:
    
    ```
    In [8]: arr = np.random.randn(10000000)
    
    In [9]: arr[::3] = np.nan
    
    In [10]: arr2 = pa.array(arr)
    
    In [11]: arr2.null_count
    Out[11]: 0
    
    In [12]: %timeit arr2 = pa.array(arr)
    The slowest run took 5.43 times longer than the fastest. This could mean 
that an intermediate result is being cached.
    10000 loops, best of 3: 68.4 µs per loop
    
    In [13]: arr2 = pa.array(arr, from_pandas=True)
    
    In [14]: arr2.null_count
    Out[14]: 3333334
    
    In [15]: %timeit arr2 = pa.array(arr, from_pandas=True)
    1 loop, best of 3: 228 ms per loop
    ```
    
    When the data is contiguous, it is always zero-copy, but then 
`from_pandas=True` and no null mask is passed, then a null bitmap is 
constructed and populated.
    
    This also permits sequence reads into integers smaller than int64:
    
    ```
    In [17]: pa.array([1, 2, 3, 4], type='i1')
    Out[17]:
    <pyarrow.lib.Int8Array object at 0x7ffa1c1c65e8>
    [
      1,
      2,
      3,
      4
    ]
    ```
    
    Oh, I also added NumPy-like string type aliases:
    
    ```
    In [18]: pa.int32() == 'i4'
    Out[18]: True
    ```
    
    Author: Wes McKinney <wes.mckin...@twosigma.com>
    
    Closes #1146 from wesm/expand-py-array-method and squashes the following 
commits:
    
    1570e525 [Wes McKinney] Code review comments
    d3bbb3c3 [Wes McKinney] Handle type aliases in cast, too
    797f0151 [Wes McKinney] Allow null checking to be skipped with 
from_pandas=False in pyarrow.array
    f2802fc7 [Wes McKinney] Cleaner codepath for numpy->arrow conversions
    587c575a [Wes McKinney] Add direct types sequence converters for more data 
types
    cf40b767 [Wes McKinney] Add type aliases, some unit tests
    7b530e4b [Wes McKinney] Consolidate both sequence and ndarray/Series/Index 
conversion in pyarrow.Array

commit 4e0f799e72048587715cb50abdd8d239f3d46d13
Author: Rene Sugar <rene.su...@gmail.com>
Date:   2017-09-30T16:42:37Z

    ARROW-1626 Add make targets to run the inter-procedural static analys…
    
    …is tool called infer
    
    Author: Rene Sugar <rene.su...@gmail.com>
    
    Closes #1149 from renesugar/infer and squashes the following commits:
    
    8591b5ff [Rene Sugar] ARROW-1626 Add make targets to run the 
inter-procedural static analysis tool called infer

commit 84e5e02fbf412c979387b0a53b0ad0c6d5c5e790
Author: Wes McKinney <wes.mckin...@twosigma.com>
Date:   2017-09-30T18:25:04Z

    ARROW-1624: [C++] Fix build on LLVM 4.0, remove some clang warning 
suppressions
    
    I'm going to quick make a pass through later today and see how many of 
these warning suppressions I can remove
    
    Author: Wes McKinney <wes.mckin...@twosigma.com>
    
    Closes #1148 from wesm/warning-fixes and squashes the following commits:
    
    e930152a [Wes McKinney] Only build compute modules if -DARROW_COMPUTE=ON
    d6ca7ac0 [Wes McKinney] Slight refactor of CMakeLists.txt to move Arrow 
library setup to src/arrow
    e2c61a2e [Wes McKinney] Use -Wno-unknown-warning-option
    3d2d7265 [Wes McKinney] Fix travis CI script
    6d0d4117 [Wes McKinney] Use BUILD_WARNING_LEVEL in Travis CI
    1bec4a76 [Wes McKinney] Fix some more compiler warnings
    cae05fb5 [Wes McKinney] Fix documentation compiler warnings
    f76b2b93 [Wes McKinney] Fix a bunch of documentation warnings
    ac54e2df [Wes McKinney] Remove some clang warning suppressions, fix warnings
    6935b8c8 [Wes McKinney] Fix compiler warnings with clang-4.0

commit 0c8b861f93884f2868eb631d8fceee3a8b8905ec
Author: Wes McKinney <wes.mckin...@twosigma.com>
Date:   2017-10-02T00:32:53Z

    ARROW-1629: [C++] Add miscellaneous DCHECKs and minor changes based on 
infer tool output
    
    This was an interesting journey through some esoterica. I went through all 
the warnings/errors that infer (fbinfer.com) outputs and made changes if it 
seemed warranted. Some of the checks might be overkill.
    
    See https://gist.github.com/wesm/fc6809e4f4aaef3ecfeb21b8123627bc for a 
summary of actions on each warning
    
    Most of the errors that Infer wasn't happy about were already addressed by 
DCHECKs. This was useful to go through all these cases -- in nearly all cases 
the null references are impossible or would be the result of an error on behalf 
of the application programmer. For example: we do not do array boundschecking 
in most cases in production builds, but these boundschecks are included in 
debug builds to assist with catching bugs caused by improper use by application 
developers.
    
    As a matter of convention, we should not use error `Status` to do parameter 
validation or asserting pre-conditions that are the responsibility of the 
library user. If parameter validation is required in binding code (e.g. 
Python), then this validation should happen in the binding layer, not in the 
core C++ library.
    
    There are some other cases where we have a `std::shared_ptr<T>` out 
variable with code like:
    
    ```
    RETURN_NOT_OK(Foo(..., &out));
    out->Method(...);
    ```
    
    Here, infer complains that `out` could contain a null pointer, but our 
contract with developers is that if `Foo` returns successfully that `out` is 
non-null.
    
    Interestingly, infer doesn't like some stack variables that are bound in 
C++11 lambda expressions. I noted these in the gist with `LAMBDA`.
    
    Author: Wes McKinney <wes.mckin...@twosigma.com>
    
    Closes #1151 from wesm/fix-infer-issues and squashes the following commits:
    
    f285be95 [Wes McKinney] Restore code paths for empty chunked arrays for 
backwards compat
    5aa86ce2 [Wes McKinney] More DCHECK esoterica / tweaks based on infer report
    22c5d361 [Wes McKinney] Address a couple more infer warnings
    75131a6b [Wes McKinney] Some more minor infer fixes
    5ff3e3a5 [Wes McKinney] Compilation fix
    05316ce4 [Wes McKinney] Fix miscellaneous things that infer does not like. 
Make some Python helper functions internal / non-exported

commit a29b0db252f439ff472600c9332006266f601591
Author: Ofek Lev <ofekmeis...@gmail.com>
Date:   2017-09-26T17:11:49Z

    [Python] Update README.md to reflect that wheels are available on all 
platforms
    
    Close #1136
    
    Change-Id: Ib9261972fb720df351931902a1666f23b0a0132f

commit 49e02d27227332b06528816bbf73e434a4e1ebcb
Author: Philipp Moritz <pcmor...@gmail.com>
Date:   2017-10-02T12:09:47Z

    ARROW-1625: [Serialization] Support OrderedDict and defaultdict 
serialization
    
    This PR adds support for OrderedDicts and default dicts using custom 
serialization handlers.
    
    Author: Philipp Moritz <pcmor...@gmail.com>
    
    Closes #1152 from pcmoritz/pydict-exact2 and squashes the following commits:
    
    431e0272 [Philipp Moritz] make cloudpickle optional
    052b1aa9 [Philipp Moritz] I'd prefer this not to be a runtime dependency
    db19ab9b [Philipp Moritz] add tests
    799d983e [Philipp Moritz] do not interpret OrderedDict as dict

commit cf738f57e408333db960e0632524437eb7c62d7f
Author: siddharth <siddha...@dremio.com>
Date:   2017-10-04T17:46:06Z

    ARROW-1473: initial prototype hierarchy

----


> [JAVA] Create Prototype Code Hierarchy (alt A)
> ----------------------------------------------
>
>                 Key: ARROW-1473
>                 URL: https://issues.apache.org/jira/browse/ARROW-1473
>             Project: Apache Arrow
>          Issue Type: Sub-task
>            Reporter: Jacques Nadeau
>              Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to