[jira] [Assigned] (ARROW-6839) [Java] access File Footer custom_metadata

2019-10-09 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-6839:
-

Assignee: Ji Liu

> [Java] access File Footer custom_metadata
> -
>
> Key: ARROW-6839
> URL: https://issues.apache.org/jira/browse/ARROW-6839
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: John Muehlhausen
>Assignee: Ji Liu
>Priority: Minor
>
> Access custom_metadata from ARROW-6836



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6850) [Java] Jdbc converter support Null type

2019-10-11 Thread Ji Liu (Jira)
Ji Liu created ARROW-6850:
-

 Summary: [Java] Jdbc converter support Null type
 Key: ARROW-6850
 URL: https://issues.apache.org/jira/browse/ARROW-6850
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


java.sql.Types.Null is not supported yet since we have no NullVector in Java 
code before.

This could be implemented after ARROW-1638 merged (IPC roundtrip for null type).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6853) [Java] Support vector and dictionary encoder use different hasher for calculating hashCode

2019-10-11 Thread Ji Liu (Jira)
Ji Liu created ARROW-6853:
-

 Summary: [Java] Support vector and dictionary encoder use 
different hasher for calculating hashCode
 Key: ARROW-6853
 URL: https://issues.apache.org/jira/browse/ARROW-6853
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Hasher interface was introduce in ARROW-5898 and now have two different 
implementations ({{MurmurHasher and }}{{SimpleHasher}}) and it could be more in 
the future.

And currently {{ValueVector#hashCode}} and {{DictionaryHashTable}} only use 
{{SimpleHasher}} for calculating hashCode. This issue enables them to use 
different hasher or even user-defined hasher for their own use cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6853) [Java] Support vector and dictionary encoder use different hasher for calculating hashCode

2019-10-11 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-6853:
--
Description: 
Hasher interface was introduce in ARROW-5898 and now have two different 
implementations ({{MurmurHasher and SimpleHasher}}) and it could be more in the 
future.

And currently {{ValueVector#hashCode}} and {{DictionaryHashTable}} only use 
{{SimpleHasher}} for calculating hashCode. This issue enables them to use 
different hasher or even user-defined hasher for their own use cases.

  was:
Hasher interface was introduce in ARROW-5898 and now have two different 
implementations ({{MurmurHasher and }}{{SimpleHasher}}) and it could be more in 
the future.

And currently {{ValueVector#hashCode}} and {{DictionaryHashTable}} only use 
{{SimpleHasher}} for calculating hashCode. This issue enables them to use 
different hasher or even user-defined hasher for their own use cases.


> [Java] Support vector and dictionary encoder use different hasher for 
> calculating hashCode
> --
>
> Key: ARROW-6853
> URL: https://issues.apache.org/jira/browse/ARROW-6853
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>
> Hasher interface was introduce in ARROW-5898 and now have two different 
> implementations ({{MurmurHasher and SimpleHasher}}) and it could be more in 
> the future.
> And currently {{ValueVector#hashCode}} and {{DictionaryHashTable}} only use 
> {{SimpleHasher}} for calculating hashCode. This issue enables them to use 
> different hasher or even user-defined hasher for their own use cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot

2019-10-12 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu resolved ARROW-6661.
---
Fix Version/s: 0.15.1
   Resolution: Fixed

Issue resolved in [https://github.com/apache/arrow/pull/5470]

> [Java] Implement APIs like slice to enhance VectorSchemaRoot
> 
>
> Key: ARROW-6661
> URL: https://issues.apache.org/jira/browse/ARROW-6661
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently in Java Implementation there is no APIs like slice for record batch 
> like C++/Python.
> This issue is about to implement slice/getVector/addVector/removeVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API

2019-10-12 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949948#comment-16949948
 ] 

Ji Liu edited comment on ARROW-6464 at 10/12/19 7:39 AM:
-

Issue resolved in

[https://github.com/apache/arrow/pull/5293]


was (Author: tianchen92):
Issue resolve in

[https://github.com/apache/arrow/pull/5293]

> [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
> ---
>
> Key: ARROW-6464
> URL: https://issues.apache.org/jira/browse/ARROW-6464
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently {{FixedSizeListVector#splitAndTransfer}} actually use 
> {{copyValueSafe}} which has memory copy, we should use slice API instead.
> Meanwhile, {{splitAndTransfer}} in all classes should position index check at 
> beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API

2019-10-12 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu resolved ARROW-6464.
---
Fix Version/s: 0.15.1
   Resolution: Fixed

Issue resolve in

[https://github.com/apache/arrow/pull/5293]

> [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
> ---
>
> Key: ARROW-6464
> URL: https://issues.apache.org/jira/browse/ARROW-6464
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently {{FixedSizeListVector#splitAndTransfer}} actually use 
> {{copyValueSafe}} which has memory copy, we should use slice API instead.
> Meanwhile, {{splitAndTransfer}} in all classes should position index check at 
> beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API

2019-10-12 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949948#comment-16949948
 ] 

Ji Liu edited comment on ARROW-6464 at 10/12/19 7:40 AM:
-

Issue resolved by pull request 5293

[https://github.com/apache/arrow/pull/5293]


was (Author: tianchen92):
Issue resolved in

[https://github.com/apache/arrow/pull/5293]

> [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
> ---
>
> Key: ARROW-6464
> URL: https://issues.apache.org/jira/browse/ARROW-6464
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently {{FixedSizeListVector#splitAndTransfer}} actually use 
> {{copyValueSafe}} which has memory copy, we should use slice API instead.
> Meanwhile, {{splitAndTransfer}} in all classes should position index check at 
> beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot

2019-10-12 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949939#comment-16949939
 ] 

Ji Liu edited comment on ARROW-6661 at 10/12/19 7:40 AM:
-

Issue resolved by pull request 5470

[https://github.com/apache/arrow/pull/5470]


was (Author: tianchen92):
Issue resolved in [https://github.com/apache/arrow/pull/5470]

> [Java] Implement APIs like slice to enhance VectorSchemaRoot
> 
>
> Key: ARROW-6661
> URL: https://issues.apache.org/jira/browse/ARROW-6661
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently in Java Implementation there is no APIs like slice for record batch 
> like C++/Python.
> This issue is about to implement slice/getVector/addVector/removeVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6871) [Java] Enhance TransferPair related parameters check and tests

2019-10-13 Thread Ji Liu (Jira)
Ji Liu created ARROW-6871:
-

 Summary: [Java] Enhance TransferPair related parameters check and 
tests
 Key: ARROW-6871
 URL: https://issues.apache.org/jira/browse/ARROW-6871
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


{{TransferPair}} related param checks in different classes have potential 
problems:

i. {{copyValueSafe}} do not check from index, if from > valueCount, no error is 
shown.

ii. {{splitAndTansferPair}} has no indices check in classes like 
{{VarcharVector}}

iii. {{splitAndTranserPair}} indices check in classes like UnionVector is not 
correct (Preconditions.checkArgument(startIndex + length <= valueCount)), 
should check params separately.

iv. some assert usages should be replaced with {{Preconditions}}.

v. should add more UT to cover corner cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6871) [Java] Enhance TransferPair related parameters check and tests

2019-10-13 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-6871:
--
Description: 
{{TransferPair}} related param checks in different classes have potential 
problems:

i. {{copyValueSafe}} do not check from index, if from > valueCount, no error is 
shown.

ii. {{splitAndTansfer}} has no indices check in classes like {{VarcharVector}}

iii. {{splitAndTranser}} indices check in classes like UnionVector is not 
correct (Preconditions.checkArgument(startIndex + length <= valueCount)), 
should check params separately.

iv. some assert usages should be replaced with {{Preconditions}}.

v. should add more UT to cover corner cases.

  was:
{{TransferPair}} related param checks in different classes have potential 
problems:

i. {{copyValueSafe}} do not check from index, if from > valueCount, no error is 
shown.

ii. {{splitAndTansferPair}} has no indices check in classes like 
{{VarcharVector}}

iii. {{splitAndTranserPair}} indices check in classes like UnionVector is not 
correct (Preconditions.checkArgument(startIndex + length <= valueCount)), 
should check params separately.

iv. some assert usages should be replaced with {{Preconditions}}.

v. should add more UT to cover corner cases.


> [Java] Enhance TransferPair related parameters check and tests
> --
>
> Key: ARROW-6871
> URL: https://issues.apache.org/jira/browse/ARROW-6871
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>
> {{TransferPair}} related param checks in different classes have potential 
> problems:
> i. {{copyValueSafe}} do not check from index, if from > valueCount, no error 
> is shown.
> ii. {{splitAndTansfer}} has no indices check in classes like {{VarcharVector}}
> iii. {{splitAndTranser}} indices check in classes like UnionVector is not 
> correct (Preconditions.checkArgument(startIndex + length <= valueCount)), 
> should check params separately.
> iv. some assert usages should be replaced with {{Preconditions}}.
> v. should add more UT to cover corner cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6871) [Java] Enhance TransferPair related parameters check and tests

2019-10-14 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950754#comment-16950754
 ] 

Ji Liu commented on ARROW-6871:
---

Thanks for your reminder, I will also add a benchmark, if there's no much 
regression, params check is needed/corrected to avoid potential problems.

> [Java] Enhance TransferPair related parameters check and tests
> --
>
> Key: ARROW-6871
> URL: https://issues.apache.org/jira/browse/ARROW-6871
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>
> {{TransferPair}} related param checks in different classes have potential 
> problems:
> i. {{copyValueSafe}} do not check from index, if from > valueCount, no error 
> is shown.
> ii. {{splitAndTansfer}} has no indices check in classes like {{VarcharVector}}
> iii. {{splitAndTranser}} indices check in classes like UnionVector is not 
> correct (Preconditions.checkArgument(startIndex + length <= valueCount)), 
> should check params separately.
> iv. some assert usages should be replaced with {{Preconditions}}.
> v. should add more UT to cover corner cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6887) [Java] Create prose documentation for using ValueVectors

2019-10-14 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-6887:
-

Assignee: Ji Liu

> [Java] Create prose documentation for using ValueVectors
> 
>
> Key: ARROW-6887
> URL: https://issues.apache.org/jira/browse/ARROW-6887
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>
> We should create documentation for the library that demonstrates:
> 1.  Basic construction of ValueVectors.  Highlighting:
>     * ValueVector lifecycle
>     * Reading by rows using Readers (mentioning that it is not as efficient 
> as direct access).
>     * Populating with Writers
> 2.  Reading and writing IPC stream format and file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2892) [Plasma] Implement interface to get Java arrow objects from Plasma

2019-10-14 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951637#comment-16951637
 ] 

Ji Liu commented on ARROW-2892:
---

[~emkornfi...@gmail.com] I'll take a close watch later.

> [Plasma] Implement interface to get Java arrow objects from Plasma
> --
>
> Key: ARROW-2892
> URL: https://issues.apache.org/jira/browse/ARROW-2892
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma, Java
>Reporter: Philipp Moritz
>Priority: Major
>
> Currently we have a low level interface to access bytes stored in plasma from 
> Java, using the JNI: [https://github.com/apache/arrow/pull/2065/]
>  
> As a followup, we should implement reading (and writing) Java arrow objects 
> from plasma, if possible using zero-copy.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors

2019-10-14 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951647#comment-16951647
 ] 

Ji Liu commented on ARROW-6887:
---

Where should we place the documentation?

> [Java] Create prose documentation for using ValueVectors
> 
>
> Key: ARROW-6887
> URL: https://issues.apache.org/jira/browse/ARROW-6887
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>
> We should create documentation (in restructured text) for the library that 
> demonstrates:
> 1.  Basic construction of ValueVectors.  Highlighting:
>     * ValueVector lifecycle
>     * Reading by rows using Readers (mentioning that it is not as efficient 
> as direct access).
>     * Populating with Writers
> 2.  Reading and writing IPC stream format and file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6889) [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor StackOverFlow

2019-10-15 Thread Ji Liu (Jira)
Ji Liu created ARROW-6889:
-

 Summary: [Java] ComplexCopier enable FixedSizeList type & fix 
RangeEualsVisitor StackOverFlow
 Key: ARROW-6889
 URL: https://issues.apache.org/jira/browse/ARROW-6889
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


i. Enable {{ComplexCopier}} copy {{FixedSizeListVector}} value, add related 
tests

ii. Fix {{RangeEqualsVisitor#compareFixedSizeListVectors}} StackOverFlow



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6898) [Java] Fix potential memory leak in ArrowWriter and several test classes

2019-10-15 Thread Ji Liu (Jira)
Ji Liu created ARROW-6898:
-

 Summary: [Java] Fix potential memory leak in ArrowWriter and 
several test classes
 Key: ARROW-6898
 URL: https://issues.apache.org/jira/browse/ARROW-6898
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


ARROW-6040 fixed the problem that dictionary entries are required in IPC 
streams even when empty, which only writes dictionaries when there are at least 
one batch. In this way, if we write empty stream and invoke ArrowWriter#close, 
the dictionaries are not closed leading to memory leak (they are closed after 
the write operation), and it’s really hard to debug, this problem was found by 
{{TestArrowReaderWriter#testEmptyStreamInStreamingIPC}} when I tried to close 
allocator after the test. 

 

Besides, there are several test classes have potential memory leak without 
closing allocator/vector/buf etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6912) [Java] Extract a common base class for avro converter consumers

2019-10-16 Thread Ji Liu (Jira)
Ji Liu created ARROW-6912:
-

 Summary: [Java] Extract a common base class for avro converter 
consumers
 Key: ARROW-6912
 URL: https://issues.apache.org/jira/browse/ARROW-6912
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Currently Avro converter consumers have some common variables and methods which 
could be eliminated by extracting a common class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6912) [Java] Extract a common base class for avro converter consumers

2019-10-17 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-6912:
--
Parent: ARROW-5845
Issue Type: Sub-task  (was: Improvement)

> [Java] Extract a common base class for avro converter consumers
> ---
>
> Key: ARROW-6912
> URL: https://issues.apache.org/jira/browse/ARROW-6912
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>
> Currently Avro converter consumers have some common variables and methods 
> which could be eliminated by extracting a common class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6930) [Java] Create static factory methods for common array types of testing.

2019-10-17 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-6930:
-

Assignee: Ji Liu

> [Java] Create static factory methods for common array types of testing.
> ---
>
> Key: ARROW-6930
> URL: https://issues.apache.org/jira/browse/ARROW-6930
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Minor
>
> There is a lot of verbosity in the construction of Arrays for testing 
> purposes (multiple lines of setSafe(...) or set(...).  We should start adding 
> some static factory methods to make test setup clearer  and more concise.  A 
> strawman proposal for BigIntVector might look like:
>  
> static BigIntVector create(String name, BufferAllocator allocator, Long... 
> values).
>  
> Usage would be something like:
>  
> try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, 
> null, 456L),
>       BigIntVector expected = BigIntVectorCreate("sample_data", allocator, 
> 1L, null, 0L),) {
>    output = doSomethingWith(input);
>    assertThat(output).isEqualTo(expected);
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6931) [Java] Consider starting to use Google Truth Fluent Assertions library

2019-10-17 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-6931:
-

Assignee: Ji Liu

> [Java] Consider starting to use Google Truth Fluent Assertions library
> --
>
> Key: ARROW-6931
> URL: https://issues.apache.org/jira/browse/ARROW-6931
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>
> This can offer more readable asserts than the limited JUnit assertions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6896) [Java] Vector schema root should not share vectors

2019-10-21 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956607#comment-16956607
 ] 

Ji Liu commented on ARROW-6896:
---

I was a little confused about VectorSchemaRoot before (why not just call it 
RecordBatch), and recently I found it was a little different with other 
implementations when I writing documentation, for example, for IPC, the java 
reader will always hold the same vector schema root and updates for every call 
for loadNextBatch, but in python side, it uses different batches in 
writer/reader. From this perspective, I think VectorSchemaRoot != a record 
batch is reasonable. Just wonder why the implementation is different in Java?

The addColumn/removeColumn API was introduced by my recent PR(after 0.15) which 
I regard is as a ‘record batch'. If we finally reach consistent and want to 
make some fix on it, it's better include it into 0.15.1 I think, so the mistake 
wouldn't exposed to users.

> [Java] Vector schema root should not share vectors
> --
>
> Key: ARROW-6896
> URL: https://issues.apache.org/jira/browse/ARROW-6896
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7021) [Java] UnionFixedSizeListWriter decimal type should check writer index

2019-10-29 Thread Ji Liu (Jira)
Ji Liu created ARROW-7021:
-

 Summary: [Java] UnionFixedSizeListWriter decimal type should check 
writer index
 Key: ARROW-7021
 URL: https://issues.apache.org/jira/browse/ARROW-7021
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


{{UnionFixedSizeListWriter}} should check writer index for decimal type (just 
as other types) to ensure the values written not exceed listSize.

Otherwise, the writer may continue to write data into it’s underlying vector 
quietly even the the writer.idx() > listSize * index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6930) [Java] Create utility class for common array types of testing

2019-10-29 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-6930:
--
Summary: [Java] Create utility class for common array types of testing  
(was: [Java] Create static factory methods for common array types of testing.)

> [Java] Create utility class for common array types of testing
> -
>
> Key: ARROW-6930
> URL: https://issues.apache.org/jira/browse/ARROW-6930
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> There is a lot of verbosity in the construction of Arrays for testing 
> purposes (multiple lines of setSafe(...) or set(...).  We should start adding 
> some static factory methods to make test setup clearer  and more concise.  A 
> strawman proposal for BigIntVector might look like:
>  
> static BigIntVector create(String name, BufferAllocator allocator, Long... 
> values).
>  
> Usage would be something like:
>  
> try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, 
> null, 456L),
>       BigIntVector expected = BigIntVectorCreate("sample_data", allocator, 
> 1L, null, 0L),) {
>    output = doSomethingWith(input);
>    assertThat(output).isEqualTo(expected);
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7026) [Java] Remove assertions in MessageSerializer/vector/writer/reader

2019-10-30 Thread Ji Liu (Jira)
Ji Liu created ARROW-7026:
-

 Summary: [Java] Remove assertions in 
MessageSerializer/vector/writer/reader
 Key: ARROW-7026
 URL: https://issues.apache.org/jira/browse/ARROW-7026
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Currently assertions exists in many classes like 
{{MessagaSerializer/JsonReader/JsonWriter/ListVector}} etc.

i. If jvm arguments are not specified, these checks will skipped and lead to 
potential problems.

ii. Java errors produced by failed assertions are not caught by traditional 
catch clauses.

To fix this, use {{Preconditions}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7026) [Java] Remove assertions in MessageSerializer/vector/writer/reader

2019-10-30 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962768#comment-16962768
 ] 

Ji Liu commented on ARROW-7026:
---

cc [~jnadeau] [~emkornfi...@gmail.com]

> [Java] Remove assertions in MessageSerializer/vector/writer/reader
> --
>
> Key: ARROW-7026
> URL: https://issues.apache.org/jira/browse/ARROW-7026
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>
> Currently assertions exists in many classes like 
> {{MessagaSerializer/JsonReader/JsonWriter/ListVector}} etc.
> i. If jvm arguments are not specified, these checks will skipped and lead to 
> potential problems.
> ii. Java errors produced by failed assertions are not caught by traditional 
> catch clauses.
> To fix this, use {{Preconditions}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6931) [Java] Consider starting to use Google Truth Fluent Assertions library

2019-10-30 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963584#comment-16963584
 ] 

Ji Liu commented on ARROW-6931:
---

Add Google Truth Fluent Assertions library dependency, so we can choose which 
one to use in the test?

Also dose this need a discussion in ML?

> [Java] Consider starting to use Google Truth Fluent Assertions library
> --
>
> Key: ARROW-6931
> URL: https://issues.apache.org/jira/browse/ARROW-6931
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>
> This can offer more readable asserts than the limited JUnit assertions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7026) [Java] Remove assertions in MessageSerializer/vector/writer/reader

2019-11-04 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966597#comment-16966597
 ] 

Ji Liu commented on ARROW-7026:
---

Besides, some assertions are in the hot path(i.e. {{VarCharVector}} set and get 
API), for those usages, should we use Preconditions or just ignore them to 
avoid potential performance regression?

> [Java] Remove assertions in MessageSerializer/vector/writer/reader
> --
>
> Key: ARROW-7026
> URL: https://issues.apache.org/jira/browse/ARROW-7026
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>
> Currently assertions exists in many classes like 
> {{MessagaSerializer/JsonReader/JsonWriter/ListVector}} etc.
> i. If jvm arguments are not specified, these checks will skipped and lead to 
> potential problems.
> ii. Java errors produced by failed assertions are not caught by traditional 
> catch clauses.
> To fix this, use {{Preconditions}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7152) [Java] Delete useless class DiffFunction

2019-11-12 Thread Ji Liu (Jira)
Ji Liu created ARROW-7152:
-

 Summary: [Java] Delete useless class DiffFunction
 Key: ARROW-7152
 URL: https://issues.apache.org/jira/browse/ARROW-7152
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


{{DiffFunction}} was used in the initial implementation of visitors, since 
currently visitors logic has been refactored and this class no longer useful.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6930) [Java] Create utility class for populating vector values used for test purpose only

2019-11-13 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-6930:
--
Issue Type: New Feature  (was: Improvement)

> [Java] Create utility class for populating vector values used for test 
> purpose only
> ---
>
> Key: ARROW-6930
> URL: https://issues.apache.org/jira/browse/ARROW-6930
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> There is a lot of verbosity in the construction of Arrays for testing 
> purposes (multiple lines of setSafe(...) or set(...).  We should start adding 
> some static factory methods to make test setup clearer  and more concise.  A 
> strawman proposal for BigIntVector might look like:
>  
> static BigIntVector create(String name, BufferAllocator allocator, Long... 
> values).
>  
> Usage would be something like:
>  
> try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, 
> null, 456L),
>       BigIntVector expected = BigIntVectorCreate("sample_data", allocator, 
> 1L, null, 0L),) {
>    output = doSomethingWith(input);
>    assertThat(output).isEqualTo(expected);
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6930) [Java] Create utility class for populating vector values used for test purpose only

2019-11-13 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-6930:
--
Summary: [Java] Create utility class for populating vector values used for 
test purpose only  (was: [Java] Create utility class for common array types of 
testing)

> [Java] Create utility class for populating vector values used for test 
> purpose only
> ---
>
> Key: ARROW-6930
> URL: https://issues.apache.org/jira/browse/ARROW-6930
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> There is a lot of verbosity in the construction of Arrays for testing 
> purposes (multiple lines of setSafe(...) or set(...).  We should start adding 
> some static factory methods to make test setup clearer  and more concise.  A 
> strawman proposal for BigIntVector might look like:
>  
> static BigIntVector create(String name, BufferAllocator allocator, Long... 
> values).
>  
> Usage would be something like:
>  
> try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, 
> null, 456L),
>       BigIntVector expected = BigIntVectorCreate("sample_data", allocator, 
> 1L, null, 0L),) {
>    output = doSomethingWith(input);
>    assertThat(output).isEqualTo(expected);
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6930) [Java] Create utility class for populating vector values used for test purpose only

2019-11-14 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-6930:
--
Description: 
There is a lot of verbosity in the construction of Arrays for testing purposes 
(multiple lines of setSafe(...) or set(...).
We should start adding a utility class to make test setup clearer and more 
concise, note this class should be located in arrow-vector test package and 
could be used in other module’s testing by adding dependency:

{{}}
{{org.apache.arrow}}
{{arrow-vector}}
{{${project.version}}}
{{tests}}
{{test-jar}}
{{test}}
{{}}

Usage would be something like:
{quote}try (IntVector vector = new IntVector(“vector”, allocator)) {
ValueVectorPopulator.setVector(vector, 1, 2, null, 4, 5);
output = doSomethingWith(input);
assertThat(output).isEqualTo(expected);
}
{quote}

  was:
There is a lot of verbosity in the construction of Arrays for testing purposes 
(multiple lines of setSafe(...) or set(...).  We should start adding some 
static factory methods to make test setup clearer  and more concise.  A 
strawman proposal for BigIntVector might look like:

 

static BigIntVector create(String name, BufferAllocator allocator, Long... 
values).

 

Usage would be something like:

 

try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, 
null, 456L),

      BigIntVector expected = BigIntVectorCreate("sample_data", allocator, 1L, 
null, 0L),) {

   output = doSomethingWith(input);

   assertThat(output).isEqualTo(expected);

}


> [Java] Create utility class for populating vector values used for test 
> purpose only
> ---
>
> Key: ARROW-6930
> URL: https://issues.apache.org/jira/browse/ARROW-6930
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> There is a lot of verbosity in the construction of Arrays for testing 
> purposes (multiple lines of setSafe(...) or set(...).
> We should start adding a utility class to make test setup clearer and more 
> concise, note this class should be located in arrow-vector test package and 
> could be used in other module’s testing by adding dependency:
> {{}}
> {{org.apache.arrow}}
> {{arrow-vector}}
> {{${project.version}}}
> {{tests}}
> {{test-jar}}
> {{test}}
> {{}}
> Usage would be something like:
> {quote}try (IntVector vector = new IntVector(“vector”, allocator)) {
> ValueVectorPopulator.setVector(vector, 1, 2, null, 4, 5);
> output = doSomethingWith(input);
> assertThat(output).isEqualTo(expected);
> }
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors

2019-11-14 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974842#comment-16974842
 ] 

Ji Liu commented on ARROW-6887:
---

Seems the web content([http://arrow.apache.org/docs/]) is not updated yet, 
should we do something to make this docs work? [~wesm] [~emkornfi...@gmail.com]

 

> [Java] Create prose documentation for using ValueVectors
> 
>
> Key: ARROW-6887
> URL: https://issues.apache.org/jira/browse/ARROW-6887
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> We should create documentation (in restructured text) for the library that 
> demonstrates:
> 1.  Basic construction of ValueVectors.  Highlighting:
>     * ValueVector lifecycle
>     * Reading by rows using Readers (mentioning that it is not as efficient 
> as direct access).
>     * Populating with Writers
> 2.  Reading and writing IPC stream format and file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6019) [Java] Port Jdbc and Avro adapter to new directory

2019-11-14 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu closed ARROW-6019.
-
Resolution: Won't Do

> [Java] Port Jdbc and Avro adapter to new directory 
> ---
>
> Key: ARROW-6019
> URL: https://issues.apache.org/jira/browse/ARROW-6019
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>
> As discussed in mail list, adapters are different from native reader.
> This issue is used to track these issues:
> i. create new “contrib” directory and move Jdbc/Avro adapter to it.
> ii. provide more description.
> iii. change orc readers structure to “converter"
> cc [~emkornfi...@gmail.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-1175) [Java] Implement/test dictionary-encoded subfields

2019-11-14 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu resolved ARROW-1175.
---
Resolution: Fixed

> [Java] Implement/test dictionary-encoded subfields
> --
>
> Key: ARROW-1175
> URL: https://issues.apache.org/jira/browse/ARROW-1175
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Ji Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> We do not have any tests about types like:
> {code}
> List
> {code}
> cc [~julienledem] [~elahrvivaz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6600) [Java] Implement dictionary-encoded subfields for Union type

2019-11-14 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu closed ARROW-6600.
-
Resolution: Later

> [Java] Implement dictionary-encoded subfields for Union type
> 
>
> Key: ARROW-6600
> URL: https://issues.apache.org/jira/browse/ARROW-6600
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Implement dictionary-encoded subfields for {{Union}} type. Each child vector 
> could be encodable or not.
>  
> Meanwhile extra common logic into {{DictionaryEncoder}} as well as refactor 
> List subfield encoding to keep consistent with {{Struct/Union}} type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors

2019-11-15 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974971#comment-16974971
 ] 

Ji Liu commented on ARROW-6887:
---

I see.

> [Java] Create prose documentation for using ValueVectors
> 
>
> Key: ARROW-6887
> URL: https://issues.apache.org/jira/browse/ARROW-6887
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> We should create documentation (in restructured text) for the library that 
> demonstrates:
> 1.  Basic construction of ValueVectors.  Highlighting:
>     * ValueVector lifecycle
>     * Reading by rows using Readers (mentioning that it is not as efficient 
> as direct access).
>     * Populating with Writers
> 2.  Reading and writing IPC stream format and file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7259) [Java] Support subfield encoder use different hasher

2019-11-25 Thread Ji Liu (Jira)
Ji Liu created ARROW-7259:
-

 Summary: [Java] Support subfield encoder use different hasher
 Key: ARROW-7259
 URL: https://issues.apache.org/jira/browse/ARROW-7259
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Currently {{ListSubFieldEncoder/StructSubFieldEncoder}} use default hasher for 
calculating hashCode.

This issue enables them to use different hasher or even user-defined hasher for 
their own use cases just like {{DictionaryEncoder}} does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7264) [Java] RangeEqualsVisitor type check is not correct

2019-11-26 Thread Ji Liu (Jira)
Ji Liu created ARROW-7264:
-

 Summary: [Java] RangeEqualsVisitor type check is not correct
 Key: ARROW-7264
 URL: https://issues.apache.org/jira/browse/ARROW-7264
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Affects Versions: 0.15.1
Reporter: Ji Liu
Assignee: Ji Liu


Currently {{RangeEqualsVisitor}} generally only checks type once and keep the 
result to avoid repeated type checking, see
{code:java}
typeCompareResult = 
left.getField().getType().equals(right.getField().getType());
{code}
This only compares {{ArrowType}} and for complex type, this may cause 
unexpected behavior, for example {{List}} and {{List}} would be 
type equals which not consider their child field.

We should compare Field here instead and to make it more extendable, we use 
{{TypeEqualsVisitor}} to compare Field, in this way, one could choose whether 
checks names or metadata either.

 

Also provide a test for ListVector to validate this change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6889) [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor StackOverFlow

2019-11-27 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu resolved ARROW-6889.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

> [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor 
> StackOverFlow
> 
>
> Key: ARROW-6889
> URL: https://issues.apache.org/jira/browse/ARROW-6889
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> i. Enable {{ComplexCopier}} copy {{FixedSizeListVector}} value, add related 
> tests
> ii. Fix {{RangeEqualsVisitor#compareFixedSizeListVectors}} StackOverFlow



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6889) [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor StackOverFlow

2019-11-27 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983345#comment-16983345
 ] 

Ji Liu commented on ARROW-6889:
---

Issue resolved by pull request 5660:

[https://github.com/apache/arrow/pull/5660]

> [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor 
> StackOverFlow
> 
>
> Key: ARROW-6889
> URL: https://issues.apache.org/jira/browse/ARROW-6889
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> i. Enable {{ComplexCopier}} copy {{FixedSizeListVector}} value, add related 
> tests
> ii. Fix {{RangeEqualsVisitor#compareFixedSizeListVectors}} StackOverFlow



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7284) [Java] ensure java implementation meets clarified dictionary spec

2019-12-01 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-7284:
-

Assignee: Ji Liu

> [Java] ensure java implementation meets clarified dictionary spec
> -
>
> Key: ARROW-7284
> URL: https://issues.apache.org/jira/browse/ARROW-7284
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> see parent issue.
>  
> CC [~tianchen92]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7405) [Java] ListVector isEmpty API is incorrect

2019-12-16 Thread Ji Liu (Jira)
Ji Liu created ARROW-7405:
-

 Summary: [Java] ListVector isEmpty API is incorrect
 Key: ARROW-7405
 URL: https://issues.apache.org/jira/browse/ARROW-7405
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


 Currently {{isEmpty}} API is always return false in 
{{BaseRepeatedValueVector}}, and its subclass {{ListVector}} did not overwrite 
this method.

This will lead to incorrect result, for example, a {{ListVector}} with data 
[1,2], null, [], [5,6] should get [false, false, true, false] with this API, 
but now it would return [false, false, false, false].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7406) [Java] NonNullableStructVector#hashCode should pass hasher to child vectors

2019-12-16 Thread Ji Liu (Jira)
Ji Liu created ARROW-7406:
-

 Summary: [Java] NonNullableStructVector#hashCode should pass 
hasher to child vectors
 Key: ARROW-7406
 URL: https://issues.apache.org/jira/browse/ARROW-7406
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


This was introduced by ARROW-6866 making parameter hasher useless in 
hashCode(int index, {{ArrowBufHasher}} hasher), and the child vectors would 
calculate hashCode using default hasher which is not correct. 

This issue should be fixed by passing hasher to child vector when calculating 
hashCode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7425) [Java] PromotableWriter support writing FixedSizeList type data

2019-12-18 Thread Ji Liu (Jira)
Ji Liu created ARROW-7425:
-

 Summary: [Java] PromotableWriter support writing FixedSizeList 
type data
 Key: ARROW-7425
 URL: https://issues.apache.org/jira/browse/ARROW-7425
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


We have introduced writer API for {{FixedSizeListVector}} via ARROW-6079, but 
{{PromotableWriter}}’s support for it is incomplete.

For example, using {{UnionListWriter}} we could simply write {{List}} 
type data, but for {{List}} or {{FixedSizeList}} 
it doesn’t work.

This issue is about to enhance the {{PromotableWriter}} support for 
{{FixedSizeList}} type and add tests to verify the cases mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7467) [Java] ComplexCopier does incorrect copy for Map nullable info

2019-12-23 Thread Ji Liu (Jira)
Ji Liu created ARROW-7467:
-

 Summary: [Java] ComplexCopier does incorrect copy for Map nullable 
info
 Key: ARROW-7467
 URL: https://issues.apache.org/jira/browse/ARROW-7467
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


The {{MapVector}} and its 'value' vector are nullable, and its {{structVector}} 
and 'key' vector are non-nullable.

However, the {{MapVector}} generated by ComplexCopier has all nullable fields 
which is not correct.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7472) [Java] Fix some incorrect behavior in UnionListWriter

2019-12-24 Thread Ji Liu (Jira)
Ji Liu created ARROW-7472:
-

 Summary: [Java] Fix some incorrect behavior in UnionListWriter
 Key: ARROW-7472
 URL: https://issues.apache.org/jira/browse/ARROW-7472
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Currently the {{UnionListWriter/UnionFixedSizeListWriter}} {{getField/close}} 
APIs seems incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-7425) [Java] PromotableWriter support writing FixedSizeList type data

2019-12-25 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu closed ARROW-7425.
-
Resolution: Later

> [Java] PromotableWriter support writing FixedSizeList type data
> ---
>
> Key: ARROW-7425
> URL: https://issues.apache.org/jira/browse/ARROW-7425
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have introduced writer API for {{FixedSizeListVector}} via ARROW-6079, but 
> {{PromotableWriter}}’s support for it is incomplete.
> For example, using {{UnionListWriter}} we could simply write {{List}} 
> type data, but for {{List}} or 
> {{FixedSizeList}} it doesn’t work.
> This issue is about to enhance the {{PromotableWriter}} support for 
> {{FixedSizeList}} type and add tests to verify the cases mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7490) [Java] Avro converter should convert attributes and props to FieldType metadata

2020-01-01 Thread Ji Liu (Jira)
Ji Liu created ARROW-7490:
-

 Summary: [Java] Avro converter should convert attributes and props 
to FieldType metadata
 Key: ARROW-7490
 URL: https://issues.apache.org/jira/browse/ARROW-7490
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Currently in Avro converter, some attributes are used when creating vectors 
such as “name”, “size” etc, others are discarded.

For named type like Record, Enum and Fixed, they may have attributes like “doc” 
“aliased” which should keep in metadata for potential further use.

Besides, properties are also not converted properly in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7490) [Java] Avro converter should convert attributes and props to FieldType metadata

2020-01-01 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-7490:
--
Parent: ARROW-5845
Issue Type: Sub-task  (was: Bug)

> [Java] Avro converter should convert attributes and props to FieldType 
> metadata
> ---
>
> Key: ARROW-7490
> URL: https://issues.apache.org/jira/browse/ARROW-7490
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently in Avro converter, some attributes are used when creating vectors 
> such as “name”, “size” etc, others are discarded.
> For named type like Record, Enum and Fixed, they may have attributes like 
> “doc” “aliased” which should keep in metadata for potential further use.
> Besides, properties are also not converted properly in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-01-03 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-7495:
-

Assignee: Ji Liu

> [Java] Remove "empty" concept from ArrowBuf, replace with custom 
> referencemanager
> -
>
> Key: ARROW-7495
> URL: https://issues.apache.org/jira/browse/ARROW-7495
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> With the introduction of ReferenceManager in the codebase, the need for a 
> separate ArrowBuf is no longer necessary. Instead, once can create a new 
> reference manager that is used for the empty ArrowBuf. For reminder/review, 
> empty arrowbufs have a special behavior in that they don't actually have any 
> reference counting semantics and always stay at one. This allow us to better 
> troubleshoot unallocated memory than what would otherwise be an NPE after 
> calling ValueVector.clear()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-7494:
-

Assignee: Ji Liu

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Critical
> Fix For: 1.0.0
>
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-06 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008693#comment-17008693
 ] 

Ji Liu commented on ARROW-7494:
---

??In general, a slice can be used instead of a reader/writer index pattern.??

[~jnadeau] , I don't quite understand, dose this mean hold a slice {{ArrowBuf}} 
in {{ArrowBuf}} to replace reader/writer index? doesn't this introduce more 
heap memory? Could you please explain a little more, thanks a lot!

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Critical
> Fix For: 1.0.0
>
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-07 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009688#comment-17009688
 ] 

Ji Liu commented on ARROW-7494:
---

Thanks for your clarification, I opened a PR, please help take a look, thanks!

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-09 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012422#comment-17012422
 ] 

Ji Liu commented on ARROW-7494:
---

I met a Gandiva java test fail problem in this issue and I am not familiar with 
this. Could you please give a detailed guidance for how to run gandiva java 
test locally? :) thanks! [~pravindra]

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7539) [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-01-09 Thread Ji Liu (Jira)
Ji Liu created ARROW-7539:
-

 Summary: [Java] FieldVector getFieldBuffers API should not set 
reader/writer indices
 Key: ARROW-7539
 URL: https://issues.apache.org/jira/browse/ARROW-7539
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Per discussion 
[https://github.com/apache/arrow/pull/6133#discussion_r364906302].

The fact that we have reader/writer settings in {{getFieldBuffers}} is wrong. 
To clarify, {{getFieldBuffers}} is distinct from {{getBuffers}}. The former 
should be for getting access to underlying data for higher-performance 
algorithms. The latter is for sending the data over the wire. Seems we've mixed 
up use of both.

 

Currently in {{VectorUnloader}}, we used {{getFieldBuffers}} to create 
{{ArrowRecordBatch}} that’s why we keep writer/reader indices in 
{{getFieldBuffers}}, we should use {{getBuffers}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7546) [Java] Use new implementation to concat vectors values in batch

2020-01-10 Thread Ji Liu (Jira)
Ji Liu created ARROW-7546:
-

 Summary: [Java] Use new implementation to concat vectors values in 
batch
 Key: ARROW-7546
 URL: https://issues.apache.org/jira/browse/ARROW-7546
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Per discussion https://github.com/apache/arrow/pull/5945#discussion_r365108806.

In ARROW-7284, we write a simple method to concat vectors. However, ARROW-7073 
is about to concat vector values efficiently, after this PR merged, we should 
use this new implementation in {{ArrowReader}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized

2020-01-10 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-7549:
-

Assignee: Ji Liu

> [Java] Reorganize Flight modules to keep top level clean/organized
> --
>
> Key: ARROW-7549
> URL: https://issues.apache.org/jira/browse/ARROW-7549
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Major
>
> Lets create a flight parent module and then create the following below:
> flight-core (existing flight module)
> flight-grpc (existing flight-grpc module)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized

2020-01-11 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013409#comment-17013409
 ] 

Ji Liu commented on ARROW-7549:
---

I opened a PR, please help take a look.

> [Java] Reorganize Flight modules to keep top level clean/organized
> --
>
> Key: ARROW-7549
> URL: https://issues.apache.org/jira/browse/ARROW-7549
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Major
>
> Lets create a flight parent module and then create the following below:
> flight-core (existing flight module)
> flight-grpc (existing flight-grpc module)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-15 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-7494:
--
Fix Version/s: (was: 0.16.0)
   1.0.0

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7713) [Java] TastLeak was put at the wrong location

2020-01-28 Thread Ji Liu (Jira)
Ji Liu created ARROW-7713:
-

 Summary: [Java] TastLeak was put at the wrong location
 Key: ARROW-7713
 URL: https://issues.apache.org/jira/browse/ARROW-7713
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Seems {{TestLeak.java}} was put at the wrong place, we should move it into 
{{flight-core}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors

2020-02-27 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047169#comment-17047169
 ] 

Ji Liu commented on ARROW-6887:
---

Seems the docs we added in this issue didn't work in website? 
[~emkornfi...@gmail.com] [~wesm]

[http://arrow.apache.org/docs/java/index.html]

> [Java] Create prose documentation for using ValueVectors
> 
>
> Key: ARROW-6887
> URL: https://issues.apache.org/jira/browse/ARROW-6887
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> We should create documentation (in restructured text) for the library that 
> demonstrates:
> 1.  Basic construction of ValueVectors.  Highlighting:
>     * ValueVector lifecycle
>     * Reading by rows using Readers (mentioning that it is not as efficient 
> as direct access).
>     * Populating with Writers
> 2.  Reading and writing IPC stream format and file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6594) [Java] Support logical type encodings from Avro

2020-03-03 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050754#comment-17050754
 ] 

Ji Liu commented on ARROW-6594:
---

[~emkornfi...@gmail.com] Avro also has logical type uuid and duration which not 
overlap with Arrow type, should we also support them with 
{{ExtensionTypeVector}}? 

> [Java] Support logical type encodings from Avro
> ---
>
> Key: ARROW-6594
> URL: https://issues.apache.org/jira/browse/ARROW-6594
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: avro, pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Avro supports some logical types that overlap with Arrow logical types 
> ([http://avro.apache.org/docs/current/spec.html#Logical+Types) 
> |http://avro.apache.org/docs/current/spec.html#Logical+Types]
>  
> For the ones that overlap, we should use the appropriate Arrow Logical type 
> array instead of the raw values.
>  
> it potentially makes sense to break this down further into sub-tasks for each 
> logical type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors

2020-03-03 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050940#comment-17050940
 ] 

Ji Liu commented on ARROW-6887:
---

I see

> [Java] Create prose documentation for using ValueVectors
> 
>
> Key: ARROW-6887
> URL: https://issues.apache.org/jira/browse/ARROW-6887
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> We should create documentation (in restructured text) for the library that 
> demonstrates:
> 1.  Basic construction of ValueVectors.  Highlighting:
>     * ValueVector lifecycle
>     * Reading by rows using Readers (mentioning that it is not as efficient 
> as direct access).
>     * Populating with Writers
> 2.  Reading and writing IPC stream format and file formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6594) [Java] Support logical type encodings from Avro

2020-03-03 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050952#comment-17050952
 ] 

Ji Liu commented on ARROW-6594:
---

ok, I see, thanks

> [Java] Support logical type encodings from Avro
> ---
>
> Key: ARROW-6594
> URL: https://issues.apache.org/jira/browse/ARROW-6594
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: avro, pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Avro supports some logical types that overlap with Arrow logical types 
> ([http://avro.apache.org/docs/current/spec.html#Logical+Types) 
> |http://avro.apache.org/docs/current/spec.html#Logical+Types]
>  
> For the ones that overlap, we should use the appropriate Arrow Logical type 
> array instead of the raw values.
>  
> it potentially makes sense to break this down further into sub-tasks for each 
> logical type.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8019) [Java] Implement vector diff functionality

2020-03-05 Thread Ji Liu (Jira)
Ji Liu created ARROW-8019:
-

 Summary: [Java] Implement vector diff functionality 
 Key: ARROW-8019
 URL: https://issues.apache.org/jira/browse/ARROW-8019
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


In C++ side, we already have array diff functionality for vector equals and 
testing to make it easy to see differences between Arrays and reduce debugging 
time.  And it’s better to do something similar in Java side for better testing 
facilities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8020) [Java] Implement vector validate functionality

2020-03-06 Thread Ji Liu (Jira)
Ji Liu created ARROW-8020:
-

 Summary: [Java] Implement vector validate functionality 
 Key: ARROW-8020
 URL: https://issues.apache.org/jira/browse/ARROW-8020
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


In C++ side, we already have array validate functionality but no similar 
functionality in Java side.

This issue is about to implement this functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8158) [Java] Getting length of data buffer and base variable width vector

2020-03-19 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062479#comment-17062479
 ] 

Ji Liu commented on ARROW-8158:
---

Hi, I think one could get valid data length by 
BaseVariableWidthVector#sizeOfValueBuffer.

[https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java#L582]

> [Java] Getting length of data buffer and base variable width vector
> ---
>
> Key: ARROW-8158
> URL: https://issues.apache.org/jira/browse/ARROW-8158
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Gaurangi Saxena
>Priority: Minor
>
> For string data buffer and base variable width vector can we have a way to 
> get length of the data? 
> For instance, in ArrowColumnVector in StringAccessor we use 
> stringResult.start and stringResult.end, instead we would like to get length 
> of the data through an exposed function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8158) [Java] Getting length of data buffer and base variable width vector

2020-03-19 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-8158:
-

Assignee: Ji Liu

> [Java] Getting length of data buffer and base variable width vector
> ---
>
> Key: ARROW-8158
> URL: https://issues.apache.org/jira/browse/ARROW-8158
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Gaurangi Saxena
>Assignee: Ji Liu
>Priority: Minor
>
> For string data buffer and base variable width vector can we have a way to 
> get length of the data? 
> For instance, in ArrowColumnVector in StringAccessor we use 
> stringResult.start and stringResult.end, instead we would like to get length 
> of the data through an exposed function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8171) Consider pre-allocating memory for fix-width vector in Avro adapter iterator

2020-03-20 Thread Ji Liu (Jira)
Ji Liu created ARROW-8171:
-

 Summary: Consider pre-allocating memory for fix-width vector in 
Avro adapter iterator
 Key: ARROW-8171
 URL: https://issues.apache.org/jira/browse/ARROW-8171
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8158) [Java] Getting length of data buffer and base variable width vector

2020-03-20 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063309#comment-17063309
 ] 

Ji Liu commented on ARROW-8158:
---

[~emkornfield] I see, I could add a method like {{getDataLength(int index)}} 
for variable width vector.

For lists, we already have {{getElementStartIndex/getElementEndIndex}}, is it 
enough or still need to add a method like getElementLength?

> [Java] Getting length of data buffer and base variable width vector
> ---
>
> Key: ARROW-8158
> URL: https://issues.apache.org/jira/browse/ARROW-8158
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Gaurangi Saxena
>Assignee: Ji Liu
>Priority: Minor
>
> For string data buffer and base variable width vector can we have a way to 
> get length of the data? 
> For instance, in ArrowColumnVector in StringAccessor we use 
> stringResult.start and stringResult.end, instead we would like to get length 
> of the data through an exposed function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-12 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reassigned ARROW-5579:
-

Assignee: Ji Liu

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5584) Add import for link reference in FieldReader javadoc

2019-06-13 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5584:
-

 Summary: Add import for link reference in FieldReader javadoc
 Key: ARROW-5584
 URL: https://issues.apache.org/jira/browse/ARROW-5584
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Ji Liu
Assignee: Ji Liu


Link reference(ValueVector) in FieldReader javadoc has no import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5584) Add import for link reference in FieldReader javadoc

2019-06-13 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-5584:
--
Component/s: Java

> Add import for link reference in FieldReader javadoc
> 
>
> Key: ARROW-5584
> URL: https://issues.apache.org/jira/browse/ARROW-5584
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Trivial
>
> Link reference(ValueVector) in FieldReader javadoc has no import.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5587) Add more maven style check for Java code

2019-06-13 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5587:
-

 Summary: Add more maven style check for Java code
 Key: ARROW-5587
 URL: https://issues.apache.org/jira/browse/ARROW-5587
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Ji Liu
Assignee: Ji Liu


Add more maven style check for java code, such as unused imports, redundant 
modifier, etc. In this way, the quality of code will be improved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5587) [Java] Add more maven style check for Java code

2019-06-13 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-5587:
--
Component/s: Java

> [Java] Add more maven style check for Java code
> ---
>
> Key: ARROW-5587
> URL: https://issues.apache.org/jira/browse/ARROW-5587
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>
> Add more maven style check for java code, such as unused imports, redundant 
> modifier, etc. In this way, the quality of code will be improved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5587) [Java] Add more maven style check for Java code

2019-06-13 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-5587:
--
Summary: [Java] Add more maven style check for Java code  (was: Add more 
maven style check for Java code)

> [Java] Add more maven style check for Java code
> ---
>
> Key: ARROW-5587
> URL: https://issues.apache.org/jira/browse/ARROW-5587
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>
> Add more maven style check for java code, such as unused imports, redundant 
> modifier, etc. In this way, the quality of code will be improved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-17 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866277#comment-16866277
 ] 

Ji Liu commented on ARROW-5579:
---

[~emkornfi...@gmail.com] I am afraid the above PR did not really shade the 
flatbuffer dependency, I am not quite familiar with 
maven-shade-plugin([https://maven.apache.org/plugins/maven-shade-plugin/]), 
correct me if I am wrong:

If we want to shade a dependency, we should follow the next steps:
 # User maven-shade-plugin and add include tags for flatbuffer
 # Add relocations to rename the package.

In the above PR, we use  tag and seems this plugin will not process 
this dependency?

If we use relocations to rename packages it will cause new problems and I don't 
know how to solve:

_/org/apache/arrow/vector/types/pojo/ArrowType.java:[239,46] 错误: 不兼容的类型: 
com.google.flatbuffers.FlatBufferBuilder无法转换为 
arrow.format.com.google.flatbuffers.FlatBufferBuilder_

Seems the direct flatbuffer dependency in arrow-vector is not compatible with 
renamed dependency in arrow-format.

What do you think?

 

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-17 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866301#comment-16866301
 ] 

Ji Liu commented on ARROW-5579:
---

[https://github.com/tianchen92/arrow/commits/ARROW-5579-new]

[~emkornfi...@gmail.com] Here is my test branch, many thanks!

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-19 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868186#comment-16868186
 ] 

Ji Liu edited comment on ARROW-5579 at 6/20/19 2:25 AM:


[~emkornfi...@gmail.com] Any update for this? I do have some new thoughts, I 
tried two different ways:

1、add arrow-shaded module and relocate package, then add this dependency in 
arrow-format instead of flatbuffers, but in flatbuffer-generated code the 
import is still origin package path and cause error.

2、Another approach is relocate package only in arrow-format since other modules 
depends on this and replace "com.google.flatbuffers.*" such imports with new 
relocated path such as "arrow.format.com.google.flatbuffers.*", in this way all 
arrow-related modules use the same flatbuffers and will not conflict with user 
applications. In this way, maven build fine but seems Intellij not support 
resolve shaded dependencies(see 
[https://youtrack.jetbrains.com/issue/IDEA-93855]), so manually run tests still 
cause errors which is still a big problem.:(

New test code can be seen in my branch. Thanks


was (Author: tianchen92):
[~emkornfi...@gmail.com] Any update for this? I do have some new thoughts, I 
tried two different ways:

1、add arrow-shaded module and relocate package, then add this dependency in 
arrow-format instead of flatbuffers, but in flatbuffer-generated code the 
import is still origin package path and cause error.

2、Another approach is relocate package only in arrow-format since other modules 
depends on this and replace "com.google.flatbuffers.*" such imports with new 
relocated path such as "arrow.format.com.google.flatbuffers.*", in this way all 
arrow-related modules use the same flatbuffers and will not conflict with user 
applications. In this way, maven build fine but seems Intellij not support 
resolve shaded dependencies(see 
[https://youtrack.jetbrains.com/issue/IDEA-93855]), so manually run tests still 
cause errors which is still a big problem.:(

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-19 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868186#comment-16868186
 ] 

Ji Liu commented on ARROW-5579:
---

[~emkornfi...@gmail.com] Any update for this? I do have some new thoughts, I 
tried two different ways:

1、add arrow-shaded module and relocate package, then add this dependency in 
arrow-format instead of flatbuffers, but in flatbuffer-generated code the 
import is still origin package path and cause error.

2、Another approach is relocate package only in arrow-format since other modules 
depends on this and replace "com.google.flatbuffers.*" such imports with new 
relocated path such as "arrow.format.com.google.flatbuffers.*", in this way all 
arrow-related modules use the same flatbuffers and will not conflict with user 
applications. In this way, maven build fine but seems Intellij not support 
resolve shaded dependencies(see 
[https://youtrack.jetbrains.com/issue/IDEA-93855]), so manually run tests still 
cause errors which is still a big problem.:(

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-20 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868324#comment-16868324
 ] 

Ji Liu commented on ARROW-5579:
---

[~emkornfi...@gmail.com] Sure, seems your solution similar with the second 
approach. 

I have submitted a PR for reverting.

Maven build works fine with my branch 
([https://github.com/tianchen92/arrow/commits/ARROW-5579-new]).

But how to solve the problem that Intellij not supporting shaded dependency? 
This is a break change since developers cannot run tests locally anymore:(

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-20 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868353#comment-16868353
 ] 

Ji Liu commented on ARROW-5579:
---

Thanks for your reminder [~emkornfi...@gmail.com], I have just tried this and 
it works fine.

Meanwhile I found a new work-around has similar effect, if we just remove 
arrow-format from parent pom just like adapter and gandiva, 
and it works fine.

Do you think this is this is reasonable solution since developers won't do 
anything?

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-20 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-5579:
--
Comment: was deleted

(was: Thanks for your reminder [~emkornfi...@gmail.com], I have just tried this 
and it works fine.

Meanwhile I found a new work-around has similar effect, if we just remove 
arrow-format from parent pom just like adapter and gandiva, 
and it works fine.

Do you think this is this is reasonable solution since developers won't do 
anything?)

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-20 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868324#comment-16868324
 ] 

Ji Liu edited comment on ARROW-5579 at 6/21/19 3:05 AM:


[~emkornfi...@gmail.com] Sure, seems your solution similar with the second 
approach. 

I have submitted a PR for reverting.

Maven build works fine with my branch 
([https://github.com/tianchen92/arrow/commits/ARROW-5579-new2]).

But how to solve the problem that Intellij not supporting shaded dependency? 
This is a break change since developers cannot run tests locally anymore:(


was (Author: tianchen92):
[~emkornfi...@gmail.com] Sure, seems your solution similar with the second 
approach. 

I have submitted a PR for reverting.

Maven build works fine with my branch 
([https://github.com/tianchen92/arrow/commits/ARROW-5579-new]).

But how to solve the problem that Intellij not supporting shaded dependency? 
This is a break change since developers cannot run tests locally anymore:(

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-20 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866301#comment-16866301
 ] 

Ji Liu edited comment on ARROW-5579 at 6/21/19 3:05 AM:


[https://github.com/tianchen92/arrow/commits/ARROW-5579-new2]

[~emkornfi...@gmail.com] Here is my test branch, many thanks!


was (Author: tianchen92):
[https://github.com/tianchen92/arrow/commits/ARROW-5579-new]

[~emkornfi...@gmail.com] Here is my test branch, many thanks!

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-20 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869189#comment-16869189
 ] 

Ji Liu commented on ARROW-5579:
---

[~emkornfi...@gmail.com] The revert PR is merged, I opened a new 
PR([https://github.com/apache/arrow/pull/4629]) and the travis has pass.

However, IED issue seems not always work with work-around. I think we still 
need do some work before it can be merged.

If there is no reasonable solution, then we should discuss in mailing list to 
see if anyone has some different thoughts.

Thanks!

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-20 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869199#comment-16869199
 ] 

Ji Liu commented on ARROW-5579:
---

yes, as we discussed before.

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5579) [Java] shade flatbuffer dependency

2019-06-20 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869199#comment-16869199
 ] 

Ji Liu edited comment on ARROW-5579 at 6/21/19 5:39 AM:


yes, as we discussed before, sorry it should be 'IDE':)


was (Author: tianchen92):
yes, as we discussed before.

> [Java] shade flatbuffer dependency
> --
>
> Key: ARROW-5579
> URL: https://issues.apache.org/jira/browse/ARROW-5579
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Pindikura Ravindra
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] 
>  
> After some [discussion|https://github.com/google/flatbuffers/issues/5368] 
> with the Flatbuffers maintainer, it appears that FB generated code is not 
> guaranteed to be compatible with _any other_ version of the runtime library 
> other than the exact same version of the flatc used to compile it.
> This makes depending on flatbuffers in a library (like arrow) quite risky, as 
> if an app depends on any other version of FB, either directly or 
> transitively, it's likely the versions will clash at some point and you'll 
> see undefined behaviour at runtime.
> Shading the dependency looks to me the best way to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5672) [Java] Refactor redundant method modifier

2019-06-21 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5672:
-

 Summary: [Java] Refactor redundant method modifier
 Key: ARROW-5672
 URL: https://issues.apache.org/jira/browse/ARROW-5672
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Ji Liu
Assignee: Ji Liu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5705) [Java] Optimize BaseValueVector#computeCombinedBufferSize logic

2019-06-24 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5705:
-

 Summary: [Java] Optimize BaseValueVector#computeCombinedBufferSize 
logic
 Key: ARROW-5705
 URL: https://issues.apache.org/jira/browse/ARROW-5705
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Now in BaseValueVector#computeCombinedBufferSize, it computes validity buffer 
size as follow:

_roundUp8(getValidityBufferSizeFromCount(valueCount))_

which can be be expanded to 

_(((valueCount + 7) >> 3 + 7) / 8) * 8_

Seems there's no need to compute bufferSize first and expression above could be 
replaced with:

_(valueCount + 63) / 64 * 8_

In this way, performance of _computeCombinedBufferSize_ would be improved. 
Performance test:

Before:
BaseValueVectorBenchmarks.testC_omputeCombinedBufferSize_ avgt 5 4083.180 ± 
180.363 ns/op

After:

BaseValueVectorBenchmarks.testC_omputeCombinedBufferSize_ avgt 5 3808.635 ± 
162.347 ns/op

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5706) [Java] Remove type conversion in getValidityBufferValueCapacity

2019-06-24 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5706:
-

 Summary: [Java] Remove type conversion in 
getValidityBufferValueCapacity
 Key: ARROW-5706
 URL: https://issues.apache.org/jira/browse/ARROW-5706
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Now implementation of getValidityBufferValueCapacity is:

(int) (validityBuffer.capacity() * 8L)

Seems no need to convert it to Long then convert it back to Int, just replace 
with:

validityBuffer.capacity() * 8

VariableWidthVectorBenchmarks#getValueCapacity shows the performance:

Before:

avgt 5 5.731 ± 0.160 ns/op

After:

avgt 5 5.124 ± 0.125 ns/op



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5726) [Java] Implement a common interface for int vectors

2019-06-25 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5726:
-

 Summary: [Java] Implement a common interface for int vectors
 Key: ARROW-5726
 URL: https://issues.apache.org/jira/browse/ARROW-5726
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Now in _DictionaryEncoder#encode_ it use reflection to pull out the set method 
and then set values. 

Set values by reflection is not efficient and code structure is not elegant 
such as

_Method setter = null;_
_for (Class c : Arrays.asList(int.class, long.class)) {_
 _try {_
 _setter = indices.getClass().getMethod("setSafe", int.class, c);_
 _break;_
 _} catch (NoSuchMethodException e) {_
 _// ignore_
 _}_
_}_

Implement a common interface for int vectors to directly get set method and set 
values seems a good choice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-5435) [Java] IntervalYearVector#getObject should return Period with both year and month

2019-06-26 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu closed ARROW-5435.
-
Resolution: Invalid

> [Java] IntervalYearVector#getObject should return Period with both year and 
> month
> -
>
> Key: ARROW-5435
> URL: https://issues.apache.org/jira/browse/ARROW-5435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> IntervalYearVector#getObject today return Period with specific month. 
> However, this vector stores interval (years and months, e.g. 2 years and 3 
> months is stored as 27(total months)), it should return Period with both 
> years and months(now only months is assigned). 
> As shown in the example above, now it return Period(27 months), I think it 
> should return Period(2 years, 3 months).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5435) [Java] add test for IntervalYearVector#getAsStringBuilder

2019-06-26 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-5435:
--
Summary: [Java] add test for IntervalYearVector#getAsStringBuilder  (was: 
[Java] IntervalYearVector#getObject should return Period with both year and 
month)

> [Java] add test for IntervalYearVector#getAsStringBuilder
> -
>
> Key: ARROW-5435
> URL: https://issues.apache.org/jira/browse/ARROW-5435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> IntervalYearVector#getObject today return Period with specific month. 
> However, this vector stores interval (years and months, e.g. 2 years and 3 
> months is stored as 27(total months)), it should return Period with both 
> years and months(now only months is assigned). 
> As shown in the example above, now it return Period(27 months), I think it 
> should return Period(2 years, 3 months).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (ARROW-5435) [Java] IntervalYearVector#getObject should return Period with both year and month

2019-06-26 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu reopened ARROW-5435:
---

> [Java] IntervalYearVector#getObject should return Period with both year and 
> month
> -
>
> Key: ARROW-5435
> URL: https://issues.apache.org/jira/browse/ARROW-5435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> IntervalYearVector#getObject today return Period with specific month. 
> However, this vector stores interval (years and months, e.g. 2 years and 3 
> months is stored as 27(total months)), it should return Period with both 
> years and months(now only months is assigned). 
> As shown in the example above, now it return Period(27 months), I think it 
> should return Period(2 years, 3 months).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5435) [Java] add test for IntervalYearVector#getAsStringBuilder

2019-06-26 Thread Ji Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-5435:
--
Description: (was: IntervalYearVector#getObject today return Period 
with specific month. However, this vector stores interval (years and months, 
e.g. 2 years and 3 months is stored as 27(total months)), it should return 
Period with both years and months(now only months is assigned). 

As shown in the example above, now it return Period(27 months), I think it 
should return Period(2 years, 3 months).)

> [Java] add test for IntervalYearVector#getAsStringBuilder
> -
>
> Key: ARROW-5435
> URL: https://issues.apache.org/jira/browse/ARROW-5435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5812) [Java] Refactor method name and param type in BaseIntVector

2019-06-30 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5812:
-

 Summary: [Java] Refactor method name and param type in 
BaseIntVector
 Key: ARROW-5812
 URL: https://issues.apache.org/jira/browse/ARROW-5812
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Ji Liu
Assignee: Ji Liu


Change to void _setWithPossibleTruncate(int index, long value);_ for better 
generality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5814) [Java] Implement a HashMap for DictionaryEncoder

2019-07-01 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5814:
-

 Summary: [Java] Implement a  HashMap for 
DictionaryEncoder
 Key: ARROW-5814
 URL: https://issues.apache.org/jira/browse/ARROW-5814
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Ji Liu
Assignee: Ji Liu


As a follow-up of 
[ARROW-5726|https://issues.apache.org/jira/browse/ARROW-5726]. Implement a 
Map for DictionaryEncoder to reduce boxing/unboxing operations.

Benchmark:
DictionaryEncodeHashMapBenchmarks.testHashMap: avgt  5  31151.345 ± 1661.878 
ns/op
DictionaryEncodeHashMapBenchmarks.testDictionaryEncodeHashMap: avgt  5  
15549.902 ± 771.647 ns/op



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5821) [Java] Support compact fixed-width vectors

2019-07-02 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5821:
-

 Summary: [Java] Support compact fixed-width vectors
 Key: ARROW-5821
 URL: https://issues.apache.org/jira/browse/ARROW-5821
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Ji Liu
Assignee: Ji Liu


In shuffle stage of some applications, FixedWitdhVectors may have very little 
non-null data.
In this case, directly serialize vectors is not a good choice, generally we can 
compact the vector make it only holding non-null value and create a BitVector 
to trace the indices for non-null values so that it could be deserialized 
properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >