[jira] [Assigned] (ARROW-6839) [Java] access File Footer custom_metadata
[ https://issues.apache.org/jira/browse/ARROW-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-6839: - Assignee: Ji Liu > [Java] access File Footer custom_metadata > - > > Key: ARROW-6839 > URL: https://issues.apache.org/jira/browse/ARROW-6839 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: John Muehlhausen >Assignee: Ji Liu >Priority: Minor > > Access custom_metadata from ARROW-6836 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6850) [Java] Jdbc converter support Null type
Ji Liu created ARROW-6850: - Summary: [Java] Jdbc converter support Null type Key: ARROW-6850 URL: https://issues.apache.org/jira/browse/ARROW-6850 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Ji Liu Assignee: Ji Liu java.sql.Types.Null is not supported yet since we have no NullVector in Java code before. This could be implemented after ARROW-1638 merged (IPC roundtrip for null type). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6853) [Java] Support vector and dictionary encoder use different hasher for calculating hashCode
Ji Liu created ARROW-6853: - Summary: [Java] Support vector and dictionary encoder use different hasher for calculating hashCode Key: ARROW-6853 URL: https://issues.apache.org/jira/browse/ARROW-6853 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Ji Liu Assignee: Ji Liu Hasher interface was introduce in ARROW-5898 and now have two different implementations ({{MurmurHasher and }}{{SimpleHasher}}) and it could be more in the future. And currently {{ValueVector#hashCode}} and {{DictionaryHashTable}} only use {{SimpleHasher}} for calculating hashCode. This issue enables them to use different hasher or even user-defined hasher for their own use cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6853) [Java] Support vector and dictionary encoder use different hasher for calculating hashCode
[ https://issues.apache.org/jira/browse/ARROW-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-6853: -- Description: Hasher interface was introduce in ARROW-5898 and now have two different implementations ({{MurmurHasher and SimpleHasher}}) and it could be more in the future. And currently {{ValueVector#hashCode}} and {{DictionaryHashTable}} only use {{SimpleHasher}} for calculating hashCode. This issue enables them to use different hasher or even user-defined hasher for their own use cases. was: Hasher interface was introduce in ARROW-5898 and now have two different implementations ({{MurmurHasher and }}{{SimpleHasher}}) and it could be more in the future. And currently {{ValueVector#hashCode}} and {{DictionaryHashTable}} only use {{SimpleHasher}} for calculating hashCode. This issue enables them to use different hasher or even user-defined hasher for their own use cases. > [Java] Support vector and dictionary encoder use different hasher for > calculating hashCode > -- > > Key: ARROW-6853 > URL: https://issues.apache.org/jira/browse/ARROW-6853 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > > Hasher interface was introduce in ARROW-5898 and now have two different > implementations ({{MurmurHasher and SimpleHasher}}) and it could be more in > the future. > And currently {{ValueVector#hashCode}} and {{DictionaryHashTable}} only use > {{SimpleHasher}} for calculating hashCode. This issue enables them to use > different hasher or even user-defined hasher for their own use cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot
[ https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu resolved ARROW-6661. --- Fix Version/s: 0.15.1 Resolution: Fixed Issue resolved in [https://github.com/apache/arrow/pull/5470] > [Java] Implement APIs like slice to enhance VectorSchemaRoot > > > Key: ARROW-6661 > URL: https://issues.apache.org/jira/browse/ARROW-6661 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently in Java Implementation there is no APIs like slice for record batch > like C++/Python. > This issue is about to implement slice/getVector/addVector/removeVector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
[ https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949948#comment-16949948 ] Ji Liu edited comment on ARROW-6464 at 10/12/19 7:39 AM: - Issue resolved in [https://github.com/apache/arrow/pull/5293] was (Author: tianchen92): Issue resolve in [https://github.com/apache/arrow/pull/5293] > [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API > --- > > Key: ARROW-6464 > URL: https://issues.apache.org/jira/browse/ARROW-6464 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 3h > Remaining Estimate: 0h > > Currently {{FixedSizeListVector#splitAndTransfer}} actually use > {{copyValueSafe}} which has memory copy, we should use slice API instead. > Meanwhile, {{splitAndTransfer}} in all classes should position index check at > beginning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
[ https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu resolved ARROW-6464. --- Fix Version/s: 0.15.1 Resolution: Fixed Issue resolve in [https://github.com/apache/arrow/pull/5293] > [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API > --- > > Key: ARROW-6464 > URL: https://issues.apache.org/jira/browse/ARROW-6464 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 3h > Remaining Estimate: 0h > > Currently {{FixedSizeListVector#splitAndTransfer}} actually use > {{copyValueSafe}} which has memory copy, we should use slice API instead. > Meanwhile, {{splitAndTransfer}} in all classes should position index check at > beginning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6464) [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API
[ https://issues.apache.org/jira/browse/ARROW-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949948#comment-16949948 ] Ji Liu edited comment on ARROW-6464 at 10/12/19 7:40 AM: - Issue resolved by pull request 5293 [https://github.com/apache/arrow/pull/5293] was (Author: tianchen92): Issue resolved in [https://github.com/apache/arrow/pull/5293] > [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API > --- > > Key: ARROW-6464 > URL: https://issues.apache.org/jira/browse/ARROW-6464 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 3h > Remaining Estimate: 0h > > Currently {{FixedSizeListVector#splitAndTransfer}} actually use > {{copyValueSafe}} which has memory copy, we should use slice API instead. > Meanwhile, {{splitAndTransfer}} in all classes should position index check at > beginning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6661) [Java] Implement APIs like slice to enhance VectorSchemaRoot
[ https://issues.apache.org/jira/browse/ARROW-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949939#comment-16949939 ] Ji Liu edited comment on ARROW-6661 at 10/12/19 7:40 AM: - Issue resolved by pull request 5470 [https://github.com/apache/arrow/pull/5470] was (Author: tianchen92): Issue resolved in [https://github.com/apache/arrow/pull/5470] > [Java] Implement APIs like slice to enhance VectorSchemaRoot > > > Key: ARROW-6661 > URL: https://issues.apache.org/jira/browse/ARROW-6661 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.1 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently in Java Implementation there is no APIs like slice for record batch > like C++/Python. > This issue is about to implement slice/getVector/addVector/removeVector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6871) [Java] Enhance TransferPair related parameters check and tests
Ji Liu created ARROW-6871: - Summary: [Java] Enhance TransferPair related parameters check and tests Key: ARROW-6871 URL: https://issues.apache.org/jira/browse/ARROW-6871 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu {{TransferPair}} related param checks in different classes have potential problems: i. {{copyValueSafe}} do not check from index, if from > valueCount, no error is shown. ii. {{splitAndTansferPair}} has no indices check in classes like {{VarcharVector}} iii. {{splitAndTranserPair}} indices check in classes like UnionVector is not correct (Preconditions.checkArgument(startIndex + length <= valueCount)), should check params separately. iv. some assert usages should be replaced with {{Preconditions}}. v. should add more UT to cover corner cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6871) [Java] Enhance TransferPair related parameters check and tests
[ https://issues.apache.org/jira/browse/ARROW-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-6871: -- Description: {{TransferPair}} related param checks in different classes have potential problems: i. {{copyValueSafe}} do not check from index, if from > valueCount, no error is shown. ii. {{splitAndTansfer}} has no indices check in classes like {{VarcharVector}} iii. {{splitAndTranser}} indices check in classes like UnionVector is not correct (Preconditions.checkArgument(startIndex + length <= valueCount)), should check params separately. iv. some assert usages should be replaced with {{Preconditions}}. v. should add more UT to cover corner cases. was: {{TransferPair}} related param checks in different classes have potential problems: i. {{copyValueSafe}} do not check from index, if from > valueCount, no error is shown. ii. {{splitAndTansferPair}} has no indices check in classes like {{VarcharVector}} iii. {{splitAndTranserPair}} indices check in classes like UnionVector is not correct (Preconditions.checkArgument(startIndex + length <= valueCount)), should check params separately. iv. some assert usages should be replaced with {{Preconditions}}. v. should add more UT to cover corner cases. > [Java] Enhance TransferPair related parameters check and tests > -- > > Key: ARROW-6871 > URL: https://issues.apache.org/jira/browse/ARROW-6871 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > > {{TransferPair}} related param checks in different classes have potential > problems: > i. {{copyValueSafe}} do not check from index, if from > valueCount, no error > is shown. > ii. {{splitAndTansfer}} has no indices check in classes like {{VarcharVector}} > iii. {{splitAndTranser}} indices check in classes like UnionVector is not > correct (Preconditions.checkArgument(startIndex + length <= valueCount)), > should check params separately. > iv. some assert usages should be replaced with {{Preconditions}}. > v. should add more UT to cover corner cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6871) [Java] Enhance TransferPair related parameters check and tests
[ https://issues.apache.org/jira/browse/ARROW-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950754#comment-16950754 ] Ji Liu commented on ARROW-6871: --- Thanks for your reminder, I will also add a benchmark, if there's no much regression, params check is needed/corrected to avoid potential problems. > [Java] Enhance TransferPair related parameters check and tests > -- > > Key: ARROW-6871 > URL: https://issues.apache.org/jira/browse/ARROW-6871 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > > {{TransferPair}} related param checks in different classes have potential > problems: > i. {{copyValueSafe}} do not check from index, if from > valueCount, no error > is shown. > ii. {{splitAndTansfer}} has no indices check in classes like {{VarcharVector}} > iii. {{splitAndTranser}} indices check in classes like UnionVector is not > correct (Preconditions.checkArgument(startIndex + length <= valueCount)), > should check params separately. > iv. some assert usages should be replaced with {{Preconditions}}. > v. should add more UT to cover corner cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6887) [Java] Create prose documentation for using ValueVectors
[ https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-6887: - Assignee: Ji Liu > [Java] Create prose documentation for using ValueVectors > > > Key: ARROW-6887 > URL: https://issues.apache.org/jira/browse/ARROW-6887 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > > We should create documentation for the library that demonstrates: > 1. Basic construction of ValueVectors. Highlighting: > * ValueVector lifecycle > * Reading by rows using Readers (mentioning that it is not as efficient > as direct access). > * Populating with Writers > 2. Reading and writing IPC stream format and file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-2892) [Plasma] Implement interface to get Java arrow objects from Plasma
[ https://issues.apache.org/jira/browse/ARROW-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951637#comment-16951637 ] Ji Liu commented on ARROW-2892: --- [~emkornfi...@gmail.com] I'll take a close watch later. > [Plasma] Implement interface to get Java arrow objects from Plasma > -- > > Key: ARROW-2892 > URL: https://issues.apache.org/jira/browse/ARROW-2892 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma, Java >Reporter: Philipp Moritz >Priority: Major > > Currently we have a low level interface to access bytes stored in plasma from > Java, using the JNI: [https://github.com/apache/arrow/pull/2065/] > > As a followup, we should implement reading (and writing) Java arrow objects > from plasma, if possible using zero-copy. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors
[ https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951647#comment-16951647 ] Ji Liu commented on ARROW-6887: --- Where should we place the documentation? > [Java] Create prose documentation for using ValueVectors > > > Key: ARROW-6887 > URL: https://issues.apache.org/jira/browse/ARROW-6887 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > > We should create documentation (in restructured text) for the library that > demonstrates: > 1. Basic construction of ValueVectors. Highlighting: > * ValueVector lifecycle > * Reading by rows using Readers (mentioning that it is not as efficient > as direct access). > * Populating with Writers > 2. Reading and writing IPC stream format and file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6889) [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor StackOverFlow
Ji Liu created ARROW-6889: - Summary: [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor StackOverFlow Key: ARROW-6889 URL: https://issues.apache.org/jira/browse/ARROW-6889 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Ji Liu Assignee: Ji Liu i. Enable {{ComplexCopier}} copy {{FixedSizeListVector}} value, add related tests ii. Fix {{RangeEqualsVisitor#compareFixedSizeListVectors}} StackOverFlow -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6898) [Java] Fix potential memory leak in ArrowWriter and several test classes
Ji Liu created ARROW-6898: - Summary: [Java] Fix potential memory leak in ArrowWriter and several test classes Key: ARROW-6898 URL: https://issues.apache.org/jira/browse/ARROW-6898 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu ARROW-6040 fixed the problem that dictionary entries are required in IPC streams even when empty, which only writes dictionaries when there are at least one batch. In this way, if we write empty stream and invoke ArrowWriter#close, the dictionaries are not closed leading to memory leak (they are closed after the write operation), and it’s really hard to debug, this problem was found by {{TestArrowReaderWriter#testEmptyStreamInStreamingIPC}} when I tried to close allocator after the test. Besides, there are several test classes have potential memory leak without closing allocator/vector/buf etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6912) [Java] Extract a common base class for avro converter consumers
Ji Liu created ARROW-6912: - Summary: [Java] Extract a common base class for avro converter consumers Key: ARROW-6912 URL: https://issues.apache.org/jira/browse/ARROW-6912 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Ji Liu Assignee: Ji Liu Currently Avro converter consumers have some common variables and methods which could be eliminated by extracting a common class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6912) [Java] Extract a common base class for avro converter consumers
[ https://issues.apache.org/jira/browse/ARROW-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-6912: -- Parent: ARROW-5845 Issue Type: Sub-task (was: Improvement) > [Java] Extract a common base class for avro converter consumers > --- > > Key: ARROW-6912 > URL: https://issues.apache.org/jira/browse/ARROW-6912 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > > Currently Avro converter consumers have some common variables and methods > which could be eliminated by extracting a common class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6930) [Java] Create static factory methods for common array types of testing.
[ https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-6930: - Assignee: Ji Liu > [Java] Create static factory methods for common array types of testing. > --- > > Key: ARROW-6930 > URL: https://issues.apache.org/jira/browse/ARROW-6930 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Minor > > There is a lot of verbosity in the construction of Arrays for testing > purposes (multiple lines of setSafe(...) or set(...). We should start adding > some static factory methods to make test setup clearer and more concise. A > strawman proposal for BigIntVector might look like: > > static BigIntVector create(String name, BufferAllocator allocator, Long... > values). > > Usage would be something like: > > try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, > null, 456L), > BigIntVector expected = BigIntVectorCreate("sample_data", allocator, > 1L, null, 0L),) { > output = doSomethingWith(input); > assertThat(output).isEqualTo(expected); > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6931) [Java] Consider starting to use Google Truth Fluent Assertions library
[ https://issues.apache.org/jira/browse/ARROW-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-6931: - Assignee: Ji Liu > [Java] Consider starting to use Google Truth Fluent Assertions library > -- > > Key: ARROW-6931 > URL: https://issues.apache.org/jira/browse/ARROW-6931 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > > This can offer more readable asserts than the limited JUnit assertions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6896) [Java] Vector schema root should not share vectors
[ https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956607#comment-16956607 ] Ji Liu commented on ARROW-6896: --- I was a little confused about VectorSchemaRoot before (why not just call it RecordBatch), and recently I found it was a little different with other implementations when I writing documentation, for example, for IPC, the java reader will always hold the same vector schema root and updates for every call for loadNextBatch, but in python side, it uses different batches in writer/reader. From this perspective, I think VectorSchemaRoot != a record batch is reasonable. Just wonder why the implementation is different in Java? The addColumn/removeColumn API was introduced by my recent PR(after 0.15) which I regard is as a ‘record batch'. If we finally reach consistent and want to make some fix on it, it's better include it into 0.15.1 I think, so the mistake wouldn't exposed to users. > [Java] Vector schema root should not share vectors > -- > > Key: ARROW-6896 > URL: https://issues.apache.org/jira/browse/ARROW-6896 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Vector schema root should not share vectors. Otherwise, unexpectd behavior > would happen. > Please note that VectorSchemaRoot is not just a container for vectors, it is > also a resource (it implements the AutoClosable interface), and it manages > the life cycle of its inner vectors. > When two VectorSchemaRoots share vectors, something unexpected may happen. > Consider the following scenario, which is frequently encountered in a SQL > engine. > 1. We create a batch: > VectorSchemaRoot oldBatch = ... > 2. We add a vector to it, which results in a new batch > VectorSchemaRoot newBatch = oldBatch.addVector(vector); > 3. We are done with the old batch, and release the resource > oldBatch.close(); > 4. We continue to use the new batch, but gets an exception, because some > inner vectors have been released by the old batch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7021) [Java] UnionFixedSizeListWriter decimal type should check writer index
Ji Liu created ARROW-7021: - Summary: [Java] UnionFixedSizeListWriter decimal type should check writer index Key: ARROW-7021 URL: https://issues.apache.org/jira/browse/ARROW-7021 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu {{UnionFixedSizeListWriter}} should check writer index for decimal type (just as other types) to ensure the values written not exceed listSize. Otherwise, the writer may continue to write data into it’s underlying vector quietly even the the writer.idx() > listSize * index. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6930) [Java] Create utility class for common array types of testing
[ https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-6930: -- Summary: [Java] Create utility class for common array types of testing (was: [Java] Create static factory methods for common array types of testing.) > [Java] Create utility class for common array types of testing > - > > Key: ARROW-6930 > URL: https://issues.apache.org/jira/browse/ARROW-6930 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > There is a lot of verbosity in the construction of Arrays for testing > purposes (multiple lines of setSafe(...) or set(...). We should start adding > some static factory methods to make test setup clearer and more concise. A > strawman proposal for BigIntVector might look like: > > static BigIntVector create(String name, BufferAllocator allocator, Long... > values). > > Usage would be something like: > > try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, > null, 456L), > BigIntVector expected = BigIntVectorCreate("sample_data", allocator, > 1L, null, 0L),) { > output = doSomethingWith(input); > assertThat(output).isEqualTo(expected); > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7026) [Java] Remove assertions in MessageSerializer/vector/writer/reader
Ji Liu created ARROW-7026: - Summary: [Java] Remove assertions in MessageSerializer/vector/writer/reader Key: ARROW-7026 URL: https://issues.apache.org/jira/browse/ARROW-7026 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Ji Liu Assignee: Ji Liu Currently assertions exists in many classes like {{MessagaSerializer/JsonReader/JsonWriter/ListVector}} etc. i. If jvm arguments are not specified, these checks will skipped and lead to potential problems. ii. Java errors produced by failed assertions are not caught by traditional catch clauses. To fix this, use {{Preconditions}} instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7026) [Java] Remove assertions in MessageSerializer/vector/writer/reader
[ https://issues.apache.org/jira/browse/ARROW-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962768#comment-16962768 ] Ji Liu commented on ARROW-7026: --- cc [~jnadeau] [~emkornfi...@gmail.com] > [Java] Remove assertions in MessageSerializer/vector/writer/reader > -- > > Key: ARROW-7026 > URL: https://issues.apache.org/jira/browse/ARROW-7026 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > > Currently assertions exists in many classes like > {{MessagaSerializer/JsonReader/JsonWriter/ListVector}} etc. > i. If jvm arguments are not specified, these checks will skipped and lead to > potential problems. > ii. Java errors produced by failed assertions are not caught by traditional > catch clauses. > To fix this, use {{Preconditions}} instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6931) [Java] Consider starting to use Google Truth Fluent Assertions library
[ https://issues.apache.org/jira/browse/ARROW-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963584#comment-16963584 ] Ji Liu commented on ARROW-6931: --- Add Google Truth Fluent Assertions library dependency, so we can choose which one to use in the test? Also dose this need a discussion in ML? > [Java] Consider starting to use Google Truth Fluent Assertions library > -- > > Key: ARROW-6931 > URL: https://issues.apache.org/jira/browse/ARROW-6931 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > > This can offer more readable asserts than the limited JUnit assertions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7026) [Java] Remove assertions in MessageSerializer/vector/writer/reader
[ https://issues.apache.org/jira/browse/ARROW-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966597#comment-16966597 ] Ji Liu commented on ARROW-7026: --- Besides, some assertions are in the hot path(i.e. {{VarCharVector}} set and get API), for those usages, should we use Preconditions or just ignore them to avoid potential performance regression? > [Java] Remove assertions in MessageSerializer/vector/writer/reader > -- > > Key: ARROW-7026 > URL: https://issues.apache.org/jira/browse/ARROW-7026 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > > Currently assertions exists in many classes like > {{MessagaSerializer/JsonReader/JsonWriter/ListVector}} etc. > i. If jvm arguments are not specified, these checks will skipped and lead to > potential problems. > ii. Java errors produced by failed assertions are not caught by traditional > catch clauses. > To fix this, use {{Preconditions}} instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7152) [Java] Delete useless class DiffFunction
Ji Liu created ARROW-7152: - Summary: [Java] Delete useless class DiffFunction Key: ARROW-7152 URL: https://issues.apache.org/jira/browse/ARROW-7152 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu {{DiffFunction}} was used in the initial implementation of visitors, since currently visitors logic has been refactored and this class no longer useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6930) [Java] Create utility class for populating vector values used for test purpose only
[ https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-6930: -- Issue Type: New Feature (was: Improvement) > [Java] Create utility class for populating vector values used for test > purpose only > --- > > Key: ARROW-6930 > URL: https://issues.apache.org/jira/browse/ARROW-6930 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 10h 20m > Remaining Estimate: 0h > > There is a lot of verbosity in the construction of Arrays for testing > purposes (multiple lines of setSafe(...) or set(...). We should start adding > some static factory methods to make test setup clearer and more concise. A > strawman proposal for BigIntVector might look like: > > static BigIntVector create(String name, BufferAllocator allocator, Long... > values). > > Usage would be something like: > > try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, > null, 456L), > BigIntVector expected = BigIntVectorCreate("sample_data", allocator, > 1L, null, 0L),) { > output = doSomethingWith(input); > assertThat(output).isEqualTo(expected); > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6930) [Java] Create utility class for populating vector values used for test purpose only
[ https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-6930: -- Summary: [Java] Create utility class for populating vector values used for test purpose only (was: [Java] Create utility class for common array types of testing) > [Java] Create utility class for populating vector values used for test > purpose only > --- > > Key: ARROW-6930 > URL: https://issues.apache.org/jira/browse/ARROW-6930 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 10h 20m > Remaining Estimate: 0h > > There is a lot of verbosity in the construction of Arrays for testing > purposes (multiple lines of setSafe(...) or set(...). We should start adding > some static factory methods to make test setup clearer and more concise. A > strawman proposal for BigIntVector might look like: > > static BigIntVector create(String name, BufferAllocator allocator, Long... > values). > > Usage would be something like: > > try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, > null, 456L), > BigIntVector expected = BigIntVectorCreate("sample_data", allocator, > 1L, null, 0L),) { > output = doSomethingWith(input); > assertThat(output).isEqualTo(expected); > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6930) [Java] Create utility class for populating vector values used for test purpose only
[ https://issues.apache.org/jira/browse/ARROW-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-6930: -- Description: There is a lot of verbosity in the construction of Arrays for testing purposes (multiple lines of setSafe(...) or set(...). We should start adding a utility class to make test setup clearer and more concise, note this class should be located in arrow-vector test package and could be used in other module’s testing by adding dependency: {{}} {{org.apache.arrow}} {{arrow-vector}} {{${project.version}}} {{tests}} {{test-jar}} {{test}} {{}} Usage would be something like: {quote}try (IntVector vector = new IntVector(“vector”, allocator)) { ValueVectorPopulator.setVector(vector, 1, 2, null, 4, 5); output = doSomethingWith(input); assertThat(output).isEqualTo(expected); } {quote} was: There is a lot of verbosity in the construction of Arrays for testing purposes (multiple lines of setSafe(...) or set(...). We should start adding some static factory methods to make test setup clearer and more concise. A strawman proposal for BigIntVector might look like: static BigIntVector create(String name, BufferAllocator allocator, Long... values). Usage would be something like: try (BigIntVector input = BigIntVectorCreate("sample_data", allocator, 1235L, null, 456L), BigIntVector expected = BigIntVectorCreate("sample_data", allocator, 1L, null, 0L),) { output = doSomethingWith(input); assertThat(output).isEqualTo(expected); } > [Java] Create utility class for populating vector values used for test > purpose only > --- > > Key: ARROW-6930 > URL: https://issues.apache.org/jira/browse/ARROW-6930 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 11h 10m > Remaining Estimate: 0h > > There is a lot of verbosity in the construction of Arrays for testing > purposes (multiple lines of setSafe(...) or set(...). > We should start adding a utility class to make test setup clearer and more > concise, note this class should be located in arrow-vector test package and > could be used in other module’s testing by adding dependency: > {{}} > {{org.apache.arrow}} > {{arrow-vector}} > {{${project.version}}} > {{tests}} > {{test-jar}} > {{test}} > {{}} > Usage would be something like: > {quote}try (IntVector vector = new IntVector(“vector”, allocator)) { > ValueVectorPopulator.setVector(vector, 1, 2, null, 4, 5); > output = doSomethingWith(input); > assertThat(output).isEqualTo(expected); > } > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors
[ https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974842#comment-16974842 ] Ji Liu commented on ARROW-6887: --- Seems the web content([http://arrow.apache.org/docs/]) is not updated yet, should we do something to make this docs work? [~wesm] [~emkornfi...@gmail.com] > [Java] Create prose documentation for using ValueVectors > > > Key: ARROW-6887 > URL: https://issues.apache.org/jira/browse/ARROW-6887 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > We should create documentation (in restructured text) for the library that > demonstrates: > 1. Basic construction of ValueVectors. Highlighting: > * ValueVector lifecycle > * Reading by rows using Readers (mentioning that it is not as efficient > as direct access). > * Populating with Writers > 2. Reading and writing IPC stream format and file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6019) [Java] Port Jdbc and Avro adapter to new directory
[ https://issues.apache.org/jira/browse/ARROW-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu closed ARROW-6019. - Resolution: Won't Do > [Java] Port Jdbc and Avro adapter to new directory > --- > > Key: ARROW-6019 > URL: https://issues.apache.org/jira/browse/ARROW-6019 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > > As discussed in mail list, adapters are different from native reader. > This issue is used to track these issues: > i. create new “contrib” directory and move Jdbc/Avro adapter to it. > ii. provide more description. > iii. change orc readers structure to “converter" > cc [~emkornfi...@gmail.com] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-1175) [Java] Implement/test dictionary-encoded subfields
[ https://issues.apache.org/jira/browse/ARROW-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu resolved ARROW-1175. --- Resolution: Fixed > [Java] Implement/test dictionary-encoded subfields > -- > > Key: ARROW-1175 > URL: https://issues.apache.org/jira/browse/ARROW-1175 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Wes McKinney >Assignee: Ji Liu >Priority: Major > Fix For: 1.0.0 > > > We do not have any tests about types like: > {code} > List > {code} > cc [~julienledem] [~elahrvivaz] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6600) [Java] Implement dictionary-encoded subfields for Union type
[ https://issues.apache.org/jira/browse/ARROW-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu closed ARROW-6600. - Resolution: Later > [Java] Implement dictionary-encoded subfields for Union type > > > Key: ARROW-6600 > URL: https://issues.apache.org/jira/browse/ARROW-6600 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Implement dictionary-encoded subfields for {{Union}} type. Each child vector > could be encodable or not. > > Meanwhile extra common logic into {{DictionaryEncoder}} as well as refactor > List subfield encoding to keep consistent with {{Struct/Union}} type. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors
[ https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974971#comment-16974971 ] Ji Liu commented on ARROW-6887: --- I see. > [Java] Create prose documentation for using ValueVectors > > > Key: ARROW-6887 > URL: https://issues.apache.org/jira/browse/ARROW-6887 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > We should create documentation (in restructured text) for the library that > demonstrates: > 1. Basic construction of ValueVectors. Highlighting: > * ValueVector lifecycle > * Reading by rows using Readers (mentioning that it is not as efficient > as direct access). > * Populating with Writers > 2. Reading and writing IPC stream format and file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7259) [Java] Support subfield encoder use different hasher
Ji Liu created ARROW-7259: - Summary: [Java] Support subfield encoder use different hasher Key: ARROW-7259 URL: https://issues.apache.org/jira/browse/ARROW-7259 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Ji Liu Assignee: Ji Liu Currently {{ListSubFieldEncoder/StructSubFieldEncoder}} use default hasher for calculating hashCode. This issue enables them to use different hasher or even user-defined hasher for their own use cases just like {{DictionaryEncoder}} does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7264) [Java] RangeEqualsVisitor type check is not correct
Ji Liu created ARROW-7264: - Summary: [Java] RangeEqualsVisitor type check is not correct Key: ARROW-7264 URL: https://issues.apache.org/jira/browse/ARROW-7264 Project: Apache Arrow Issue Type: Bug Components: Java Affects Versions: 0.15.1 Reporter: Ji Liu Assignee: Ji Liu Currently {{RangeEqualsVisitor}} generally only checks type once and keep the result to avoid repeated type checking, see {code:java} typeCompareResult = left.getField().getType().equals(right.getField().getType()); {code} This only compares {{ArrowType}} and for complex type, this may cause unexpected behavior, for example {{List}} and {{List}} would be type equals which not consider their child field. We should compare Field here instead and to make it more extendable, we use {{TypeEqualsVisitor}} to compare Field, in this way, one could choose whether checks names or metadata either. Also provide a test for ListVector to validate this change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6889) [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor StackOverFlow
[ https://issues.apache.org/jira/browse/ARROW-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu resolved ARROW-6889. --- Fix Version/s: 1.0.0 Resolution: Fixed > [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor > StackOverFlow > > > Key: ARROW-6889 > URL: https://issues.apache.org/jira/browse/ARROW-6889 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > i. Enable {{ComplexCopier}} copy {{FixedSizeListVector}} value, add related > tests > ii. Fix {{RangeEqualsVisitor#compareFixedSizeListVectors}} StackOverFlow -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6889) [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor StackOverFlow
[ https://issues.apache.org/jira/browse/ARROW-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983345#comment-16983345 ] Ji Liu commented on ARROW-6889: --- Issue resolved by pull request 5660: [https://github.com/apache/arrow/pull/5660] > [Java] ComplexCopier enable FixedSizeList type & fix RangeEualsVisitor > StackOverFlow > > > Key: ARROW-6889 > URL: https://issues.apache.org/jira/browse/ARROW-6889 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > i. Enable {{ComplexCopier}} copy {{FixedSizeListVector}} value, add related > tests > ii. Fix {{RangeEqualsVisitor#compareFixedSizeListVectors}} StackOverFlow -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7284) [Java] ensure java implementation meets clarified dictionary spec
[ https://issues.apache.org/jira/browse/ARROW-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-7284: - Assignee: Ji Liu > [Java] ensure java implementation meets clarified dictionary spec > - > > Key: ARROW-7284 > URL: https://issues.apache.org/jira/browse/ARROW-7284 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Fix For: 1.0.0 > > > see parent issue. > > CC [~tianchen92] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7405) [Java] ListVector isEmpty API is incorrect
Ji Liu created ARROW-7405: - Summary: [Java] ListVector isEmpty API is incorrect Key: ARROW-7405 URL: https://issues.apache.org/jira/browse/ARROW-7405 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Currently {{isEmpty}} API is always return false in {{BaseRepeatedValueVector}}, and its subclass {{ListVector}} did not overwrite this method. This will lead to incorrect result, for example, a {{ListVector}} with data [1,2], null, [], [5,6] should get [false, false, true, false] with this API, but now it would return [false, false, false, false]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7406) [Java] NonNullableStructVector#hashCode should pass hasher to child vectors
Ji Liu created ARROW-7406: - Summary: [Java] NonNullableStructVector#hashCode should pass hasher to child vectors Key: ARROW-7406 URL: https://issues.apache.org/jira/browse/ARROW-7406 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu This was introduced by ARROW-6866 making parameter hasher useless in hashCode(int index, {{ArrowBufHasher}} hasher), and the child vectors would calculate hashCode using default hasher which is not correct. This issue should be fixed by passing hasher to child vector when calculating hashCode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7425) [Java] PromotableWriter support writing FixedSizeList type data
Ji Liu created ARROW-7425: - Summary: [Java] PromotableWriter support writing FixedSizeList type data Key: ARROW-7425 URL: https://issues.apache.org/jira/browse/ARROW-7425 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Ji Liu Assignee: Ji Liu We have introduced writer API for {{FixedSizeListVector}} via ARROW-6079, but {{PromotableWriter}}’s support for it is incomplete. For example, using {{UnionListWriter}} we could simply write {{List}} type data, but for {{List}} or {{FixedSizeList}} it doesn’t work. This issue is about to enhance the {{PromotableWriter}} support for {{FixedSizeList}} type and add tests to verify the cases mentioned above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7467) [Java] ComplexCopier does incorrect copy for Map nullable info
Ji Liu created ARROW-7467: - Summary: [Java] ComplexCopier does incorrect copy for Map nullable info Key: ARROW-7467 URL: https://issues.apache.org/jira/browse/ARROW-7467 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu The {{MapVector}} and its 'value' vector are nullable, and its {{structVector}} and 'key' vector are non-nullable. However, the {{MapVector}} generated by ComplexCopier has all nullable fields which is not correct. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7472) [Java] Fix some incorrect behavior in UnionListWriter
Ji Liu created ARROW-7472: - Summary: [Java] Fix some incorrect behavior in UnionListWriter Key: ARROW-7472 URL: https://issues.apache.org/jira/browse/ARROW-7472 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Currently the {{UnionListWriter/UnionFixedSizeListWriter}} {{getField/close}} APIs seems incorrect. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-7425) [Java] PromotableWriter support writing FixedSizeList type data
[ https://issues.apache.org/jira/browse/ARROW-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu closed ARROW-7425. - Resolution: Later > [Java] PromotableWriter support writing FixedSizeList type data > --- > > Key: ARROW-7425 > URL: https://issues.apache.org/jira/browse/ARROW-7425 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We have introduced writer API for {{FixedSizeListVector}} via ARROW-6079, but > {{PromotableWriter}}’s support for it is incomplete. > For example, using {{UnionListWriter}} we could simply write {{List}} > type data, but for {{List}} or > {{FixedSizeList}} it doesn’t work. > This issue is about to enhance the {{PromotableWriter}} support for > {{FixedSizeList}} type and add tests to verify the cases mentioned above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7490) [Java] Avro converter should convert attributes and props to FieldType metadata
Ji Liu created ARROW-7490: - Summary: [Java] Avro converter should convert attributes and props to FieldType metadata Key: ARROW-7490 URL: https://issues.apache.org/jira/browse/ARROW-7490 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Currently in Avro converter, some attributes are used when creating vectors such as “name”, “size” etc, others are discarded. For named type like Record, Enum and Fixed, they may have attributes like “doc” “aliased” which should keep in metadata for potential further use. Besides, properties are also not converted properly in some cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7490) [Java] Avro converter should convert attributes and props to FieldType metadata
[ https://issues.apache.org/jira/browse/ARROW-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-7490: -- Parent: ARROW-5845 Issue Type: Sub-task (was: Bug) > [Java] Avro converter should convert attributes and props to FieldType > metadata > --- > > Key: ARROW-7490 > URL: https://issues.apache.org/jira/browse/ARROW-7490 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently in Avro converter, some attributes are used when creating vectors > such as “name”, “size” etc, others are discarded. > For named type like Record, Enum and Fixed, they may have attributes like > “doc” “aliased” which should keep in metadata for potential further use. > Besides, properties are also not converted properly in some cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager
[ https://issues.apache.org/jira/browse/ARROW-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-7495: - Assignee: Ji Liu > [Java] Remove "empty" concept from ArrowBuf, replace with custom > referencemanager > - > > Key: ARROW-7495 > URL: https://issues.apache.org/jira/browse/ARROW-7495 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Jacques Nadeau >Assignee: Ji Liu >Priority: Major > Fix For: 1.0.0 > > > With the introduction of ReferenceManager in the codebase, the need for a > separate ArrowBuf is no longer necessary. Instead, once can create a new > reference manager that is used for the empty ArrowBuf. For reminder/review, > empty arrowbufs have a special behavior in that they don't actually have any > reference counting semantics and always stay at one. This allow us to better > troubleshoot unallocated memory than what would otherwise be an NPE after > calling ValueVector.clear() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf
[ https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-7494: - Assignee: Ji Liu > [Java] Remove reader index and writer index from ArrowBuf > - > > Key: ARROW-7494 > URL: https://issues.apache.org/jira/browse/ARROW-7494 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Jacques Nadeau >Assignee: Ji Liu >Priority: Critical > Fix For: 1.0.0 > > > Reader and writer index and functionality doesn't belong on a chunk of memory > and is due to inheritance from ByteBuf. As part of removing ByteBuf > inheritance, we should also remove reader and writer indexes from ArrowBuf > functionality. It wastes heap memory for rare utility. In general, a slice > can be used instead of a reader/writer index pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf
[ https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008693#comment-17008693 ] Ji Liu commented on ARROW-7494: --- ??In general, a slice can be used instead of a reader/writer index pattern.?? [~jnadeau] , I don't quite understand, dose this mean hold a slice {{ArrowBuf}} in {{ArrowBuf}} to replace reader/writer index? doesn't this introduce more heap memory? Could you please explain a little more, thanks a lot! > [Java] Remove reader index and writer index from ArrowBuf > - > > Key: ARROW-7494 > URL: https://issues.apache.org/jira/browse/ARROW-7494 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Jacques Nadeau >Assignee: Ji Liu >Priority: Critical > Fix For: 1.0.0 > > > Reader and writer index and functionality doesn't belong on a chunk of memory > and is due to inheritance from ByteBuf. As part of removing ByteBuf > inheritance, we should also remove reader and writer indexes from ArrowBuf > functionality. It wastes heap memory for rare utility. In general, a slice > can be used instead of a reader/writer index pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf
[ https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009688#comment-17009688 ] Ji Liu commented on ARROW-7494: --- Thanks for your clarification, I opened a PR, please help take a look, thanks! > [Java] Remove reader index and writer index from ArrowBuf > - > > Key: ARROW-7494 > URL: https://issues.apache.org/jira/browse/ARROW-7494 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Jacques Nadeau >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Reader and writer index and functionality doesn't belong on a chunk of memory > and is due to inheritance from ByteBuf. As part of removing ByteBuf > inheritance, we should also remove reader and writer indexes from ArrowBuf > functionality. It wastes heap memory for rare utility. In general, a slice > can be used instead of a reader/writer index pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf
[ https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012422#comment-17012422 ] Ji Liu commented on ARROW-7494: --- I met a Gandiva java test fail problem in this issue and I am not familiar with this. Could you please give a detailed guidance for how to run gandiva java test locally? :) thanks! [~pravindra] > [Java] Remove reader index and writer index from ArrowBuf > - > > Key: ARROW-7494 > URL: https://issues.apache.org/jira/browse/ARROW-7494 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Jacques Nadeau >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 0.16.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Reader and writer index and functionality doesn't belong on a chunk of memory > and is due to inheritance from ByteBuf. As part of removing ByteBuf > inheritance, we should also remove reader and writer indexes from ArrowBuf > functionality. It wastes heap memory for rare utility. In general, a slice > can be used instead of a reader/writer index pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7539) [Java] FieldVector getFieldBuffers API should not set reader/writer indices
Ji Liu created ARROW-7539: - Summary: [Java] FieldVector getFieldBuffers API should not set reader/writer indices Key: ARROW-7539 URL: https://issues.apache.org/jira/browse/ARROW-7539 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Per discussion [https://github.com/apache/arrow/pull/6133#discussion_r364906302]. The fact that we have reader/writer settings in {{getFieldBuffers}} is wrong. To clarify, {{getFieldBuffers}} is distinct from {{getBuffers}}. The former should be for getting access to underlying data for higher-performance algorithms. The latter is for sending the data over the wire. Seems we've mixed up use of both. Currently in {{VectorUnloader}}, we used {{getFieldBuffers}} to create {{ArrowRecordBatch}} that’s why we keep writer/reader indices in {{getFieldBuffers}}, we should use {{getBuffers}} instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7546) [Java] Use new implementation to concat vectors values in batch
Ji Liu created ARROW-7546: - Summary: [Java] Use new implementation to concat vectors values in batch Key: ARROW-7546 URL: https://issues.apache.org/jira/browse/ARROW-7546 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Per discussion https://github.com/apache/arrow/pull/5945#discussion_r365108806. In ARROW-7284, we write a simple method to concat vectors. However, ARROW-7073 is about to concat vector values efficiently, after this PR merged, we should use this new implementation in {{ArrowReader}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized
[ https://issues.apache.org/jira/browse/ARROW-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-7549: - Assignee: Ji Liu > [Java] Reorganize Flight modules to keep top level clean/organized > -- > > Key: ARROW-7549 > URL: https://issues.apache.org/jira/browse/ARROW-7549 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Jacques Nadeau >Assignee: Ji Liu >Priority: Major > > Lets create a flight parent module and then create the following below: > flight-core (existing flight module) > flight-grpc (existing flight-grpc module) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized
[ https://issues.apache.org/jira/browse/ARROW-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013409#comment-17013409 ] Ji Liu commented on ARROW-7549: --- I opened a PR, please help take a look. > [Java] Reorganize Flight modules to keep top level clean/organized > -- > > Key: ARROW-7549 > URL: https://issues.apache.org/jira/browse/ARROW-7549 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Jacques Nadeau >Assignee: Ji Liu >Priority: Major > > Lets create a flight parent module and then create the following below: > flight-core (existing flight module) > flight-grpc (existing flight-grpc module) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf
[ https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-7494: -- Fix Version/s: (was: 0.16.0) 1.0.0 > [Java] Remove reader index and writer index from ArrowBuf > - > > Key: ARROW-7494 > URL: https://issues.apache.org/jira/browse/ARROW-7494 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Jacques Nadeau >Assignee: Ji Liu >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Reader and writer index and functionality doesn't belong on a chunk of memory > and is due to inheritance from ByteBuf. As part of removing ByteBuf > inheritance, we should also remove reader and writer indexes from ArrowBuf > functionality. It wastes heap memory for rare utility. In general, a slice > can be used instead of a reader/writer index pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7713) [Java] TastLeak was put at the wrong location
Ji Liu created ARROW-7713: - Summary: [Java] TastLeak was put at the wrong location Key: ARROW-7713 URL: https://issues.apache.org/jira/browse/ARROW-7713 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Ji Liu Assignee: Ji Liu Seems {{TestLeak.java}} was put at the wrong place, we should move it into {{flight-core}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors
[ https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047169#comment-17047169 ] Ji Liu commented on ARROW-6887: --- Seems the docs we added in this issue didn't work in website? [~emkornfi...@gmail.com] [~wesm] [http://arrow.apache.org/docs/java/index.html] > [Java] Create prose documentation for using ValueVectors > > > Key: ARROW-6887 > URL: https://issues.apache.org/jira/browse/ARROW-6887 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > We should create documentation (in restructured text) for the library that > demonstrates: > 1. Basic construction of ValueVectors. Highlighting: > * ValueVector lifecycle > * Reading by rows using Readers (mentioning that it is not as efficient > as direct access). > * Populating with Writers > 2. Reading and writing IPC stream format and file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6594) [Java] Support logical type encodings from Avro
[ https://issues.apache.org/jira/browse/ARROW-6594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050754#comment-17050754 ] Ji Liu commented on ARROW-6594: --- [~emkornfi...@gmail.com] Avro also has logical type uuid and duration which not overlap with Arrow type, should we also support them with {{ExtensionTypeVector}}? > [Java] Support logical type encodings from Avro > --- > > Key: ARROW-6594 > URL: https://issues.apache.org/jira/browse/ARROW-6594 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: avro, pull-request-available > Fix For: 0.16.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Avro supports some logical types that overlap with Arrow logical types > ([http://avro.apache.org/docs/current/spec.html#Logical+Types) > |http://avro.apache.org/docs/current/spec.html#Logical+Types] > > For the ones that overlap, we should use the appropriate Arrow Logical type > array instead of the raw values. > > it potentially makes sense to break this down further into sub-tasks for each > logical type. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6887) [Java] Create prose documentation for using ValueVectors
[ https://issues.apache.org/jira/browse/ARROW-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050940#comment-17050940 ] Ji Liu commented on ARROW-6887: --- I see > [Java] Create prose documentation for using ValueVectors > > > Key: ARROW-6887 > URL: https://issues.apache.org/jira/browse/ARROW-6887 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > We should create documentation (in restructured text) for the library that > demonstrates: > 1. Basic construction of ValueVectors. Highlighting: > * ValueVector lifecycle > * Reading by rows using Readers (mentioning that it is not as efficient > as direct access). > * Populating with Writers > 2. Reading and writing IPC stream format and file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6594) [Java] Support logical type encodings from Avro
[ https://issues.apache.org/jira/browse/ARROW-6594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050952#comment-17050952 ] Ji Liu commented on ARROW-6594: --- ok, I see, thanks > [Java] Support logical type encodings from Avro > --- > > Key: ARROW-6594 > URL: https://issues.apache.org/jira/browse/ARROW-6594 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: avro, pull-request-available > Fix For: 0.16.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Avro supports some logical types that overlap with Arrow logical types > ([http://avro.apache.org/docs/current/spec.html#Logical+Types) > |http://avro.apache.org/docs/current/spec.html#Logical+Types] > > For the ones that overlap, we should use the appropriate Arrow Logical type > array instead of the raw values. > > it potentially makes sense to break this down further into sub-tasks for each > logical type. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8019) [Java] Implement vector diff functionality
Ji Liu created ARROW-8019: - Summary: [Java] Implement vector diff functionality Key: ARROW-8019 URL: https://issues.apache.org/jira/browse/ARROW-8019 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Ji Liu Assignee: Ji Liu In C++ side, we already have array diff functionality for vector equals and testing to make it easy to see differences between Arrays and reduce debugging time. And it’s better to do something similar in Java side for better testing facilities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8020) [Java] Implement vector validate functionality
Ji Liu created ARROW-8020: - Summary: [Java] Implement vector validate functionality Key: ARROW-8020 URL: https://issues.apache.org/jira/browse/ARROW-8020 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Ji Liu Assignee: Ji Liu In C++ side, we already have array validate functionality but no similar functionality in Java side. This issue is about to implement this functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8158) [Java] Getting length of data buffer and base variable width vector
[ https://issues.apache.org/jira/browse/ARROW-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062479#comment-17062479 ] Ji Liu commented on ARROW-8158: --- Hi, I think one could get valid data length by BaseVariableWidthVector#sizeOfValueBuffer. [https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java#L582] > [Java] Getting length of data buffer and base variable width vector > --- > > Key: ARROW-8158 > URL: https://issues.apache.org/jira/browse/ARROW-8158 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Gaurangi Saxena >Priority: Minor > > For string data buffer and base variable width vector can we have a way to > get length of the data? > For instance, in ArrowColumnVector in StringAccessor we use > stringResult.start and stringResult.end, instead we would like to get length > of the data through an exposed function. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8158) [Java] Getting length of data buffer and base variable width vector
[ https://issues.apache.org/jira/browse/ARROW-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-8158: - Assignee: Ji Liu > [Java] Getting length of data buffer and base variable width vector > --- > > Key: ARROW-8158 > URL: https://issues.apache.org/jira/browse/ARROW-8158 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Gaurangi Saxena >Assignee: Ji Liu >Priority: Minor > > For string data buffer and base variable width vector can we have a way to > get length of the data? > For instance, in ArrowColumnVector in StringAccessor we use > stringResult.start and stringResult.end, instead we would like to get length > of the data through an exposed function. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8171) Consider pre-allocating memory for fix-width vector in Avro adapter iterator
Ji Liu created ARROW-8171: - Summary: Consider pre-allocating memory for fix-width vector in Avro adapter iterator Key: ARROW-8171 URL: https://issues.apache.org/jira/browse/ARROW-8171 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Ji Liu Assignee: Ji Liu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8158) [Java] Getting length of data buffer and base variable width vector
[ https://issues.apache.org/jira/browse/ARROW-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063309#comment-17063309 ] Ji Liu commented on ARROW-8158: --- [~emkornfield] I see, I could add a method like {{getDataLength(int index)}} for variable width vector. For lists, we already have {{getElementStartIndex/getElementEndIndex}}, is it enough or still need to add a method like getElementLength? > [Java] Getting length of data buffer and base variable width vector > --- > > Key: ARROW-8158 > URL: https://issues.apache.org/jira/browse/ARROW-8158 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Gaurangi Saxena >Assignee: Ji Liu >Priority: Minor > > For string data buffer and base variable width vector can we have a way to > get length of the data? > For instance, in ArrowColumnVector in StringAccessor we use > stringResult.start and stringResult.end, instead we would like to get length > of the data through an exposed function. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-5579: - Assignee: Ji Liu > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5584) Add import for link reference in FieldReader javadoc
Ji Liu created ARROW-5584: - Summary: Add import for link reference in FieldReader javadoc Key: ARROW-5584 URL: https://issues.apache.org/jira/browse/ARROW-5584 Project: Apache Arrow Issue Type: Bug Reporter: Ji Liu Assignee: Ji Liu Link reference(ValueVector) in FieldReader javadoc has no import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5584) Add import for link reference in FieldReader javadoc
[ https://issues.apache.org/jira/browse/ARROW-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-5584: -- Component/s: Java > Add import for link reference in FieldReader javadoc > > > Key: ARROW-5584 > URL: https://issues.apache.org/jira/browse/ARROW-5584 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Trivial > > Link reference(ValueVector) in FieldReader javadoc has no import. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5587) Add more maven style check for Java code
Ji Liu created ARROW-5587: - Summary: Add more maven style check for Java code Key: ARROW-5587 URL: https://issues.apache.org/jira/browse/ARROW-5587 Project: Apache Arrow Issue Type: Improvement Reporter: Ji Liu Assignee: Ji Liu Add more maven style check for java code, such as unused imports, redundant modifier, etc. In this way, the quality of code will be improved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5587) [Java] Add more maven style check for Java code
[ https://issues.apache.org/jira/browse/ARROW-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-5587: -- Component/s: Java > [Java] Add more maven style check for Java code > --- > > Key: ARROW-5587 > URL: https://issues.apache.org/jira/browse/ARROW-5587 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > > Add more maven style check for java code, such as unused imports, redundant > modifier, etc. In this way, the quality of code will be improved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5587) [Java] Add more maven style check for Java code
[ https://issues.apache.org/jira/browse/ARROW-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-5587: -- Summary: [Java] Add more maven style check for Java code (was: Add more maven style check for Java code) > [Java] Add more maven style check for Java code > --- > > Key: ARROW-5587 > URL: https://issues.apache.org/jira/browse/ARROW-5587 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > > Add more maven style check for java code, such as unused imports, redundant > modifier, etc. In this way, the quality of code will be improved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866277#comment-16866277 ] Ji Liu commented on ARROW-5579: --- [~emkornfi...@gmail.com] I am afraid the above PR did not really shade the flatbuffer dependency, I am not quite familiar with maven-shade-plugin([https://maven.apache.org/plugins/maven-shade-plugin/]), correct me if I am wrong: If we want to shade a dependency, we should follow the next steps: # User maven-shade-plugin and add include tags for flatbuffer # Add relocations to rename the package. In the above PR, we use tag and seems this plugin will not process this dependency? If we use relocations to rename packages it will cause new problems and I don't know how to solve: _/org/apache/arrow/vector/types/pojo/ArrowType.java:[239,46] 错误: 不兼容的类型: com.google.flatbuffers.FlatBufferBuilder无法转换为 arrow.format.com.google.flatbuffers.FlatBufferBuilder_ Seems the direct flatbuffer dependency in arrow-vector is not compatible with renamed dependency in arrow-format. What do you think? > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866301#comment-16866301 ] Ji Liu commented on ARROW-5579: --- [https://github.com/tianchen92/arrow/commits/ARROW-5579-new] [~emkornfi...@gmail.com] Here is my test branch, many thanks! > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868186#comment-16868186 ] Ji Liu edited comment on ARROW-5579 at 6/20/19 2:25 AM: [~emkornfi...@gmail.com] Any update for this? I do have some new thoughts, I tried two different ways: 1、add arrow-shaded module and relocate package, then add this dependency in arrow-format instead of flatbuffers, but in flatbuffer-generated code the import is still origin package path and cause error. 2、Another approach is relocate package only in arrow-format since other modules depends on this and replace "com.google.flatbuffers.*" such imports with new relocated path such as "arrow.format.com.google.flatbuffers.*", in this way all arrow-related modules use the same flatbuffers and will not conflict with user applications. In this way, maven build fine but seems Intellij not support resolve shaded dependencies(see [https://youtrack.jetbrains.com/issue/IDEA-93855]), so manually run tests still cause errors which is still a big problem.:( New test code can be seen in my branch. Thanks was (Author: tianchen92): [~emkornfi...@gmail.com] Any update for this? I do have some new thoughts, I tried two different ways: 1、add arrow-shaded module and relocate package, then add this dependency in arrow-format instead of flatbuffers, but in flatbuffer-generated code the import is still origin package path and cause error. 2、Another approach is relocate package only in arrow-format since other modules depends on this and replace "com.google.flatbuffers.*" such imports with new relocated path such as "arrow.format.com.google.flatbuffers.*", in this way all arrow-related modules use the same flatbuffers and will not conflict with user applications. In this way, maven build fine but seems Intellij not support resolve shaded dependencies(see [https://youtrack.jetbrains.com/issue/IDEA-93855]), so manually run tests still cause errors which is still a big problem.:( > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Micah Kornfield >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868186#comment-16868186 ] Ji Liu commented on ARROW-5579: --- [~emkornfi...@gmail.com] Any update for this? I do have some new thoughts, I tried two different ways: 1、add arrow-shaded module and relocate package, then add this dependency in arrow-format instead of flatbuffers, but in flatbuffer-generated code the import is still origin package path and cause error. 2、Another approach is relocate package only in arrow-format since other modules depends on this and replace "com.google.flatbuffers.*" such imports with new relocated path such as "arrow.format.com.google.flatbuffers.*", in this way all arrow-related modules use the same flatbuffers and will not conflict with user applications. In this way, maven build fine but seems Intellij not support resolve shaded dependencies(see [https://youtrack.jetbrains.com/issue/IDEA-93855]), so manually run tests still cause errors which is still a big problem.:( > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Micah Kornfield >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868324#comment-16868324 ] Ji Liu commented on ARROW-5579: --- [~emkornfi...@gmail.com] Sure, seems your solution similar with the second approach. I have submitted a PR for reverting. Maven build works fine with my branch ([https://github.com/tianchen92/arrow/commits/ARROW-5579-new]). But how to solve the problem that Intellij not supporting shaded dependency? This is a break change since developers cannot run tests locally anymore:( > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868353#comment-16868353 ] Ji Liu commented on ARROW-5579: --- Thanks for your reminder [~emkornfi...@gmail.com], I have just tried this and it works fine. Meanwhile I found a new work-around has similar effect, if we just remove arrow-format from parent pom just like adapter and gandiva, and it works fine. Do you think this is this is reasonable solution since developers won't do anything? > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-5579: -- Comment: was deleted (was: Thanks for your reminder [~emkornfi...@gmail.com], I have just tried this and it works fine. Meanwhile I found a new work-around has similar effect, if we just remove arrow-format from parent pom just like adapter and gandiva, and it works fine. Do you think this is this is reasonable solution since developers won't do anything?) > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868324#comment-16868324 ] Ji Liu edited comment on ARROW-5579 at 6/21/19 3:05 AM: [~emkornfi...@gmail.com] Sure, seems your solution similar with the second approach. I have submitted a PR for reverting. Maven build works fine with my branch ([https://github.com/tianchen92/arrow/commits/ARROW-5579-new2]). But how to solve the problem that Intellij not supporting shaded dependency? This is a break change since developers cannot run tests locally anymore:( was (Author: tianchen92): [~emkornfi...@gmail.com] Sure, seems your solution similar with the second approach. I have submitted a PR for reverting. Maven build works fine with my branch ([https://github.com/tianchen92/arrow/commits/ARROW-5579-new]). But how to solve the problem that Intellij not supporting shaded dependency? This is a break change since developers cannot run tests locally anymore:( > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866301#comment-16866301 ] Ji Liu edited comment on ARROW-5579 at 6/21/19 3:05 AM: [https://github.com/tianchen92/arrow/commits/ARROW-5579-new2] [~emkornfi...@gmail.com] Here is my test branch, many thanks! was (Author: tianchen92): [https://github.com/tianchen92/arrow/commits/ARROW-5579-new] [~emkornfi...@gmail.com] Here is my test branch, many thanks! > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869189#comment-16869189 ] Ji Liu commented on ARROW-5579: --- [~emkornfi...@gmail.com] The revert PR is merged, I opened a new PR([https://github.com/apache/arrow/pull/4629]) and the travis has pass. However, IED issue seems not always work with work-around. I think we still need do some work before it can be merged. If there is no reasonable solution, then we should discuss in mailing list to see if anyone has some different thoughts. Thanks! > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869199#comment-16869199 ] Ji Liu commented on ARROW-5579: --- yes, as we discussed before. > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5579) [Java] shade flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869199#comment-16869199 ] Ji Liu edited comment on ARROW-5579 at 6/21/19 5:39 AM: yes, as we discussed before, sorry it should be 'IDE':) was (Author: tianchen92): yes, as we discussed before. > [Java] shade flatbuffer dependency > -- > > Key: ARROW-5579 > URL: https://issues.apache.org/jira/browse/ARROW-5579 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Pindikura Ravindra >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Reported in a [github issue|[https://github.com/apache/arrow/issues/4489]] > > After some [discussion|https://github.com/google/flatbuffers/issues/5368] > with the Flatbuffers maintainer, it appears that FB generated code is not > guaranteed to be compatible with _any other_ version of the runtime library > other than the exact same version of the flatc used to compile it. > This makes depending on flatbuffers in a library (like arrow) quite risky, as > if an app depends on any other version of FB, either directly or > transitively, it's likely the versions will clash at some point and you'll > see undefined behaviour at runtime. > Shading the dependency looks to me the best way to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5672) [Java] Refactor redundant method modifier
Ji Liu created ARROW-5672: - Summary: [Java] Refactor redundant method modifier Key: ARROW-5672 URL: https://issues.apache.org/jira/browse/ARROW-5672 Project: Apache Arrow Issue Type: Sub-task Reporter: Ji Liu Assignee: Ji Liu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5705) [Java] Optimize BaseValueVector#computeCombinedBufferSize logic
Ji Liu created ARROW-5705: - Summary: [Java] Optimize BaseValueVector#computeCombinedBufferSize logic Key: ARROW-5705 URL: https://issues.apache.org/jira/browse/ARROW-5705 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Ji Liu Assignee: Ji Liu Now in BaseValueVector#computeCombinedBufferSize, it computes validity buffer size as follow: _roundUp8(getValidityBufferSizeFromCount(valueCount))_ which can be be expanded to _(((valueCount + 7) >> 3 + 7) / 8) * 8_ Seems there's no need to compute bufferSize first and expression above could be replaced with: _(valueCount + 63) / 64 * 8_ In this way, performance of _computeCombinedBufferSize_ would be improved. Performance test: Before: BaseValueVectorBenchmarks.testC_omputeCombinedBufferSize_ avgt 5 4083.180 ± 180.363 ns/op After: BaseValueVectorBenchmarks.testC_omputeCombinedBufferSize_ avgt 5 3808.635 ± 162.347 ns/op -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5706) [Java] Remove type conversion in getValidityBufferValueCapacity
Ji Liu created ARROW-5706: - Summary: [Java] Remove type conversion in getValidityBufferValueCapacity Key: ARROW-5706 URL: https://issues.apache.org/jira/browse/ARROW-5706 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Ji Liu Assignee: Ji Liu Now implementation of getValidityBufferValueCapacity is: (int) (validityBuffer.capacity() * 8L) Seems no need to convert it to Long then convert it back to Int, just replace with: validityBuffer.capacity() * 8 VariableWidthVectorBenchmarks#getValueCapacity shows the performance: Before: avgt 5 5.731 ± 0.160 ns/op After: avgt 5 5.124 ± 0.125 ns/op -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5726) [Java] Implement a common interface for int vectors
Ji Liu created ARROW-5726: - Summary: [Java] Implement a common interface for int vectors Key: ARROW-5726 URL: https://issues.apache.org/jira/browse/ARROW-5726 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Ji Liu Assignee: Ji Liu Now in _DictionaryEncoder#encode_ it use reflection to pull out the set method and then set values. Set values by reflection is not efficient and code structure is not elegant such as _Method setter = null;_ _for (Class c : Arrays.asList(int.class, long.class)) {_ _try {_ _setter = indices.getClass().getMethod("setSafe", int.class, c);_ _break;_ _} catch (NoSuchMethodException e) {_ _// ignore_ _}_ _}_ Implement a common interface for int vectors to directly get set method and set values seems a good choice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-5435) [Java] IntervalYearVector#getObject should return Period with both year and month
[ https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu closed ARROW-5435. - Resolution: Invalid > [Java] IntervalYearVector#getObject should return Period with both year and > month > - > > Key: ARROW-5435 > URL: https://issues.apache.org/jira/browse/ARROW-5435 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > IntervalYearVector#getObject today return Period with specific month. > However, this vector stores interval (years and months, e.g. 2 years and 3 > months is stored as 27(total months)), it should return Period with both > years and months(now only months is assigned). > As shown in the example above, now it return Period(27 months), I think it > should return Period(2 years, 3 months). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5435) [Java] add test for IntervalYearVector#getAsStringBuilder
[ https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-5435: -- Summary: [Java] add test for IntervalYearVector#getAsStringBuilder (was: [Java] IntervalYearVector#getObject should return Period with both year and month) > [Java] add test for IntervalYearVector#getAsStringBuilder > - > > Key: ARROW-5435 > URL: https://issues.apache.org/jira/browse/ARROW-5435 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > IntervalYearVector#getObject today return Period with specific month. > However, this vector stores interval (years and months, e.g. 2 years and 3 > months is stored as 27(total months)), it should return Period with both > years and months(now only months is assigned). > As shown in the example above, now it return Period(27 months), I think it > should return Period(2 years, 3 months). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (ARROW-5435) [Java] IntervalYearVector#getObject should return Period with both year and month
[ https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reopened ARROW-5435: --- > [Java] IntervalYearVector#getObject should return Period with both year and > month > - > > Key: ARROW-5435 > URL: https://issues.apache.org/jira/browse/ARROW-5435 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > IntervalYearVector#getObject today return Period with specific month. > However, this vector stores interval (years and months, e.g. 2 years and 3 > months is stored as 27(total months)), it should return Period with both > years and months(now only months is assigned). > As shown in the example above, now it return Period(27 months), I think it > should return Period(2 years, 3 months). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5435) [Java] add test for IntervalYearVector#getAsStringBuilder
[ https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu updated ARROW-5435: -- Description: (was: IntervalYearVector#getObject today return Period with specific month. However, this vector stores interval (years and months, e.g. 2 years and 3 months is stored as 27(total months)), it should return Period with both years and months(now only months is assigned). As shown in the example above, now it return Period(27 months), I think it should return Period(2 years, 3 months).) > [Java] add test for IntervalYearVector#getAsStringBuilder > - > > Key: ARROW-5435 > URL: https://issues.apache.org/jira/browse/ARROW-5435 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5812) [Java] Refactor method name and param type in BaseIntVector
Ji Liu created ARROW-5812: - Summary: [Java] Refactor method name and param type in BaseIntVector Key: ARROW-5812 URL: https://issues.apache.org/jira/browse/ARROW-5812 Project: Apache Arrow Issue Type: Improvement Reporter: Ji Liu Assignee: Ji Liu Change to void _setWithPossibleTruncate(int index, long value);_ for better generality. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5814) [Java] Implement a HashMap for DictionaryEncoder
Ji Liu created ARROW-5814: - Summary: [Java] Implement a HashMap for DictionaryEncoder Key: ARROW-5814 URL: https://issues.apache.org/jira/browse/ARROW-5814 Project: Apache Arrow Issue Type: Improvement Reporter: Ji Liu Assignee: Ji Liu As a follow-up of [ARROW-5726|https://issues.apache.org/jira/browse/ARROW-5726]. Implement a Map for DictionaryEncoder to reduce boxing/unboxing operations. Benchmark: DictionaryEncodeHashMapBenchmarks.testHashMap: avgt 5 31151.345 ± 1661.878 ns/op DictionaryEncodeHashMapBenchmarks.testDictionaryEncodeHashMap: avgt 5 15549.902 ± 771.647 ns/op -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5821) [Java] Support compact fixed-width vectors
Ji Liu created ARROW-5821: - Summary: [Java] Support compact fixed-width vectors Key: ARROW-5821 URL: https://issues.apache.org/jira/browse/ARROW-5821 Project: Apache Arrow Issue Type: New Feature Reporter: Ji Liu Assignee: Ji Liu In shuffle stage of some applications, FixedWitdhVectors may have very little non-null data. In this case, directly serialize vectors is not a good choice, generally we can compact the vector make it only holding non-null value and create a BitVector to trace the indices for non-null values so that it could be deserialized properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)