[jira] [Resolved] (ARROW-9016) [Java] Remove direct references to Netty/Unsafe Allocators

2020-06-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved ARROW-9016.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7329
[https://github.com/apache/arrow/pull/7329]

> [Java] Remove direct references to Netty/Unsafe Allocators
> --
>
> Key: ARROW-9016
> URL: https://issues.apache.org/jira/browse/ARROW-9016
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ryan Murray
>Assignee: Ryan Murray
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As part of ARROW-8230 this removes direct references to Netty and Unsafe 
> Allocation managers in the `DefaultAllocationManagerOption`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8695) [Java] remove references to PlatformDependent in memory module

2020-06-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved ARROW-8695.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7101
[https://github.com/apache/arrow/pull/7101]

> [Java] remove references to PlatformDependent in memory module
> --
>
> Key: ARROW-8695
> URL: https://issues.apache.org/jira/browse/ARROW-8695
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ryan Murray
>Assignee: Ryan Murray
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Part of breaking ARROW-8230 into smaller chucks. First remove NettyUtils 
> references from 'pure' arrow-memory classes before breaking netty classes out



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-9015) [Java] Make BaseAllocator package private

2020-06-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved ARROW-9015.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7328
[https://github.com/apache/arrow/pull/7328]

> [Java] Make BaseAllocator package private
> -
>
> Key: ARROW-9015
> URL: https://issues.apache.org/jira/browse/ARROW-9015
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ryan Murray
>Assignee: Ryan Murray
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As part of the netty work in ARROW-8230 it became clear that BaseAllocator 
> should be package private



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7779) [Format] Enable integration tests for dictionaries-within-dictionaries

2020-04-17 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086034#comment-17086034
 ] 

Jacques Nadeau commented on ARROW-7779:
---

I don't like it would require in terms of complexity on the processing side.

> [Format] Enable integration tests for dictionaries-within-dictionaries
> --
>
> Key: ARROW-7779
> URL: https://issues.apache.org/jira/browse/ARROW-7779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format, Integration
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> The integration test is implemented but currently disabled for all 
> implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7779) [Format] Enable integration tests for dictionaries-within-dictionaries

2020-04-17 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085972#comment-17085972
 ] 

Jacques Nadeau commented on ARROW-7779:
---

It feels like a mistake to think of dictionary-encoding as a type. I don't 
think the example you gave was in the spirit of what we intended when we 
defined dictionary encoding. Why not just say it isn't supported? 

In this example we'd say: if you want to dictionary encode the lists, you can't 
dictionary encode the values in the lists as well. At least for now. 

> [Format] Enable integration tests for dictionaries-within-dictionaries
> --
>
> Key: ARROW-7779
> URL: https://issues.apache.org/jira/browse/ARROW-7779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format, Integration
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> The integration test is implemented but currently disabled for all 
> implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7779) [Format] Enable integration tests for dictionaries-within-dictionaries

2020-04-17 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085959#comment-17085959
 ] 

Jacques Nadeau commented on ARROW-7779:
---

Hey [~wesm] can you expound on what you mean by dictionaries within 
dictionaries? Are you talking about dictionary encoding or something else? Are 
you talking about a map where the key or value is a map?

> [Format] Enable integration tests for dictionaries-within-dictionaries
> --
>
> Key: ARROW-7779
> URL: https://issues.apache.org/jira/browse/ARROW-7779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format, Integration
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> The integration test is implemented but currently disabled for all 
> implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7840) [Java] [Integration] Java executables fail

2020-02-12 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035488#comment-17035488
 ] 

Jacques Nadeau commented on ARROW-7840:
---

[~ravindra], can you have Prudhvi take a look at this?

> [Java] [Integration] Java executables fail
> --
>
> Key: ARROW-7840
> URL: https://issues.apache.org/jira/browse/ARROW-7840
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Integration, Java
>Reporter: Antoine Pitrou
>Priority: Critical
> Fix For: 1.0.0
>
>
> When trying to run integration tests using {{docker-compose run 
> conda-integration}}, I always get failures during the Java tests:
> {code}
> RuntimeError: Command failed: ['java', 
> '-Dio.netty.tryReflectionSetAccessible=true', '-cp', 
> '/arrow/java/tools/target/arrow-tools-1.0.0-SNAPSHOT-jar-with-dependencies.jar',
>  'org.apache.arrow.tools.StreamToFile', 
> '/tmp/tmpqbkrmpo1/e75ed336_simple.producer_file_as_stream', 
> '/tmp/tmpqbkrmpo1/e75ed336_simple.consumer_stream_as_file']
> With output:
> --
> 15:57:01.194 [main] DEBUG 
> io.netty.util.internal.logging.InternalLoggerFactory - Using SLF4J as the 
> default logging framework
> 15:57:01.196 [main] DEBUG io.netty.util.ResourceLeakDetector - 
> -Dio.netty.leakDetection.level: simple
> 15:57:01.196 [main] DEBUG io.netty.util.ResourceLeakDetector - 
> -Dio.netty.leakDetection.targetRecords: 4
> 15:57:01.208 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
> -Dio.netty.noUnsafe: false
> 15:57:01.209 [main] DEBUG io.netty.util.internal.PlatformDependent0 - Java 
> version: 8
> 15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
> sun.misc.Unsafe.theUnsafe: available
> 15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
> sun.misc.Unsafe.copyMemory: available
> 15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
> java.nio.Buffer.address: available
> 15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - direct 
> buffer constructor: available
> 15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
> java.nio.Bits.unaligned: available, true
> 15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
> jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable prior 
> to Java9
> 15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
> java.nio.DirectByteBuffer.(long, int): available
> 15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
> sun.misc.Unsafe: available
> 15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
> -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
> 15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
> -Dio.netty.bitMode: 64 (sun.arch.data.model)
> 15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
> -Dio.netty.noPreferDirect: false
> 15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
> -Dio.netty.maxDirectMemory: 11252269056 bytes
> 15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
> -Dio.netty.uninitializedArrayAllocationThreshold: -1
> 15:57:01.213 [main] DEBUG io.netty.util.internal.CleanerJava6 - 
> java.nio.ByteBuffer.cleaner(): available
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.numHeapArenas: 48
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.numDirectArenas: 48
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.pageSize: 8192
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.maxOrder: 11
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.chunkSize: 16777216
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.tinyCacheSize: 512
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.smallCacheSize: 256
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.normalCacheSize: 64
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.maxCachedBufferCapacity: 32768
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.cacheTrimInterval: 8192
> 15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
> -Dio.netty.allocator.useCacheForAllThreads: true
> 15:57:01.216 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - 
> -Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
> 15:57:01.216 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - 
> -Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
> 15:57:01.228 [main] DEBUG io.netty.buffe

[jira] [Comment Edited] (ARROW-7744) [Java] Implement Flight JDBC Driver

2020-02-03 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029327#comment-17029327
 ] 

Jacques Nadeau edited comment on ARROW-7744 at 2/3/20 9:59 PM:
---

Given my previous experience with these APIs, I suggest you use Avatica as the 
basis for this rather than implementing by hand. I noticed you haven't done 
that in your WIP. Was it something you considered?


was (Author: jnadeau):
Given my previous experience with these APIs, I suggest you use Avatica as the 
basis for this rather than implementing by hand. I noticed you have done that 
yet. Was it something you considered?

> [Java] Implement Flight JDBC Driver
> ---
>
> Key: ARROW-7744
> URL: https://issues.apache.org/jira/browse/ARROW-7744
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As a Java developer, I would like the ability to use JDBC to interact with 
> Flight servers. For example, there is now an example in the Arrow repo to run 
> a Flight server wrapping DataFusion and it supports executing SQL against CSV 
> and Parquet files. I would like to be able to call this from Java.
> A flight Arrow JDBC driver would also then simplify developing integrations 
> with other Apache projects, such as building a Spark V2 Data Source or a 
> Drill storage plugin. It would also be directly usable from many BI tools.
> I propose that the class name of the driver should be 
> "org.apache.arrow.jdbc.Driver" and the connection string should be 
> "jdbc:arrow://host:port?[properties]". I'm purposely leaving "flight" out of 
> these because I don't think it makes sense to support multiple protocols now 
> that we have flight and it is easier for users to remember "arrow" rather 
> than needing to know about the protocol. This is easy to change if there are 
> objections.
> JDBC is designed around sending queries as strings and then receiving 
> results. These strings could be SQL queries, JSON-encoded query plans, or 
> something else. The JDBC driver will not make any assumptions about the 
> format or dialect of these strings. Queries would be executed using the 
> "DoGet" method.
> The JDBC metadata functionality for reading schema information could possibly 
> use ListFlights but I haven't looked into this part yet.
> I do expect that this JDBC driver will serve as a base that could be extended 
> to add specific functionality for different Flight servers rather than 
> attempt to support them all.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7744) [Java] Implement Flight JDBC Driver

2020-02-03 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029327#comment-17029327
 ] 

Jacques Nadeau commented on ARROW-7744:
---

Given my previous experience with these APIs, I suggest you use Avatica as the 
basis for this rather than implementing by hand. I noticed you have done that 
yet. Was it something you considered?

> [Java] Implement Flight JDBC Driver
> ---
>
> Key: ARROW-7744
> URL: https://issues.apache.org/jira/browse/ARROW-7744
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As a Java developer, I would like the ability to use JDBC to interact with 
> Flight servers. For example, there is now an example in the Arrow repo to run 
> a Flight server wrapping DataFusion and it supports executing SQL against CSV 
> and Parquet files. I would like to be able to call this from Java.
> A flight Arrow JDBC driver would also then simplify developing integrations 
> with other Apache projects, such as building a Spark V2 Data Source or a 
> Drill storage plugin. It would also be directly usable from many BI tools.
> I propose that the class name of the driver should be 
> "org.apache.arrow.jdbc.Driver" and the connection string should be 
> "jdbc:arrow://host:port?[properties]". I'm purposely leaving "flight" out of 
> these because I don't think it makes sense to support multiple protocols now 
> that we have flight and it is easier for users to remember "arrow" rather 
> than needing to know about the protocol. This is easy to change if there are 
> objections.
> JDBC is designed around sending queries as strings and then receiving 
> results. These strings could be SQL queries, JSON-encoded query plans, or 
> something else. The JDBC driver will not make any assumptions about the 
> format or dialect of these strings. Queries would be executed using the 
> "DoGet" method.
> The JDBC metadata functionality for reading schema information could possibly 
> use ListFlights but I haven't looked into this part yet.
> I do expect that this JDBC driver will serve as a base that could be extended 
> to add specific functionality for different Flight servers rather than 
> attempt to support them all.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7622) [Format] Mark Tensor and SparseTensor fields required

2020-01-20 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019750#comment-17019750
 ] 

Jacques Nadeau commented on ARROW-7622:
---

+.5 from me. I agree with the thinking but don't actually use this. Otherwise 
would be +1

> [Format] Mark Tensor and SparseTensor fields required
> -
>
> Key: ARROW-7622
> URL: https://issues.apache.org/jira/browse/ARROW-7622
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Format
>Affects Versions: 0.15.1
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Tensor and SparseTensor parts of the format are currently marked 
> experimental. This presumably means that they are still allowed to change 
> (and indeed they did change one month ago, in ARROW-4225). 
> I suggest we take the opportunity to mark some fields required in 
> {{Tensor.fbs}} and {{SparseTensor.fbs}}, to make input validation more robust.
> cc [~mrkn], [~jacques]  and [~wesm] for opinions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7610) [Java] Finish support for 64 bit int allocations

2020-01-19 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019056#comment-17019056
 ] 

Jacques Nadeau commented on ARROW-7610:
---

Per comments on the mailing list, 

I think we should use the existing netty facades for this. 

[PlatformDependent.allocateMemory(long)|https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L383]
[PlatformDependent.freeMemory(long)|https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L387]

These delegate to unsafe which also ensures that -XX:MaxDirectMemorySize is 
respected.

I don't think we need a new allocator. We can simply check the size and use 
this path if beyond an int in size within the existing allocator.

> [Java] Finish support for 64 bit int allocations 
> -
>
> Key: ARROW-7610
> URL: https://issues.apache.org/jira/browse/ARROW-7610
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Liya Fan
>Priority: Major
>
> 1.  Add an allocator capable of allocating larger then 2GB of data.
> 2.  Do end-to-end round trip trip on a larger vector/record batch size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized

2020-01-10 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7549:
-

 Summary: [Java] Reorganize Flight modules to keep top level 
clean/organized
 Key: ARROW-7549
 URL: https://issues.apache.org/jira/browse/ARROW-7549
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Jacques Nadeau


Lets create a flight parent module and then create the following below:

flight-core (existing flight module)
flight-grpc (existing flight-grpc module)




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6799) [C++] Plasma JNI component links to flatbuffers::flatbuffers (unnecessarily?)

2020-01-09 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012146#comment-17012146
 ] 

Jacques Nadeau commented on ARROW-6799:
---

I just opened ARROW-7534 to create a contrib module and demote these things to 
their. I think that if we don't have development or user engagement on a 
contrib module for two quarters (or something like that), it should probably be 
thought of as "no longer used/maintained".

> [C++] Plasma JNI component links to flatbuffers::flatbuffers (unnecessarily?)
> -
>
> Key: ARROW-6799
> URL: https://issues.apache.org/jira/browse/ARROW-6799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Does not appear to be tested in CI. Originally reported at 
> https://github.com/apache/arrow/issues/5575



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6103) [Java] Do we really want to use the maven release plugin?

2020-01-09 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012144#comment-17012144
 ] 

Jacques Nadeau commented on ARROW-6103:
---

I believe this is based on recommendations from ASF. Someone would need to 
decompose what it does and replicate that to remove it..

> [Java] Do we really want to use the maven release plugin?
> -
>
> Key: ARROW-6103
> URL: https://issues.apache.org/jira/browse/ARROW-6103
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Java
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> For reference .. I'm filing this issue to track investigation work around 
> this ..
> {code:java}
> The biggest problem for the Git commit is our Java package
> requires "apache-arrow-${VERSION}" tag on
> https://github.com/apache/arrow . (Right?)
> I think that "mvm release:perform" in
> dev/release/01-perform.sh does so but I don't know the
> details of "mvm release:perform"...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-09 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-4526:
--
Fix Version/s: 1.0.0

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Liya Fan
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7534) Create a new java/contrib module

2020-01-09 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-7534:
--
Component/s: Java

> Create a new java/contrib module
> 
>
> Key: ARROW-7534
> URL: https://issues.apache.org/jira/browse/ARROW-7534
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Liya Fan
>Priority: Major
>
> To better clarify the status of java sub-modules, create a contrib module and 
> move the following modules underneath it.
> * algorithm
> * adapter
> * plasma



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7534) Create a new java/contrib module

2020-01-09 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7534:
-

 Summary: Create a new java/contrib module
 Key: ARROW-7534
 URL: https://issues.apache.org/jira/browse/ARROW-7534
 Project: Apache Arrow
  Issue Type: Task
Reporter: Jacques Nadeau
Assignee: Liya Fan


To better clarify the status of java sub-modules, create a contrib module and 
move the following modules underneath it.

* algorithm
* adapter
* plasma



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7533) [Java] Move ArrowBufPointer out of the java the memory package

2020-01-09 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7533:
-

 Summary: [Java] Move ArrowBufPointer out of the java the memory 
package
 Key: ARROW-7533
 URL: https://issues.apache.org/jira/browse/ARROW-7533
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Jacques Nadeau
Assignee: Liya Fan


The memory package is focused on memory access and management. ArrowBufPointer 
should be moved to algorithm package as it isn't core to the Arrow memory 
management primitives. I would further suggest that is an anti-pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-06 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009168#comment-17009168
 ] 

Jacques Nadeau commented on ARROW-7494:
---

What I'm saying: 

Reader and writer index are typically used to decide what portion of data needs 
to be written to a socket. For example, if you have a buffer that is 10 bytes 
but you only want to write bytes 2..4 to the buffer, you'll need to set the 
reader index to two and the writer index to 4. By so doing, you can then hand 
that buffer to someone and they will only write the two selected bytes.

In the case of ArrowBuf, we have the concept of slices. As such, in cases where 
we're currently relying on reader/writer index to write the correct bytes to an 
io location, we should change the code to get the appropriate slice of the 
underlying buffer instead and simply write the whole thing. For example, I 
believe the getBuffers() has the behavior of changing reader/writer indexes to 
solve this. For example, if you do getBuffers() on a valuecount of 1 for an int 
vector, I believe we set the writer index to 4 bytes but return the whole large 
buffer. Instead, here we want to do the slice of bytes 0..4.


> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Ji Liu
>Priority: Critical
> Fix For: 1.0.0
>
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-05 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008400#comment-17008400
 ] 

Jacques Nadeau commented on ARROW-4526:
---

[~fan_li_ya], can you do this as a series of separate PRs so it is more 
manageable to review?

Here is a suggested set of PRs:

# Remove all references to Netty from ArrowBuf, BufferAllocator, 
ReferenceManager.
# Move Netty memory manager into a separate module such that the basic 
allocator does not depend on it.
# Move ArrowBuf into the Arrow package.

Some other smaller steps may also make sense as you start working on it.

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Liya Fan
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6896) [Java] Vector schema root should not share vectors

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau closed ARROW-6896.
-
Resolution: Invalid

Per discussion here, this issue doesn't seem like it is a problem.

> [Java] Vector schema root should not share vectors
> --
>
> Key: ARROW-6896
> URL: https://issues.apache.org/jira/browse/ARROW-6896
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-7494:
--
Component/s: Java

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 1.0.0
>
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-7494:
--
Priority: Critical  (was: Major)

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Critical
> Fix For: 1.0.0
>
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-7495:
--
Component/s: Java

> [Java] Remove "empty" concept from ArrowBuf, replace with custom 
> referencemanager
> -
>
> Key: ARROW-7495
> URL: https://issues.apache.org/jira/browse/ARROW-7495
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 1.0.0
>
>
> With the introduction of ReferenceManager in the codebase, the need for a 
> separate ArrowBuf is no longer necessary. Instead, once can create a new 
> reference manager that is used for the empty ArrowBuf. For reminder/review, 
> empty arrowbufs have a special behavior in that they don't actually have any 
> reference counting semantics and always stay at one. This allow us to better 
> troubleshoot unallocated memory than what would otherwise be an NPE after 
> calling ValueVector.clear()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-4526:
--
Priority: Critical  (was: Major)

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6799) [C++] Plasma JNI component links to flatbuffers::flatbuffers (unnecessarily?)

2020-01-03 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007834#comment-17007834
 ] 

Jacques Nadeau commented on ARROW-6799:
---

Should we just delete the plasma jni implementation? It seems like it was never 
finished and no one is maintaining it?

> [C++] Plasma JNI component links to flatbuffers::flatbuffers (unnecessarily?)
> -
>
> Key: ARROW-6799
> URL: https://issues.apache.org/jira/browse/ARROW-6799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Does not appear to be tested in CI. Originally reported at 
> https://github.com/apache/arrow/issues/5575



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-7155) [Java][CI] add maven wrapper to make setup process simple

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau closed ARROW-7155.
-
Resolution: Not A Problem

> [Java][CI] add maven wrapper to make setup process simple
> -
>
> Key: ARROW-7155
> URL: https://issues.apache.org/jira/browse/ARROW-7155
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 1.0.0
> Environment: Linux/Windows/Mac
>Reporter: Wang GaoXiang
>Priority: Trivial
>  Labels: beginner, newbie, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> [https://github.com/takari/maven-wrapper] is a good tool that helps making 
> the setup process simpler for newcomers. 
> What others say about maven wrapper?
> [https://stackoverflow.com/questions/38723833/what-is-the-purpose-of-mvnw-and-mvnw-cmd-files]
>  
> Some good examples using maven wrapper:
> [https://github.com/spring-projects/spring-boot]
> [https://github.com/apache/camel]
> [https://github.com/wgx731/dr-spring]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-7494:
--
Labels:   (was: pre-1.0)

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 1.0.0
>
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-4526:
--
Fix Version/s: 1.0.0

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
>  Labels: pre-1.0
> Fix For: 1.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-4526:
--
Labels:   (was: pre-1.0)

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 1.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-7495:
--
Labels:   (was: pre-1.0)

> [Java] Remove "empty" concept from ArrowBuf, replace with custom 
> referencemanager
> -
>
> Key: ARROW-7495
> URL: https://issues.apache.org/jira/browse/ARROW-7495
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 1.0.0
>
>
> With the introduction of ReferenceManager in the codebase, the need for a 
> separate ArrowBuf is no longer necessary. Instead, once can create a new 
> reference manager that is used for the empty ArrowBuf. For reminder/review, 
> empty arrowbufs have a special behavior in that they don't actually have any 
> reference counting semantics and always stay at one. This allow us to better 
> troubleshoot unallocated memory than what would otherwise be an NPE after 
> calling ValueVector.clear()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-7495:
--
Fix Version/s: 1.0.0

> [Java] Remove "empty" concept from ArrowBuf, replace with custom 
> referencemanager
> -
>
> Key: ARROW-7495
> URL: https://issues.apache.org/jira/browse/ARROW-7495
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Jacques Nadeau
>Priority: Major
>  Labels: pre-1.0
> Fix For: 1.0.0
>
>
> With the introduction of ReferenceManager in the codebase, the need for a 
> separate ArrowBuf is no longer necessary. Instead, once can create a new 
> reference manager that is used for the empty ArrowBuf. For reminder/review, 
> empty arrowbufs have a special behavior in that they don't actually have any 
> reference counting semantics and always stay at one. This allow us to better 
> troubleshoot unallocated memory than what would otherwise be an NPE after 
> calling ValueVector.clear()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-7494:
--
Fix Version/s: 1.0.0

> [Java] Remove reader index and writer index from ArrowBuf
> -
>
> Key: ARROW-7494
> URL: https://issues.apache.org/jira/browse/ARROW-7494
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Jacques Nadeau
>Priority: Major
>  Labels: pre-1.0
> Fix For: 1.0.0
>
>
> Reader and writer index and functionality doesn't belong on a chunk of memory 
> and is due to inheritance from ByteBuf. As part of removing ByteBuf 
> inheritance, we should also remove reader and writer indexes from ArrowBuf 
> functionality. It wastes heap memory for rare utility. In general, a slice 
> can be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-01-03 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7495:
-

 Summary: [Java] Remove "empty" concept from ArrowBuf, replace with 
custom referencemanager
 Key: ARROW-7495
 URL: https://issues.apache.org/jira/browse/ARROW-7495
 Project: Apache Arrow
  Issue Type: Task
Reporter: Jacques Nadeau


With the introduction of ReferenceManager in the codebase, the need for a 
separate ArrowBuf is no longer necessary. Instead, once can create a new 
reference manager that is used for the empty ArrowBuf. For reminder/review, 
empty arrowbufs have a special behavior in that they don't actually have any 
reference counting semantics and always stay at one. This allow us to better 
troubleshoot unallocated memory than what would otherwise be an NPE after 
calling ValueVector.clear()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7494:
-

 Summary: [Java] Remove reader index and writer index from ArrowBuf
 Key: ARROW-7494
 URL: https://issues.apache.org/jira/browse/ARROW-7494
 Project: Apache Arrow
  Issue Type: Task
Reporter: Jacques Nadeau


Reader and writer index and functionality doesn't belong on a chunk of memory 
and is due to inheritance from ByteBuf. As part of removing ByteBuf 
inheritance, we should also remove reader and writer indexes from ArrowBuf 
functionality. It wastes heap memory for rare utility. In general, a slice can 
be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-03 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007823#comment-17007823
 ] 

Jacques Nadeau commented on ARROW-4526:
---

As part of this, ArrowBuf should move into the Arrow packages from the Netty 
packages

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
>  Labels: pre-1.0
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-4526:
--
Labels: pre-1.0  (was: 1.)

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
>  Labels: pre-1.0
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-4526:
--
Labels: 1.  (was: )

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
>  Labels: 1.
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2020-01-03 Thread Jacques Nadeau (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-4526:
--
Issue Type: Improvement  (was: New Feature)

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7342) [Java] offset buffer for vector of variable-width type with zero value count is empty

2019-12-06 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990318#comment-16990318
 ] 

Jacques Nadeau commented on ARROW-7342:
---

I'm not sure if there is a "right" (or at least well-specified) way. The Java 
perspective is empty vectors shouldn't have any data. There is no point in only 
having one offset since it doesn't mean anything. This also means communicating 
an empty vector on the wire is zero data as opposed to having to communicate 4 
bytes of useless data.

> [Java] offset buffer for vector of variable-width type with zero value count 
> is empty
> -
>
> Key: ARROW-7342
> URL: https://issues.apache.org/jira/browse/ARROW-7342
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Steve M. Kim
>Priority: Major
>
> I am reporting what I think might be two related bugs in 
> {{org.apache.arrow.vector.BaseVariableWidthVector}}
>  # The offset buffer is initialized as empty. I expect that it to have 4 
> bytes that represent the integer zero.
>  # The {{getBufferSize}} method returns 0 when value count is zero, instead 
> of 4.
> Compare to the pyarrow implementation, which I believe correctly populates 
> the offset buffer:
> {code:java}
> >>> import pyarrow as pa
> >>> array = pa.array([], type=pa.binary())
> >>> array 
> []
> >>> print([b.hex().decode() for b in array.buffers()])
> ['', '', '']
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7272) [C++][Java] JNI bridge between RecordBatch and VectorSchemaRoot

2019-11-27 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983847#comment-16983847
 ] 

Jacques Nadeau commented on ARROW-7272:
---

bq. The C data interface should be the preferred way to achieve this. It 
requires implementing on the Java side, though.

Should it be? Why not use flatbuffer since that is already supported on both 
sides of the boundary?

> [C++][Java] JNI bridge between RecordBatch and VectorSchemaRoot
> ---
>
> Key: ARROW-7272
> URL: https://issues.apache.org/jira/browse/ARROW-7272
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java
>Reporter: Francois Saint-Jacques
>Priority: Major
>
> Given a C++ std::shared_ptr, retrieve it in java as a 
> VectorSchemaRoot class. Gandiva already offer a similar facility but with raw 
> buffers. It would be convenient if users could call C++ that yields 
> RecordBatch and retrieve it in a seamless fashion.
> This would remove one roadblock of using C++ dataset facility in Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7198) [Java] Allow a user to provide an alternative "chunk" allocator

2019-11-17 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7198:
-

 Summary: [Java] Allow a user to provide an alternative "chunk" 
allocator
 Key: ARROW-7198
 URL: https://issues.apache.org/jira/browse/ARROW-7198
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau


Right now, the Arrow Java libraries have two options:
- Have accounted memory using the Netty allocator.
- Have unaccounted memory using your own allocator.

I'd like to add a third option where you can use the existing accounting but 
decide where the chunks of memory come from.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7017) [C++] Refactor AddKernel to support other operations and types

2019-10-30 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963271#comment-16963271
 ] 

Jacques Nadeau commented on ARROW-7017:
---

Isn't it possible to use the LLVM paths to generate objects at compile time for 
people who don't want runtime compilation (and want to avoid the LLVM 
dependency)? My thought is we have people working on a bunch of expressions in 
Gandiva already so why not make that useful elsewhere rather than having two 
implementations of things like add, cast, etc.

In other words, can we have a single development push even though there are two 
targets (compile time and runtime compilation)?

> [C++] Refactor AddKernel to support other operations and types
> --
>
> Key: ARROW-7017
> URL: https://issues.apache.org/jira/browse/ARROW-7017
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Compute
>Reporter: Francois Saint-Jacques
>Priority: Major
>  Labels: analytics
>
> * Should avoid using builders (and/or NULLs) since the output shape is known 
> a compute time.
>  * Should be refatored to support other operations, e.g. Substraction, 
> Multiplication.
>  * Should have a overflow, underflow detection mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7017) [C++] Refactor AddKernel to support other operations and types

2019-10-28 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961565#comment-16961565
 ] 

Jacques Nadeau commented on ARROW-7017:
---

What's the thinking of building these a second time here as opposed to just 
adding utility methods over Gandiva for specific patterns? My experience is 
that it is very rare that people only need to do a single expression.

> [C++] Refactor AddKernel to support other operations and types
> --
>
> Key: ARROW-7017
> URL: https://issues.apache.org/jira/browse/ARROW-7017
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Compute
>Reporter: Francois Saint-Jacques
>Priority: Major
>  Labels: analytics
>
> * Should avoid using builders (and/or NULLs) since the output shape is known 
> a compute time.
>  * Should be refatored to support other operations, e.g. Substraction, 
> Multiplication.
>  * Should have a overflow, underflow detection mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6896) [Java] Vector schema root should not share vectors

2019-10-22 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957219#comment-16957219
 ] 

Jacques Nadeau commented on ARROW-6896:
---

{quote}Just wonder why the implementation is different in Java?
{quote}
That is an interesting question. The Java code is focused on multitenant 
pipelines. A core tenant of the java library was no batches were allowed to be 
more than 2^16 records. I actually wish that I had held a stronger line on this 
in the original format specification (...spilt milk...). The java library 
doesn't enforce this but we actually do in Dremio. It is done for several 
reasons: keeping the pipeline moving, minimizing issues of memory 
fragmentation, allows keeping a full batch within cache, etc. One of the key 
needs was also to support a pipelined operation using runtime code generation. 
Having a set of references that you can keep grabbing addresses from as the 
pipeline progresses is what a vector container's purpose is in the Java 
library. VectorSchemaRoot is really a poor man's version of what we call 
[VectorContainer|[https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/record/VectorContainer.java]]
 in the Dremio codebase. One other design consideration: the Java library is 
focused on minimizing gc churn and small heap sizes so you can get high 
performance out of a JVM without spending large amounts of time dealing with gc 
(cpu time or human tuning time).

I think the other libraries were not as focused on pipelining and multitenancy 
when initially constructed, thus the differences in implementation. In many 
cases, I think the initial uses in other languages were frequently single 
monolithic operations. In those cases, it might make sense to create a single 
vector to data type rather than a bunch of smaller batches. In that scenario, 
you don't have to worry about fragmentation and once all the data is loaded, 
you can do many operations against the same data. It's a very different pattern 
from a system running many independent operations concurrently.

 

> [Java] Vector schema root should not share vectors
> --
>
> Key: ARROW-6896
> URL: https://issues.apache.org/jira/browse/ARROW-6896
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6896) [Java] Vector schema root should not share vectors

2019-10-20 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955722#comment-16955722
 ] 

Jacques Nadeau commented on ARROW-6896:
---

I think you're mixing up the design of the library. VectorSchemaRoot != a 
record batch.

VectorSchemaRoot is a container that can hold batches, not a batch in itself. 
Think of it more like a pipe than the water in the pipe. Batches flow through 
vector schema root as part of a pipeline. See how it is used in the flight 
tests to better understand. A single VectorSchemaRoot is created based on a 
known schema and then data is populated over and over into the same schema root 
in a stream of batches. At any one point a VectorSchemaRoot may have data or 
may have no data (say it was transferred downstream or not yet populated).

It's easy to share vectors in Arrow and make mistakes. That's the nature of a 
system programming that does manual reference counting. I'm exactly 
understanding how your change improves those dangers in any meaningful way.

 

> [Java] Vector schema root should not share vectors
> --
>
> Key: ARROW-6896
> URL: https://issues.apache.org/jira/browse/ARROW-6896
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6896) [Java] Vector schema root should not share vectors

2019-10-16 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952941#comment-16952941
 ] 

Jacques Nadeau commented on ARROW-6896:
---

I don't understand your comments. Since sharing vectors between 
VectorSchemaRoot is an anti-pattern, why would we try to change code for it?

> [Java] Vector schema root should not share vectors
> --
>
> Key: ARROW-6896
> URL: https://issues.apache.org/jira/browse/ARROW-6896
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6896) [Java] Vector schema root should not share vectors

2019-10-15 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952513#comment-16952513
 ] 

Jacques Nadeau commented on ARROW-6896:
---

You shouldn't be sharing a vector between two batches. You should have separate 
vectors in each batch that were created as a transfer pair and then transfer 
the data from batch to the other when you want to move it. 

> [Java] Vector schema root should not share vectors
> --
>
> Key: ARROW-6896
> URL: https://issues.apache.org/jira/browse/ARROW-6896
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6896) [Java] Vector schema root should not share vectors

2019-10-15 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952475#comment-16952475
 ] 

Jacques Nadeau commented on ARROW-6896:
---

I disagree with the issue here. We should probably add a better description of 
reference count semantics but having the container close it's children makes 
sense. We depend on this functionality quite a bit.

Generally speaking, Vectors are things that shouldn't be handed around, they 
should be transferred, which has clear reference management semantics. The 
design was based on the [AttributeSource design 
pattern|[https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/util/AttributeSource.html]]
 in Lucene where you create an object once and then pass many separate pieces 
of data through it to minimize heap churn and pointer/reference management. I 
think if you're hitting the problem you describe, you're misunderstanding the 
goals of the codebase.

> [Java] Vector schema root should not share vectors
> --
>
> Key: ARROW-6896
> URL: https://issues.apache.org/jira/browse/ARROW-6896
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Vector schema root should not share vectors. Otherwise, unexpectd behavior 
> would happen. 
> Please note that VectorSchemaRoot is not just a container for vectors, it is 
> also a resource (it implements the AutoClosable interface), and it manages 
> the life cycle of its inner vectors.
> When two VectorSchemaRoots share vectors, something unexpected may happen. 
> Consider the following scenario, which is frequently encountered in a SQL 
> engine.
> 1. We create a batch:
> VectorSchemaRoot oldBatch = ...
> 2. We add a vector to it, which results in a new batch
> VectorSchemaRoot newBatch = oldBatch.addVector(vector);
> 3. We are done with the old batch, and release the resource
> oldBatch.close();
> 4. We continue to use the new batch, but gets an exception, because some 
> inner vectors have been released by the old batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6509) [C++][Gandiva] Re-enable Gandiva JNI tests and fix Travis CI failure

2019-09-15 Thread Jacques Nadeau (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930176#comment-16930176
 ] 

Jacques Nadeau commented on ARROW-6509:
---

[~pprudhvi], per discussion offline, can you look to solve this? 

> [C++][Gandiva] Re-enable Gandiva JNI tests and fix Travis CI failure
> 
>
> Key: ARROW-6509
> URL: https://issues.apache.org/jira/browse/ARROW-6509
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Reporter: Antoine Pitrou
>Assignee: Prudhvi Porandla
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This seems to happen more or less frequently on the Python - Java build (with 
> jpype enabled).
> See warnings and errors starting from 
> https://travis-ci.org/apache/arrow/jobs/583069089#L6662



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-5957) [C++][Gandiva] Implement div function in Gandiva

2019-07-16 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886423#comment-16886423
 ] 

Jacques Nadeau commented on ARROW-5957:
---

Can you make sure to specify the planned function signature and impl for each 
function in the Jira description?

> [C++][Gandiva] Implement div function in Gandiva
> 
>
> Key: ARROW-5957
> URL: https://issues.apache.org/jira/browse/ARROW-5957
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5842) [Java] Revise the semantic of lastSet in ListVector

2019-07-04 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878626#comment-16878626
 ] 

Jacques Nadeau commented on ARROW-5842:
---

If you want to propose changing the semantics of anything like this, please 
discuss on list first. The current code is working well on 10s of thousands of 
systems. 

> [Java] Revise the semantic of lastSet in ListVector
> ---
>
> Key: ARROW-5842
> URL: https://issues.apache.org/jira/browse/ARROW-5842
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The {{lastSet}} member in ListVector seems misleading. According to the name, 
> it should refers to the last index that is actually set. However, from the 
> context of the code, it actually means the next index that will be set.
> We fix this problem, and make it consistent with the {{lastSet}} in 
> {{BaseVariableWidthVector}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5821) [Java] Support compact fixed-width vectors

2019-07-02 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877350#comment-16877350
 ] 

Jacques Nadeau commented on ARROW-5821:
---

If I understand the ask correctly, it seems like you are proposing to change 
the format to have a new integer representation that doesn't support constant 
lookup. I'm not sure now is the right time to do something like this.

> [Java] Support compact fixed-width vectors
> --
>
> Key: ARROW-5821
> URL: https://issues.apache.org/jira/browse/ARROW-5821
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>
> In shuffle stage of some applications, FixedWitdhVectors may have very little 
> non-null data.
> In this case, directly serialize vectors is not a good choice, generally we 
> can compact the vector make it only holding non-null value and create a 
> BitVector to trace the indices for non-null values so that it could be 
> deserialized properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5815) [Java] Support swap functionality for fixed-width vectors

2019-07-02 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876959#comment-16876959
 ] 

Jacques Nadeau commented on ARROW-5815:
---

I'm generally opposed to mutating vectors once they've been written at the api 
level so I'm not sure it is a good idea to add this method. It also would be 
weird since it is asym in that other vector types may not be able to support 
the behavior. 

> [Java] Support swap functionality for fixed-width vectors
> -
>
> Key: ARROW-5815
> URL: https://issues.apache.org/jira/browse/ARROW-5815
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Support swapping data elements for fixed-width vectors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5063) [Java] FlightClient should not create a child allocator

2019-06-21 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869732#comment-16869732
 ] 

Jacques Nadeau commented on ARROW-5063:
---

I think we're trying to solve this wrong way. The idea that a client has its 
own allocator that is closed as it closes out to make sure it doesn't leak any 
memory is a good thing.

> [Java] FlightClient should not create a child allocator
> ---
>
> Key: ARROW-5063
> URL: https://issues.apache.org/jira/browse/ARROW-5063
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Java
>Reporter: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I ran into a problem when testing out Flight using the ExampleFlightServer 
> with InMemoryStore producer. 
> A client will iterate over endpoints and locations to get the streams, and 
> the example creates a new client for each location. The only way to close the 
> allocator in the FlightClient is to close the FlightClient, which also closes 
> the read channel.  If the location is the same for each FlightStream (as is 
> the case for the InMemoryStore), then it seems like grpc will reuse the 
> channel, so closing one read client will shutdown the channel and the 
> remaining FlightStreams cannot be read.
> If an allocator was created by the owner of the FlightClient, then the client 
> would not need to close it and this problem would be avoided. I believe other 
> Flight classes do not create child allocators either, so this change would be 
> consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5224) [Java] Add APIs for supporting directly serialize/deserialize ValueVector

2019-05-12 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838260#comment-16838260
 ] 

Jacques Nadeau edited comment on ARROW-5224 at 5/13/19 5:16 AM:


What is the major downside of wrapping in a batch? It seems like we should 
probably just do that and not introduce new APIs & protocols.


was (Author: jnadeau):
What is the major downside of wrapping in a batch? It seems like we should 
probably just do that and not introduce new APIs.

> [Java] Add APIs for supporting directly serialize/deserialize ValueVector
> -
>
> Key: ARROW-5224
> URL: https://issues.apache.org/jira/browse/ARROW-5224
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There is no API to directly serialize/deserialize ValueVector. The only way 
> to implement this is to put a single FieldVector in VectorSchemaRoot and 
> convert it to ArrowRecordBatch, and the deserialize process is as well. 
> Provide a utility class to implement this may be better, I know all 
> serializations should follow IPC format so that data can be shared between 
> different Arrow implementations. But for users who only use Java API and want 
> to do some further optimization, this seem to be no problem and we could 
> provide them a more option.
> This may take some benefits for Java user who only use ValueVector rather 
> than IPC series classes such as ArrowReordBatch:
>  * We could do some shuffle optimization such as compression and some 
> encoding algorithm for numerical type which could greatly improve performance.
>  * Do serialize/deserialize with the actual buffer size within vector since 
> the buffer size is power of 2 which is actually bigger than it really need.
>  * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it 
> user-friendly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5305) [Java] Refactor null slot verification onto a single method in the parent class

2019-05-12 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838261#comment-16838261
 ] 

Jacques Nadeau commented on ARROW-5305:
---

Let's verify that this doesn't cause a performance regression via assembly 
inspection.

> [Java] Refactor null slot verification onto a single method in the parent 
> class 
> 
>
> Key: ARROW-5305
> URL: https://issues.apache.org/jira/browse/ARROW-5305
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>
> After https://github.com/apache/arrow/pull/4288 is checked in there is an 
> opportunity to refactor the code to one place instead of having the same 
> logic across all vector classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5224) [Java] Add APIs for supporting directly serialize/deserialize ValueVector

2019-05-12 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838260#comment-16838260
 ] 

Jacques Nadeau commented on ARROW-5224:
---

What is the major downside of wrapping in a batch? It seems like we should 
probably just do that and not introduce new APIs.

> [Java] Add APIs for supporting directly serialize/deserialize ValueVector
> -
>
> Key: ARROW-5224
> URL: https://issues.apache.org/jira/browse/ARROW-5224
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There is no API to directly serialize/deserialize ValueVector. The only way 
> to implement this is to put a single FieldVector in VectorSchemaRoot and 
> convert it to ArrowRecordBatch, and the deserialize process is as well. 
> Provide a utility class to implement this may be better, I know all 
> serializations should follow IPC format so that data can be shared between 
> different Arrow implementations. But for users who only use Java API and want 
> to do some further optimization, this seem to be no problem and we could 
> provide them a more option.
> This may take some benefits for Java user who only use ValueVector rather 
> than IPC series classes such as ArrowReordBatch:
>  * We could do some shuffle optimization such as compression and some 
> encoding algorithm for numerical type which could greatly improve performance.
>  * Do serialize/deserialize with the actual buffer size within vector since 
> the buffer size is power of 2 which is actually bigger than it really need.
>  * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it 
> user-friendly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5207) [Java] add APIs to support vector reuse

2019-05-07 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835296#comment-16835296
 ] 

Jacques Nadeau commented on ARROW-5207:
---

Is this an important optimization? Given that we already pool the memory itself 
at the lower layers it seems like the reuse here wouldn't be an impactful 
optimization.

> [Java] add APIs to support vector reuse
> ---
>
> Key: ARROW-5207
> URL: https://issues.apache.org/jira/browse/ARROW-5207
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In some scenarios we hope that ValueVector could be reused to reduce creation 
> overhead. This is very common in shuffle stage, it's no need to create 
> ValueVector or realloc buffers every time, suppose that the recordCount of 
> ValueVector and capacity of its buffers is written in stream, when we 
> deserialize it, we can simply judge whether realloc is needed through 
> dataLength.
> My proposal is that add APIs in ValueVector to process this logic, otherwise 
> users have to implement by themselves if they want to reuse which is not 
> user-friendly. 
> If you agree with this, I would like to take this ticket. Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5264) [Java] Allow enabling/disabling boundary checking dynamically in the code

2019-05-07 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834699#comment-16834699
 ] 

Jacques Nadeau commented on ARROW-5264:
---

I'll don't fully understand the problem. If flink is loading the libraries and 
is leaving these methods on, you probably should avoid turning them off bounds 
checking across the board since Flink may want to depend on them. In this case 
you should probably be loading inside a classloader with your own copy of the 
classes and in that situation you can set the config you wish before loading. 
If Flink isn't using the libraries and you're initializing them, I don't 
understand why you don't have control over initialization order and can set 
that first.

Anyway, I'm fine with adding an environment variable option to set the static 
final in addition to a system property. I don't understand what situations you 
have access to one or not the other but I'm okay with it even then.

> [Java] Allow enabling/disabling boundary checking dynamically in the code
> -
>
> Key: ARROW-5264
> URL: https://issues.apache.org/jira/browse/ARROW-5264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The flag BoundsChecking#BOUNDS_CHECKING_ENABLED determines if boundary 
> checking is enabled/disabled in vector/arrow buffer APIs. 
> It has significant performance implications, since boundary checking is a 
> frequent operation.
> This issue address 2 problems with the flag for boundary checking in Java API:
> 1. This flag is final and initialized in a static block. That means, the only 
> reliable way to override it is in the JVM command line, by providing some 
> system properties. However, for some scenarios, it is difficult or even 
> impossible to get access to the JVM command line. Therefore, it is desirable 
> to provide a way to override it dynamically in the program code. 
> 2. There is an old and a new system property for this flag. To disable 
> boundary checking, both the old and new properties must be set to true, which 
> is undesirable:
>  !screenshot-1.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5207) [Java] add APIs to support vector reuse

2019-05-07 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834451#comment-16834451
 ] 

Jacques Nadeau commented on ARROW-5207:
---

To clarify, are you wanting to reuse the same ArrowBuf within a Vector? Vectors 
are already reusable (with different underlying memory) If so, maybe we should 
introduce this in a different way where it is purpose built. E.g. reuseMemory. 
For example, during reuse, there is some types of memory you need to clear. 
Also, I don't understand why you have setting valuecount and datalength. If 
you're reusing, you need to fit within whatever you have and then use 
setValueCount at the end. The reason we should make this more formal is because 
Vectors are mostly thought about as created, sealed, read. This isn't formally 
enforced by the API but I think we should try to keep the spirit of this in new 
APIs we introduce. If your pattern is different, we should formallize it 
specifically as reuse rather than something more low-level as your current PR 
proposes.

> [Java] add APIs to support vector reuse
> ---
>
> Key: ARROW-5207
> URL: https://issues.apache.org/jira/browse/ARROW-5207
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In some scenarios we hope that ValueVector could be reused to reduce creation 
> overhead. This is very common in shuffle stage, it's no need to create 
> ValueVector or realloc buffers every time, suppose that the recordCount of 
> ValueVector and capacity of its buffers is written in stream, when we 
> deserialize it, we can simply judge whether realloc is needed through 
> dataLength.
> My proposal is that add APIs in ValueVector to process this logic, otherwise 
> users have to implement by themselves if they want to reuse which is not 
> user-friendly. 
> If you agree with this, I would like to take this ticket. Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5264) [Java] Allow enabling/disabling boundary checking dynamically in the code

2019-05-06 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833818#comment-16833818
 ] 

Jacques Nadeau commented on ARROW-5264:
---

I believe #1 and #2 are entirely reliable and cover most cases. #1 solves for 
most systems. #2 solves for where you're a plugin in another system since as a 
plugin writer, you can control which operations are run and thus which classes 
get loaded when.

> [Java] Allow enabling/disabling boundary checking dynamically in the code
> -
>
> Key: ARROW-5264
> URL: https://issues.apache.org/jira/browse/ARROW-5264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The flag BoundsChecking#BOUNDS_CHECKING_ENABLED determines if boundary 
> checking is enabled/disabled in vector/arrow buffer APIs. 
> It has significant performance implications, since boundary checking is a 
> frequent operation.
> This issue address 2 problems with the flag for boundary checking in Java API:
> 1. This flag is final and initialized in a static block. That means, the only 
> reliable way to override it is in the JVM command line, by providing some 
> system properties. However, for some scenarios, it is difficult or even 
> impossible to get access to the JVM command line. Therefore, it is desirable 
> to provide a way to override it dynamically in the program code. 
> 2. There is an old and a new system property for this flag. To disable 
> boundary checking, both the old and new properties must be set to true, which 
> is undesirable:
>  !screenshot-1.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5264) [Java] Allow enabling/disabling boundary checking dynamically in the code

2019-05-06 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833663#comment-16833663
 ] 

Jacques Nadeau commented on ARROW-5264:
---

This was done on purpose to ensure the JVM only has to optimize one version of 
the code, thus increasing the likelihood of JIT optimizations. I don't see a 
good reason to change this without proof of a similar optimization behavior 
with an alternative approach.

> [Java] Allow enabling/disabling boundary checking dynamically in the code
> -
>
> Key: ARROW-5264
> URL: https://issues.apache.org/jira/browse/ARROW-5264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The flag BoundsChecking#BOUNDS_CHECKING_ENABLED determines if boundary 
> checking is enabled/disabled in vector/arrow buffer APIs. 
> It has significant performance implications, since boundary checking is a 
> frequent operation.
> This issue address 2 problems with the flag for boundary checking in Java API:
> 1. This flag is final and initialized in a static block. That means, the only 
> reliable way to override it is in the JVM command line, by providing some 
> system properties. However, for some scenarios, it is difficult or even 
> impossible to get access to the JVM command line. Therefore, it is desirable 
> to provide a way to override it dynamically in the program code. 
> 2. There is an old and a new system property for this flag. To disable 
> boundary checking, both the old and new properties must be set to true, which 
> is undesirable:
>  !screenshot-1.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3978) [C++] Implement hashing, dictionary-encoding for StructArray

2019-04-24 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825451#comment-16825451
 ] 

Jacques Nadeau commented on ARROW-3978:
---

Here is some info about what we found worked well. Note that it doesn't go into 
a lot of detail about the pivot algorithm beyond the basic concepts of fixed 
and variable vectors.

[https://docs.google.com/document/d/1Yk6IvDL28IzEjqcqSkFdevRyMrC8_kwzEatHvcOnawM/edit]

 

Main idea around pivot: 
 * separate fixed and variable and have each continguous
 * coalesce bits for nullability and values together at the start of the data 
structure (save space, increase likelihood of mismatch early)
 * include length of variable in fixed container to reduce likelihood of 
jumping to variable container.
 * Have specialized cases that look at actual existence of nulls for each word 
and fork behavior based on that to improve performance of common case where 
things are mostly null or not null.

The latest code for the Arrow pivot algorithms specifically that we use can be 
found here:

Pivots: 
[https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Pivots.java]

Unpivots: 
[https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Unpivots.java]

Hash Table: 
[https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/LBlockHashTable.java]

We'd be happy to donate this code/algo to the community as it would probably 
serve as a good foundation.

 

Note the doc is probably somewhat out of date with the actual implementation as 
it was written early on in development.

 

> [C++] Implement hashing, dictionary-encoding for StructArray
> 
>
> Key: ARROW-3978
> URL: https://issues.apache.org/jira/browse/ARROW-3978
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This is a central requirement for hash-aggregations such as
> {code}
> SELECT AGG_FUNCTION(expr)
> FROM table
> GROUP BY expr1, expr2, ...
> {code}
> The materialized keys in the GROUP BY section form a struct, which can be 
> incrementally hashed to produce dictionary codes suitable for computing 
> aggregates or any other purpose. 
> There are a few subtasks related to this, such as efficiently constructing a 
> record (that can be hashed quickly) to identify each "row" in the struct. 
> Maybe we should start with that first



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1005) [JAVA] NullableDecimalVector.setSafe(int, byte[]...) throws UnsupportedOperationException

2019-04-17 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved ARROW-1005.
---
Resolution: Invalid

I think this is so old and the code has gone through so much iterations that 
whatever it is, let's not worry about it.

> [JAVA] NullableDecimalVector.setSafe(int, byte[]...) throws 
> UnsupportedOperationException
> -
>
> Key: ARROW-1005
> URL: https://issues.apache.org/jira/browse/ARROW-1005
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5062) [Java] Shade Java Guava dependency for Flight

2019-04-03 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809310#comment-16809310
 ] 

Jacques Nadeau commented on ARROW-5062:
---

Hey [~bryanc], we should have as second shaded version as opposed to adding 
guava to existing one. 

> [Java] Shade Java Guava dependency for Flight
> -
>
> Key: ARROW-5062
> URL: https://issues.apache.org/jira/browse/ARROW-5062
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Java
>Reporter: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Guava dependency in the Java Flight module can interfere if using Flight 
> in an application that relies on an older version of Guava.  We can shade the 
> usage in Flight to prevent this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5062) [Java] Shade Java Guava dependency for Flight

2019-03-28 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804397#comment-16804397
 ] 

Jacques Nadeau commented on ARROW-5062:
---

Let's just add a second artifact with guava shaded as well (rather than replace 
the existing one). Would prefer to avoid doubling loading/storing where we 
don't need to with a large artifact.

> [Java] Shade Java Guava dependency for Flight
> -
>
> Key: ARROW-5062
> URL: https://issues.apache.org/jira/browse/ARROW-5062
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Major
>
> The Guava dependency in the Java Flight module can interfere if using Flight 
> in an application that relies on an older version of Guava.  We can shade the 
> usage in Flight to prevent this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2501) [Java] Remove Jackson from compile-time dependencies for arrow-vector

2019-03-12 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790993#comment-16790993
 ] 

Jacques Nadeau commented on ARROW-2501:
---

I think the use of Jackson is limited to a very small piece of non-core 
functionality. We should move that functionality out of the vector module to a 
place where people can use it only if they need to (if we even need that 
functionality). Several core classes are labeled with Jackson but they 
shouldn't need to be (Field, Schema, DictionaryEncoding). JsonFileReader and 
Writer need this stuff but that should really be separate from vector since 
that isn't focused on a real use case from my perspective (or maybe even just 
move to tests). 

> [Java] Remove Jackson from compile-time dependencies for arrow-vector
> -
>
> Key: ARROW-2501
> URL: https://issues.apache.org/jira/browse/ARROW-2501
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.9.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I would like to upgrade Jackson to the latest version (2.9.5). If there are 
> no objections I will create a PR (it is literally just changing the version 
> number in the pom - no code changes required).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4810) [Format][C++] Add "LargeList" type with 64-bit offsets

2019-03-10 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788994#comment-16788994
 ] 

Jacques Nadeau commented on ARROW-4810:
---

I'm -1 on this (and any format change) unless there are first class 
implementations in both Java and C++. I think it is important to avoid having 
features that are only in one our reference implementations.

> [Format][C++] Add "LargeList" type with 64-bit offsets
> --
>
> Key: ARROW-4810
> URL: https://issues.apache.org/jira/browse/ARROW-4810
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Mentioned in https://github.com/apache/arrow/issues/3845



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4651) [Format] Flight Location should be more flexible than a (host, port) pair

2019-02-26 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778191#comment-16778191
 ] 

Jacques Nadeau commented on ARROW-4651:
---

Yes, there are many libraries for parsing a URI. I think we should start by 
supporting protocols that we have reference implementations for.

> [Format] Flight Location should be more flexible than a (host, port) pair
> -
>
> Key: ARROW-4651
> URL: https://issues.apache.org/jira/browse/ARROW-4651
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Format
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> The more future-proof solution is probably to define a URI format. gRPC 
> already has something like that, though we might want to define our own 
> format:
> https://grpc.io/grpc/cpp/md_doc_naming.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4669) [Java] No Bounds checking on ArrowBuf.slice

2019-02-23 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-4669:
-

 Summary: [Java] No Bounds checking on ArrowBuf.slice
 Key: ARROW-4669
 URL: https://issues.apache.org/jira/browse/ARROW-4669
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Jacques Nadeau


While reviewing some code I realized that there is no bounds checking on 
ArrowBuf slicing. Example negative test case that should pass but is currently 
failing can be found here: 

[https://gist.github.com/jacques-n/737c26b7016ed29dc710d4aba617340e]

It may be that this doesn't cause more problems because the index checks do 
exist on memory access but fixing this would make it much easier to understand 
where a code mistake was made.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4651) [Format] Flight Location should be more flexible than a (host, port) pair

2019-02-21 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774323#comment-16774323
 ] 

Jacques Nadeau commented on ARROW-4651:
---

Inflexible and opinionated can be good when defining a format. Flexibility 
means that implementations don't work with each other. (There are several 
places where we already have that problem across our bindings :(.)

I'm all for adding flexibility for real things we want to support assuming as 
part of that we're including support for those items in at least the C++ and 
Java libraries.
 * If you're arguing to change the protocol to a string field and define a 
formal URI scheme that only supports host + port right now, I'd be in support 
of that.
 * If you want to extend that to add support for unix domain sockets and the 
supporting impls, that sounds good as well.

> [Format] Flight Location should be more flexible than a (host, port) pair
> -
>
> Key: ARROW-4651
> URL: https://issues.apache.org/jira/browse/ARROW-4651
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Format
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> The more future-proof solution is probably to define a URI format. gRPC 
> already has something like that, though we might want to define our own 
> format:
> https://grpc.io/grpc/cpp/md_doc_naming.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4651) [Format] Flight Location should be more flexible than a (host, port) pair

2019-02-21 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774301#comment-16774301
 ] 

Jacques Nadeau commented on ARROW-4651:
---

Generally, I think it is best to avoid adding stuff that we "might" use. I'd 
also like us to introduce features with reference implementations. Are there 
specific features someone is building that are blocked by this?

> [Format] Flight Location should be more flexible than a (host, port) pair
> -
>
> Key: ARROW-4651
> URL: https://issues.apache.org/jira/browse/ARROW-4651
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Format
>Affects Versions: 0.12.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> The more future-proof solution is probably to define a URI format. gRPC 
> already has something like that, though we might want to define our own 
> format:
> https://grpc.io/grpc/cpp/md_doc_naming.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2019-02-18 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771221#comment-16771221
 ] 

Jacques Nadeau commented on ARROW-3191:
---

Hey [~siddteotia], any progress on this? It would be great to be able to start 
using the Memory abstraction for several things. For example, being able to put 
an ArrowBuf in an ArrowBuf while overloading the release() semantics becomes 
possible.

> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>Priority: Major
> Fix For: 0.13.0
>
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {code}
> public abstract class Memory  {
>   protected final int length;
>   protected final long address;
>   protected abstract void release();
> }
> {code}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2019-02-10 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-4526:
-

 Summary: [Java] Remove Netty references from ArrowBuf and move 
Allocator out of vector package
 Key: ARROW-4526
 URL: https://issues.apache.org/jira/browse/ARROW-4526
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jacques Nadeau


Arrow currently has a hard dependency on Netty and exposes this in public APIs. 
This shouldn't be the case. There could be many allocator implementations with 
Netty as one possible option. We should remove hard dependency between 
arrow-vector and Netty, instead creating a trivial allocator. ArrowBuf should 
probably expose an  T unwrap(Class clazz) method instead to allow inner 
providers availability without a hard reference. This should also include 
drastically reducing the number of methods on ArrowBuf as right now it includes 
every method from ByteBuf but many of those are not very useful, appropriate.

This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4298) [Java] Building Flight fails with OpenJDK 11

2019-01-19 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747203#comment-16747203
 ] 

Jacques Nadeau commented on ARROW-4298:
---

I'm actually hopping in a plane momentarily and will be unavailable for the 
next week. [~laurentgo]Goujon can you take a look? 

> [Java] Building Flight fails with OpenJDK 11
> 
>
> Key: ARROW-4298
> URL: https://issues.apache.org/jira/browse/ARROW-4298
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: FlightRPC, Java
>Affects Versions: 0.12.0
>Reporter: Uwe L. Korn
>Priority: Major
>
> Building flight fails with
> {code:java}
> [INFO] --- maven-compiler-plugin:3.6.2:compile (default-compile) @ 
> arrow-flight ---
> [INFO] Compiling 39 source files to 
> /Users/uwe/Development/arrow-repos-1/arrow/java/flight/target/classes
> [INFO] -
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/uwe/Development/arrow-repos-1/arrow/java/flight/target/generated-sources/protobuf/org/apache/arrow/flight/impl/FlightServiceGrpc.java:[26,17]
>  error: cannot find symbol
> symbol: class Generated
> location: package javax.annotation
> [INFO] 1 error{code}
> To fix this, I added the following dependency to {{flight/pom.xml}}:
> {code:java}
>  
>javax.annotation
>javax.annotation-api
>1.3.2
>  {code}
> This then passed the compile step but failed later with:
> {code:java}
> [INFO] --- maven-dependency-plugin:3.0.1:analyze-only (analyze) @ 
> arrow-flight ---
> [WARNING] Unused declared dependencies found:
> [WARNING] javax.annotation:javax.annotation-api:jar:1.3.2:compile
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) 
> on project arrow-flight: Dependency problems found -> [Help 1]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4213) [Flight] C++ and Java implementations are incompatible

2019-01-09 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738694#comment-16738694
 ] 

Jacques Nadeau commented on ARROW-4213:
---

I'm not sure on DoGet. I can imagine situations where it should be unnecessary 
to re-request a ticket but possibly a schema has changed since the ticket was 
generated. I can be convinced of not including but it also seems nice to 
include. In part this is nice because the node that got the flight info message 
may not be the same one that ultimately consumes the ticket. In those cases, 
being able to give someone a ticket and that alone is enough to reconstruct a 
stream of arrow records seems desirable. Basically making a flight stream self 
describing. I think of it as similar to a Parquet or an Avro file where a 
directory may all be the same but it is still nice to have each file describe 
itself.

With regards to the IPC message, I mostly disagree with [~wesmckinn]'s 
perspective. The Flatbuffers definition of schema is a public serialization 
from the format perspective it seems weird to add an arbitrary envelope. Some 
of the codebases may chosen to hide that structure but I think those are code 
construction/style choices as opposed to something that the format 
instructs/defines.

> [Flight] C++ and Java implementations are incompatible
> --
>
> Key: ARROW-4213
> URL: https://issues.apache.org/jira/browse/ARROW-4213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC
>Reporter: David Li
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> A C++ client cannot request streams from a Java service, nor can it decode 
> the schema from GetFlightInfo.
> Schema: in Java, GetFlightInfo encodes the schema directly via flatbuffers. 
> C++ expects it to be encoded as an IPC message. This isn't a problem in Java 
> as a method exists to decode such schemas, but in C++ the API for reading 
> such a schema isn't really exposed. I'm willing to submit a patch for this, 
> but it's not clear to me which scheme is preferred.
> Streams: in Java, DoGet starts with an ArrowMessage containing a schema. C++ 
> does not expect this and segfaults when it tries to decode the message as a 
> record batch. Based on the presentations I've seen, I think C++ is in the 
> wrong here; I have a patch to fix this that I could clean up and submit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-4003) [Gandiva][Java] Safeguard jvm before loading the gandiva library

2018-12-12 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718648#comment-16718648
 ] 

Jacques Nadeau edited comment on ARROW-4003 at 12/12/18 9:06 AM:
-

This seems like overkill. I'm not aware of any other native library packaged 
for Java that does this. Why do you think is necessary?


was (Author: jnadeau):
This seems like overkill. I'm not aware of any other native library packaged 
for Java stuff that does this. Why do you think is necessary?

> [Gandiva][Java] Safeguard jvm before loading the gandiva library
> 
>
> Key: ARROW-4003
> URL: https://issues.apache.org/jira/browse/ARROW-4003
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>
> Today we load the gandiva library always when trying to use the jni bridge, 
> but we have run into issues causing the jvm to crash in untested paths.
> Proposal is to do load the library in a separate process first and if it 
> works only then load in the current process.
> This would be done only once at startup/first load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4003) [Gandiva][Java] Safeguard jvm before loading the gandiva library

2018-12-12 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718648#comment-16718648
 ] 

Jacques Nadeau commented on ARROW-4003:
---

This seems like overkill. I'm not aware of any other native library packaged 
for Java stuff that does this. Why do you think is necessary?

> [Gandiva][Java] Safeguard jvm before loading the gandiva library
> 
>
> Key: ARROW-4003
> URL: https://issues.apache.org/jira/browse/ARROW-4003
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>
> Today we load the gandiva library always when trying to use the jni bridge, 
> but we have run into issues causing the jvm to crash in untested paths.
> Proposal is to do load the library in a separate process first and if it 
> works only then load in the current process.
> This would be done only once at startup/first load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3887) [Java][Gandiva] Expose Dremio build and tests as new optional container/test

2018-11-26 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-3887:
-

 Summary: [Java][Gandiva] Expose Dremio build and tests as new 
optional container/test
 Key: ARROW-3887
 URL: https://issues.apache.org/jira/browse/ARROW-3887
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jacques Nadeau
Assignee: Praveen Kumar Desabandu


Dremio uses Arrow Java and Gandiva extensively and could provide additional 
test coverage for the project. We should find a way to expose the downstream 
build of Dremio as an optional build so major changes can better be evaluated 
against downstream effects.

 

[~praveenbingo], assigning to you for now but let's figure out who at Dremio 
can pick this up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3701) [Gandiva] Add support for decimal operations

2018-11-05 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675804#comment-16675804
 ] 

Jacques Nadeau commented on ARROW-3701:
---

Thanks for the explanations/pointers Antoine.

This suggest to me that we need to look more closely at when we use C++ as a 
basis for IR. In cases like decimal operations it seems like it might be better 
to write operations directly in LLVM ir since it seems like most of the 
operations are primitive instead of using front-end translation. Thoughts?
{quote}only the most common ones (8, 16, 32, 64) will be generally available
{quote}
Is this documented somewhere?

We may have to pick an optimal path for the common linux/modern cpus path and 
then a lowest common denominator fallback :(

 

 

> [Gandiva] Add support for decimal operations
> 
>
> Key: ARROW-3701
> URL: https://issues.apache.org/jira/browse/ARROW-3701
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>
> To begin with, will add support for 128-bit decimals. There are two parts :
>  # llvm_generator needs to understand decimal types (value, precision, scale)
>  # code decimal operations : add/subtract/multiply/divide/mod/..
>  ** This will be c++ code that can be pre-compiled to emit IR code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3701) [Gandiva] Add support for decimal operations

2018-11-05 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675728#comment-16675728
 ] 

Jacques Nadeau commented on ARROW-3701:
---

{quote}However, we'll need to keep a standards-compliant fallback path with two 
int64 for compilers which don't implement a 128-bit integer.
{quote}
I'm confused by this comment. Aren't these functions going to be compiled into 
LLVM IR at build time such that the target platform needs to be supported by 
LLVM (not the compiling platform). In that case, wouldn't we be able to stop 
worrying about compilers on different platforms for these operations? We could 
just generate the IR on one platform, right? (Assuming that we can use pure 
IR--which I believe is the most optimal pattern).

>From an IR perspective, we probably want to map down to LLVM's int128 
>operations since LLVM has that support within its IR and it allows future 
>optimizations to be clean (and LLVM to target/compile as appropriate), right?

It seems like we should have a very clear delineation in Gandiva between code 
that is compiled to IR versus code that is compiled for execution (I think 
Impala uses -ir.cc to identify the former).

> [Gandiva] Add support for decimal operations
> 
>
> Key: ARROW-3701
> URL: https://issues.apache.org/jira/browse/ARROW-3701
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>
> To begin with, will add support for 128-bit decimals. There are two parts :
>  # llvm_generator needs to understand decimal types (value, precision, scale)
>  # code decimal operations : add/subtract/multiply/divide/mod/..
>  ** This will be c++ code that can be pre-compiled to emit IR code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3443) [Java] Flight reports memory leaks in TestBasicOperation

2018-10-04 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638841#comment-16638841
 ] 

Jacques Nadeau commented on ARROW-3443:
---

Sorry I haven't gotten to this yet. We can mark the test Ignore so it stop 
causing noise. We probably need to enhance the test to report more information 
on the leak than it is doing now. Not sure anything else would help debugging.

> [Java] Flight reports memory leaks in TestBasicOperation
> 
>
> Key: ARROW-3443
> URL: https://issues.apache.org/jira/browse/ARROW-3443
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Java
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.11.0
>
>
> While running the release verification scripts on Ubuntu 16.04, I get the 
> following error in one of the flight tests:
> {code}
> [INFO] Running org.apache.arrow.flight.TestBasicOperation
> 63 6F 6F 6C 20 74 68 69 6E 67
> get
> put
> hello
> world
> 63 6F 6F 6C 20 74 68 69 6E 67
> [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.131 
> s - in org.apache.arrow.flight.TestBasicOperation
> [INFO] Running org.apache.arrow.flight.example.TestExampleServer
> Starting server.
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.234 
> s <<< FAILURE! - in org.apache.arrow.flight.example.TestExampleServer
> [ERROR] putStream(org.apache.arrow.flight.example.TestExampleServer)  Time 
> elapsed: 0.222 s  <<< ERROR!
> java.lang.IllegalStateException:
> Memory was leaked by query. Memory leaked: (66)
> Allocator(flight-server) 0/66/134/9223372036854775807 (res/actual/peak/limit)
> at 
> org.apache.arrow.flight.example.TestExampleServer.after(TestExampleServer.java:66)
> [INFO] Running org.apache.arrow.flight.perf.TestPerf
> Transferred 1 records totaling 32 bytes at 87,592919 mb/s. 
> 2870244,784388 record/s. 700,971181 batch/s.
> Transferred 1 records totaling 32 bytes at 121,977665 mb/s. 
> 3996964,136267 record/s. 976,138581 batch/s.
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 59.966 s <<< FAILURE! - in org.apache.arrow.flight.perf.TestPerf
> [ERROR] throughput(org.apache.arrow.flight.perf.TestPerf)  Time elapsed: 
> 59.964 s  <<< ERROR!
> java.lang.IllegalStateException:
> Memory was leaked by query. Memory leaked: (133120)
> Allocator(perf-server) 0/133120/267264/9223372036854775807 
> (res/actual/peak/limit)
> at org.apache.arrow.flight.perf.TestPerf.throughput(TestPerf.java:112)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3328) [Flight] Allow for optional unique flight identifier to be sent with FlightGetInfo

2018-09-25 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628257#comment-16628257
 ] 

Jacques Nadeau commented on ARROW-3328:
---

My inclination is if you want to see specific metrics, you could do that with a 
ticket (which should(could?) be single use).

> [Flight] Allow for optional unique flight identifier to be sent with 
> FlightGetInfo
> --
>
> Key: ARROW-3328
> URL: https://issues.apache.org/jira/browse/ARROW-3328
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> There could either be
> * A global identifier for the entire flight
> * Endpoint-specific identifiers
> A client could use these unique identifier to perform other kinds of actions. 
> An example would be retrieving logs or statistics about a get -- you could 
> see time spent writing the dataset to gRPC or time spent constructing the 
> dataset before handing off to the gRPC write layer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2501) [Java] Upgrade Jackson to 2.9.5

2018-09-15 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616516#comment-16616516
 ] 

Jacques Nadeau commented on ARROW-2501:
---

I think we should change the jira to be removing Jackson from the compile time 
dependencies of arrow vector.

> [Java] Upgrade Jackson to 2.9.5
> ---
>
> Key: ARROW-2501
> URL: https://issues.apache.org/jira/browse/ARROW-2501
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.9.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.11.0
>
>
> I would like to upgrade Jackson to the latest version (2.9.5). If there are 
> no objections I will create a PR (it is literally just changing the version 
> number in the pom - no code changes required).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{code}
public abstract class Memory  {
  protected final int length;
  protected final long address;
  protected abstract void release();
}
{code}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
 {{  protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {code}
> public abstract class Memory  {
>   protected final int length;
>   protected final long address;
>   protected abstract void release();
> }
> {code}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
 {{  protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
{{  protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {{public abstract class Memory  {}}
>  {{  protected final int length;}}
>  {{  protected final long address;}}
>  {{  protected abstract void release();}}
> {{}}}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
{{protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
{{  protected final int length;}}
{{  protected final long address;}}
{{   protected abstract void release(); }}
{{ }}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {{public abstract class Memory  {}}
>  {{  protected final int length;}}
>  {{  protected final long address;}}
> {{protected abstract void release();}}
> {{}}}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
{{  protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
 {{  protected final int length;}}
 {{  protected final long address;}}
{{protected abstract void release();}}
{{}}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {{public abstract class Memory  {}}
>  {{  protected final int length;}}
>  {{  protected final long address;}}
> {{  protected abstract void release();}}
> {{}}}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-3191:
--
Description: 
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

{{public abstract class Memory  {}}
{{  protected final int length;}}
{{  protected final long address;}}
{{   protected abstract void release(); }}
{{ }}}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).

  was:
Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

public abstract class Memory  {
  protected final int length;
  protected final long address;
  protected abstract void release(); 
}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).


> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {{public abstract class Memory  {}}
> {{  protected final int length;}}
> {{  protected final long address;}}
> {{   protected abstract void release(); }}
> {{ }}}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-3191:
-

 Summary: [Java] Add support for ArrowBuf to point to arbitrary 
memory.
 Key: ARROW-3191
 URL: https://issues.apache.org/jira/browse/ARROW-3191
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jacques Nadeau


Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

public abstract class Memory  {
  protected final int length;
  protected final long address;
  protected abstract void release(); 
}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2892) [Plasma] Implement interface to get Java arrow objects from Plasma

2018-08-07 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571825#comment-16571825
 ] 

Jacques Nadeau commented on ARROW-2892:
---

Arrow Java manages all memory using ArrowBuf. This would probably require:
 * Enhancing The BufferAllocator and ArrowBuf to support wrapping an existing 
allocated slice of memory.
 * Creating an interface to expose get/put of ArrowRecordBatch.
 * Some way of viewing stream information, etc.

 

> [Plasma] Implement interface to get Java arrow objects from Plasma
> --
>
> Key: ARROW-2892
> URL: https://issues.apache.org/jira/browse/ARROW-2892
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>
> Currently we have a low level interface to access bytes stored in plasma from 
> Java, using the JNI: [https://github.com/apache/arrow/pull/2065/]
>  
> As a followup, we should implement reading (and writing) Java arrow objects 
> from plasma, if possible using zero-copy.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2517) [Java] Add list writer

2018-05-05 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464827#comment-16464827
 ] 

Jacques Nadeau commented on ARROW-2517:
---

Big question here from my perspective is how we want to represent this since 
you could have a mixed decimal. For example: decimal(30,2) and decimal (20,4) 
or something. This means the union concept needs to support multiple 
scale/precisions somehow.

> [Java] Add list writer
> ---
>
> Key: ARROW-2517
> URL: https://issues.apache.org/jira/browse/ARROW-2517
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> Apache Arrow have writer interface for list of decimal without 
> implementation. It will follow the current interface and will infer the scale 
> from BigDecimal or DecimalHolder if it can.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448502#comment-16448502
 ] 

Jacques Nadeau commented on ARROW-2498:
---

Sounds good to me. Let's flip the switch :)

> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error which I'm pretty sure is because the two projects use 
> different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1935) Download page must not link to snapshots / nightly builds

2017-12-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296105#comment-16296105
 ] 

Jacques Nadeau commented on ARROW-1935:
---

Fixed. Will close jira once the source for the site is merged into the arrow 
repo. (The generated pages have already been committed to the serving website.)

> Download page must not link to snapshots / nightly builds
> -
>
> Key: ARROW-1935
> URL: https://issues.apache.org/jira/browse/ARROW-1935
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: http://arrow.apache.org/install/
>Reporter: Sebb
>
> Nightly builds / snapshots which are not formal releases must not be linked 
> from the main download page.
> Such builds have not been voted on and should only be used by project 
> developers who should be made aware that the code is without any guarantees.
> Nightly builds are not formal ASF releases, and must not be promoted to the 
> general public.
> See [1] second para. The second sentence states:
> "Do not include any links on the project website that might encourage 
> non-developers to download and use nightly builds, snapshots, release 
> candidates, or any other similar package."
> [1] http://www.apache.org/dev/release.html#what



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1936) Broken links to signatures/hashes etc

2017-12-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296103#comment-16296103
 ] 

Jacques Nadeau commented on ARROW-1936:
---

I fixed the the hash links. Still need to fix keys/add verification info.

> Broken links to signatures/hashes etc
> -
>
> Key: ARROW-1936
> URL: https://issues.apache.org/jira/browse/ARROW-1936
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: http://arrow.apache.org/install/
>Reporter: Sebb
>
> Links to KEYS, sigs and hashes must use the ASF host, not the mirrors, as 
> such files are deliberately not mirrored.
> i.e. use
> http://www.apache.org/dist/arrow/KEYS
> https://www.apache.org/dist/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz.sha512
> etc.
> The download page needs to include a link to the KEYS file (the asc file is 
> no use without it) and should provide details of how to check sigs and 
> hashes, for example:
> https://www.apache.org/dyn/closer.cgi#verify



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1932) [Website] Update site for 0.8.0

2017-12-18 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved ARROW-1932.
---
Resolution: Done

> [Website] Update site for 0.8.0
> ---
>
> Key: ARROW-1932
> URL: https://issues.apache.org/jira/browse/ARROW-1932
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Wes McKinney
>Assignee: Jacques Nadeau
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1815) [Java] Rename MapVector to StructVector

2017-11-29 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau reassigned ARROW-1815:
-

Assignee: Bryan Cutler

> [Java] Rename MapVector to StructVector
> ---
>
> Key: ARROW-1815
> URL: https://issues.apache.org/jira/browse/ARROW-1815
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>Assignee: Bryan Cutler
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1719) [Java] Clear warning message for accessor/mutator methods that throws "UnsupportedOperationException" in new vector classes

2017-11-29 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau reassigned ARROW-1719:
-

Assignee: Li Jin

> [Java] Clear warning message for accessor/mutator methods that throws 
> "UnsupportedOperationException" in new vector classes
> ---
>
> Key: ARROW-1719
> URL: https://issues.apache.org/jira/browse/ARROW-1719
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
>Assignee: Li Jin
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1864) [Java] Upgrade Netty to 4.1.x

2017-11-29 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271089#comment-16271089
 ] 

Jacques Nadeau commented on ARROW-1864:
---

Probably only a moderate amount of work. It's possible they made some things 
private that could cause problems but I'm not aware of anything offhand.

I probably can't get to it in the next week or two. If someone else wants to 
pick it up, switch the pom and the identify the issues I'd be to try to help 
give guidance on how to address any issues found.

> [Java] Upgrade Netty to 4.1.x
> -
>
> Key: ARROW-1864
> URL: https://issues.apache.org/jira/browse/ARROW-1864
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory
>Reporter: Shixiong Zhu
> Fix For: 0.8.0
>
>
> The Netty community will declare Netty 4.0.x as EOL at the beginning of the 
> second quarter of 2018: https://github.com/netty/netty/issues/7439
> It would be great that Arrow can migrate to Netty 4.1.x soon. This is the 
> only blocker for Spark to migrate to Netty 4.1.x.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   >