[jira] [Created] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized
Jacques Nadeau created ARROW-7549: - Summary: [Java] Reorganize Flight modules to keep top level clean/organized Key: ARROW-7549 URL: https://issues.apache.org/jira/browse/ARROW-7549 Project: Apache Arrow Issue Type: Task Components: Java Reporter: Jacques Nadeau Lets create a flight parent module and then create the following below: flight-core (existing flight module) flight-grpc (existing flight-grpc module) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7534) Create a new java/contrib module
Jacques Nadeau created ARROW-7534: - Summary: Create a new java/contrib module Key: ARROW-7534 URL: https://issues.apache.org/jira/browse/ARROW-7534 Project: Apache Arrow Issue Type: Task Reporter: Jacques Nadeau Assignee: Liya Fan To better clarify the status of java sub-modules, create a contrib module and move the following modules underneath it. * algorithm * adapter * plasma -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7533) [Java] Move ArrowBufPointer out of the java the memory package
Jacques Nadeau created ARROW-7533: - Summary: [Java] Move ArrowBufPointer out of the java the memory package Key: ARROW-7533 URL: https://issues.apache.org/jira/browse/ARROW-7533 Project: Apache Arrow Issue Type: Task Components: Java Reporter: Jacques Nadeau Assignee: Liya Fan The memory package is focused on memory access and management. ArrowBufPointer should be moved to algorithm package as it isn't core to the Arrow memory management primitives. I would further suggest that is an anti-pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager
Jacques Nadeau created ARROW-7495: - Summary: [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager Key: ARROW-7495 URL: https://issues.apache.org/jira/browse/ARROW-7495 Project: Apache Arrow Issue Type: Task Reporter: Jacques Nadeau With the introduction of ReferenceManager in the codebase, the need for a separate ArrowBuf is no longer necessary. Instead, once can create a new reference manager that is used for the empty ArrowBuf. For reminder/review, empty arrowbufs have a special behavior in that they don't actually have any reference counting semantics and always stay at one. This allow us to better troubleshoot unallocated memory than what would otherwise be an NPE after calling ValueVector.clear() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf
Jacques Nadeau created ARROW-7494: - Summary: [Java] Remove reader index and writer index from ArrowBuf Key: ARROW-7494 URL: https://issues.apache.org/jira/browse/ARROW-7494 Project: Apache Arrow Issue Type: Task Reporter: Jacques Nadeau Reader and writer index and functionality doesn't belong on a chunk of memory and is due to inheritance from ByteBuf. As part of removing ByteBuf inheritance, we should also remove reader and writer indexes from ArrowBuf functionality. It wastes heap memory for rare utility. In general, a slice can be used instead of a reader/writer index pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7198) [Java] Allow a user to provide an alternative "chunk" allocator
Jacques Nadeau created ARROW-7198: - Summary: [Java] Allow a user to provide an alternative "chunk" allocator Key: ARROW-7198 URL: https://issues.apache.org/jira/browse/ARROW-7198 Project: Apache Arrow Issue Type: Task Components: Java Reporter: Jacques Nadeau Assignee: Jacques Nadeau Right now, the Arrow Java libraries have two options: - Have accounted memory using the Netty allocator. - Have unaccounted memory using your own allocator. I'd like to add a third option where you can use the existing accounting but decide where the chunks of memory come from. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-4669) [Java] No Bounds checking on ArrowBuf.slice
Jacques Nadeau created ARROW-4669: - Summary: [Java] No Bounds checking on ArrowBuf.slice Key: ARROW-4669 URL: https://issues.apache.org/jira/browse/ARROW-4669 Project: Apache Arrow Issue Type: Bug Reporter: Jacques Nadeau While reviewing some code I realized that there is no bounds checking on ArrowBuf slicing. Example negative test case that should pass but is currently failing can be found here: [https://gist.github.com/jacques-n/737c26b7016ed29dc710d4aba617340e] It may be that this doesn't cause more problems because the index checks do exist on memory access but fixing this would make it much easier to understand where a code mistake was made. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package
Jacques Nadeau created ARROW-4526: - Summary: [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package Key: ARROW-4526 URL: https://issues.apache.org/jira/browse/ARROW-4526 Project: Apache Arrow Issue Type: New Feature Reporter: Jacques Nadeau Arrow currently has a hard dependency on Netty and exposes this in public APIs. This shouldn't be the case. There could be many allocator implementations with Netty as one possible option. We should remove hard dependency between arrow-vector and Netty, instead creating a trivial allocator. ArrowBuf should probably expose an T unwrap(Class clazz) method instead to allow inner providers availability without a hard reference. This should also include drastically reducing the number of methods on ArrowBuf as right now it includes every method from ByteBuf but many of those are not very useful, appropriate. This work should come after we do the simpler ARROW-3191 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3887) [Java][Gandiva] Expose Dremio build and tests as new optional container/test
Jacques Nadeau created ARROW-3887: - Summary: [Java][Gandiva] Expose Dremio build and tests as new optional container/test Key: ARROW-3887 URL: https://issues.apache.org/jira/browse/ARROW-3887 Project: Apache Arrow Issue Type: New Feature Reporter: Jacques Nadeau Assignee: Praveen Kumar Desabandu Dremio uses Arrow Java and Gandiva extensively and could provide additional test coverage for the project. We should find a way to expose the downstream build of Dremio as an optional build so major changes can better be evaluated against downstream effects. [~praveenbingo], assigning to you for now but let's figure out who at Dremio can pick this up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.
Jacques Nadeau created ARROW-3191: - Summary: [Java] Add support for ArrowBuf to point to arbitrary memory. Key: ARROW-3191 URL: https://issues.apache.org/jira/browse/ARROW-3191 Project: Apache Arrow Issue Type: New Feature Reporter: Jacques Nadeau Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This is because in many cases we want to be able to support hierarchical accounting of memory and the ability to transfer memory ownership between separate allocators within the same hierarchy. At the same time, there are definitely times where someone might want to map some amount of arbitrary off-heap memory. In these situations they should still be able to use ArrowBuf. I propose we have a new ArrowBuf constructor that takes an input that subclasses an interface similar to: public abstract class Memory { protected final int length; protected final long address; protected abstract void release(); } We then make it so all the memory transfer semantics and accounting behavior are noops for this type of memory. The target of this work will be to make sure that all the fast paths continue to be efficient but some of the other paths like transfer can include a conditional (either directly or through alternative implementations of things like ledger). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-1477) Create Benchmarking Suite for final ValueVector updates
Jacques Nadeau created ARROW-1477: - Summary: Create Benchmarking Suite for final ValueVector updates Key: ARROW-1477 URL: https://issues.apache.org/jira/browse/ARROW-1477 Project: Apache Arrow Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1476) [JAVA] Implement final ValueVector updates
Jacques Nadeau created ARROW-1476: - Summary: [JAVA] Implement final ValueVector updates Key: ARROW-1476 URL: https://issues.apache.org/jira/browse/ARROW-1476 Project: Apache Arrow Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1475) [JAVA] Create Benchmarking Suite for prototypes
Jacques Nadeau created ARROW-1475: - Summary: [JAVA] Create Benchmarking Suite for prototypes Key: ARROW-1475 URL: https://issues.apache.org/jira/browse/ARROW-1475 Project: Apache Arrow Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1474) [JAVA] Create Prototype Code Hierarchy (alt B)
Jacques Nadeau created ARROW-1474: - Summary: [JAVA] Create Prototype Code Hierarchy (alt B) Key: ARROW-1474 URL: https://issues.apache.org/jira/browse/ARROW-1474 Project: Apache Arrow Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1472) [JAVA] Design updated ValueVector Object Hierarchy
Jacques Nadeau created ARROW-1472: - Summary: [JAVA] Design updated ValueVector Object Hierarchy Key: ARROW-1472 URL: https://issues.apache.org/jira/browse/ARROW-1472 Project: Apache Arrow Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1473) [JAVA] Create Prototype Code Hierarchy (alt A)
Jacques Nadeau created ARROW-1473: - Summary: [JAVA] Create Prototype Code Hierarchy (alt A) Key: ARROW-1473 URL: https://issues.apache.org/jira/browse/ARROW-1473 Project: Apache Arrow Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1463) [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code
Jacques Nadeau created ARROW-1463: - Summary: [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code Key: ARROW-1463 URL: https://issues.apache.org/jira/browse/ARROW-1463 Project: Apache Arrow Issue Type: Improvement Reporter: Jacques Nadeau The templates used in the java package are very high mainteance and the if conditions are hard to track. As started in the discussion here: https://github.com/apache/arrow/pull/1012, I'd like to propose that we modify the structure of the internal value vectors and code generation dynamics. Create new abstract base vectors: BaseFixedVector BaseVariableVector BaseNullableVector For each of these, implement all the basic functionality of a vector without using templating. Evaluate whether to use code generation to generate specific specializations of this functionality for each type where needed for performance purposes (probably constrained to mutator and accessor set/get methods). Giant and complex if conditions in the templates are actually worse from my perspective than a small amount of hand written duplicated code since templates are much harder to work with. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1045) [JAVA] Add support for custom metadata in org.apache.arrow.vector.types.pojo.*
Jacques Nadeau created ARROW-1045: - Summary: [JAVA] Add support for custom metadata in org.apache.arrow.vector.types.pojo.* Key: ARROW-1045 URL: https://issues.apache.org/jira/browse/ARROW-1045 Project: Apache Arrow Issue Type: Bug Reporter: Jacques Nadeau Custom metadata for Arrow Schema and Arrow Fields is lost if a user translates to/from the Java implementations pojo helper objects. Conversion to/from the Flatbuf schema should be lossless. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-1005) NullableDecimalVector.set(int, byte[]...) throws UnsupportedOperationException
Jacques Nadeau created ARROW-1005: - Summary: NullableDecimalVector.set(int, byte[]...) throws UnsupportedOperationException Key: ARROW-1005 URL: https://issues.apache.org/jira/browse/ARROW-1005 Project: Apache Arrow Issue Type: Bug Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-801) [JAVA] Provide direct access to underlying buffer memory addresses in consistent way without generating garbage or large amount indirections
Jacques Nadeau created ARROW-801: Summary: [JAVA] Provide direct access to underlying buffer memory addresses in consistent way without generating garbage or large amount indirections Key: ARROW-801 URL: https://issues.apache.org/jira/browse/ARROW-801 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Jacques Nadeau When working with Arrow vectors recently, we observed a situation where our time was dominated by calls to getFieldBuffers() to be able to retrieve memory addresses (22s out of 26s total for a piece of code). We should provide a direct mechanism to access this data so we can avoid all the extra indirection and object creation. A proposal: getBitAddress(); getDataAddress(); getOffsetAddress(); These interfaces would be made available at the FieldVector interface and simply throw UnsupportedOperationException where not supported. Unsupported Operations: data for list type offset for fixed width types data and offset for struct type data for union type -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-649) Explore a Weld/Arrow converter
Jacques Nadeau created ARROW-649: Summary: Explore a Weld/Arrow converter Key: ARROW-649 URL: https://issues.apache.org/jira/browse/ARROW-649 Project: Apache Arrow Issue Type: New Feature Reporter: Jacques Nadeau [~matei] and the Stanford team have just open sourced Weld. It would be interesting to evaluate how we could move Arrow data to Weld's internal representation. Weld is here: https://github.com/weld-project/weld -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-485) [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe
[ https://issues.apache.org/jira/browse/ARROW-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822969#comment-15822969 ] Jacques Nadeau commented on ARROW-485: -- There should be better documentation on this. In order to use vectors, the correct order of operations are: 1. allocateNew() (allocate memory for the vector) 2. Set one or more values using getMutator().setSafe(i, val). Note, this has to be monotonically increasing position but allows index skips. 3. call set valueCount(n) where n is the number of valid indices in the vector 4. read or serialize data I believe that if you follow these operations, you will not have a problem here. I'm guessing you're trying to use a vector before allocating (1). > [Java] Users are required to initialize VariableLengthVectors.offsetVector > before calling VariableLengthVectors.mutator.getSafe > > > Key: ARROW-485 > URL: https://issues.apache.org/jira/browse/ARROW-485 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Li Jin >Priority: Minor > > https://github.com/apache/arrow/blob/master/java/vector/src/main/codegen/templates/VariableLengthVectors.java#L492 > Here VariableLengthVectors.getMutator().setSafe() calls: > {code} > offsetVector.getAccessor().get(index) > {code} > however, index 0 of offsetVector (which is always 0) is not initialized by > VariableLengthVectors. > As a result, user of the VariableLengthVectors needs to manually initialize > the class by calling: > {code} > VariableLengthVectors.getOffsetVector().getMutator().setSafe(0, 0) > {code} > I wonder if this is necessary or should VariableLengthVectors initialize this > for the user -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-413) DATE type is not specified clearly
[ https://issues.apache.org/jira/browse/ARROW-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749143#comment-15749143 ] Jacques Nadeau commented on ARROW-413: -- I agree. I think it should be timezone-less. Basically the same semantics of java.time.Local[Date|DateTime|Time] > DATE type is not specified clearly > -- > > Key: ARROW-413 > URL: https://issues.apache.org/jira/browse/ARROW-413 > Project: Apache Arrow > Issue Type: Bug > Components: Format >Affects Versions: 0.1.0 >Reporter: Uwe L. Korn > > Currently the DATE type is not specified anywhere and needs to be documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-413) DATE type is not specified clearly
[ https://issues.apache.org/jira/browse/ARROW-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749135#comment-15749135 ] Jacques Nadeau commented on ARROW-413: -- We found it useful in java land as many of the prebuilt libraries use this construct. It makes doing date math much less work. Example: org.joda.time.LocalDate and the joda-derived JDK8+ java.time.LocalDate > DATE type is not specified clearly > -- > > Key: ARROW-413 > URL: https://issues.apache.org/jira/browse/ARROW-413 > Project: Apache Arrow > Issue Type: Bug > Components: Format >Affects Versions: 0.1.0 >Reporter: Uwe L. Korn > > Currently the DATE type is not specified anywhere and needs to be documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-413) DATE type is not specified clearly
[ https://issues.apache.org/jira/browse/ARROW-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736269#comment-15736269 ] Jacques Nadeau commented on ARROW-413: -- I'm more inclined to keeping the "rounded" long format. This is due to the common use of this pattern in libraries (rather than having to convert when operating against). This is different from Parquet in that Parquet can go for compactness. My $0.02. Anybody feel strongly in other ways? > DATE type is not specified clearly > -- > > Key: ARROW-413 > URL: https://issues.apache.org/jira/browse/ARROW-413 > Project: Apache Arrow > Issue Type: Bug > Components: Format >Affects Versions: 0.1.0 >Reporter: Uwe L. Korn > > Currently the DATE type is not specified anywhere and needs to be documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (ARROW-401) [Java] Floating point vectors should do an approximate comparison in integration tests
[ https://issues.apache.org/jira/browse/ARROW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717215#comment-15717215 ] Jacques Nadeau edited comment on ARROW-401 at 12/3/16 2:46 AM: --- This is what I've used elsewhere before: {code} boolean evaluateEquality(Float f1, Float f2) { if(f1.isNaN()){ return f2.isNaN(); } if(f1.isInfinite()){ return f2.isInfinite(); } if ((f1 + f2) / 2 != 0) { return Math.abs(f1 - f2) / ((f1 + f2) / 2) < 1.0E-6; } else { return !(f1 != 0); } } {code} {code} boolean evaluateEquality(Double f1, Double f2) { if(f1.isNaN()){ return f2.isNaN(); } if(f1.isInfinite()){ return f2.isInfinite(); } if ((f1 + f2) / 2 != 0) { return Math.abs(f1 - f2) / ((f1 + f2) / 2) < 1.0E-12; } else { return !(f1 != 0); } } } {code} was (Author: jnadeau): This is what I've used elsewhere before: {code} boolean evaluateEquality(Float f1, Float f2) { if(f1.isNaN()){ return f2.isNaN(); } if(f1.isInfinite()){ return f2.isInfinite(); } if ((f1 + f2) / 2 != 0) { return Math.abs(f1 - f2) / ((f1 + f2) / 2) < 1.0E-6; } else { return !(f1 != 0); } } {code} {code} @Override boolean evaluateEquality(Double f1, Double f2) { if(f1.isNaN()){ return f2.isNaN(); } if(f1.isInfinite()){ return f2.isInfinite(); } if ((f1 + f2) / 2 != 0) { return Math.abs(f1 - f2) / ((f1 + f2) / 2) < 1.0E-12; } else { return !(f1 != 0); } } } {code} > [Java] Floating point vectors should do an approximate comparison in > integration tests > -- > > Key: ARROW-401 > URL: https://issues.apache.org/jira/browse/ARROW-401 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Julien Le Dem >Priority: Blocker > > Floating point precision rears its ugly head: > {code} > Incompatible files > Different values in column: > Field{name=float64_nullable, type=FloatingPoint{2}, children=[], > layout=TypeLayout{[{width=1,type=VALIDITY}, {width=64,type=DATA}]}} at index > 1: 912.41402 != 912.414 > 10:23:45.863 [main] ERROR org.apache.arrow.tools.Integration - Incompatible > files > java.lang.IllegalArgumentException: Different values in column: > Field{name=float64_nullable, type=FloatingPoint{2}, children=[], > layout=TypeLayout{[{width=1,type=VALIDITY}, {width=64,type=DATA}]}} at index > 1: 912.41402 != 912.414 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-384) Align Java and C++ RecordBatch data and metadata layout
[ https://issues.apache.org/jira/browse/ARROW-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685289#comment-15685289 ] Jacques Nadeau commented on ARROW-384: -- +1 on this approach. > Align Java and C++ RecordBatch data and metadata layout > --- > > Key: ARROW-384 > URL: https://issues.apache.org/jira/browse/ARROW-384 > Project: Apache Arrow > Issue Type: Bug >Reporter: Julien Le Dem > > layout on C++ side: > {noformat} > > {noformat} > and on the java side: > {noformat} > > {noformat} > In the file format the footer has a Block info that contains the metadata > length. > https://github.com/apache/arrow/blob/f082b17323354dc2af31f39c15c58b995ba08360/format/File.fbs#L36 > See: > https://github.com/apache/arrow/pull/211#issuecomment-262080545 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-295) Create DOAP File
Jacques Nadeau created ARROW-295: Summary: Create DOAP File Key: ARROW-295 URL: https://issues.apache.org/jira/browse/ARROW-295 Project: Apache Arrow Issue Type: Bug Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-273) Lists use unsigned offset vectors instead of signed (as defined in the spec)
[ https://issues.apache.org/jira/browse/ARROW-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438658#comment-15438658 ] Jacques Nadeau commented on ARROW-273: -- I vote yes. > Lists use unsigned offset vectors instead of signed (as defined in the spec) > > > Key: ARROW-273 > URL: https://issues.apache.org/jira/browse/ARROW-273 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Julien Le Dem > > The List vector defines it's offset vector as UInt4Vector. (unsigned int 34) > According to the arrow spec it should be a signed int32. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-270) [Format] Define more generic Interval logical type
[ https://issues.apache.org/jira/browse/ARROW-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435655#comment-15435655 ] Jacques Nadeau commented on ARROW-270: -- This matches DAY_TIME I believe. The difference is that we are currently fixed to four bytes, right? > [Format] Define more generic Interval logical type > -- > > Key: ARROW-270 > URL: https://issues.apache.org/jira/browse/ARROW-270 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Wes McKinney > > Per discussion in > https://github.com/apache/arrow/commit/e7e399db5fc6913e67426514279f81766a0778d2#commitcomment-18711366, > we can create an {{Interval}} type with a unit to be more general. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-270) [Format] Define more generic Interval logical type
[ https://issues.apache.org/jira/browse/ARROW-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435147#comment-15435147 ] Jacques Nadeau commented on ARROW-270: -- IntervalUnit seems fine to me. As far as timestamp/decimal, I'm not inclined to change. I think most of the processing engines and storage formats that we work with use epoch in either millis, micros or nanos. > [Format] Define more generic Interval logical type > -- > > Key: ARROW-270 > URL: https://issues.apache.org/jira/browse/ARROW-270 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Wes McKinney > > Per discussion in > https://github.com/apache/arrow/commit/e7e399db5fc6913e67426514279f81766a0778d2#commitcomment-18711366, > we can create an {{Interval}} type with a unit to be more general. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-81) C++: Add a Category nested type
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428316#comment-15428316 ] Jacques Nadeau commented on ARROW-81: - Can you guys provide two small example datasets in JSON format here? > C++: Add a Category nested type > --- > > Key: ARROW-81 > URL: https://issues.apache.org/jira/browse/ARROW-81 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > > A Category (or "factor") is a dictionary-encoded array whose dictionary has > semantic meaning. The data consists of > - An array of integer "codes" > - A child array of some other type, known as the "categories" or "levels" of > the array. Typically there is an "ordered" boolean flag indicating whether > the order of the categories is meaningful. > Category/factor types are used in a number of common statistical analyses. > See, for example, > http://www.voteview.com/R_Ordered_Logistic_or_Probit_Regression.htm. It is a > basic requirement for Python and R, at least, as Arrow C++ consumers, to have > this type. Separately, we should consider what is necessary to be able to > transmit category data in IPCs -- possible an expansion of the Arrow format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-260) TestValueVector.testFixedVectorReallocation and testVariableVectorReallocation are flaky
[ https://issues.apache.org/jira/browse/ARROW-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420079#comment-15420079 ] Jacques Nadeau commented on ARROW-260: -- Note, it is probably still good to move these three tests into a separate class and put a disclaimer at the top about the parameter. > TestValueVector.testFixedVectorReallocation and > testVariableVectorReallocation are flaky > > > Key: ARROW-260 > URL: https://issues.apache.org/jira/browse/ARROW-260 > Project: Apache Arrow > Issue Type: Test > Components: Java - Vectors >Reporter: Julien Le Dem >Assignee: Jihoon Son > > The Travis-ci build has failled several times on these tests. > It looks like they often throw OOME. > stacktrace bellow: > {noformat} > testFixedVectorReallocation(org.apache.arrow.vector.TestValueVector) Time > elapsed: 0.174 sec <<< ERROR! > java.lang.Exception: Unexpected exception, > expected but > was > at java.nio.Bits.reserveMemory(Bits.java:658) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:69) > at > io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) > at > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:155) > at > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195) > at > io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62) > at > org.apache.arrow.memory.AllocationManager.(AllocationManager.java:79) > at > org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:238) > at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:220) > at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:190) > at > org.apache.arrow.vector.UInt4Vector.allocateBytes(UInt4Vector.java:189) > at org.apache.arrow.vector.UInt4Vector.allocateNew(UInt4Vector.java:171) > at > org.apache.arrow.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:106) > testVariableVectorReallocation(org.apache.arrow.vector.TestValueVector) Time > elapsed: 0.148 sec <<< ERROR! > java.lang.Exception: Unexpected exception, > expected but > was > at java.nio.Bits.reserveMemory(Bits.java:658) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:69) > at > io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) > at > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:155) > at > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195) > at > io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62) > at > org.apache.arrow.memory.AllocationManager.(AllocationManager.java:79) > at > org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:238) > at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:220) > at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:190) > at > org.apache.arrow.vector.VarCharVector.allocateNew(VarCharVector.java:364) > at > org.apache.arrow.vector.TestValueVector.testVariableVectorReallocation(TestValueVector.java:163) > Results : > Tests in error: > TestValueVector.testFixedVectorReallocation » Unexpected exception, > expected<... > TestValueVector.testVariableVectorReallocation » Unexpected exception, > expect... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-260) TestValueVector.testFixedVectorReallocation and testVariableVectorReallocation are flaky
[ https://issues.apache.org/jira/browse/ARROW-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420078#comment-15420078 ] Jacques Nadeau commented on ARROW-260: -- I'm fine with setting the surefire option in the default execution for now. > TestValueVector.testFixedVectorReallocation and > testVariableVectorReallocation are flaky > > > Key: ARROW-260 > URL: https://issues.apache.org/jira/browse/ARROW-260 > Project: Apache Arrow > Issue Type: Test > Components: Java - Vectors >Reporter: Julien Le Dem >Assignee: Jihoon Son > > The Travis-ci build has failled several times on these tests. > It looks like they often throw OOME. > stacktrace bellow: > {noformat} > testFixedVectorReallocation(org.apache.arrow.vector.TestValueVector) Time > elapsed: 0.174 sec <<< ERROR! > java.lang.Exception: Unexpected exception, > expected but > was > at java.nio.Bits.reserveMemory(Bits.java:658) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:69) > at > io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) > at > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:155) > at > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195) > at > io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62) > at > org.apache.arrow.memory.AllocationManager.(AllocationManager.java:79) > at > org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:238) > at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:220) > at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:190) > at > org.apache.arrow.vector.UInt4Vector.allocateBytes(UInt4Vector.java:189) > at org.apache.arrow.vector.UInt4Vector.allocateNew(UInt4Vector.java:171) > at > org.apache.arrow.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:106) > testVariableVectorReallocation(org.apache.arrow.vector.TestValueVector) Time > elapsed: 0.148 sec <<< ERROR! > java.lang.Exception: Unexpected exception, > expected but > was > at java.nio.Bits.reserveMemory(Bits.java:658) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:69) > at > io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) > at > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:155) > at > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195) > at > io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62) > at > org.apache.arrow.memory.AllocationManager.(AllocationManager.java:79) > at > org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:238) > at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:220) > at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:190) > at > org.apache.arrow.vector.VarCharVector.allocateNew(VarCharVector.java:364) > at > org.apache.arrow.vector.TestValueVector.testVariableVectorReallocation(TestValueVector.java:163) > Results : > Tests in error: > TestValueVector.testFixedVectorReallocation » Unexpected exception, > expected<... > TestValueVector.testVariableVectorReallocation » Unexpected exception, > expect... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ARROW-64) Add zsh support to C++ build scripts
[ https://issues.apache.org/jira/browse/ARROW-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated ARROW-64: Assignee: Uwe L. Korn > Add zsh support to C++ build scripts > > > Key: ARROW-64 > URL: https://issues.apache.org/jira/browse/ARROW-64 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn > > All scripts that have to be sourced during development currently only support > bash. This patch adds zsh support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-62) Format: Are the nulls bits 0 or 1 for null values?
[ https://issues.apache.org/jira/browse/ARROW-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192525#comment-15192525 ] Jacques Nadeau commented on ARROW-62: - I consider the bitmap to be a validity map as opposed to a null map. I've also seen a couple places where it is nice to zero out values that are null using the zero in the bitmap without a condition... although I can't remember where we took advantage of this previously. > Format: Are the nulls bits 0 or 1 for null values? > -- > > Key: ARROW-62 > URL: https://issues.apache.org/jira/browse/ARROW-62 > Project: Apache Arrow > Issue Type: Bug > Components: Format >Reporter: Wes McKinney >Assignee: Wes McKinney > > As brought up by Dan Robinson on the mailing list (thank you for catching > this!), there is an inconsistency in the format documents in the > representation of nulls with the ValueVectors code import -- since I drafted > these format documents initially I'll take the blame for the inconsistency, > but: > * Drill / ValueVectors uses the value 0 for null data, and 1 for non-null data > * The format document currently states the opposite (values are null if the > bit is set) > I can see arguments both ways, but one argument for the ValueVectors style is > that values must be explicitly set to be non-null, versus uninitialized > values being accidentally interpreted as being non-null. When initializing a > bitmap, one can {{memset}} the bits to 0, then set then to 1 when non-null > values are appended during construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)