[jira] [Created] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized

2020-01-10 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7549:
-

 Summary: [Java] Reorganize Flight modules to keep top level 
clean/organized
 Key: ARROW-7549
 URL: https://issues.apache.org/jira/browse/ARROW-7549
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Jacques Nadeau


Lets create a flight parent module and then create the following below:

flight-core (existing flight module)
flight-grpc (existing flight-grpc module)




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7534) Create a new java/contrib module

2020-01-09 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7534:
-

 Summary: Create a new java/contrib module
 Key: ARROW-7534
 URL: https://issues.apache.org/jira/browse/ARROW-7534
 Project: Apache Arrow
  Issue Type: Task
Reporter: Jacques Nadeau
Assignee: Liya Fan


To better clarify the status of java sub-modules, create a contrib module and 
move the following modules underneath it.

* algorithm
* adapter
* plasma



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7533) [Java] Move ArrowBufPointer out of the java the memory package

2020-01-09 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7533:
-

 Summary: [Java] Move ArrowBufPointer out of the java the memory 
package
 Key: ARROW-7533
 URL: https://issues.apache.org/jira/browse/ARROW-7533
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Jacques Nadeau
Assignee: Liya Fan


The memory package is focused on memory access and management. ArrowBufPointer 
should be moved to algorithm package as it isn't core to the Arrow memory 
management primitives. I would further suggest that is an anti-pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-01-03 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7495:
-

 Summary: [Java] Remove "empty" concept from ArrowBuf, replace with 
custom referencemanager
 Key: ARROW-7495
 URL: https://issues.apache.org/jira/browse/ARROW-7495
 Project: Apache Arrow
  Issue Type: Task
Reporter: Jacques Nadeau


With the introduction of ReferenceManager in the codebase, the need for a 
separate ArrowBuf is no longer necessary. Instead, once can create a new 
reference manager that is used for the empty ArrowBuf. For reminder/review, 
empty arrowbufs have a special behavior in that they don't actually have any 
reference counting semantics and always stay at one. This allow us to better 
troubleshoot unallocated memory than what would otherwise be an NPE after 
calling ValueVector.clear()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7494:
-

 Summary: [Java] Remove reader index and writer index from ArrowBuf
 Key: ARROW-7494
 URL: https://issues.apache.org/jira/browse/ARROW-7494
 Project: Apache Arrow
  Issue Type: Task
Reporter: Jacques Nadeau


Reader and writer index and functionality doesn't belong on a chunk of memory 
and is due to inheritance from ByteBuf. As part of removing ByteBuf 
inheritance, we should also remove reader and writer indexes from ArrowBuf 
functionality. It wastes heap memory for rare utility. In general, a slice can 
be used instead of a reader/writer index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7198) [Java] Allow a user to provide an alternative "chunk" allocator

2019-11-17 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7198:
-

 Summary: [Java] Allow a user to provide an alternative "chunk" 
allocator
 Key: ARROW-7198
 URL: https://issues.apache.org/jira/browse/ARROW-7198
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau


Right now, the Arrow Java libraries have two options:
- Have accounted memory using the Netty allocator.
- Have unaccounted memory using your own allocator.

I'd like to add a third option where you can use the existing accounting but 
decide where the chunks of memory come from.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-4669) [Java] No Bounds checking on ArrowBuf.slice

2019-02-23 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-4669:
-

 Summary: [Java] No Bounds checking on ArrowBuf.slice
 Key: ARROW-4669
 URL: https://issues.apache.org/jira/browse/ARROW-4669
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Jacques Nadeau


While reviewing some code I realized that there is no bounds checking on 
ArrowBuf slicing. Example negative test case that should pass but is currently 
failing can be found here: 

[https://gist.github.com/jacques-n/737c26b7016ed29dc710d4aba617340e]

It may be that this doesn't cause more problems because the index checks do 
exist on memory access but fixing this would make it much easier to understand 
where a code mistake was made.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2019-02-10 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-4526:
-

 Summary: [Java] Remove Netty references from ArrowBuf and move 
Allocator out of vector package
 Key: ARROW-4526
 URL: https://issues.apache.org/jira/browse/ARROW-4526
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jacques Nadeau


Arrow currently has a hard dependency on Netty and exposes this in public APIs. 
This shouldn't be the case. There could be many allocator implementations with 
Netty as one possible option. We should remove hard dependency between 
arrow-vector and Netty, instead creating a trivial allocator. ArrowBuf should 
probably expose an  T unwrap(Class clazz) method instead to allow inner 
providers availability without a hard reference. This should also include 
drastically reducing the number of methods on ArrowBuf as right now it includes 
every method from ByteBuf but many of those are not very useful, appropriate.

This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3887) [Java][Gandiva] Expose Dremio build and tests as new optional container/test

2018-11-26 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-3887:
-

 Summary: [Java][Gandiva] Expose Dremio build and tests as new 
optional container/test
 Key: ARROW-3887
 URL: https://issues.apache.org/jira/browse/ARROW-3887
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jacques Nadeau
Assignee: Praveen Kumar Desabandu


Dremio uses Arrow Java and Gandiva extensively and could provide additional 
test coverage for the project. We should find a way to expose the downstream 
build of Dremio as an optional build so major changes can better be evaluated 
against downstream effects.

 

[~praveenbingo], assigning to you for now but let's figure out who at Dremio 
can pick this up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2018-09-07 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-3191:
-

 Summary: [Java] Add support for ArrowBuf to point to arbitrary 
memory.
 Key: ARROW-3191
 URL: https://issues.apache.org/jira/browse/ARROW-3191
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jacques Nadeau


Right now ArrowBuf can only point to memory managed by an Arrow Allocator. This 
is because in many cases we want to be able to support hierarchical accounting 
of memory and the ability to transfer memory ownership between separate 
allocators within the same hierarchy.

At the same time, there are definitely times where someone might want to map 
some amount of arbitrary off-heap memory. In these situations they should still 
be able to use ArrowBuf.

I propose we have a new ArrowBuf constructor that takes an input that 
subclasses an interface similar to:

public abstract class Memory  {
  protected final int length;
  protected final long address;
  protected abstract void release(); 
}

We then make it so all the memory transfer semantics and accounting behavior 
are noops for this type of memory. The target of this work will be to make sure 
that all the fast paths continue to be efficient but some of the other paths 
like transfer can include a conditional (either directly or through alternative 
implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-1477) Create Benchmarking Suite for final ValueVector updates

2017-09-06 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1477:
-

 Summary: Create Benchmarking Suite for final ValueVector updates
 Key: ARROW-1477
 URL: https://issues.apache.org/jira/browse/ARROW-1477
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1476) [JAVA] Implement final ValueVector updates

2017-09-06 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1476:
-

 Summary: [JAVA] Implement final ValueVector updates
 Key: ARROW-1476
 URL: https://issues.apache.org/jira/browse/ARROW-1476
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1475) [JAVA] Create Benchmarking Suite for prototypes

2017-09-06 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1475:
-

 Summary: [JAVA] Create Benchmarking Suite for prototypes
 Key: ARROW-1475
 URL: https://issues.apache.org/jira/browse/ARROW-1475
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1474) [JAVA] Create Prototype Code Hierarchy (alt B)

2017-09-06 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1474:
-

 Summary: [JAVA] Create Prototype Code Hierarchy (alt B)
 Key: ARROW-1474
 URL: https://issues.apache.org/jira/browse/ARROW-1474
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1472) [JAVA] Design updated ValueVector Object Hierarchy

2017-09-06 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1472:
-

 Summary: [JAVA] Design updated ValueVector Object Hierarchy
 Key: ARROW-1472
 URL: https://issues.apache.org/jira/browse/ARROW-1472
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1473) [JAVA] Create Prototype Code Hierarchy (alt A)

2017-09-06 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1473:
-

 Summary: [JAVA] Create Prototype Code Hierarchy (alt A)
 Key: ARROW-1473
 URL: https://issues.apache.org/jira/browse/ARROW-1473
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1463) [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code

2017-09-05 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1463:
-

 Summary: [JAVA] Restructure ValueVector hierarchy to minimize 
compile-time generated code
 Key: ARROW-1463
 URL: https://issues.apache.org/jira/browse/ARROW-1463
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Jacques Nadeau


The templates used in the java package are very high mainteance and the if 
conditions are hard to track. As started in the discussion here: 
https://github.com/apache/arrow/pull/1012, I'd like to propose that we modify 
the structure of the internal value vectors and code generation dynamics.

Create new abstract base vectors:

BaseFixedVector
BaseVariableVector
BaseNullableVector

For each of these, implement all the basic functionality of a vector without 
using templating.

Evaluate whether to use code generation to generate specific specializations of 
this functionality for each type where needed for performance purposes 
(probably constrained to mutator and accessor set/get methods). Giant and 
complex if conditions in the templates are actually worse from my perspective 
than a small amount of hand written duplicated code since templates are much 
harder to work with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1045) [JAVA] Add support for custom metadata in org.apache.arrow.vector.types.pojo.*

2017-05-17 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1045:
-

 Summary: [JAVA] Add support for custom metadata in 
org.apache.arrow.vector.types.pojo.*
 Key: ARROW-1045
 URL: https://issues.apache.org/jira/browse/ARROW-1045
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Jacques Nadeau


Custom metadata for Arrow Schema and Arrow Fields is lost if a user translates 
to/from the Java implementations pojo helper objects. Conversion to/from the 
Flatbuf schema should be lossless.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-1005) NullableDecimalVector.set(int, byte[]...) throws UnsupportedOperationException

2017-05-10 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-1005:
-

 Summary: NullableDecimalVector.set(int, byte[]...) throws 
UnsupportedOperationException
 Key: ARROW-1005
 URL: https://issues.apache.org/jira/browse/ARROW-1005
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-801) [JAVA] Provide direct access to underlying buffer memory addresses in consistent way without generating garbage or large amount indirections

2017-04-10 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-801:


 Summary: [JAVA] Provide direct access to underlying buffer memory 
addresses in consistent way without generating garbage or large amount 
indirections
 Key: ARROW-801
 URL: https://issues.apache.org/jira/browse/ARROW-801
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Jacques Nadeau


When working with Arrow vectors recently, we observed a situation where our 
time was dominated  by calls to getFieldBuffers() to be able to retrieve memory 
addresses (22s out of 26s total for a piece of code). We should provide a 
direct mechanism to access this data so we can avoid all the extra indirection 
and object creation. 

A proposal:
getBitAddress();
getDataAddress();
getOffsetAddress();

These interfaces would be made available at the FieldVector interface and 
simply throw UnsupportedOperationException where not supported.

Unsupported Operations: 
data for list type
offset for fixed width types
data and offset for struct type
data for union type



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-649) Explore a Weld/Arrow converter

2017-03-17 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-649:


 Summary: Explore a Weld/Arrow converter
 Key: ARROW-649
 URL: https://issues.apache.org/jira/browse/ARROW-649
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Jacques Nadeau


[~matei] and the Stanford team have just open sourced Weld. It would be 
interesting to evaluate how we could move Arrow data to Weld's internal 
representation.

Weld is here: https://github.com/weld-project/weld



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-485) [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe

2017-01-14 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822969#comment-15822969
 ] 

Jacques Nadeau commented on ARROW-485:
--

There should be better documentation on this. In order to use vectors, the 
correct order of operations are:

1. allocateNew() (allocate memory for the vector)
2. Set one or more values using getMutator().setSafe(i, val). Note, this has to 
be monotonically increasing position but allows index skips.
3. call set valueCount(n) where n is the number of valid indices in the vector
4. read or serialize data

I believe that if you follow these operations, you will not have a problem 
here. I'm guessing you're trying to use a vector before allocating (1).


> [Java] Users are required to initialize VariableLengthVectors.offsetVector 
> before calling VariableLengthVectors.mutator.getSafe 
> 
>
> Key: ARROW-485
> URL: https://issues.apache.org/jira/browse/ARROW-485
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Li Jin
>Priority: Minor
>
> https://github.com/apache/arrow/blob/master/java/vector/src/main/codegen/templates/VariableLengthVectors.java#L492
> Here VariableLengthVectors.getMutator().setSafe() calls:
> {code}
> offsetVector.getAccessor().get(index)
> {code}
>  however, index 0 of offsetVector (which is always 0) is not initialized by 
> VariableLengthVectors.
> As a result, user of the VariableLengthVectors needs to manually initialize 
> the class by calling:
> {code}
> VariableLengthVectors.getOffsetVector().getMutator().setSafe(0, 0)
> {code}
> I wonder if this is necessary or should VariableLengthVectors initialize this 
> for the user



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-413) DATE type is not specified clearly

2016-12-14 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749143#comment-15749143
 ] 

Jacques Nadeau commented on ARROW-413:
--

I agree. I think it should be timezone-less. Basically the same semantics of 
java.time.Local[Date|DateTime|Time]

> DATE type is not specified clearly
> --
>
> Key: ARROW-413
> URL: https://issues.apache.org/jira/browse/ARROW-413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Affects Versions: 0.1.0
>Reporter: Uwe L. Korn
>
> Currently the DATE type is not specified anywhere and needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-413) DATE type is not specified clearly

2016-12-14 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749135#comment-15749135
 ] 

Jacques Nadeau commented on ARROW-413:
--

We found it useful in java land as many of the prebuilt libraries use this 
construct. It makes doing date math much less work. Example: 
org.joda.time.LocalDate and the joda-derived JDK8+ java.time.LocalDate

> DATE type is not specified clearly
> --
>
> Key: ARROW-413
> URL: https://issues.apache.org/jira/browse/ARROW-413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Affects Versions: 0.1.0
>Reporter: Uwe L. Korn
>
> Currently the DATE type is not specified anywhere and needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-413) DATE type is not specified clearly

2016-12-09 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736269#comment-15736269
 ] 

Jacques Nadeau commented on ARROW-413:
--

I'm more inclined to keeping the "rounded" long format. This is due to the 
common use of this pattern in libraries (rather than having to convert when 
operating against). This is different from Parquet in that Parquet can go for 
compactness. My $0.02.

Anybody feel strongly in other ways?

> DATE type is not specified clearly
> --
>
> Key: ARROW-413
> URL: https://issues.apache.org/jira/browse/ARROW-413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Affects Versions: 0.1.0
>Reporter: Uwe L. Korn
>
> Currently the DATE type is not specified anywhere and needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ARROW-401) [Java] Floating point vectors should do an approximate comparison in integration tests

2016-12-02 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717215#comment-15717215
 ] 

Jacques Nadeau edited comment on ARROW-401 at 12/3/16 2:46 AM:
---

This is what I've used elsewhere before:
{code}
boolean evaluateEquality(Float f1, Float f2) {
  if(f1.isNaN()){
return f2.isNaN();
  }

  if(f1.isInfinite()){
return f2.isInfinite();
  }

  if ((f1 + f2) / 2 != 0) {
return Math.abs(f1 - f2) / ((f1 + f2) / 2) < 1.0E-6;
  } else {
return !(f1 != 0);
  }
}
{code}

{code}
boolean evaluateEquality(Double f1, Double f2) {
  if(f1.isNaN()){
return f2.isNaN();
  }

  if(f1.isInfinite()){
return f2.isInfinite();
  }

  if ((f1 + f2) / 2 != 0) {
return Math.abs(f1 - f2) / ((f1 + f2) / 2) < 1.0E-12;
  } else {
return !(f1 != 0);
  }
}
  }
{code}


was (Author: jnadeau):
This is what I've used elsewhere before:
{code}
boolean evaluateEquality(Float f1, Float f2) {
  if(f1.isNaN()){
return f2.isNaN();
  }

  if(f1.isInfinite()){
return f2.isInfinite();
  }

  if ((f1 + f2) / 2 != 0) {
return Math.abs(f1 - f2) / ((f1 + f2) / 2) < 1.0E-6;
  } else {
return !(f1 != 0);
  }
}
{code}

{code}
@Override
boolean evaluateEquality(Double f1, Double f2) {
  if(f1.isNaN()){
return f2.isNaN();
  }

  if(f1.isInfinite()){
return f2.isInfinite();
  }

  if ((f1 + f2) / 2 != 0) {
return Math.abs(f1 - f2) / ((f1 + f2) / 2) < 1.0E-12;
  } else {
return !(f1 != 0);
  }
}
  }
{code}

> [Java] Floating point vectors should do an approximate comparison in 
> integration tests
> --
>
> Key: ARROW-401
> URL: https://issues.apache.org/jira/browse/ARROW-401
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Julien Le Dem
>Priority: Blocker
>
> Floating point precision rears its ugly head:
> {code}
> Incompatible files
> Different values in column:
> Field{name=float64_nullable, type=FloatingPoint{2}, children=[], 
> layout=TypeLayout{[{width=1,type=VALIDITY}, {width=64,type=DATA}]}} at index 
> 1: 912.41402 != 912.414
> 10:23:45.863 [main] ERROR org.apache.arrow.tools.Integration - Incompatible 
> files
> java.lang.IllegalArgumentException: Different values in column:
> Field{name=float64_nullable, type=FloatingPoint{2}, children=[], 
> layout=TypeLayout{[{width=1,type=VALIDITY}, {width=64,type=DATA}]}} at index 
> 1: 912.41402 != 912.414
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-384) Align Java and C++ RecordBatch data and metadata layout

2016-11-21 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685289#comment-15685289
 ] 

Jacques Nadeau commented on ARROW-384:
--

+1 on this approach. 

> Align Java and C++ RecordBatch data and metadata layout
> ---
>
> Key: ARROW-384
> URL: https://issues.apache.org/jira/browse/ARROW-384
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Julien Le Dem
>
> layout on C++ side:
> {noformat}
>   
> {noformat}
> and on the java side:
> {noformat}
>  
> {noformat}
> In the file format the footer has a Block info that contains the metadata 
> length.
> https://github.com/apache/arrow/blob/f082b17323354dc2af31f39c15c58b995ba08360/format/File.fbs#L36
> See:
> https://github.com/apache/arrow/pull/211#issuecomment-262080545



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ARROW-295) Create DOAP File

2016-09-18 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created ARROW-295:


 Summary: Create DOAP File
 Key: ARROW-295
 URL: https://issues.apache.org/jira/browse/ARROW-295
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Jacques Nadeau






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-273) Lists use unsigned offset vectors instead of signed (as defined in the spec)

2016-08-26 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438658#comment-15438658
 ] 

Jacques Nadeau commented on ARROW-273:
--

I vote yes.

> Lists use unsigned offset vectors instead of signed (as defined in the spec)
> 
>
> Key: ARROW-273
> URL: https://issues.apache.org/jira/browse/ARROW-273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Julien Le Dem
>
> The List vector defines it's offset vector as UInt4Vector. (unsigned int 34)
> According to the arrow spec it should be a signed int32.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-270) [Format] Define more generic Interval logical type

2016-08-24 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435655#comment-15435655
 ] 

Jacques Nadeau commented on ARROW-270:
--

This matches DAY_TIME I believe. The difference is that we are currently fixed 
to four bytes, right?

> [Format] Define more generic Interval logical type
> --
>
> Key: ARROW-270
> URL: https://issues.apache.org/jira/browse/ARROW-270
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Wes McKinney
>
> Per discussion in 
> https://github.com/apache/arrow/commit/e7e399db5fc6913e67426514279f81766a0778d2#commitcomment-18711366,
>  we can create an {{Interval}} type with a unit to be more general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-270) [Format] Define more generic Interval logical type

2016-08-24 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435147#comment-15435147
 ] 

Jacques Nadeau commented on ARROW-270:
--

IntervalUnit seems fine to me.

As far as timestamp/decimal, I'm not inclined to change. I think most of the 
processing engines and storage formats that we work with use epoch in either 
millis, micros or nanos.

> [Format] Define more generic Interval logical type
> --
>
> Key: ARROW-270
> URL: https://issues.apache.org/jira/browse/ARROW-270
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Wes McKinney
>
> Per discussion in 
> https://github.com/apache/arrow/commit/e7e399db5fc6913e67426514279f81766a0778d2#commitcomment-18711366,
>  we can create an {{Interval}} type with a unit to be more general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-19 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428316#comment-15428316
 ] 

Jacques Nadeau commented on ARROW-81:
-

Can you guys provide two small example datasets in JSON format here?

> C++: Add a Category nested type
> ---
>
> Key: ARROW-81
> URL: https://issues.apache.org/jira/browse/ARROW-81
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> A Category (or "factor") is a dictionary-encoded array whose dictionary has 
> semantic meaning. The data consists of
> - An array of integer "codes"
> - A child array of some other type, known as the "categories" or "levels" of 
> the array. Typically there is an "ordered" boolean flag indicating whether 
> the order of the categories is meaningful.
> Category/factor types are used in a number of common statistical analyses. 
> See, for example, 
> http://www.voteview.com/R_Ordered_Logistic_or_Probit_Regression.htm. It is a 
> basic requirement for Python and R, at least, as Arrow C++ consumers, to have 
> this type. Separately, we should consider what is necessary to be able to 
> transmit category data in IPCs -- possible an expansion of the Arrow format. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-260) TestValueVector.testFixedVectorReallocation and testVariableVectorReallocation are flaky

2016-08-13 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420079#comment-15420079
 ] 

Jacques Nadeau commented on ARROW-260:
--

Note, it is probably still good to move these three tests into a separate class 
and put a disclaimer at the top about the parameter.

> TestValueVector.testFixedVectorReallocation and 
> testVariableVectorReallocation are flaky
> 
>
> Key: ARROW-260
> URL: https://issues.apache.org/jira/browse/ARROW-260
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Java - Vectors
>Reporter: Julien Le Dem
>Assignee: Jihoon Son
>
> The Travis-ci build has failled several times on these tests.
> It looks like they often throw OOME.
> stacktrace bellow:
> {noformat}
> testFixedVectorReallocation(org.apache.arrow.vector.TestValueVector)  Time 
> elapsed: 0.174 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected but 
> was
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
>   at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:69)
>   at 
> io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:155)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62)
>   at 
> org.apache.arrow.memory.AllocationManager.(AllocationManager.java:79)
>   at 
> org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:238)
>   at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:220)
>   at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:190)
>   at 
> org.apache.arrow.vector.UInt4Vector.allocateBytes(UInt4Vector.java:189)
>   at org.apache.arrow.vector.UInt4Vector.allocateNew(UInt4Vector.java:171)
>   at 
> org.apache.arrow.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:106)
> testVariableVectorReallocation(org.apache.arrow.vector.TestValueVector)  Time 
> elapsed: 0.148 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected but 
> was
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
>   at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:69)
>   at 
> io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:155)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62)
>   at 
> org.apache.arrow.memory.AllocationManager.(AllocationManager.java:79)
>   at 
> org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:238)
>   at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:220)
>   at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:190)
>   at 
> org.apache.arrow.vector.VarCharVector.allocateNew(VarCharVector.java:364)
>   at 
> org.apache.arrow.vector.TestValueVector.testVariableVectorReallocation(TestValueVector.java:163)
> Results :
> Tests in error: 
>   TestValueVector.testFixedVectorReallocation »  Unexpected exception, 
> expected<...
>   TestValueVector.testVariableVectorReallocation »  Unexpected exception, 
> expect...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-260) TestValueVector.testFixedVectorReallocation and testVariableVectorReallocation are flaky

2016-08-13 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420078#comment-15420078
 ] 

Jacques Nadeau commented on ARROW-260:
--

I'm fine with setting the surefire option in the default execution for now.

> TestValueVector.testFixedVectorReallocation and 
> testVariableVectorReallocation are flaky
> 
>
> Key: ARROW-260
> URL: https://issues.apache.org/jira/browse/ARROW-260
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Java - Vectors
>Reporter: Julien Le Dem
>Assignee: Jihoon Son
>
> The Travis-ci build has failled several times on these tests.
> It looks like they often throw OOME.
> stacktrace bellow:
> {noformat}
> testFixedVectorReallocation(org.apache.arrow.vector.TestValueVector)  Time 
> elapsed: 0.174 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected but 
> was
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
>   at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:69)
>   at 
> io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:155)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62)
>   at 
> org.apache.arrow.memory.AllocationManager.(AllocationManager.java:79)
>   at 
> org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:238)
>   at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:220)
>   at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:190)
>   at 
> org.apache.arrow.vector.UInt4Vector.allocateBytes(UInt4Vector.java:189)
>   at org.apache.arrow.vector.UInt4Vector.allocateNew(UInt4Vector.java:171)
>   at 
> org.apache.arrow.vector.TestValueVector.testFixedVectorReallocation(TestValueVector.java:106)
> testVariableVectorReallocation(org.apache.arrow.vector.TestValueVector)  Time 
> elapsed: 0.148 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected but 
> was
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
>   at 
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:69)
>   at 
> io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:155)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195)
>   at 
> io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62)
>   at 
> org.apache.arrow.memory.AllocationManager.(AllocationManager.java:79)
>   at 
> org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:238)
>   at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:220)
>   at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:190)
>   at 
> org.apache.arrow.vector.VarCharVector.allocateNew(VarCharVector.java:364)
>   at 
> org.apache.arrow.vector.TestValueVector.testVariableVectorReallocation(TestValueVector.java:163)
> Results :
> Tests in error: 
>   TestValueVector.testFixedVectorReallocation »  Unexpected exception, 
> expected<...
>   TestValueVector.testVariableVectorReallocation »  Unexpected exception, 
> expect...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ARROW-64) Add zsh support to C++ build scripts

2016-03-19 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated ARROW-64:

Assignee: Uwe L. Korn

> Add zsh support to C++ build scripts
> 
>
> Key: ARROW-64
> URL: https://issues.apache.org/jira/browse/ARROW-64
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>
> All scripts that have to be sourced during development currently only support 
> bash. This patch adds zsh support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-62) Format: Are the nulls bits 0 or 1 for null values?

2016-03-13 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192525#comment-15192525
 ] 

Jacques Nadeau commented on ARROW-62:
-

I consider the bitmap to be a validity map as opposed to a null map. I've also 
seen a couple places where it is nice to zero out values that are null using 
the zero in the bitmap without a condition... although I can't remember where 
we took advantage of this previously.

> Format: Are the nulls bits 0 or 1 for null values?
> --
>
> Key: ARROW-62
> URL: https://issues.apache.org/jira/browse/ARROW-62
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> As brought up by Dan Robinson on the mailing list (thank you for catching 
> this!), there is an inconsistency in the format documents in the 
> representation of nulls with the ValueVectors code import -- since I drafted 
> these format documents initially I'll take the blame for the inconsistency, 
> but:
> * Drill / ValueVectors uses the value 0 for null data, and 1 for non-null data
> * The format document currently states the opposite (values are null if the 
> bit is set)
> I can see arguments both ways, but one argument for the ValueVectors style is 
> that values must be explicitly set to be non-null, versus uninitialized 
> values being accidentally interpreted as being non-null. When initializing a 
> bitmap, one can {{memset}} the bits to 0, then set then to 1 when non-null 
> values are appended during construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)