Re: [VOTE] Release Apache Arrow 0.6.0 - RC1
+1 (binding) On Sun, Aug 13, 2017 at 11:05 AM, Arun K. Subramaniyan < sarunkar...@gmail.com> wrote: > +1 for the release. > > On Sat, Aug 12, 2017 at 4:33 PM, Wes McKinney wrote: > > > We've been updating it right after the release (and on the website) up > > until now. It's a manual update; it would be nice to automate this in > > a future release. > > > > On Sat, Aug 12, 2017 at 12:26 PM, Jacques Nadeau > > wrote: > > > It looks like the CHANGELOG.md hasn't been updated. On purpose? > > > > > > On Fri, Aug 11, 2017 at 2:51 PM, Wes McKinney > > wrote: > > > > > >> +1 (binding) > > >> > > >> - Ran Java, C++, Python unit tests on Linux, including Plasma tests > > >> (-DARROW_PLASMA=on, and --with-plasma in Python) > > >> - Ran integration tests (Java vs C++) > > >> - Ran C++ and Python tests (including Parquet) on Windows Visual > > >> Studio 2015 (using the new script in ARROW-1348) > > >> - Ran C GLib-Ruby unit tests > > >> > > >> On Fri, Aug 11, 2017 at 5:43 PM, Wes McKinney > > wrote: > > >> > Hello all, > > >> > > > >> > I'd like to propose the 2nd release candidate (rc1) of Apache > > >> > Arrow version 0.6.0. This is a major release consisting of 90 > > >> > resolved JIRAs [1]. > > >> > > > >> > The source release rc1 is hosted at [2]. > > >> > > > >> > This release candidate is based on commit > > >> > b17333482ea1da3728538bc912b1053ba70ed2e7 [3] > > >> > > > >> > Please note this also includes the Plasma Object Store, the IP > > >> > for which was recently cleared by vote on the Arrow dev mailing > > >> > list and related vote on the Apache Incubator general mailing > > >> > list. See [4] for more details. The LICENSE.txt has been updated > > >> > to include additional vendored dependency code that was imported > > >> > as part of this process. > > >> > > > >> > As compared with RC0, this contains 2 Java library dependency fixes > > >> > and includes the Plasma source tree, which had previously been > > >> > excluded. > > >> > > > >> > The vote will be open for at least 72 hours. > > >> > > > >> > [ ] +1 Release this as Apache Arrow 0.6.0 > > >> > [ ] +0 > > >> > [ ] -1 Do not release this as Apache Arrow 0.6.0 because... > > >> > > > >> > Thanks, > > >> > Wes > > >> > > > >> > How to validate a release signature: > > >> > https://httpd.apache.org/dev/verification.html > > >> > > > >> > [1]: > > >> > https://issues.apache.org/jira/issues/?jql=project%20% > > >> 3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)% > > >> 20AND%20fixVersion%20%3D%200.6.0 > > >> > [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0. > > >> 6.0-rc1/ > > >> > [3]: https://github.com/apache/arrow/tree/ > > b17333482ea1da3728538bc912b105 > > >> 3ba70ed2e7 > > >> > [4]: http://incubator.apache.org/ip-clearance/arrow-plasma- > > >> object-store.html > > >> > > >
[jira] [Created] (ARROW-1347) List null type should use consistent name for inner field
Steven Phillips created ARROW-1347: -- Summary: List null type should use consistent name for inner field Key: ARROW-1347 URL: https://issues.apache.org/jira/browse/ARROW-1347 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips The child field for List type has the field name "$data$" in most cases. In the case that there is not a known type for the List, currently the getField() method will return a subfield with name "DEFAULT". We should make this consistent with the rest of the cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1192) [JAVA] Improve splitAndTransfer performance for List and Union vectors
Steven Phillips created ARROW-1192: -- Summary: [JAVA] Improve splitAndTransfer performance for List and Union vectors Key: ARROW-1192 URL: https://issues.apache.org/jira/browse/ARROW-1192 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips Most vector implementations slice the underlying buffer for splitAndTransfer, but ListVector and UnionVector copy data into a new buffer. We should enhance these to use slice as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1191) [JAVA] Implement getField() method for the complex readers
Steven Phillips created ARROW-1191: -- Summary: [JAVA] Implement getField() method for the complex readers Key: ARROW-1191 URL: https://issues.apache.org/jira/browse/ARROW-1191 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips The getField() method is no implemented for UnionReader, NullableMapReaderImpl, SingleMapReaderImpl, and UnionListReader. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1112) [JAVA] Set lastSet for VarLength and List vectors when loading
Steven Phillips created ARROW-1112: -- Summary: [JAVA] Set lastSet for VarLength and List vectors when loading Key: ARROW-1112 URL: https://issues.apache.org/jira/browse/ARROW-1112 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Steven Phillips Assignee: Steven Phillips -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-1111) [JAVA] Make aligning buffers optional, and allow -1 for unknown null count
Steven Phillips created ARROW-: -- Summary: [JAVA] Make aligning buffers optional, and allow -1 for unknown null count Key: ARROW- URL: https://issues.apache.org/jira/browse/ARROW- Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Steven Phillips Assignee: Steven Phillips -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-1110) [JAVA] make union vector naming consistent
Steven Phillips created ARROW-1110: -- Summary: [JAVA] make union vector naming consistent Key: ARROW-1110 URL: https://issues.apache.org/jira/browse/ARROW-1110 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Steven Phillips Assignee: Steven Phillips -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-1109) [JAVA] transferOwnership fails when readerIndex is not 0
Steven Phillips created ARROW-1109: -- Summary: [JAVA] transferOwnership fails when readerIndex is not 0 Key: ARROW-1109 URL: https://issues.apache.org/jira/browse/ARROW-1109 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Steven Phillips Assignee: Steven Phillips -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-1108) Check if ArrowBuf is empty buffer in getActualConsumedMemory() and getPossibleConsumedMemory()
Steven Phillips created ARROW-1108: -- Summary: Check if ArrowBuf is empty buffer in getActualConsumedMemory() and getPossibleConsumedMemory() Key: ARROW-1108 URL: https://issues.apache.org/jira/browse/ARROW-1108 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-1107) [JAVA] NullableMapVector getField() should return nullable type
Steven Phillips created ARROW-1107: -- Summary: [JAVA] NullableMapVector getField() should return nullable type Key: ARROW-1107 URL: https://issues.apache.org/jira/browse/ARROW-1107 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [VOTE] Release Apache Arrow 0.4.1 - rc0
+1 On Wed, Jun 7, 2017 at 2:09 PM, Julien Le Dem wrote: > +1 > I validated the signature and build + ran tests for java and c++ > > > On Tue, Jun 6, 2017 at 7:27 PM, Wes McKinney wrote: > > > Hello all, > > > > I'd like to propose the 1st release candidate (rc0) of Apache > > Arrow version 0.4.1. This is a bug fix release consisting of 30 > > resolved JIRAs [1]. > > > > The source release rc0 is hosted at [2]. > > > > This release candidate is based on commit > > 46315431aeda3b6968b3ac4c1087f6d41052b99d > > > > The will be open for ~72 hours, ending 22:30 Eastern US Time on Friday > > June 9, 2017. > > > > [ ] +1 Release this as Apache Arrow 0.4.1 > > [ ] +0 > > [ ] -1 Do not release this as Apache Arrow 0.4.1 because... > > > > Thanks, > > Wes > > > > How to validate a release signature: > > https://httpd.apache.org/dev/verification.html > > > > [1]: > > https://issues.apache.org/jira/issues/?jql=project%20% > > 3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)% > > 20AND%20fixVersion%20%3D%200.4.1 > > [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0. > 4.1-rc0/ > > [3]: https://github.com/apache/arrow/tree/46315431aeda3b6968b3ac4c1087f6 > > d41052b99d > > > > > > -- > Julien >
[jira] [Created] (ARROW-895) Nullable variable length vector lastSet not set corretly
Steven Phillips created ARROW-895: - Summary: Nullable variable length vector lastSet not set corretly Key: ARROW-895 URL: https://issues.apache.org/jira/browse/ARROW-895 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Steven Phillips Assignee: Steven Phillips Fix For: 0.3.0 ARROW-875 fix was incomplete. I discovered some issues with the change after it was merged. The lastSet variable needs to be adjusted in the fillEmpties() method, and lastSet needs to be set when using the copyFrom methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-888) BitVector transfer() does not transfer ownership
Steven Phillips created ARROW-888: - Summary: BitVector transfer() does not transfer ownership Key: ARROW-888 URL: https://issues.apache.org/jira/browse/ARROW-888 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Steven Phillips Assignee: Steven Phillips When buffers are transferred one vector to another, the ownership of the buffers needs to be transferred to the target vectors allocator. This is done in all of the other vectors, but BitVector, which is not generated using the freemarker templates, does not have this code implemented. This causes memory accounting to be incorrect. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-875) Nullable variable length vector fillEmpties() fills an extra value
Steven Phillips created ARROW-875: - Summary: Nullable variable length vector fillEmpties() fills an extra value Key: ARROW-875 URL: https://issues.apache.org/jira/browse/ARROW-875 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Steven Phillips Assignee: Steven Phillips The fillEmpties() method is needed to update the offset vector in between non-null values. But it fills the current value which we are trying to set as well, when this is unnecessary. In fact, in the case of setValueCount(), this can result in an unnecessary reAlloc() call. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-792) Allow loading/unloading vectors without using FieldNodes
Steven Phillips created ARROW-792: - Summary: Allow loading/unloading vectors without using FieldNodes Key: ARROW-792 URL: https://issues.apache.org/jira/browse/ARROW-792 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips The information stored in FieldNode structure is not strictly necessary for serializing/deserializing vectors. We should allow loading/unloading of vectors without it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-791) Check if ArrowBuf is empty buffer in getActualConsumedMemory() and getPossibleConsumedMemory()
Steven Phillips created ARROW-791: - Summary: Check if ArrowBuf is empty buffer in getActualConsumedMemory() and getPossibleConsumedMemory() Key: ARROW-791 URL: https://issues.apache.org/jira/browse/ARROW-791 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips Most of the methods related to memory accounting in ArrowBuf have special handling for the case when then Buffer is the empty buffer instance. This check is missing in these two methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-790) Fix getField() for NullableMapVector
Steven Phillips created ARROW-790: - Summary: Fix getField() for NullableMapVector Key: ARROW-790 URL: https://issues.apache.org/jira/browse/ARROW-790 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips Needs to call super.getField() and return a nullable version of that field. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-789) Fix issue where setValueCount(0) doesn’t work in the case that we’ve shipped vectors across the wire
Steven Phillips created ARROW-789: - Summary: Fix issue where setValueCount(0) doesn’t work in the case that we’ve shipped vectors across the wire Key: ARROW-789 URL: https://issues.apache.org/jira/browse/ARROW-789 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips We should be able to call setValueCount() on vectors that have been loaded from external buffers -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-617) Time type is not specified clearly
[ https://issues.apache.org/jira/browse/ARROW-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926482#comment-15926482 ] Steven Phillips commented on ARROW-617: --- I personally strongly favor simplicity, so if it were up to me, I would choose to go with a single physical type, even though that means we would have to resolve compatibility issues with existing code. I would actually go a step further and ask if we really need all 4 (nano/micro/milli/seconds). Couldn't we just store nanoseconds in all cases? Are we concerned about the cost of conversion? > Time type is not specified clearly > -- > > Key: ARROW-617 > URL: https://issues.apache.org/jira/browse/ARROW-617 > Project: Apache Arrow > Issue Type: Bug > Components: Format >Reporter: Julien Le Dem > > 2 options: > - Use 64 bits for microseconds and nanoseconds, 32 bits for other units > - Use 64 bits for everything > The latter is simpler to implement, the former saves space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-347) Add method to pass CallBack when creating a transfer pair
Steven Phillips created ARROW-347: - Summary: Add method to pass CallBack when creating a transfer pair Key: ARROW-347 URL: https://issues.apache.org/jira/browse/ARROW-347 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Steven Phillips Assignee: Steven Phillips When calling the getTransferPair method of a NullableMapVector, we pass the current vectors callback to the newly created vector. This is wrong, as the new vector needs to have its own callback. Whoever is using the target vector should have a handle on the callBack to deal with schema changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-326) ComplexWriter should initialize nested writers when container vector is already populated
Steven Phillips created ARROW-326: - Summary: ComplexWriter should initialize nested writers when container vector is already populated Key: ARROW-326 URL: https://issues.apache.org/jira/browse/ARROW-326 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips It's possible and sometimes useful to use reuse a nested vector that was populated in a previous ComplexWriter. The new ComplexWriter should be aware of the fields that are present in the vector. As it is right now, if a particular column were determined to be a specific type (or a union type), but the new writer finds a new type, the original type may be thrown out. What should happen is that the type should be promoted to union (or have a new subtype added to the union field). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-257) Add a typeids Vector to Union type
[ https://issues.apache.org/jira/browse/ARROW-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514043#comment-15514043 ] Steven Phillips commented on ARROW-257: --- I don't understand that purpose or benefit of this change. Could you give a concrete example of where this would be useful? > Add a typeids Vector to Union type > -- > > Key: ARROW-257 > URL: https://issues.apache.org/jira/browse/ARROW-257 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Julien Le Dem >Assignee: Julien Le Dem > > {noformat} > enum UnionMode:int { Sparse, Dense } > table Union { > mode: UnionMode; > typeIds: [Int32]; // optional, describes typeid of each child. > } > {noformat} > The idea is to enable providing an id different from the child offset (the > default) > This enables an optimization where we use predefined ids when constructing > the type vector of the union but want the children to be only the actually > used types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-277) Flatbuf serialization fails for Timestamp type
Steven Phillips created ARROW-277: - Summary: Flatbuf serialization fails for Timestamp type Key: ARROW-277 URL: https://issues.apache.org/jira/browse/ARROW-277 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Caused By (java.lang.AssertionError) FlatBuffers: object serialization must not be nested. com.google.flatbuffers.FlatBufferBuilder.notNested():293 com.google.flatbuffers.FlatBufferBuilder.startVector():239 com.google.flatbuffers.FlatBufferBuilder.createString():266 org.apache.arrow.vector.types.pojo.ArrowType$Timestamp.getType():463 org.apache.arrow.vector.types.pojo.Field.getField():63 org.apache.arrow.vector.types.pojo.Schema.getSchema():41 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-269) UnionVector getBuffers method does not include typevector
Steven Phillips created ARROW-269: - Summary: UnionVector getBuffers method does not include typevector Key: ARROW-269 URL: https://issues.apache.org/jira/browse/ARROW-269 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Only the interMapVecgtor's buffers are returned currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-265) Negative decimal values have wrong padding
Steven Phillips created ARROW-265: - Summary: Negative decimal values have wrong padding Key: ARROW-265 URL: https://issues.apache.org/jira/browse/ARROW-265 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips Pad negative values with 1 and not 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-259) Use flatbuffer fields in java implementation
Steven Phillips created ARROW-259: - Summary: Use flatbuffer fields in java implementation Key: ARROW-259 URL: https://issues.apache.org/jira/browse/ARROW-259 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips The value vectors in the java implementation should switch to using the Field and types as defined in the flatbuffer spec. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-241) Implement splitAndTransfer for UnionVector
Steven Phillips created ARROW-241: - Summary: Implement splitAndTransfer for UnionVector Key: ARROW-241 URL: https://issues.apache.org/jira/browse/ARROW-241 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips This method was never implemented, and currently is a no op. We should at least do the naive "copy" version of the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [JAVA] Figuring out whats shifted from Drill/Java
I am currently working on a patch that addresses this, as well as removing some of the residual code from Drill that isn't really needed in Arrow, (such as the Drill types, MaterializedField, etc.) I will be posting this within a few days. On Tue, Jun 7, 2016 at 5:54 PM, Leif Walsh wrote: > I am also interested in this. > On Tue, Jun 7, 2016 at 17:37 Holden Karau wrote: > > > Hi Everyone, > > > > I'm looking to help get started with Arrow & Spark and to that end I'd > like > > to start with getting the Java implementation closer to the spec / C > > implementation. I'm wondering what places people know the differences are > > between the two? > > > > Cheers, > > > > Holden :) > > > -- > -- > Cheers, > Leif >
[jira] [Created] (ARROW-51) Move ValueVector test from Drill project
Steven Phillips created ARROW-51: Summary: Move ValueVector test from Drill project Key: ARROW-51 URL: https://issues.apache.org/jira/browse/ARROW-51 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips There are some simple tests that should be moved from the Drill project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-46) Port DRILL-4410 to Arrow
Steven Phillips created ARROW-46: Summary: Port DRILL-4410 to Arrow Key: ARROW-46 URL: https://issues.apache.org/jira/browse/ARROW-46 Project: Apache Arrow Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips This fixes a bug in ListVector which causes OversizeAllocation -- This message was sent by Atlassian JIRA (v6.3.4#6332)