[jira] [Created] (ARROW-2789) [JS] Minor DataFrame improvements
Brian Hulette created ARROW-2789: Summary: [JS] Minor DataFrame improvements Key: ARROW-2789 URL: https://issues.apache.org/jira/browse/ARROW-2789 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Brian Hulette Assignee: Brian Hulette * deprecate count() in favor of a readonly length member (implemented with a getter in FilterdDataFrame) * Add an iterator to FilteredDataFrame -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2788) [Plasma] Defining Delete semantics
Philipp Moritz created ARROW-2788: - Summary: [Plasma] Defining Delete semantics Key: ARROW-2788 URL: https://issues.apache.org/jira/browse/ARROW-2788 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz We should define what the semantics of Plasma's Delete operation is, especially in the presence of errors (object in use is deleted, non-existing object is deleted). My current take on this is the following: Delete should be a hint to the store to delete, so if the object is not present, it should be a no-op. If an object that is in use is deleted, the store should delete it as soon as the reference count goes to zero (it would also be ok, but less desirable in my opinion, to not delete it). I think this is a good application of the "Defining errors away" from John Ousterhouts book (A Philosophy of Software Design). Please comment in this thread if you have different opinions so we can discuss! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2787) Memory Issue passing table from python to c++ via cython
Joseph Toth created ARROW-2787: -- Summary: Memory Issue passing table from python to c++ via cython Key: ARROW-2787 URL: https://issues.apache.org/jira/browse/ARROW-2787 Project: Apache Arrow Issue Type: Bug Components: Integration Affects Versions: 0.9.0 Environment: clang6 Reporter: Joseph Toth I wanted to create a simple example of reading a table in Python and pass it to C++, but I'm doing something wrong or there is a memory issue. When the table gets to C++ and I print out column names it also prints out a lot of junk and what looks like pydocs. Let me know if you need any more info. Thanks! *demo.py* ??import numpy?? ??from psy.automl import cyth?? ??import pandas as pd?? ??from absl import app?? ??def main(argv):?? ?? sup = pd.DataFrame({?? ?? 'int': [1, 2],?? ?? 'str': ['a', 'b']?? ?? })?? ?? table = pa.Table.from_pandas(sup)?? ?? cyth.c_t(table)?? *??cyth.pyx??* ??import pandas as pd?? ??import pyarrow as pa?? ??from pyarrow.lib cimport *?? ??cdef extern from "cyth.h" namespace "psy":?? ??void t(shared_ptr[CTable])?? ??def c_t(obj):?? # ??These print work?? ??# for i in range(obj.num_columns):?? ??# print(obj.column(i).name?? ?? cdef shared_ptr[CTable] tbl = pyarrow_unwrap_table(obj)?? ?? t(tbl)?? *cyth.h* ??#include ?? ??#include ?? ??#include "arrow/api.h"?? ??#include "arrow/python/api.h"?? ??#include "Python.h"?? ??namespace psy {?? ??void t(std::shared_ptr pytable) {?? ??// This works?? ?? std::cout << "NUM" << pytable->num_columns();?? ??// This prints a lot of garbage?? ?? for(int i = 0; i < pytable->num_columns(); i++) {?? ?? std::cout << pytable->column(i)->name();?? ?? }?? ??}?? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Uninitialized buffer memory leads to buffer warnings
hi Dimitri, Zeroing the padding is probably a good idea; I'd be interested to look at the diff to see how many code paths are impacted. We already have a number of places where we are zeroing buffers but as you have found it is not 100% consistent. - Wes On Tue, Jul 3, 2018 at 10:13 AM, Dimitri Vorona wrote: > Hi, > > I misunderstood the issue: the zeroing of the padding is correctly handled > by the writers, as long as they are careful not to read beyond the > specified size of the buffers (as opposed to its capacity). > > The valgrind warning came from the way AppendNull(s) is implemented in some > of the builders: it just sets the nullmap and increments lengths, but > doesn't set the data, which leads to uninitialized memory chunks, where > nulls where inserted. I'll send a PR which addresses it. > > I'm still not sure if we should add zero the padding anyway, since the > buffers can be shared in any number of ways. Zeroing the last couple of > bytes shouldn't have any performance impact and could save a headache in > the future. > > Cheers. > > On Tue, Jul 3, 2018 at 1:15 PM Antoine Pitrou wrote: > >> >> Hi, >> >> Writing uninitialized memory risks leaking private data. This might >> lead to security issues. >> >> I'd go for option 2. Option 3 sounds much more costly (we would be >> zero-initializing large memory areas instead of small padding areas). >> >> Regards >> >> Antoine. >> >> >> Le 03/07/2018 à 13:11, Dimitri Vorona a écrit : >> > Hi all, >> > >> > currently, running json-integration-test with valgrind leads to the >> > following warning: "Syscall param write(buf) points to uninitialised >> > byte(s)". This is caused by PrimitiveBufferBuilder not initializing its >> > data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. >> setting >> > all values to not-null by default. >> > >> > Since having tests pass valgrind might be desirable for the CI, I think >> we >> > should fix this warning. There are a couple of possibilities: >> > >> > 1. Add suppression. The specs doesn't require padding to have a specific >> > value, so we might consider it to be false positive >> > 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal >> > implementations. >> > 3. Generally zero-initialize memory in PoolBuffer. Might be too >> expensive. >> > >> > Or course there could be number of other options, includeing "do >> nothing". >> > If we settle on a best option, I can make a PR. >> > >> > Cheers, >> > Dimitri. >> > >>
Re: Uninitialized buffer memory leads to buffer warnings
Hi, I misunderstood the issue: the zeroing of the padding is correctly handled by the writers, as long as they are careful not to read beyond the specified size of the buffers (as opposed to its capacity). The valgrind warning came from the way AppendNull(s) is implemented in some of the builders: it just sets the nullmap and increments lengths, but doesn't set the data, which leads to uninitialized memory chunks, where nulls where inserted. I'll send a PR which addresses it. I'm still not sure if we should add zero the padding anyway, since the buffers can be shared in any number of ways. Zeroing the last couple of bytes shouldn't have any performance impact and could save a headache in the future. Cheers. On Tue, Jul 3, 2018 at 1:15 PM Antoine Pitrou wrote: > > Hi, > > Writing uninitialized memory risks leaking private data. This might > lead to security issues. > > I'd go for option 2. Option 3 sounds much more costly (we would be > zero-initializing large memory areas instead of small padding areas). > > Regards > > Antoine. > > > Le 03/07/2018 à 13:11, Dimitri Vorona a écrit : > > Hi all, > > > > currently, running json-integration-test with valgrind leads to the > > following warning: "Syscall param write(buf) points to uninitialised > > byte(s)". This is caused by PrimitiveBufferBuilder not initializing its > > data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. > setting > > all values to not-null by default. > > > > Since having tests pass valgrind might be desirable for the CI, I think > we > > should fix this warning. There are a couple of possibilities: > > > > 1. Add suppression. The specs doesn't require padding to have a specific > > value, so we might consider it to be false positive > > 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal > > implementations. > > 3. Generally zero-initialize memory in PoolBuffer. Might be too > expensive. > > > > Or course there could be number of other options, includeing "do > nothing". > > If we settle on a best option, I can make a PR. > > > > Cheers, > > Dimitri. > > >
Re: Uninitialized buffer memory leads to buffer warnings
Hi, Writing uninitialized memory risks leaking private data. This might lead to security issues. I'd go for option 2. Option 3 sounds much more costly (we would be zero-initializing large memory areas instead of small padding areas). Regards Antoine. Le 03/07/2018 à 13:11, Dimitri Vorona a écrit : > Hi all, > > currently, running json-integration-test with valgrind leads to the > following warning: "Syscall param write(buf) points to uninitialised > byte(s)". This is caused by PrimitiveBufferBuilder not initializing its > data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. setting > all values to not-null by default. > > Since having tests pass valgrind might be desirable for the CI, I think we > should fix this warning. There are a couple of possibilities: > > 1. Add suppression. The specs doesn't require padding to have a specific > value, so we might consider it to be false positive > 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal > implementations. > 3. Generally zero-initialize memory in PoolBuffer. Might be too expensive. > > Or course there could be number of other options, includeing "do nothing". > If we settle on a best option, I can make a PR. > > Cheers, > Dimitri. >
Uninitialized buffer memory leads to buffer warnings
Hi all, currently, running json-integration-test with valgrind leads to the following warning: "Syscall param write(buf) points to uninitialised byte(s)". This is caused by PrimitiveBufferBuilder not initializing its data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. setting all values to not-null by default. Since having tests pass valgrind might be desirable for the CI, I think we should fix this warning. There are a couple of possibilities: 1. Add suppression. The specs doesn't require padding to have a specific value, so we might consider it to be false positive 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal implementations. 3. Generally zero-initialize memory in PoolBuffer. Might be too expensive. Or course there could be number of other options, includeing "do nothing". If we settle on a best option, I can make a PR. Cheers, Dimitri.