[jira] [Created] (ARROW-2789) [JS] Minor DataFrame improvements

2018-07-03 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2789:


 Summary: [JS] Minor DataFrame improvements
 Key: ARROW-2789
 URL: https://issues.apache.org/jira/browse/ARROW-2789
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Brian Hulette


* deprecate count() in favor of a readonly length member (implemented with a 
getter in FilterdDataFrame)
* Add an iterator to FilteredDataFrame



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2788) [Plasma] Defining Delete semantics

2018-07-03 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2788:
-

 Summary: [Plasma] Defining Delete semantics
 Key: ARROW-2788
 URL: https://issues.apache.org/jira/browse/ARROW-2788
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


We should define what the semantics of Plasma's Delete operation is, especially 
in the presence of errors (object in use is deleted, non-existing object is 
deleted).

My current take on this is the following:

Delete should be a hint to the store to delete, so if the object is not 
present, it should be a no-op. If an object that is in use is deleted, the 
store should delete it as soon as the reference count goes to zero (it would 
also be ok, but less desirable in my opinion, to not delete it).

I think this is a good application of the "Defining errors away" from John 
Ousterhouts book (A Philosophy of Software Design).

Please comment in this thread if you have different opinions so we can discuss!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2787) Memory Issue passing table from python to c++ via cython

2018-07-03 Thread Joseph Toth (JIRA)
Joseph Toth created ARROW-2787:
--

 Summary: Memory Issue passing table from python to c++ via cython
 Key: ARROW-2787
 URL: https://issues.apache.org/jira/browse/ARROW-2787
 Project: Apache Arrow
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.9.0
 Environment: clang6
Reporter: Joseph Toth


I wanted to create a simple example of reading a table in Python and pass it to 
C++, but I'm doing something wrong or there is a memory issue. When the table 
gets to C++ and I print out column names it also prints out a lot of junk and 
what looks like pydocs. Let me know if you need any more info. Thanks!

 

*demo.py*

??import numpy??
??from psy.automl import cyth??
??import pandas as pd??
??from absl import app??

??def main(argv):??
?? sup = pd.DataFrame({??
?? 'int': [1, 2],??
?? 'str': ['a', 'b']??
?? })??
?? table = pa.Table.from_pandas(sup)??
?? cyth.c_t(table)??

 

*??cyth.pyx??*

??import pandas as pd??
??import pyarrow as pa??
??from pyarrow.lib cimport *??

??cdef extern from "cyth.h" namespace "psy":??
??void t(shared_ptr[CTable])??

??def c_t(obj):??
 # ??These print work??
??# for i in range(obj.num_columns):??
??# print(obj.column(i).name??
?? cdef shared_ptr[CTable] tbl = pyarrow_unwrap_table(obj)??
?? t(tbl)??
*cyth.h*

??#include ??
??#include ??

??#include "arrow/api.h"??
??#include "arrow/python/api.h"??
??#include "Python.h"??

??namespace psy {??

??void t(std::shared_ptr pytable) {??

??// This works??
?? std::cout << "NUM" << pytable->num_columns();??

??// This prints a lot of garbage??
?? for(int i = 0; i < pytable->num_columns(); i++) {??
?? std::cout << pytable->column(i)->name();??
?? }??
??}??

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Uninitialized buffer memory leads to buffer warnings

2018-07-03 Thread Wes McKinney
hi Dimitri,

Zeroing the padding is probably a good idea; I'd be interested to look
at the diff to see how many code paths are impacted. We already have a
number of places where we are zeroing buffers but as you have found it
is not 100% consistent.

- Wes

On Tue, Jul 3, 2018 at 10:13 AM, Dimitri Vorona
 wrote:
> Hi,
>
> I misunderstood the issue: the zeroing of the padding is correctly handled
> by the writers, as long as they are careful not to read beyond the
> specified size of the buffers (as opposed to its capacity).
>
> The valgrind warning came from the way AppendNull(s) is implemented in some
> of the builders: it just sets the nullmap and increments lengths, but
> doesn't set the data, which leads to uninitialized memory chunks, where
> nulls where inserted. I'll send a PR which addresses it.
>
> I'm still not sure if we should add zero the padding anyway, since the
> buffers can be shared in any number of ways. Zeroing the last couple of
> bytes shouldn't have any performance impact and could save a headache in
> the future.
>
> Cheers.
>
> On Tue, Jul 3, 2018 at 1:15 PM Antoine Pitrou  wrote:
>
>>
>> Hi,
>>
>> Writing uninitialized memory risks leaking private data.  This might
>> lead to security issues.
>>
>> I'd go for option 2.  Option 3 sounds much more costly (we would be
>> zero-initializing large memory areas instead of small padding areas).
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 03/07/2018 à 13:11, Dimitri Vorona a écrit :
>> > Hi all,
>> >
>> > currently, running json-integration-test with valgrind leads to the
>> > following warning: "Syscall param write(buf) points to uninitialised
>> > byte(s)". This is caused by PrimitiveBufferBuilder not initializing its
>> > data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e.
>> setting
>> > all values to not-null by default.
>> >
>> > Since having tests pass valgrind might be desirable for the CI, I think
>> we
>> > should fix this warning. There are a couple of possibilities:
>> >
>> > 1. Add suppression. The specs doesn't require padding to have a specific
>> > value, so we might consider it to be false positive
>> > 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal
>> > implementations.
>> > 3. Generally zero-initialize memory in PoolBuffer. Might be too
>> expensive.
>> >
>> > Or course there could be number of other options, includeing "do
>> nothing".
>> > If we settle on a best option, I can make a PR.
>> >
>> > Cheers,
>> > Dimitri.
>> >
>>


Re: Uninitialized buffer memory leads to buffer warnings

2018-07-03 Thread Dimitri Vorona
Hi,

I misunderstood the issue: the zeroing of the padding is correctly handled
by the writers, as long as they are careful not to read beyond the
specified size of the buffers (as opposed to its capacity).

The valgrind warning came from the way AppendNull(s) is implemented in some
of the builders: it just sets the nullmap and increments lengths, but
doesn't set the data, which leads to uninitialized memory chunks, where
nulls where inserted. I'll send a PR which addresses it.

I'm still not sure if we should add zero the padding anyway, since the
buffers can be shared in any number of ways. Zeroing the last couple of
bytes shouldn't have any performance impact and could save a headache in
the future.

Cheers.

On Tue, Jul 3, 2018 at 1:15 PM Antoine Pitrou  wrote:

>
> Hi,
>
> Writing uninitialized memory risks leaking private data.  This might
> lead to security issues.
>
> I'd go for option 2.  Option 3 sounds much more costly (we would be
> zero-initializing large memory areas instead of small padding areas).
>
> Regards
>
> Antoine.
>
>
> Le 03/07/2018 à 13:11, Dimitri Vorona a écrit :
> > Hi all,
> >
> > currently, running json-integration-test with valgrind leads to the
> > following warning: "Syscall param write(buf) points to uninitialised
> > byte(s)". This is caused by PrimitiveBufferBuilder not initializing its
> > data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e.
> setting
> > all values to not-null by default.
> >
> > Since having tests pass valgrind might be desirable for the CI, I think
> we
> > should fix this warning. There are a couple of possibilities:
> >
> > 1. Add suppression. The specs doesn't require padding to have a specific
> > value, so we might consider it to be false positive
> > 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal
> > implementations.
> > 3. Generally zero-initialize memory in PoolBuffer. Might be too
> expensive.
> >
> > Or course there could be number of other options, includeing "do
> nothing".
> > If we settle on a best option, I can make a PR.
> >
> > Cheers,
> > Dimitri.
> >
>


Re: Uninitialized buffer memory leads to buffer warnings

2018-07-03 Thread Antoine Pitrou


Hi,

Writing uninitialized memory risks leaking private data.  This might
lead to security issues.

I'd go for option 2.  Option 3 sounds much more costly (we would be
zero-initializing large memory areas instead of small padding areas).

Regards

Antoine.


Le 03/07/2018 à 13:11, Dimitri Vorona a écrit :
> Hi all,
> 
> currently, running json-integration-test with valgrind leads to the
> following warning: "Syscall param write(buf) points to uninitialised
> byte(s)". This is caused by PrimitiveBufferBuilder not initializing its
> data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. setting
> all values to not-null by default.
> 
> Since having tests pass valgrind might be desirable for the CI, I think we
> should fix this warning. There are a couple of possibilities:
> 
> 1. Add suppression. The specs doesn't require padding to have a specific
> value, so we might consider it to be false positive
> 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal
> implementations.
> 3. Generally zero-initialize memory in PoolBuffer. Might be too expensive.
> 
> Or course there could be number of other options, includeing "do nothing".
> If we settle on a best option, I can make a PR.
> 
> Cheers,
> Dimitri.
> 


Uninitialized buffer memory leads to buffer warnings

2018-07-03 Thread Dimitri Vorona
Hi all,

currently, running json-integration-test with valgrind leads to the
following warning: "Syscall param write(buf) points to uninitialised
byte(s)". This is caused by PrimitiveBufferBuilder not initializing its
data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. setting
all values to not-null by default.

Since having tests pass valgrind might be desirable for the CI, I think we
should fix this warning. There are a couple of possibilities:

1. Add suppression. The specs doesn't require padding to have a specific
value, so we might consider it to be false positive
2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal
implementations.
3. Generally zero-initialize memory in PoolBuffer. Might be too expensive.

Or course there could be number of other options, includeing "do nothing".
If we settle on a best option, I can make a PR.

Cheers,
Dimitri.