Re: arrow for game engine / graphics workloads?

2020-12-14 Thread Robert Bigelow
Yes, I guess I’d end up taking a similar path to Arrow in this regard. I think 
I have some homework to do, to see whether I can use the Arrow format to model 
some things like meshes, scene graph layout, etc. If that is a good fit, it 
makes sense to use Arrow. Even if it isn’t a perfect fit, I like the idea of 
having the data in a more malleable or neutral form. Thank you for pointing 
that out.

> On Dec 14, 2020, at 11:46 AM, Wes McKinney  wrote:
> 
> Arrow only uses Flatbuffers to serialize metadata, *not* data.
> 
> On Mon, Dec 14, 2020 at 1:39 PM Robert Bigelow
>  wrote:
>> 
>> This is an excellent point. I could use Flatbuffers directly to define any 
>> custom format needed by the engine. The engine itself would need to use the 
>> same principles the Arrow devs have, which I guess is true of any 
>> data-intensive system. Thanks for your response!
>> 
>>> On Dec 14, 2020, at 11:24 AM, Lee, David  
>>> wrote:
>>> 
>>> Arrow uses flatbuffers under the hood.
>>> 
>>> https://google.github.io/flatbuffers/
>>> 
>>> FlatBuffers is an efficient cross platform serialization library for C++, 
>>> C#, C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, 
>>> Rust and Swift. It was originally created at Google for game development 
>>> and other performance-critical applications.
>>> 
>>> 
>>> -Original Message-
>>> From: Robert Bigelow 
>>> Sent: Sunday, December 13, 2020 1:00 PM
>>> To: dev@arrow.apache.org
>>> Subject: arrow for game engine / graphics workloads?
>>> 
>>> External Email: Use caution with links and attachments
>>> 
>>> 
>>> Dear Arrow devs,
>>> 
>>> I’m writing a game engine in Swift, and the next system to design is the 
>>> resource manager / asset database. Arrow seems like an attractive option 
>>> for the heart of an engine, since many of the performance goals stated for 
>>> analytic workloads are shared by real-time rendering. Data layout is 
>>> extremely important, and I’d like to be able to feed the renderer without 
>>> chasing pointers. So far my plan is to create a custom format for the asset 
>>> database that will be used at runtime. I plan on having a tool that 
>>> traverses a scene graph write this format, then have the resource manager 
>>> use mmap to load assets. Arrow seems like a good fit for such a format, as 
>>> (if I understand correctly) only the Schema needs to be deserialized before 
>>> the data would be available, and it could be used to back streaming APIs.
>>> 
>>> Do you know of any work being done with Arrow in the real-time rendering or 
>>> game engine space? Would the API presented by Arrow be a good fit, assuming 
>>> I’d mostly need to expose buffers of typed data to feed the renderer?
>>> 
>>> One final question assuming none of your answers have dissuaded me. Would 
>>> the C/glib Arrow library be reasonable, since Swift can import C headers? 
>>> It seems that no intrepid Swift developers have started a native Swift 
>>> implementation yet.
>>> 
>>> Thanks for your time, and kudos for your work on the Arrow project. It is 
>>> very impressive!
>>> 
>>> Cheers,
>>> Rob Bigelow
>>> 
>>> 
>>> This message may contain information that is confidential or privileged. If 
>>> you are not the intended recipient, please advise the sender immediately 
>>> and delete this message. See 
>>> http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
>>> information.  Please refer to 
>>> http://www.blackrock.com/corporate/compliance/privacy-policy for more 
>>> information about BlackRock’s Privacy Policy.
>>> 
>>> 
>>> For a list of BlackRock's office addresses worldwide, see 
>>> http://www.blackrock.com/corporate/about-us/contacts-locations.
>>> 
>>> © 2020 BlackRock, Inc. All rights reserved.
>> 



Re: arrow for game engine / graphics workloads?

2020-12-14 Thread Wes McKinney
Arrow only uses Flatbuffers to serialize metadata, *not* data.

On Mon, Dec 14, 2020 at 1:39 PM Robert Bigelow
 wrote:
>
> This is an excellent point. I could use Flatbuffers directly to define any 
> custom format needed by the engine. The engine itself would need to use the 
> same principles the Arrow devs have, which I guess is true of any 
> data-intensive system. Thanks for your response!
>
> > On Dec 14, 2020, at 11:24 AM, Lee, David  
> > wrote:
> >
> > Arrow uses flatbuffers under the hood.
> >
> > https://google.github.io/flatbuffers/
> >
> > FlatBuffers is an efficient cross platform serialization library for C++, 
> > C#, C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, 
> > Rust and Swift. It was originally created at Google for game development 
> > and other performance-critical applications.
> >
> >
> > -Original Message-
> > From: Robert Bigelow 
> > Sent: Sunday, December 13, 2020 1:00 PM
> > To: dev@arrow.apache.org
> > Subject: arrow for game engine / graphics workloads?
> >
> > External Email: Use caution with links and attachments
> >
> >
> > Dear Arrow devs,
> >
> > I’m writing a game engine in Swift, and the next system to design is the 
> > resource manager / asset database. Arrow seems like an attractive option 
> > for the heart of an engine, since many of the performance goals stated for 
> > analytic workloads are shared by real-time rendering. Data layout is 
> > extremely important, and I’d like to be able to feed the renderer without 
> > chasing pointers. So far my plan is to create a custom format for the asset 
> > database that will be used at runtime. I plan on having a tool that 
> > traverses a scene graph write this format, then have the resource manager 
> > use mmap to load assets. Arrow seems like a good fit for such a format, as 
> > (if I understand correctly) only the Schema needs to be deserialized before 
> > the data would be available, and it could be used to back streaming APIs.
> >
> > Do you know of any work being done with Arrow in the real-time rendering or 
> > game engine space? Would the API presented by Arrow be a good fit, assuming 
> > I’d mostly need to expose buffers of typed data to feed the renderer?
> >
> > One final question assuming none of your answers have dissuaded me. Would 
> > the C/glib Arrow library be reasonable, since Swift can import C headers? 
> > It seems that no intrepid Swift developers have started a native Swift 
> > implementation yet.
> >
> > Thanks for your time, and kudos for your work on the Arrow project. It is 
> > very impressive!
> >
> > Cheers,
> > Rob Bigelow
> >
> >
> > This message may contain information that is confidential or privileged. If 
> > you are not the intended recipient, please advise the sender immediately 
> > and delete this message. See 
> > http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
> > information.  Please refer to 
> > http://www.blackrock.com/corporate/compliance/privacy-policy for more 
> > information about BlackRock’s Privacy Policy.
> >
> >
> > For a list of BlackRock's office addresses worldwide, see 
> > http://www.blackrock.com/corporate/about-us/contacts-locations.
> >
> > © 2020 BlackRock, Inc. All rights reserved.
>


Re: arrow for game engine / graphics workloads?

2020-12-14 Thread Robert Bigelow
This is an excellent point. I could use Flatbuffers directly to define any 
custom format needed by the engine. The engine itself would need to use the 
same principles the Arrow devs have, which I guess is true of any 
data-intensive system. Thanks for your response!

> On Dec 14, 2020, at 11:24 AM, Lee, David  
> wrote:
> 
> Arrow uses flatbuffers under the hood.
> 
> https://google.github.io/flatbuffers/
> 
> FlatBuffers is an efficient cross platform serialization library for C++, C#, 
> C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, Rust 
> and Swift. It was originally created at Google for game development and other 
> performance-critical applications.
> 
> 
> -Original Message-
> From: Robert Bigelow  
> Sent: Sunday, December 13, 2020 1:00 PM
> To: dev@arrow.apache.org
> Subject: arrow for game engine / graphics workloads?
> 
> External Email: Use caution with links and attachments
> 
> 
> Dear Arrow devs,
> 
> I’m writing a game engine in Swift, and the next system to design is the 
> resource manager / asset database. Arrow seems like an attractive option for 
> the heart of an engine, since many of the performance goals stated for 
> analytic workloads are shared by real-time rendering. Data layout is 
> extremely important, and I’d like to be able to feed the renderer without 
> chasing pointers. So far my plan is to create a custom format for the asset 
> database that will be used at runtime. I plan on having a tool that traverses 
> a scene graph write this format, then have the resource manager use mmap to 
> load assets. Arrow seems like a good fit for such a format, as (if I 
> understand correctly) only the Schema needs to be deserialized before the 
> data would be available, and it could be used to back streaming APIs.
> 
> Do you know of any work being done with Arrow in the real-time rendering or 
> game engine space? Would the API presented by Arrow be a good fit, assuming 
> I’d mostly need to expose buffers of typed data to feed the renderer?
> 
> One final question assuming none of your answers have dissuaded me. Would the 
> C/glib Arrow library be reasonable, since Swift can import C headers? It 
> seems that no intrepid Swift developers have started a native Swift 
> implementation yet.
> 
> Thanks for your time, and kudos for your work on the Arrow project. It is 
> very impressive!
> 
> Cheers,
> Rob Bigelow
> 
> 
> This message may contain information that is confidential or privileged. If 
> you are not the intended recipient, please advise the sender immediately and 
> delete this message. See 
> http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
> information.  Please refer to 
> http://www.blackrock.com/corporate/compliance/privacy-policy for more 
> information about BlackRock’s Privacy Policy.
> 
> 
> For a list of BlackRock's office addresses worldwide, see 
> http://www.blackrock.com/corporate/about-us/contacts-locations.
> 
> © 2020 BlackRock, Inc. All rights reserved.



Arrow 3.0 release dashboard

2020-12-14 Thread Neal Richardson
Hi all,
I've created
https://cwiki.apache.org/confluence/display/ARROW/Arrow+3.0.0+Release,
cloned from our previous release dashboards, for our upcoming release. A
few things I'd like to draw your attention to:

* Blockers: there are 5 currently. I'm not certain that they all are
blockers, nor am I sure that there aren't other blockers. I know we've had
some issues lately with python wheel builds, and I've added a couple of
known issues, but there may be others. If you know of other release
blockers, please note them.
* There are over 300 unstarted issues tagged for 3.0.0. Given holiday
schedules and the release target in the first week of January, that's at
least 200 too many. Please review your backlogs and start bumping things
out of scope. I've added a 4.0.0 version that you can move things to, if
you want.
* Release process improvements: there was some discussion a few weeks ago
about what could be done to make it easier to cut releases (
https://lists.apache.org/thread.html/r2257603c91336c042633f2bc480f34f58c02d9c67ff1ee1a212933b5%40%3Cdev.arrow.apache.org%3E).
A few tasks were mentioned in that thread. If anyone is going to take
action, now would be a good time.

Thanks,
Neal


Re: Python: Bad address when rewriting file

2020-12-14 Thread Antoine Pitrou


Hi Rares,

Ok, so here is the explanation.  `pa.ipc.open_stream` will open the
given file memory-mapped, so the buffers read from the file are
zero-copy. But now you're rewriting the file from scratch... so the
buffers become invalid memory (they're zero-copy).  Hence the "Bad
address" error you're getting (the underlying errno mnemonic for error
code 14 is EFAULT).

If you need to rewrite the *same* file, you should disable memory
mapping.  For example, you can use
`pyarrow.ipc.open_stream(pyarrow.OSFile(fn))`, which will create a
regular file object.

Or you can arrange to not rewrite the same file.  For example you could
write to a temporary file, close it, and then move it to the original
location.

Regards

Antoine.


Le 14/12/2020 à 20:03, Rares Vernica a écrit :
> Hi Antoine,
> 
> Here is a repro for this issue:
> 
> import pyarrow
> 
> fn = '/tmp/foo'
> 
> # Data
> data = [
> pyarrow.array(range(1000)),
> pyarrow.array(range(1000))
> ]
> batch = pyarrow.record_batch(data, names=['f0', 'f1'])
> 
> # File Prep
> writer = pyarrow.ipc.RecordBatchStreamWriter(fn, batch.schema)
> writer.write_batch(batch)
> writer.close()
> 
> # Read
> reader = pyarrow.open_stream(fn)
> tbl = reader.read_all()
> 
> # Rewrite
> writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema)
> batches = tbl.to_batches(max_chunksize=200)
> writer.write_table(pyarrow.Table.from_batches(batches))
> writer.close()
> 
> 
>> python3 foo.py
> Traceback (most recent call last):
>   File "foo.py", line 24, in 
> writer.write_table(pyarrow.Table.from_batches(batches))
>   File "pyarrow/ipc.pxi", line 237, in
> pyarrow.lib._CRecordBatchWriter.write_table
>   File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
> OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad
> address
> 
> Cheers,
> Rares
> 
> 
> On Mon, Dec 14, 2020 at 12:30 AM Antoine Pitrou  wrote:
> 
>>
>> Hello Rares,
>>
>> Is there a complete reproducer that we may try out?
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 14/12/2020 à 06:52, Rares Vernica a écrit :
>>> Hello,
>>>
>>> As part of a test, I'm reading a record batch from an Arrow file,
>>> re-batching the data in smaller batches, and writing back the result to
>> the
>>> same file. I'm getting an unexpected Bad address error and I wonder what
>> am
>>> I doing wrong?
>>>
>>> reader = pyarrow.open_stream(fn)
>>> tbl = reader.read_all()
>>>
>>> writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema)
>>> batches = tbl.to_batches(max_chunksize=200)
>>> writer.write_table(pyarrow.Table.from_batches(batches))
>>> writer.close()
>>>
>>> Traceback (most recent call last):
>>>   File "tests/foo.py", line 10, in 
>>> writer.write_table(pyarrow.Table.from_batches(batches))
>>>   File "pyarrow/ipc.pxi", line 237, in
>>> pyarrow.lib._CRecordBatchWriter.write_table
>>>   File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
>>> OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad
>>> address
>>>
>>> Do I need to "close" the reader or open the writer differently?
>>>
>>> I'm using PyArrow 0.16.0 and Python 3.8.2.
>>>
>>> Thank you!
>>> Rares
>>>
>>
> 


RE: arrow for game engine / graphics workloads?

2020-12-14 Thread Lee, David
Arrow uses flatbuffers under the hood.

https://google.github.io/flatbuffers/

FlatBuffers is an efficient cross platform serialization library for C++, C#, 
C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, Rust 
and Swift. It was originally created at Google for game development and other 
performance-critical applications.


-Original Message-
From: Robert Bigelow  
Sent: Sunday, December 13, 2020 1:00 PM
To: dev@arrow.apache.org
Subject: arrow for game engine / graphics workloads?

External Email: Use caution with links and attachments


Dear Arrow devs,

I’m writing a game engine in Swift, and the next system to design is the 
resource manager / asset database. Arrow seems like an attractive option for 
the heart of an engine, since many of the performance goals stated for analytic 
workloads are shared by real-time rendering. Data layout is extremely 
important, and I’d like to be able to feed the renderer without chasing 
pointers. So far my plan is to create a custom format for the asset database 
that will be used at runtime. I plan on having a tool that traverses a scene 
graph write this format, then have the resource manager use mmap to load 
assets. Arrow seems like a good fit for such a format, as (if I understand 
correctly) only the Schema needs to be deserialized before the data would be 
available, and it could be used to back streaming APIs.

Do you know of any work being done with Arrow in the real-time rendering or 
game engine space? Would the API presented by Arrow be a good fit, assuming I’d 
mostly need to expose buffers of typed data to feed the renderer?

One final question assuming none of your answers have dissuaded me. Would the 
C/glib Arrow library be reasonable, since Swift can import C headers? It seems 
that no intrepid Swift developers have started a native Swift implementation 
yet.

Thanks for your time, and kudos for your work on the Arrow project. It is very 
impressive!

Cheers,
Rob Bigelow


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
information.  Please refer to 
http://www.blackrock.com/corporate/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.


For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2020 BlackRock, Inc. All rights reserved.


Re: Python: Bad address when rewriting file

2020-12-14 Thread Rares Vernica
Hi Antoine,

Here is a repro for this issue:

import pyarrow

fn = '/tmp/foo'

# Data
data = [
pyarrow.array(range(1000)),
pyarrow.array(range(1000))
]
batch = pyarrow.record_batch(data, names=['f0', 'f1'])

# File Prep
writer = pyarrow.ipc.RecordBatchStreamWriter(fn, batch.schema)
writer.write_batch(batch)
writer.close()

# Read
reader = pyarrow.open_stream(fn)
tbl = reader.read_all()

# Rewrite
writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema)
batches = tbl.to_batches(max_chunksize=200)
writer.write_table(pyarrow.Table.from_batches(batches))
writer.close()


> python3 foo.py
Traceback (most recent call last):
  File "foo.py", line 24, in 
writer.write_table(pyarrow.Table.from_batches(batches))
  File "pyarrow/ipc.pxi", line 237, in
pyarrow.lib._CRecordBatchWriter.write_table
  File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad
address

Cheers,
Rares


On Mon, Dec 14, 2020 at 12:30 AM Antoine Pitrou  wrote:

>
> Hello Rares,
>
> Is there a complete reproducer that we may try out?
>
> Regards
>
> Antoine.
>
>
> Le 14/12/2020 à 06:52, Rares Vernica a écrit :
> > Hello,
> >
> > As part of a test, I'm reading a record batch from an Arrow file,
> > re-batching the data in smaller batches, and writing back the result to
> the
> > same file. I'm getting an unexpected Bad address error and I wonder what
> am
> > I doing wrong?
> >
> > reader = pyarrow.open_stream(fn)
> > tbl = reader.read_all()
> >
> > writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema)
> > batches = tbl.to_batches(max_chunksize=200)
> > writer.write_table(pyarrow.Table.from_batches(batches))
> > writer.close()
> >
> > Traceback (most recent call last):
> >   File "tests/foo.py", line 10, in 
> > writer.write_table(pyarrow.Table.from_batches(batches))
> >   File "pyarrow/ipc.pxi", line 237, in
> > pyarrow.lib._CRecordBatchWriter.write_table
> >   File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
> > OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad
> > address
> >
> > Do I need to "close" the reader or open the writer differently?
> >
> > I'm using PyArrow 0.16.0 and Python 3.8.2.
> >
> > Thank you!
> > Rares
> >
>


[javascript] cant get timestamps in arrow 2.0

2020-12-14 Thread Andrew Clancy
Hi,

I have a simple feather file created via a pandas to_feather with a
datetime64[ns] column, and cannot get timestamps in javascript
apache-arrow@2.0.0

See this notebook:
https://observablehq.com/@nite/apache-arrow-timestamp-investigation

I'm guessing I'm missing something, has anyone got any suggestions, or
decent examples of reading a file created in pandas? I've seen in examples
of apache-arrow@0.3.1 where dates stored as an array of 2 ints.

File was created with:

import pandas as pd
pd.read_parquet('sample.parquet')
df.to_feather('sample-seconds.feather')

Final Q: I'm assuming this is the best place for this question? Happy to
post elsewhere if there's any other forums, or if this should be a JIRA
ticket?

Thanks!
Andy


[Rust] [Proposal] Add guidelines about usage of `unsafe`

2020-12-14 Thread Andrew Lamb
I would like to draw people's attention to the following
proposal documenting the acceptable use of `unsafe` Rust in the Rust Arrow
implementation:

https://github.com/apache/arrow/pull/8901

I wanted to increase the visibility of this proposal as it has implications
for future contributions.

Andrew


Re: pass input args directly to kernel

2020-12-14 Thread Wes McKinney
Also, do not feel the need to be constrained by the structures that
are currently defined.

On Mon, Dec 14, 2020 at 4:33 AM Antoine Pitrou  wrote:
>
>
> Hi,
>
> If you set `can_execute_chunkwise = false` on the kernel options, you
> should see the whole chunked array.
>
> Regards
>
> Antoine.
>
>
> Le 14/12/2020 à 11:27, Yibo Cai a écrit :
> > Current kernel framework divides inputs (e.g. arrays, chunked arrays) into 
> > batches and feeds to kernel code.
> > Does it make sense to pass input args directly to kernel?
> > I'm writing quantile kernel, need to allocate buffer to record all inputs 
> > and find nth at last. For chunked array, input is received chunk by chunk, 
> > kernel don't know the total buffer size to be allocated all at once. It 
> > will be convenient if the raw chunked array input is seen by the kernel.
> > Or there are better ways to achieve this? Thanks.
> >


Re: pass input args directly to kernel

2020-12-14 Thread Antoine Pitrou


Hi,

If you set `can_execute_chunkwise = false` on the kernel options, you
should see the whole chunked array.

Regards

Antoine.


Le 14/12/2020 à 11:27, Yibo Cai a écrit :
> Current kernel framework divides inputs (e.g. arrays, chunked arrays) into 
> batches and feeds to kernel code.
> Does it make sense to pass input args directly to kernel?
> I'm writing quantile kernel, need to allocate buffer to record all inputs and 
> find nth at last. For chunked array, input is received chunk by chunk, kernel 
> don't know the total buffer size to be allocated all at once. It will be 
> convenient if the raw chunked array input is seen by the kernel.
> Or there are better ways to achieve this? Thanks.
> 


pass input args directly to kernel

2020-12-14 Thread Yibo Cai

Current kernel framework divides inputs (e.g. arrays, chunked arrays) into 
batches and feeds to kernel code.
Does it make sense to pass input args directly to kernel?
I'm writing quantile kernel, need to allocate buffer to record all inputs and 
find nth at last. For chunked array, input is received chunk by chunk, kernel 
don't know the total buffer size to be allocated all at once. It will be 
convenient if the raw chunked array input is seen by the kernel.
Or there are better ways to achieve this? Thanks.


Re: Python: Bad address when rewriting file

2020-12-14 Thread Antoine Pitrou


Hello Rares,

Is there a complete reproducer that we may try out?

Regards

Antoine.


Le 14/12/2020 à 06:52, Rares Vernica a écrit :
> Hello,
> 
> As part of a test, I'm reading a record batch from an Arrow file,
> re-batching the data in smaller batches, and writing back the result to the
> same file. I'm getting an unexpected Bad address error and I wonder what am
> I doing wrong?
> 
> reader = pyarrow.open_stream(fn)
> tbl = reader.read_all()
> 
> writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema)
> batches = tbl.to_batches(max_chunksize=200)
> writer.write_table(pyarrow.Table.from_batches(batches))
> writer.close()
> 
> Traceback (most recent call last):
>   File "tests/foo.py", line 10, in 
> writer.write_table(pyarrow.Table.from_batches(batches))
>   File "pyarrow/ipc.pxi", line 237, in
> pyarrow.lib._CRecordBatchWriter.write_table
>   File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
> OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad
> address
> 
> Do I need to "close" the reader or open the writer differently?
> 
> I'm using PyArrow 0.16.0 and Python 3.8.2.
> 
> Thank you!
> Rares
>