Re: arrow for game engine / graphics workloads?
Yes, I guess I’d end up taking a similar path to Arrow in this regard. I think I have some homework to do, to see whether I can use the Arrow format to model some things like meshes, scene graph layout, etc. If that is a good fit, it makes sense to use Arrow. Even if it isn’t a perfect fit, I like the idea of having the data in a more malleable or neutral form. Thank you for pointing that out. > On Dec 14, 2020, at 11:46 AM, Wes McKinney wrote: > > Arrow only uses Flatbuffers to serialize metadata, *not* data. > > On Mon, Dec 14, 2020 at 1:39 PM Robert Bigelow > wrote: >> >> This is an excellent point. I could use Flatbuffers directly to define any >> custom format needed by the engine. The engine itself would need to use the >> same principles the Arrow devs have, which I guess is true of any >> data-intensive system. Thanks for your response! >> >>> On Dec 14, 2020, at 11:24 AM, Lee, David >>> wrote: >>> >>> Arrow uses flatbuffers under the hood. >>> >>> https://google.github.io/flatbuffers/ >>> >>> FlatBuffers is an efficient cross platform serialization library for C++, >>> C#, C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, >>> Rust and Swift. It was originally created at Google for game development >>> and other performance-critical applications. >>> >>> >>> -Original Message- >>> From: Robert Bigelow >>> Sent: Sunday, December 13, 2020 1:00 PM >>> To: dev@arrow.apache.org >>> Subject: arrow for game engine / graphics workloads? >>> >>> External Email: Use caution with links and attachments >>> >>> >>> Dear Arrow devs, >>> >>> I’m writing a game engine in Swift, and the next system to design is the >>> resource manager / asset database. Arrow seems like an attractive option >>> for the heart of an engine, since many of the performance goals stated for >>> analytic workloads are shared by real-time rendering. Data layout is >>> extremely important, and I’d like to be able to feed the renderer without >>> chasing pointers. So far my plan is to create a custom format for the asset >>> database that will be used at runtime. I plan on having a tool that >>> traverses a scene graph write this format, then have the resource manager >>> use mmap to load assets. Arrow seems like a good fit for such a format, as >>> (if I understand correctly) only the Schema needs to be deserialized before >>> the data would be available, and it could be used to back streaming APIs. >>> >>> Do you know of any work being done with Arrow in the real-time rendering or >>> game engine space? Would the API presented by Arrow be a good fit, assuming >>> I’d mostly need to expose buffers of typed data to feed the renderer? >>> >>> One final question assuming none of your answers have dissuaded me. Would >>> the C/glib Arrow library be reasonable, since Swift can import C headers? >>> It seems that no intrepid Swift developers have started a native Swift >>> implementation yet. >>> >>> Thanks for your time, and kudos for your work on the Arrow project. It is >>> very impressive! >>> >>> Cheers, >>> Rob Bigelow >>> >>> >>> This message may contain information that is confidential or privileged. If >>> you are not the intended recipient, please advise the sender immediately >>> and delete this message. See >>> http://www.blackrock.com/corporate/compliance/email-disclaimers for further >>> information. Please refer to >>> http://www.blackrock.com/corporate/compliance/privacy-policy for more >>> information about BlackRock’s Privacy Policy. >>> >>> >>> For a list of BlackRock's office addresses worldwide, see >>> http://www.blackrock.com/corporate/about-us/contacts-locations. >>> >>> © 2020 BlackRock, Inc. All rights reserved. >>
Re: arrow for game engine / graphics workloads?
Arrow only uses Flatbuffers to serialize metadata, *not* data. On Mon, Dec 14, 2020 at 1:39 PM Robert Bigelow wrote: > > This is an excellent point. I could use Flatbuffers directly to define any > custom format needed by the engine. The engine itself would need to use the > same principles the Arrow devs have, which I guess is true of any > data-intensive system. Thanks for your response! > > > On Dec 14, 2020, at 11:24 AM, Lee, David > > wrote: > > > > Arrow uses flatbuffers under the hood. > > > > https://google.github.io/flatbuffers/ > > > > FlatBuffers is an efficient cross platform serialization library for C++, > > C#, C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, > > Rust and Swift. It was originally created at Google for game development > > and other performance-critical applications. > > > > > > -Original Message- > > From: Robert Bigelow > > Sent: Sunday, December 13, 2020 1:00 PM > > To: dev@arrow.apache.org > > Subject: arrow for game engine / graphics workloads? > > > > External Email: Use caution with links and attachments > > > > > > Dear Arrow devs, > > > > I’m writing a game engine in Swift, and the next system to design is the > > resource manager / asset database. Arrow seems like an attractive option > > for the heart of an engine, since many of the performance goals stated for > > analytic workloads are shared by real-time rendering. Data layout is > > extremely important, and I’d like to be able to feed the renderer without > > chasing pointers. So far my plan is to create a custom format for the asset > > database that will be used at runtime. I plan on having a tool that > > traverses a scene graph write this format, then have the resource manager > > use mmap to load assets. Arrow seems like a good fit for such a format, as > > (if I understand correctly) only the Schema needs to be deserialized before > > the data would be available, and it could be used to back streaming APIs. > > > > Do you know of any work being done with Arrow in the real-time rendering or > > game engine space? Would the API presented by Arrow be a good fit, assuming > > I’d mostly need to expose buffers of typed data to feed the renderer? > > > > One final question assuming none of your answers have dissuaded me. Would > > the C/glib Arrow library be reasonable, since Swift can import C headers? > > It seems that no intrepid Swift developers have started a native Swift > > implementation yet. > > > > Thanks for your time, and kudos for your work on the Arrow project. It is > > very impressive! > > > > Cheers, > > Rob Bigelow > > > > > > This message may contain information that is confidential or privileged. If > > you are not the intended recipient, please advise the sender immediately > > and delete this message. See > > http://www.blackrock.com/corporate/compliance/email-disclaimers for further > > information. Please refer to > > http://www.blackrock.com/corporate/compliance/privacy-policy for more > > information about BlackRock’s Privacy Policy. > > > > > > For a list of BlackRock's office addresses worldwide, see > > http://www.blackrock.com/corporate/about-us/contacts-locations. > > > > © 2020 BlackRock, Inc. All rights reserved. >
Re: arrow for game engine / graphics workloads?
This is an excellent point. I could use Flatbuffers directly to define any custom format needed by the engine. The engine itself would need to use the same principles the Arrow devs have, which I guess is true of any data-intensive system. Thanks for your response! > On Dec 14, 2020, at 11:24 AM, Lee, David > wrote: > > Arrow uses flatbuffers under the hood. > > https://google.github.io/flatbuffers/ > > FlatBuffers is an efficient cross platform serialization library for C++, C#, > C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, Rust > and Swift. It was originally created at Google for game development and other > performance-critical applications. > > > -Original Message- > From: Robert Bigelow > Sent: Sunday, December 13, 2020 1:00 PM > To: dev@arrow.apache.org > Subject: arrow for game engine / graphics workloads? > > External Email: Use caution with links and attachments > > > Dear Arrow devs, > > I’m writing a game engine in Swift, and the next system to design is the > resource manager / asset database. Arrow seems like an attractive option for > the heart of an engine, since many of the performance goals stated for > analytic workloads are shared by real-time rendering. Data layout is > extremely important, and I’d like to be able to feed the renderer without > chasing pointers. So far my plan is to create a custom format for the asset > database that will be used at runtime. I plan on having a tool that traverses > a scene graph write this format, then have the resource manager use mmap to > load assets. Arrow seems like a good fit for such a format, as (if I > understand correctly) only the Schema needs to be deserialized before the > data would be available, and it could be used to back streaming APIs. > > Do you know of any work being done with Arrow in the real-time rendering or > game engine space? Would the API presented by Arrow be a good fit, assuming > I’d mostly need to expose buffers of typed data to feed the renderer? > > One final question assuming none of your answers have dissuaded me. Would the > C/glib Arrow library be reasonable, since Swift can import C headers? It > seems that no intrepid Swift developers have started a native Swift > implementation yet. > > Thanks for your time, and kudos for your work on the Arrow project. It is > very impressive! > > Cheers, > Rob Bigelow > > > This message may contain information that is confidential or privileged. If > you are not the intended recipient, please advise the sender immediately and > delete this message. See > http://www.blackrock.com/corporate/compliance/email-disclaimers for further > information. Please refer to > http://www.blackrock.com/corporate/compliance/privacy-policy for more > information about BlackRock’s Privacy Policy. > > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/about-us/contacts-locations. > > © 2020 BlackRock, Inc. All rights reserved.
Arrow 3.0 release dashboard
Hi all, I've created https://cwiki.apache.org/confluence/display/ARROW/Arrow+3.0.0+Release, cloned from our previous release dashboards, for our upcoming release. A few things I'd like to draw your attention to: * Blockers: there are 5 currently. I'm not certain that they all are blockers, nor am I sure that there aren't other blockers. I know we've had some issues lately with python wheel builds, and I've added a couple of known issues, but there may be others. If you know of other release blockers, please note them. * There are over 300 unstarted issues tagged for 3.0.0. Given holiday schedules and the release target in the first week of January, that's at least 200 too many. Please review your backlogs and start bumping things out of scope. I've added a 4.0.0 version that you can move things to, if you want. * Release process improvements: there was some discussion a few weeks ago about what could be done to make it easier to cut releases ( https://lists.apache.org/thread.html/r2257603c91336c042633f2bc480f34f58c02d9c67ff1ee1a212933b5%40%3Cdev.arrow.apache.org%3E). A few tasks were mentioned in that thread. If anyone is going to take action, now would be a good time. Thanks, Neal
Re: Python: Bad address when rewriting file
Hi Rares, Ok, so here is the explanation. `pa.ipc.open_stream` will open the given file memory-mapped, so the buffers read from the file are zero-copy. But now you're rewriting the file from scratch... so the buffers become invalid memory (they're zero-copy). Hence the "Bad address" error you're getting (the underlying errno mnemonic for error code 14 is EFAULT). If you need to rewrite the *same* file, you should disable memory mapping. For example, you can use `pyarrow.ipc.open_stream(pyarrow.OSFile(fn))`, which will create a regular file object. Or you can arrange to not rewrite the same file. For example you could write to a temporary file, close it, and then move it to the original location. Regards Antoine. Le 14/12/2020 à 20:03, Rares Vernica a écrit : > Hi Antoine, > > Here is a repro for this issue: > > import pyarrow > > fn = '/tmp/foo' > > # Data > data = [ > pyarrow.array(range(1000)), > pyarrow.array(range(1000)) > ] > batch = pyarrow.record_batch(data, names=['f0', 'f1']) > > # File Prep > writer = pyarrow.ipc.RecordBatchStreamWriter(fn, batch.schema) > writer.write_batch(batch) > writer.close() > > # Read > reader = pyarrow.open_stream(fn) > tbl = reader.read_all() > > # Rewrite > writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema) > batches = tbl.to_batches(max_chunksize=200) > writer.write_table(pyarrow.Table.from_batches(batches)) > writer.close() > > >> python3 foo.py > Traceback (most recent call last): > File "foo.py", line 24, in > writer.write_table(pyarrow.Table.from_batches(batches)) > File "pyarrow/ipc.pxi", line 237, in > pyarrow.lib._CRecordBatchWriter.write_table > File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status > OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad > address > > Cheers, > Rares > > > On Mon, Dec 14, 2020 at 12:30 AM Antoine Pitrou wrote: > >> >> Hello Rares, >> >> Is there a complete reproducer that we may try out? >> >> Regards >> >> Antoine. >> >> >> Le 14/12/2020 à 06:52, Rares Vernica a écrit : >>> Hello, >>> >>> As part of a test, I'm reading a record batch from an Arrow file, >>> re-batching the data in smaller batches, and writing back the result to >> the >>> same file. I'm getting an unexpected Bad address error and I wonder what >> am >>> I doing wrong? >>> >>> reader = pyarrow.open_stream(fn) >>> tbl = reader.read_all() >>> >>> writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema) >>> batches = tbl.to_batches(max_chunksize=200) >>> writer.write_table(pyarrow.Table.from_batches(batches)) >>> writer.close() >>> >>> Traceback (most recent call last): >>> File "tests/foo.py", line 10, in >>> writer.write_table(pyarrow.Table.from_batches(batches)) >>> File "pyarrow/ipc.pxi", line 237, in >>> pyarrow.lib._CRecordBatchWriter.write_table >>> File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status >>> OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad >>> address >>> >>> Do I need to "close" the reader or open the writer differently? >>> >>> I'm using PyArrow 0.16.0 and Python 3.8.2. >>> >>> Thank you! >>> Rares >>> >> >
RE: arrow for game engine / graphics workloads?
Arrow uses flatbuffers under the hood. https://google.github.io/flatbuffers/ FlatBuffers is an efficient cross platform serialization library for C++, C#, C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, Rust and Swift. It was originally created at Google for game development and other performance-critical applications. -Original Message- From: Robert Bigelow Sent: Sunday, December 13, 2020 1:00 PM To: dev@arrow.apache.org Subject: arrow for game engine / graphics workloads? External Email: Use caution with links and attachments Dear Arrow devs, I’m writing a game engine in Swift, and the next system to design is the resource manager / asset database. Arrow seems like an attractive option for the heart of an engine, since many of the performance goals stated for analytic workloads are shared by real-time rendering. Data layout is extremely important, and I’d like to be able to feed the renderer without chasing pointers. So far my plan is to create a custom format for the asset database that will be used at runtime. I plan on having a tool that traverses a scene graph write this format, then have the resource manager use mmap to load assets. Arrow seems like a good fit for such a format, as (if I understand correctly) only the Schema needs to be deserialized before the data would be available, and it could be used to back streaming APIs. Do you know of any work being done with Arrow in the real-time rendering or game engine space? Would the API presented by Arrow be a good fit, assuming I’d mostly need to expose buffers of typed data to feed the renderer? One final question assuming none of your answers have dissuaded me. Would the C/glib Arrow library be reasonable, since Swift can import C headers? It seems that no intrepid Swift developers have started a native Swift implementation yet. Thanks for your time, and kudos for your work on the Arrow project. It is very impressive! Cheers, Rob Bigelow This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations. © 2020 BlackRock, Inc. All rights reserved.
Re: Python: Bad address when rewriting file
Hi Antoine, Here is a repro for this issue: import pyarrow fn = '/tmp/foo' # Data data = [ pyarrow.array(range(1000)), pyarrow.array(range(1000)) ] batch = pyarrow.record_batch(data, names=['f0', 'f1']) # File Prep writer = pyarrow.ipc.RecordBatchStreamWriter(fn, batch.schema) writer.write_batch(batch) writer.close() # Read reader = pyarrow.open_stream(fn) tbl = reader.read_all() # Rewrite writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema) batches = tbl.to_batches(max_chunksize=200) writer.write_table(pyarrow.Table.from_batches(batches)) writer.close() > python3 foo.py Traceback (most recent call last): File "foo.py", line 24, in writer.write_table(pyarrow.Table.from_batches(batches)) File "pyarrow/ipc.pxi", line 237, in pyarrow.lib._CRecordBatchWriter.write_table File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad address Cheers, Rares On Mon, Dec 14, 2020 at 12:30 AM Antoine Pitrou wrote: > > Hello Rares, > > Is there a complete reproducer that we may try out? > > Regards > > Antoine. > > > Le 14/12/2020 à 06:52, Rares Vernica a écrit : > > Hello, > > > > As part of a test, I'm reading a record batch from an Arrow file, > > re-batching the data in smaller batches, and writing back the result to > the > > same file. I'm getting an unexpected Bad address error and I wonder what > am > > I doing wrong? > > > > reader = pyarrow.open_stream(fn) > > tbl = reader.read_all() > > > > writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema) > > batches = tbl.to_batches(max_chunksize=200) > > writer.write_table(pyarrow.Table.from_batches(batches)) > > writer.close() > > > > Traceback (most recent call last): > > File "tests/foo.py", line 10, in > > writer.write_table(pyarrow.Table.from_batches(batches)) > > File "pyarrow/ipc.pxi", line 237, in > > pyarrow.lib._CRecordBatchWriter.write_table > > File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status > > OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad > > address > > > > Do I need to "close" the reader or open the writer differently? > > > > I'm using PyArrow 0.16.0 and Python 3.8.2. > > > > Thank you! > > Rares > > >
[javascript] cant get timestamps in arrow 2.0
Hi, I have a simple feather file created via a pandas to_feather with a datetime64[ns] column, and cannot get timestamps in javascript apache-arrow@2.0.0 See this notebook: https://observablehq.com/@nite/apache-arrow-timestamp-investigation I'm guessing I'm missing something, has anyone got any suggestions, or decent examples of reading a file created in pandas? I've seen in examples of apache-arrow@0.3.1 where dates stored as an array of 2 ints. File was created with: import pandas as pd pd.read_parquet('sample.parquet') df.to_feather('sample-seconds.feather') Final Q: I'm assuming this is the best place for this question? Happy to post elsewhere if there's any other forums, or if this should be a JIRA ticket? Thanks! Andy
[Rust] [Proposal] Add guidelines about usage of `unsafe`
I would like to draw people's attention to the following proposal documenting the acceptable use of `unsafe` Rust in the Rust Arrow implementation: https://github.com/apache/arrow/pull/8901 I wanted to increase the visibility of this proposal as it has implications for future contributions. Andrew
Re: pass input args directly to kernel
Also, do not feel the need to be constrained by the structures that are currently defined. On Mon, Dec 14, 2020 at 4:33 AM Antoine Pitrou wrote: > > > Hi, > > If you set `can_execute_chunkwise = false` on the kernel options, you > should see the whole chunked array. > > Regards > > Antoine. > > > Le 14/12/2020 à 11:27, Yibo Cai a écrit : > > Current kernel framework divides inputs (e.g. arrays, chunked arrays) into > > batches and feeds to kernel code. > > Does it make sense to pass input args directly to kernel? > > I'm writing quantile kernel, need to allocate buffer to record all inputs > > and find nth at last. For chunked array, input is received chunk by chunk, > > kernel don't know the total buffer size to be allocated all at once. It > > will be convenient if the raw chunked array input is seen by the kernel. > > Or there are better ways to achieve this? Thanks. > >
Re: pass input args directly to kernel
Hi, If you set `can_execute_chunkwise = false` on the kernel options, you should see the whole chunked array. Regards Antoine. Le 14/12/2020 à 11:27, Yibo Cai a écrit : > Current kernel framework divides inputs (e.g. arrays, chunked arrays) into > batches and feeds to kernel code. > Does it make sense to pass input args directly to kernel? > I'm writing quantile kernel, need to allocate buffer to record all inputs and > find nth at last. For chunked array, input is received chunk by chunk, kernel > don't know the total buffer size to be allocated all at once. It will be > convenient if the raw chunked array input is seen by the kernel. > Or there are better ways to achieve this? Thanks. >
pass input args directly to kernel
Current kernel framework divides inputs (e.g. arrays, chunked arrays) into batches and feeds to kernel code. Does it make sense to pass input args directly to kernel? I'm writing quantile kernel, need to allocate buffer to record all inputs and find nth at last. For chunked array, input is received chunk by chunk, kernel don't know the total buffer size to be allocated all at once. It will be convenient if the raw chunked array input is seen by the kernel. Or there are better ways to achieve this? Thanks.
Re: Python: Bad address when rewriting file
Hello Rares, Is there a complete reproducer that we may try out? Regards Antoine. Le 14/12/2020 à 06:52, Rares Vernica a écrit : > Hello, > > As part of a test, I'm reading a record batch from an Arrow file, > re-batching the data in smaller batches, and writing back the result to the > same file. I'm getting an unexpected Bad address error and I wonder what am > I doing wrong? > > reader = pyarrow.open_stream(fn) > tbl = reader.read_all() > > writer = pyarrow.ipc.RecordBatchStreamWriter(fn, tbl.schema) > batches = tbl.to_batches(max_chunksize=200) > writer.write_table(pyarrow.Table.from_batches(batches)) > writer.close() > > Traceback (most recent call last): > File "tests/foo.py", line 10, in > writer.write_table(pyarrow.Table.from_batches(batches)) > File "pyarrow/ipc.pxi", line 237, in > pyarrow.lib._CRecordBatchWriter.write_table > File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status > OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad > address > > Do I need to "close" the reader or open the writer differently? > > I'm using PyArrow 0.16.0 and Python 3.8.2. > > Thank you! > Rares >