[jira] [Commented] (ARROW-5069) [C++] Implement direct support for shared memory arrow columns

Rok Mihevc (Jira) Tue, 10 Jan 2023 23:48:42 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662091#comment-17662091
 ]


Rok Mihevc commented on ARROW-5069:
-----------------------------------

This issue has been migrated to [issue 
#21559|https://github.com/apache/arrow/issues/21559] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++] Implement direct support for shared memory arrow columns
> --------------------------------------------------------------
>
>                 Key: ARROW-5069
>                 URL: https://issues.apache.org/jira/browse/ARROW-5069
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>         Environment: Linux
>            Reporter: Dimitris Lekkas
>            Priority: Major
>              Labels: perfomance, proposal
>
> I consider the option of memory-mapping columns to shared memory to be 
> valuable. Such option will be triggered if specific metadata are supplied. 
> Given that many data frames backed by arrow are used for machine learning I 
> guess we could somehow benefit from treating differently the data (most 
> likely data buffer columns) that will be fed into the GPUs/FPGAs. To enable 
> such change we would need to address the following issues:
> First, we need each column to hold an integer value representing its 
> associated file descriptor. The application developer could retrieve the 
> file-name from the file descriptor (i.e fstat syscall) and inform another 
> application to reference that file or inform an FPGA to DMA that memory-area.
> We also need to support variable buffer alignment (restricted to powers-of-2 
> of course)  when initiating an arrow::AllocateBuffer() call. By inspecting 
> the current implementation, the alignment size is fixed at 64 bytes and to 
> change that value a recompilation is required [1].
> To justify the above suggestion, major FPGA vendors (i.e Xilinx) benefit 
> heavily from page-aligned buffers since their device memory is 4KB [2]. 
> Particularly, Xilinx warns users if they attempt to memcpy a non-page-aligned 
> buffer from CPU memory to FPGA's memory [3]. 
> Wouldn't it be nice if we could issue from_pandas() and then have our columns 
> memory mapped to shared memory for FPGAs to DMA such memory and accelerate 
> the workload? If there is already a workaround to achieve that I would like 
> more info on that.
> I am open to discuss any suggestions, improvements or concerns. 
>  
> [1]: 
> [https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.cc#L40]
> [2]: 
> [https://forums.xilinx.com/t5/SDAccel/memory-alignment-when-allocating-emmory-in-SDAccel/td-p/887593]
> [3]: [https://forums.aws.amazon.com/thread.jspa?messageID=884615&tstart=0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-5069) [C++] Implement direct support for shared memory arrow columns

Reply via email to