Re: [DISCUSS] Updating Arrow's "elevator pitch" on web properties

Wes McKinney Sat, 21 Oct 2017 13:34:19 -0700

Thanks Julian, I like the changes.

For the last part I agree listing languages is good; we would do well
to include JavaScript and Ruby in that list. Hopefully the list will
keep growing longer!


On Sat, Oct 21, 2017 at 4:20 PM, Julian Hyde <jh...@apache.org> wrote:
> Your proposed version is definitely an improvement.
>
>> "Apache Arrow is a cross-language development platform for in-memory
>> structured data access and analytics. It specifies a standardized
>> language-independent columnar memory format for flat and hierarchical
>> data, with support for zero-copy streaming messaging and interprocess
>> communication. It also provides computational libraries for efficient
>> in-memory analytics on modern hardware.”
>
> I propose a few tweaks:
>
> Simplify sentence 1 to
>
>   Apache Arrow is a cross-language development platform for in-memory
>   data.
>
> This is easier to parse, captures the gist, and the other parts are covered
> in later sentences.
>
> To me, the cache-efficient format is more fundamental important than
> streaming and IPC (you can build the latter). Therefore I’d change
> sentence 2 to
>
>   It specifies a standardized language-independent columnar memory
>   format for flat and hierarchical data, organized for efficient analytic
>   operations on modern hardware.
>
> Which leaves sentence 3 as
>
>   It also provides computational libraries for zero-copy streaming
>   messaging and interprocess communication.
>
> And add sentence 4,
>
>   Languages supported include C and C++, Java, and Python.
>
> Julian
>
>> On Oct 21, 2017, at 10:58 AM, Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> I believe we would benefit from modified language to describe the
>> nature and scope of the Arrow project.
>>
>> Currently, our GitHub project description (and what we use in release
>> announcements) states:
>>
>> "Apache Arrow is a columnar in-memory analytics layer designed to
>> accelerate big data. It houses a set of canonical in-memory
>> representations of flat and hierarchical data along with multiple
>> language-bindings for structure manipulation. It also provides IPC and
>> common algorithm implementations."
>>
>> I think this could be perhaps restated in the following way:
>>
>> "Apache Arrow is a cross-language development platform for in-memory
>> structured data access and analytics. It specifies a standardized
>> language-independent columnar memory format for flat and hierarchical
>> data, with support for zero-copy streaming messaging and interprocess
>> communication. It also provides computational libraries for efficient
>> in-memory analytics on modern hardware."
>>
>> It is true that we have been mostly focused on hardening the details
>> of the Arrow format and related issues around messaging and IPC, which
>> are necessary for everything else we may contemplate building in the
>> future. Since I plan to be building a library of computational tools
>> in C++ for the native code community (Python, Ruby, R, etc.), I think
>> it would be a good idea to clearly state that building general purpose
>> analytics implementations (i.e. the sorts of things you find in "data
>> frame libraries" like pandas) is part of the mission of the project.
>>
>> Feedback on the above would be appreciated how we could do a better
>> job representing our past, present, and future community goals.
>>
>> Thanks
>> Wes
>

Re: [DISCUSS] Updating Arrow's "elevator pitch" on web properties

Reply via email to