Thanks Julian, I like the changes. For the last part I agree listing languages is good; we would do well to include JavaScript and Ruby in that list. Hopefully the list will keep growing longer!
On Sat, Oct 21, 2017 at 4:20 PM, Julian Hyde <[email protected]> wrote: > Your proposed version is definitely an improvement. > >> "Apache Arrow is a cross-language development platform for in-memory >> structured data access and analytics. It specifies a standardized >> language-independent columnar memory format for flat and hierarchical >> data, with support for zero-copy streaming messaging and interprocess >> communication. It also provides computational libraries for efficient >> in-memory analytics on modern hardware.” > > I propose a few tweaks: > > Simplify sentence 1 to > > Apache Arrow is a cross-language development platform for in-memory > data. > > This is easier to parse, captures the gist, and the other parts are covered > in later sentences. > > To me, the cache-efficient format is more fundamental important than > streaming and IPC (you can build the latter). Therefore I’d change > sentence 2 to > > It specifies a standardized language-independent columnar memory > format for flat and hierarchical data, organized for efficient analytic > operations on modern hardware. > > Which leaves sentence 3 as > > It also provides computational libraries for zero-copy streaming > messaging and interprocess communication. > > And add sentence 4, > > Languages supported include C and C++, Java, and Python. > > Julian > >> On Oct 21, 2017, at 10:58 AM, Wes McKinney <[email protected]> wrote: >> >> I believe we would benefit from modified language to describe the >> nature and scope of the Arrow project. >> >> Currently, our GitHub project description (and what we use in release >> announcements) states: >> >> "Apache Arrow is a columnar in-memory analytics layer designed to >> accelerate big data. It houses a set of canonical in-memory >> representations of flat and hierarchical data along with multiple >> language-bindings for structure manipulation. It also provides IPC and >> common algorithm implementations." >> >> I think this could be perhaps restated in the following way: >> >> "Apache Arrow is a cross-language development platform for in-memory >> structured data access and analytics. It specifies a standardized >> language-independent columnar memory format for flat and hierarchical >> data, with support for zero-copy streaming messaging and interprocess >> communication. It also provides computational libraries for efficient >> in-memory analytics on modern hardware." >> >> It is true that we have been mostly focused on hardening the details >> of the Arrow format and related issues around messaging and IPC, which >> are necessary for everything else we may contemplate building in the >> future. Since I plan to be building a library of computational tools >> in C++ for the native code community (Python, Ruby, R, etc.), I think >> it would be a good idea to clearly state that building general purpose >> analytics implementations (i.e. the sorts of things you find in "data >> frame libraries" like pandas) is part of the mission of the project. >> >> Feedback on the above would be appreciated how we could do a better >> job representing our past, present, and future community goals. >> >> Thanks >> Wes >
