Hi Michael,

How much data are you sending over the wire to javascript? For grouping,
you might be able to send the original ungrouped table as an arrow buffer
and use a new javascript library from the University of Washington called
arquero (https://github.com/uwdata/arquero), which natively supports arrow.
Here's a few Observable notebooks to get you started:

https://observablehq.com/@uwdata/arquero-roll-up-and-drill-down
https://observablehq.com/@uwdata/arquero-and-apache-arrow

Regards,
Naveen


On Wed, Mar 3, 2021 at 2:00 PM Michael Lavina <michael.lav...@factset.com>
wrote:

> Hey Weston,
>
> Do you have any public code examples I could take a look at? This does
> sound very related to what I am doing.
>
> One particular question I have related to grouping is how you define
> row-grouping. Column grouping is fairly simple I think you can just define
> a Struct that tells you how columns of data is grouped, but how would you
> go about grouping rows of data for example
>
> User Table
>
> First Name | Last Name | Country | State | City | Occupation
>
> // some data
>
> I have thought of basically two ways to do this. Send some metadata array
> i.e. groupBy that denotes how data should be grouped by and it’s a simple
> algorithm maybe something like [country, state, city]. But then you would
> need to store some mapping of a given rowIndex returns some rows of
> children based of that algorithm. And I think this would require all the
> data to be available to do the grouping.
>
> The other way is defining the structure of the data maybe something like
> (this could be entirely wrong I am new to Arrow sorry)
> list<struct<country, list<struct<state, list<struct<city,
> list<struct<firstName, lastName, occupation>>>>>>>>
>
> but basically the idea would be if you were to retrieve the data for a
> given index of let’s say a state it would return all the cities and vectors
> of data related to that given state.
>
> I also don’t know also if this is a limitations of my understanding of
> Arrow or the ArrowJs SDK library and this might be something very easy I am
> just not seeing it.
>
> -Michael
> From: Weston Pace <weston.p...@gmail.com>
> Date: Friday, February 26, 2021 at 9:34 PM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Cc: Michael Lavina <michael.lav...@factset.com>
> Subject: Re: [JS] Exploring usage of apache arrow at my company for
> complex table rendering
> I used Arrow for this purpose in the past.  I don't have much to add
> but just a few thoughts off the top of my head...
>
> * The line between data and metadata can be blurry - For most
> measurements we were able to store the "expected distribution" as
> metadata (e.g. this measurement should have an expected value of 10
> +/- 3) and that could be used for drawing limit lines.  For some
> measurements however the common practice in place was to store the
> upper/lower limit as separate columns because they often changed
> depending on the various independent variables.  In that case the same
> "concept" (limit) might be stored in data or metadata.
>
> * Distinction between "data" and a "chart" - For us, we introduced a
> separate representation called the "chart" between the data and the
> rendering layer.  So using that limit line example before if we wanted
> to plot a histogram of some column then we would create a bar chart
> from the column.  This bar chart itself was also an array of numbers
> but, since these arrays were much smaller (one per bin, hard limit to
> bin count in the thousands based on # of pixels in display), and the
> structure was much more deeply nested, we ended up just using JSON for
> charts.  The "limit" metadata belonged to the data and it was
> translated into a vertical line element as part of the chart.
>
> * Processing layer - For us it was too expensive to send the data
> across the Internet for display.  So the conversion from data -> chart
> happened with the datacenter close to the actual data.  The JS UI was
> simply responsible for chart -> pixels (well, SVG).  It sounds like
> you plan on doing the processing in JS.  This can work, I'm just
> tossing out alternatives to think about.  You can even have a hybrid
> model where some initial filtering happens in the datacenter and then
> chart calculation / rendering happens in JS.
>
> * Expressions for group/split - Arrow expressions / compute are
> starting to become available (and more work is being done on in-arrow
> query engines).  These can be very helpful for things like grouping or
> splitting.  For example, if you want to plot two line charts, one for
> model X and one for model Y then you can define your split using
> expressions.  Unfortunately, these are pretty big features and I don't
> think they are in the JS library.  However, the existing C++/Rust work
> could serve as examples for how you might want to tackle this.  You
> will need a fair amount of compute to go from data to chart
> (histograms, averages, standard deviations, etc.).  In my case I used
> pandas pretty extensively for this since the Arrow compute features
> didn't exist yet.  There are some JS libraries for this (e.g. d3) so
> you can probably investigate that avenue as well.
>
> On Fri, Feb 26, 2021 at 12:05 PM Paul Taylor <ptay...@apache.org> wrote:
> >
> > Hi Michael,
> >
> > The answer to your question about metadata will likely be
> > application-specific.
> >
> > For small amounts of metadata (i.e. communicating a bounding box of
> > included geometry), there isn't much room for optimization, so a string
> > could be fine.
> >
> > For larger amounts of metadata (or other constraints, like if the
> metadata
> > needs to be constantly modified independent of the data), custom
> encodings
> > or a second service and/or arrow table of the metadata could be the way
> to
> > go.
> >
> > The metadata keys/values are UTF-8 strings, so nothing should prevent you
> > from stuffing a base64-encoded protobuf in there.
> >
> > As for whether the library is maintained -- yes it is, but lately I've
> only
> > had time to work on bug fixes or features required to maintain parity
> with
> > the spec and other libs.
> >
> > I will be using Arrow JS in my work again soon, and that could justify
> more
> > "quality of life" improvements again, but without other maintainers
> jumping
> > in to contribute or needing it for my work, those things don't get done.
> >
> > I'd be happy to do a call with you or your team to give a short overview
> > and introduction to the JS lib. You can also email me directly or in the
> > #arrow-js channel on the-asf.slack.com with any questions.
> >
> > Best,
> > Paul
> >
> > On Fri, Feb 26, 2021 at 1:47 PM Michael Lavina <
> michael.lav...@factset.com>
> > wrote:
> >
> > > Hey Neal,
> > >
> > > Thanks for the response and I am glad I am using this correctly. I have
> > > never really used email servers so hopefully this works.
> > >
> > > That’s exactly what I was thinking of doing is to create a standard
> > > metadata schema to built on top of Apache Arrow with some predefined
> user
> > > types.
> > >
> > > I guess I was just wondering if I was trying to use a screwdriver as a
> > > hammer. It can work because we are using the metadata and that could be
> > > anything but maybe like you said we should be creating a separate
> standard
> > > entirely for defining the schema to render tables instead of defining
> it
> > > within Arrow.
> > >
> > > Does it defeat the value of Arrow if are sending the data using buffers
> > > and stream and a giant string of stringified metadata when I could
> maybe
> > > define the metadata in protobuf binary separately.
> > >
> > > In addition, I was curious with all these visualization tools has
> someone
> > > already developed a standard metadata for arrow to help with rendering.
> > > Stuff like how to denote grouping of data, relationship between
> columns and
> > > hidden information.
> > >
> > > -Michael
> > >
> > > From: Neal Richardson <neal.p.richard...@gmail.com>
> > > Date: Friday, February 26, 2021 at 1:38 PM
> > > To: dev <dev@arrow.apache.org>
> > > Subject: Re: [JS] Exploring usage of apache arrow at my company for
> > > complex table rendering
> > > The Arrow IPC specification allows for custom metadata in both the
> Schema
> > > and the individual Fields:
> > >
> > >
> https://urldefense.com/v3/__https://arrow.apache.org/docs/format/Columnar.html*schema-message__;Iw!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKvxxhHK7K$
> <
> https://urldefense.com/v3/__https:/arrow.apache.org/docs/format/Columnar.html*schema-message__;Iw!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKvxxhHK7K$
> >
> > > <
> > >
> https://urldefense.com/v3/__https:/arrow.apache.org/docs/format/Columnar.html*schema-message__;Iw!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKvxxhHK7K$
> > > >
> > >
> > > Might that work for you? Another alternative would be to track your
> > > metadata in a separate object outside of the Arrow data.
> > >
> > > Neal
> > >
> > > On Fri, Feb 26, 2021 at 5:02 AM Michael Lavina <
> michael.lav...@factset.com
> > > >
> > > wrote:
> > >
> > > > Hello Everyone,
> > > >
> > > >
> > > >
> > > > Some background. My name is Michael and I work at FactSet, which if
> you
> > > > use Arrow you may have heard because one of our architects did a
> talk on
> > > > using Arrow and Dremio.
> > > >
> > > >
> > > >
> > >
> https://urldefense.com/v3/__https://hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv9lV4pkV$
> <
> https://urldefense.com/v3/__https:/hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv9lV4pkV$
> >
> > > <
> > >
> https://urldefense.com/v3/__https:/hello.dremio.com/eliminate-data-transfer-bottlenecks-with-apache-arrow-flight.html?utm_medium=social-free&utm_source=linkedin&utm_term=na&utm_content=na&utm_campaign=eliminate-data-transfer-bottlenecks-with-apache-arrow-flight__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv9lV4pkV$
> > > >
> > > >
> > > >
> > > >
> > > > His team has decided to use Arrow as a tabular data interchange
> format.
> > > > Other teams are doing other things. We are working on standardizing
> our
> > > > tabular data interchange format at our company.
> > > >
> > > >
> > > >
> > > > We have our own open-sourced columnar based schema defined in
> protobuf.
> > > >
> > >
> https://urldefense.com/v3/__https://github.com/factset/stachschema__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv6XjzSrx$
> <
> https://urldefense.com/v3/__https:/github.com/factset/stachschema__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv6XjzSrx$
> >
> > > <
> > >
> https://urldefense.com/v3/__https:/github.com/factset/stachschema__;!!PBKjc0U4!ZDNX2q8bDIOFv2QGswzYOu9kXjf-yQ_0OvCT9gc-9kIH6GXS0qYzmwCGSdcKv6XjzSrx$
> > > >
> > > >
> > > >
> > > >
> > > > We looked into Apache Arrow a few years ago, but decided not to use
> it as
> > > > it was not mature enough at the time and we had two specific
> requirements
> > > >
> > > > 1) We needed this data not just for analytics but rendering as well
> and
> > > > rendering requires a lot more complicated information such as
> > > understanding
> > > > the type of data and relationship between data i.e. grouping
> > > >
> > > > 2) We need SDKs that support typescript/javascript both browser and
> node
> > > > and supports both creating and consuming arrow.
> > > >
> > > >
> > > >
> > > > Now that Apache Arrow is more mature and stabilized i.e. the schema
> and
> > > > sdks are post 1.x we are looking into it again.
> > > >
> > > >
> > > >
> > > >    1. we are thinking of defining specific metadata in a similar way
> we
> > > >    do for STACH that let’s us define some rendering specific e.g.
> adding
> > > a
> > > >    metadata to a Field Schema called isHidden to denote whether we
> should
> > > >    render the data column or not.
> > > >    2. It seems like there is a well developed javascript SDK that we
> can
> > > >    use. I am still reading the source code and the Observable
> articles to
> > > >    truly understand how it works.
> > > >       1. I read one of the issues is that the JS library might be out
> > > >       sync, so do people know how actively that repo is maintained.
> > > >       2. If there needs to be work done I think we would be able to
> help
> > > >       if we had some help getting started with understanding that
> repo.
> > > >
> > > >
> > > >
> > > > If possible we would be interested to continue to chat about the
> above
> > > > ideas, get more information about if Apache Arrow is right for the
> job,
> > > and
> > > > if there is already discussion of other people are using arrow for
> > > > rendering in addition to analytics.
> > > >
> > > >
> > > >
> > > > To clarify what I mean for existing render technologies I know stuff
> like
> > > > Falcon and Perspective exist, but those seem to be for basic table
> > > > rendering for simple tables. I mean to create a superset of arrow by
> > > > definfing metadata that allows for complex nested headers and nested
> > > rows.
> > > > Something like the image below. Then you can imagine even more data
> > > > attached such as describing the data and relationships to other data
> on
> > > the
> > > > page. You can image in the dataset there is some `personId` that is
> set
> > > to
> > > > not be rendered. This personId can then be used to gather more
> > > information
> > > > in another api call if you wanted to render a tooltip with maybe
> some bio
> > > > information. In short, rendered tables require a lot more information
> > > than
> > > > just the data. Does it make sense to build this upon Arrow.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -Thanks
> > > >
> > > > Michael
> > > >
> > > >
> > > >
> > >
>


-- 
-----------------------------------
Naveen Michaud-Agrawal

Reply via email to