Re: [GSoC 2015][COMDEV-119] Zeppelin GSoC Project: add more D3 visualization

madhuka udantha Sun, 22 Mar 2015 22:16:30 -0700

Hi, moon

Yes, Since


> "Moving computation is cheaper than moving data"

We can do computation in computing framework.

For simple pivot changing or filtering can be handle in local storage with
indexing databases depending on the current user level.
As you saw, computations will be handle in the back ends.

Great to hear about the building rich GUI, I will give me chart library ideas
on there.

Your ideas are always welcome, those will be helpful for my task and draft
proposal

Thanks

On Mon, Mar 23, 2015 at 7:59 AM, moon soo Lee <m...@apache.org> wrote:

> Hi, madhuka udantha
>
> I think your idea about chart library and data transformation engine sounds
> cool. For the data transform modules, it's good idea to make this pluggable
> to data transform engine. But i'm not sure getting result locally and do
> transform for pivot or filtering to prevent run query again is good idea.
> Because of Zeppelin is (not limited but) trying to build analytical
> environment on top of distributed computing framework, like Spark, Flink,
> Ignite, etc. Most of distributed computing framework Zeppelin trying to
> integrate is following the same paradigm "Moving computation is cheaper
> than moving data". In this manner, size of data that transform engine need
> to handle can be easily multiple TB. Which will take long time to copy to
> local machine and process. So i think transform module should be run on
> underlying distributed computing framework.
>
> And about Chart library, we have started discussion thread about building
> rich GUI inside of notebook. it might be related.
>
> Thanks,
> moon
>
>
>
> On Mon, Mar 23, 2015 at 2:27 AM madhuka udantha <madhukaudan...@gmail.com>
> wrote:
>
> > On Sun, Mar 22, 2015 at 7:05 PM, Corneau Damien <cornead...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > Being able to aggregate on the query side is a great idea and would
> allow
> > > us to transfer less data as well as having a full query representation
> of
> > > the visualization.
> > >
> > > However creating a SQL query dynamically is a pretty difficult task,
> and
> > > might be too much for that scope.
> > >
> > > Also I see some possible problems with this method:
> > >  - Changing the pivot or simple filtering would mean running the query
> > > again
> > >
> > No, the query wont run again.
> > In the first run of the query data is collected and stored locally- local
> > storage [1](using indexing techniques to make retrieval faster) So
> changing
> > pivot or simple filtering will use the local storage.
> > If any attribute or data is missing in local storage then it will
> retrieve
> > only that and save the network bandwidth as well.
> > Does my explanation make sense.
> >
> >
> >
> > >  - Being able to make pivot style SQL query would be really hard,
> > >    we would need multiple sub-queries or even some times multiple
> queries
> > > (I tried a few times and could have the result wanted only with
> > > visualization side pivot).
> > >    It would end up with really bad SQL queries, especially with the
> Hive
> > > SQL or Spark SQL limitations and would take way more time to process.
> > >
> > Agreed. I'm not planing to use pivot style queries.
> >
> > Any suggestions?
> >
> >
> > Thanks.
> >
> >
> > > On Sun, Mar 22, 2015 at 10:08 PM, IT CTO <goi....@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > The Chart library features sounds promising.
> > > > As  for the data engine - one thing that I think is missing is the
> > > ability
> > > > to use the visualization to drive the aggregation in the SQL. today,
> > you
> > > > first write the SQL, you execute it, *limited by the number of
> results
> > > sent
> > > > to the client*, and then you use viz to understand the results.
> > > > Alternatively, if through the visualization I can generate a better
> SQL
> > > > which returns returns an aggregated data-set then I can analyze a
> > bigger
> > > > amount of data.
> > > >
> > > > I hope I was clear enough in my explanation :-)
> > > >
> > > > Eran
> > > >
> > > >
> > > > On Fri, Mar 20, 2015 at 8:21 AM, madhuka udantha <
> > > madhukaudan...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Here is my proposing ideas.
> > > > > According to COMDEV-119 jira. Charts are hard coded until now and
> > data
> > > > > transformation issue was highlighted since different charts have
> > > > different
> > > > > pivot fields eg: Area charts, Scatter, Surface charts, Bubble
> charts,
> > > > Radar
> > > > > charts. etc..
> > > > >
> > > > > To solve this I am introducing a two major component one is called
> > > 'Chart
> > > > > library' and 'Data transformation engine'. Chart library is located
> > > where
> > > > > it shows the chats that are currently plugged. There we can plug
> > chart
> > > > > types and those can be reused.
> > > > >
> > > > > *Chart library features *
> > > > >
> > > > >    - Users can select the chart from library
> > > > >    - Those charts are pluggable to library
> > > > >    - Charts can be plugged by config(json)/UI with wizard
> > > > >    - Configuration/Meta file of the chart contains interface, libs,
> > > > themes
> > > > >    and a data transformation types/mappings
> > > > >
> > > > >
> > > > >
> > > > > *Data Transformation Engine*
> > > > > 'Data transformation engine' contains data transformation modules.
> > > Those
> > > > > modules are also pluggable to engine. Those have connections to
> > charts.
> > > > > Data transformation engine sit between the data (sql) and chart. So
> > > this
> > > > > module  converts data and map them to each chart pivot field
> > > > >
> > > > >    - This module will look at pivot fields of the chart
> > > > >    - Selected attributes of the SQL query
> > > > >    - Attribute value operations improvement (string split, value
> > > > >    aggregation, round number round)
> > > > >
> > > > >
> > > > > Another improvement that I notice is that
> > > > >
> > > > >    - Query Edit auto-completion support (with Ctrl+space)
> > > > >
> > > > >
> > > > > Your ideas are welcome here
> > > > > Thanks
> > > > >
> > > > > On Fri, Mar 20, 2015 at 10:57 AM, madhuka udantha <
> > > > > madhukaudan...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I'm Udantha, MSc. Student at University of Moratuwa. This GSoC
> 2015
> > > > > > project, 0COMDEV-1190 captures my interest.
> > > > > >
> > > > > > I have abundant experiences of visualization techniques creating
> > > > numerous
> > > > > > dashboards[1,2] with javascript, html5, angularJS, d3 charting
> etc.
> > > > > >
> > > > > > My current research area comprises of big data where I have
> worked
> > > with
> > > > > > various types of data sets. Also I'm working with cluster
> > > > representation
> > > > > > and classification techniques where visualization amounts to a
> > > > > considerable
> > > > > > part. I was following COMDEV-119 (jira) with Alexander Bezzubov
> and
> > > > > CORNEAU
> > > > > > Damien for more than week.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > [1] http://wso2.com/products/user-engagement-server/
> > > > > > [2] https://github.com/wso2/jaggery
> > > > > > --
> > > > > > Cheers,
> > > > > > Madhuka Udantha
> > > > > > http://madhukaudantha.blogspot.com
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Madhuka Udantha
> > > > > http://madhukaudantha.blogspot.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Eran | CTO
> > > >
> > >
> >
> >
> >
> > --
> > Cheers,
> > Madhuka Udantha
> > http://madhukaudantha.blogspot.com
> >
>



-- 
Cheers,
Madhuka Udantha
http://madhukaudantha.blogspot.com

Re: [GSoC 2015][COMDEV-119] Zeppelin GSoC Project: add more D3 visualization

Reply via email to