[GitHub] [incubator-superset] mistercrunch commented on issue #6041: POC: Vaex connector

GitHub Sun, 07 Oct 2018 13:52:20 -0700

For reference, someone wrote a `pandas` connector in the past that we never 
merged. The main reason it wasn't merge is that it was a fair amount of code to 
manage coming from a non-committer, while the connector interface wasn't super 
well-defined and "settled" at that point. Evolving the interface would mean 
carrying the pandas connector along for the ride.


Also the problem of where to persist the dataframe. Since our web servers are 
stateless, the pandas dataframe needs to be brought up in memory from the 
network prior to performing aggregations / filters. With something like Arrow 
that becomes somewhat reasonable, but it feels like there should be a dedicated 
service (that resembles a database quite a bit) loading/caching/computing on 
those files.

[ Full content available at: 
https://github.com/apache/incubator-superset/pull/6041 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [incubator-superset] mistercrunch commented on issue #6041: POC: Vaex connector

Reply via email to