Hi Yu,

First of all thank you for your interest in extending Impala. That said,
what you're proposing is not a trivial task as it will affect multiple
impala components (metadata, planner, query exec, etc). Also, it adds
another dependency to Impala that will have to be continuously tested and
maintained. So, before jumping into the code, I think you need to submit a
proposal that outlines:
1. The design; see how Impala interacts with other storage engines such as
Kudu or HBase to understand which components are affected by such a change.
2. How are you going to test this and what kind of testing infrastructure
will be in place in order to ensure that future commits don't break the
integration with ElasticSearch?
3. Timeline and milestones (sub-tasks) for this project.

I suggest submitting your proposal as a google doc so that it's easier to
comment on. At the same time, I think it's very important for you to get
more experience in modifying the Impala codebase. So, before endeavoring in
such a big task, it may worth spending some time working on a few smaller
(ramp-up) tasks.

Thanks,
Dimitris



On Thu, Nov 2, 2017 at 11:32 PM, yu feng <olaptes...@gmail.com> wrote:

> Hi All :
>
>    We are try to query data from Elasticsearch using impala, we want
> to take advantage of fast speed of impala engine and fast filter and
> aggregation speed of Elasticsearch.
>
> I want to do it in the following way :
>
> 1、add a new Table type(metadata) called ES Table.
> 2、add two new ExecNode(ESScanNode and ESAggregation) to implements query to
> ES.
> 3、when a query to ES Table, try to rewrite execution plan while contains
> Aggregation(parent) and ESScanNode(child) to a ESAggregation.
>
> In this way, I think it can scan and do aggregation by ES.
>
> I want to know what attitude about the combination, and Is it some better
> way to implement it ?
>
> Thanks a lot.
>

Reply via email to