Yes, I know what your concerns, I will do the design at first, and try to modify less impala core code but adding some code to expand it. all in all, a brief design will be output at first, and the community can review and give me some advice. Thanks again.
2017-11-04 1:56 GMT+08:00 Dimitris Tsirogiannis <dtsirogian...@cloudera.com> : > Hi Yu, > > First of all thank you for your interest in extending Impala. That said, > what you're proposing is not a trivial task as it will affect multiple > impala components (metadata, planner, query exec, etc). Also, it adds > another dependency to Impala that will have to be continuously tested and > maintained. So, before jumping into the code, I think you need to submit a > proposal that outlines: > 1. The design; see how Impala interacts with other storage engines such as > Kudu or HBase to understand which components are affected by such a change. > 2. How are you going to test this and what kind of testing infrastructure > will be in place in order to ensure that future commits don't break the > integration with ElasticSearch? > 3. Timeline and milestones (sub-tasks) for this project. > > I suggest submitting your proposal as a google doc so that it's easier to > comment on. At the same time, I think it's very important for you to get > more experience in modifying the Impala codebase. So, before endeavoring in > such a big task, it may worth spending some time working on a few smaller > (ramp-up) tasks. > > Thanks, > Dimitris > > > > On Thu, Nov 2, 2017 at 11:32 PM, yu feng <olaptes...@gmail.com> wrote: > > > Hi All : > > > > We are try to query data from Elasticsearch using impala, we want > > to take advantage of fast speed of impala engine and fast filter and > > aggregation speed of Elasticsearch. > > > > I want to do it in the following way : > > > > 1、add a new Table type(metadata) called ES Table. > > 2、add two new ExecNode(ESScanNode and ESAggregation) to implements query > to > > ES. > > 3、when a query to ES Table, try to rewrite execution plan while contains > > Aggregation(parent) and ESScanNode(child) to a ESAggregation. > > > > In this way, I think it can scan and do aggregation by ES. > > > > I want to know what attitude about the combination, and Is it some better > > way to implement it ? > > > > Thanks a lot. > > >