[ 
https://issues.apache.org/jira/browse/LENS-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623891#comment-14623891
 ] 

Amruth S edited comment on LENS-252 at 7/12/15 5:14 PM:
--------------------------------------------------------

Elastic search driver for lens
~~~~~~~~~~~~~~~~~~~~~~

Elastic search accepts a nested json as a query and returns a json result.
The json result is nested for group by queries and simple for simple selects.

HQL -> ES json query
~~~~~~~~~~~~~~~~~
-> I have written a traversal (ASTTraverserForES)(specific for noSQL stores). 
The traversal could be used for any purpose like query building/validation etc.
-> The traversal will take in a query visitor (ASTVisitor.java) (for building 
the query) and a criteria visitor (for building the where clause). I have 
checked in concrete visitors for ElasticSearch that can build the elastic 
search json query.

Elasticsearch client
~~~~~~~~~~~~~~~
-> There are multiple choices of elasticsearch client available. I've made the 
client pluggable.
-> I've added one default HTTPClient implementation (Jest library - apache 2). 
The choice of HTTP client over transport client was made because of the version 
consistency requirement between the transport client and the ES server.
-> Every client has to implement an execute method that takes in the query and 
returns a LensResultSet. Hence the transformation of resultset must also be 
done by the client implementation.

Elasticsearch Jest json response -> LensResultSet
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The result set obtained could be a simple hit result (document response) or 
facets response.
-> Facets response has a tree structure. Finding all the paths in the tree will 
give us all the rows. Look at JestResultSetTransformer (AggregateTransformer)
-> Hits response is straightforward to decode (TermTransformer)

Known issues/shortcommings
~~~~~~~~~~~~~~~~~~~~~~~~
-> Scrolling responses for aggregate queries (by design ES always returns the 
complete bucket in a single json - there is no scroll facility)
-> Order by in aggregate queries. Fully functional order by queries can get 
complex as the ordering by measure can happen only in the immediate parent 
group by. 'Limit' is also blocked as it could be misleading to have limit 
without order by.
-> *, count(*) is not available as of now.
-> support for other UDFs. Right now common UDAFs like sum, min, max are 
supported. We need a way to seamlessly translate a new UDF to elastic search 
without code change
-> Query estimation 
-> Session level config injection for properties like fetch size and group by 
cardinality size? (Right now these configs are at driver level)

Have added a esdriver-default.xml for looking up default properties


was (Author: amrk7):
Elastic search driver for lens
~~~~~~~~~~~~~~~~~~~~~~

Elastic search accepts a nested json as a query and returns a json result.
The json result is nested for group by queries and simple for simple selects.

HQL -> ES json query
~~~~~~~~~~~~~~~~~
-> I have written a traversal (ASTTraverserForES)(specific for noSQL stores). 
The traversal could be used for any purpose like query building/validation etc.
-> The traversal will take in a query visitor (ASTVisitor.java) (for building 
the query) and a criteria visitor (for building the where clause). I have 
checked in concrete visitors for ElasticSearch that can build the elastic 
search json query.

Elasticsearch client
~~~~~~~~~~~~~~~
-> There are multiple choices of elasticsearch client available. I've made the 
client pluggable.
-> I've added one default HTTPClient implementation (Jest library - apache 2). 
The choice of HTTP client over transport client was made because of the version 
consistency requirement between the transport client and the ES server.
-> Every client has to implement an execute method that takes in the query and 
returns a LensResultSet. Hence the transformation of resultset must also be 
done by the client implementation.

Elasticsearch Jest json response -> LensResultSet
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The result set obtained could be a simple hit result (document response) or 
facets response.
-> Facets response has a tree structure. Finding all the paths in the tree will 
give us all the rows. Look at JestResultSetTransformer (AggregateTransformer)
-> Hits response is straightforward to decode (TermTransformer)

Have added a esdriver-default.xml for looking up default properties

> Add Elastic Search Driver
> -------------------------
>
>                 Key: LENS-252
>                 URL: https://issues.apache.org/jira/browse/LENS-252
>             Project: Apache Lens
>          Issue Type: New Feature
>            Reporter: Sharad Agarwal
>            Assignee: Amruth S
>              Labels: Hackathon-July, newbie
>         Attachments: LENS-252.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to