[GitHub] incubator-zeppelin pull request: Add an Elasticsearch interpreter

jeffsteinmetz Mon, 07 Dec 2015 19:40:40 -0800

Github user jeffsteinmetz commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/520#issuecomment-162749562
  
    I love the idea of making elasticsearch a first class cititen to Zeppelin.
    
    I was curious however, as you build out the query language, eventually it 
will get to the point that you'll want to add more features to keep in parity 
with elasticsearch's existing "extensive" query dsl.  I've built some scala 
libraries that wrap ES for usage in API, spark utilities and other business 
logic.  Eventually the wrapper even starts to try to mimic some of the ES query 
language.
    As a long time user of Elasticsearch, elasticsearch-hadoop (spark 
integration), beta testing elasticsearch-hadoop, using ES from spark, etc, my 
one comment is the construction of another DSL (domain specific language) via 
an interpreter. 
    
    Curious if there has been any considerations made as to why you might not 
just pass through existing ES json using the existing query language?  
    Imagine if we abstracted another query language on top of Spark SQL %sql 
that wasn't SQL? 
    
    The ease of use is certainly welcome though.  So I was curious about the 
long term plans for the DSL in the interpreter.
    
    As a side note, I import this library, 
https://github.com/sksamuel/elastic4s  when I want to simplify my Elastic 
Search query experience ( this DSL misses out on a few functional builder 
patterns and recent features, but does cover about 80% of most use cases):  
    
    Using the native spark integration provided by Elasticsearch Hadoop after 
an `import org.elasticsearch.spark._` , you  have 2 options to load the results 
into an RDD, either the original json:  
    
    `val myRDD: RDD[(String, String)] = sc.esJsonRDD("someindex/doctype", 
query)`  
    
    or a Map 
    
    `val myRDD: RDD[(String, String)] = sc.esRDD("someindex/doctype", query)`
    
    Elasticsearch Hadoop (spark) can also create a dataframe using the 
elasticsearch spark sql context.
    
    Curious if there has been any thought around how the elasticsearch 
interpreter / pipeline could be a bit more spark, ES native and ES hadoop 
native friendly.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add an Elasticsearch interpreter

Reply via email to