Re: Elasticsearch and cassandra integration?
I am not actively working on the elsaticsearch cassandra river now, but always open to pull requests! :) https://github.com/ebay/cassandra-river I found another fork of the project: https://github.com/srecon/elasticsearch-cassandra-river -Utkarsh On Thu, Oct 16, 2014 at 7:10 AM, José Guilherme Vanz guilherme@gmail.com wrote: Hi, Utkarsh Are you still working on the cassandra river? Thanks Vanz On Monday, March 25, 2013 10:46:50 PM UTC-3, Utkarsh Sengar wrote: Thanks for the answer! I was able to write a simple river for cassandra while pulls data periodically (similar to couchdb's river). Which leads to some questions: 1. I saw that EsExecutors exists but there is no implementation of ScheduledExecutorService. So, is there any reason why EsExecutor is implemented other than having a custom name and priority? Can I use ScheduledExecutorService inside a river without any performance issues? 2. What I am doing for now is, I have 1 thread which wakes up every x hours and moves all the data from cassandra to ES, everytime. Its not very performant if the data is alot (will add some kind of batching of records). So wanted to know, are there some standard practices while throwing data to ES? The implementation is just 1 day old, very raw. I will put it up on github soon! I loved the simple APIs and it was very east to get started with (except lack of documentation, but reference implementations helped)! Thanks, -Utkarsh On Sat, Mar 23, 2013 at 2:31 AM, Jörg Prante joerg...@gmail.com wrote: 1. I use IntelliJ (previously Netbeans) and mvn on command line but Eclipse TestNG use is documented here: http://testng.org/doc/eclipse. html 2. Debugging running plugins works like debugging a running ES node. Beside extensive logging I use tools like jvisualvm to analyze runtime behaviour. 3. I think it is best to start from an existing river as boilerplate code. It helps to examine the river sources documented at http://www.elasticsearch.org/guide/reference/modules/plugins.html Jörg Am 23.03.13 04:56, schrieb Utkarsh Sengar: I agree with you. I am also inclined towards implementing a plugin due to lack of elastic search and cassandra integration. I have been looking at the jdbc and rss river and it surely helps to understand the anatomy of an ES river. Although I have some questions about elastic search plugin development: 1. These plugins have some nicely written tests whose test suits are defined in xml files under test/resources. How can I debug these tests via eclipse? 2. Say I have a working prototype of the plugin and I manually install it in my local elastic search instance by placing the plugin project in the plugins folder. What is the best way to debug the plugin in ES, except logging the output of-course. 3. Documentation about plugin development lacks but the sample rss river code helps. Can I safely assume that I can use rss river as a boildeplate project for cassandra river right? Or is there a way to create a plugin project for ES? Any pointers from you about ES plugin development will help :) -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/to pic/elasticsearch/9TJFiWr1oUQ/unsubscribe?hl=en-US. To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- Thanks, -Utkarsh -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/9TJFiWr1oUQ/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a8718cd-22b1-4b42-a938-e771b877fe6c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/2a8718cd-22b1-4b42-a938-e771b877fe6c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Thanks, -Utkarsh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADjjot_O9543Gnz3r%2BftFP-4-xaCKZE9ZjAWezt5JgD0B3imqg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Conditional query for geo location lookup
Bumping this one up. Any advice on the query? On Tue, May 13, 2014 at 6:17 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote: I have a usecase where I have 2 types of locations (i.e. with geo_point type): 1. Location 1: Has a lat/lon with say radius=90 miles (it will vary) and type=outgoing 2. Location 2: Has a lat/lon with no radius and type=incoming Now, when a query comes in with: lat/lon and radius=20, I expect this to happen: 1. Simple geo lookup: If the input lat/lon is within 20miles of location 2, return location 2. 2. If input lat/lon is within 90 miles of Location 1, return location 1 too in the result. If you notice, I want input radius to be overwritten by the saved radius for a specific type of location. This is what I have come up with using script: { query: { match_all: {} }, filter: { script: { script: !doc['geopoint'].empty doc['coverage_type'].value == 'outgoing' ? doc['geopoint'].distanceInMiles(37,-121) = doc['radius'].value : doc['geopoint'].distanceInMiles(37,-121) = 20 } } } Where 37,-121 is input lat/lon and 20 is the input radius. What do you think? -- Thanks, -Utkarsh -- Thanks, -Utkarsh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADjjot9Qxaq2Zd%2BdsCcX3YjJ4%3DKhAQ_fzv0BsUva5R-KWcSKOg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Conditional query for geo location lookup
I have a usecase where I have 2 types of locations (i.e. with geo_point type): 1. Location 1: Has a lat/lon with say radius=90 miles (it will vary) and type=outgoing 2. Location 2: Has a lat/lon with no radius and type=incoming Now, when a query comes in with: lat/lon and radius=20, I expect this to happen: 1. Simple geo lookup: If the input lat/lon is within 20miles of location 2, return location 2. 2. If input lat/lon is within 90 miles of Location 1, return location 1 too in the result. If you notice, I want input radius to be overwritten by the saved radius for a specific type of location. This is what I have come up with using script: { query: { match_all: {} }, filter: { script: { script: !doc['geopoint'].empty doc['coverage_type'].value == 'outgoing' ? doc['geopoint'].distanceInMiles(37,-121) = doc['radius'].value : doc['geopoint'].distanceInMiles(37,-121) = 20 } } } Where 37,-121 is input lat/lon and 20 is the input radius. What do you think? -- Thanks, -Utkarsh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADjjot-aRjyyuHCfvPa2BrSFG6%3DPhqAcG1AwoszQF3dazXBHxQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Bulk indexing tips for Elastic search and Cassandra River
Can you please file a bug (https://github.com/eBay/cassandra-river/issues) or share the stacktrace? Thanks, -Utkarsh On Tue, Feb 4, 2014 at 8:54 AM, AKhan ansa...@gmail.com wrote: cassandra-river is not working in my case too and I am getting exceptions on server side. elasticsearch.common.UUID; On Friday, March 29, 2013 10:01:14 PM UTC+1, utkar...@gmail.com wrote: Hello, I have been working on a cassandra river which triggers periodically and indexes all data in a cassandra column family. The implementation for now spawns 10 threads and processes 10k documents (with 13 columns)/thread. The performance initially was very good. It indexed 1M documents in 10mins. But after a 1hour, the indexing became very slow and it indexed around 8M documents. I am trying to index a total of 50M documents. I have attached a screenshot of the memory and CPU usage. What I noticed was, a lot of merge threads spawned up which reduced the speed considerably: elasticsearch[Doppelganger][[prodinfo][1]: Lucene Merge Thread #329] daemon prio=10 tid=0x2a63 nid=0x4c28 runnable [0x246bd000] So, I believe this has to do with some configuration which I can tweak to improve bulk indexing. I am running 1 node with 5 shared with 2GB of ES_HEAP_SIZE and no replicas for now. Shay mentioned some tips here: https://groups.google.com/ forum/?fromgroups=#!topic/elasticsearch/APWxRLrMOeU in 2011. Wanted to know if there are any bulk indexing performance improvements? I am also using: bulk.execute().addListener() (async) in place of bulk.execute().actionGet() (sync) I am planning to share the cassandra-river as soon its achieves acceptable performance. https://lh5.googleusercontent.com/-G8kxNXFaUmc/UVYAPHN-NFI/Agw/rmcLu6P1Urg/s1600/bigdesk_ES.png https://lh3.googleusercontent.com/-8RokQMwPSW0/UVYAJFwUvrI/Ago/GTEdX5MkTwA/s1600/visualvm_Es.png Thanks, -Utkarsh -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/M1aJqvAIpZE/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1f5550ca-d53e-4513-b691-8992e0504533%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- Thanks, -Utkarsh -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADjjot9uWLyBw%3D7ESYL%3D-sHLJmBLigPx4iON1UjsXNgpWsm18g%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.