Re: Elasticsearch and cassandra integration?

2014-10-16 Thread Utkarsh Sengar
I am not actively working on the elsaticsearch cassandra river now, but
always open to pull requests! :)
https://github.com/ebay/cassandra-river

I found another fork of the project:
https://github.com/srecon/elasticsearch-cassandra-river

-Utkarsh

On Thu, Oct 16, 2014 at 7:10 AM, José Guilherme Vanz 
guilherme@gmail.com wrote:

 Hi, Utkarsh

 Are you still working on the cassandra river?

 Thanks
 Vanz

 On Monday, March 25, 2013 10:46:50 PM UTC-3, Utkarsh Sengar wrote:

 Thanks for the answer! I was able to write a simple river for cassandra
 while pulls data periodically (similar to couchdb's river).

 Which leads to some questions:

 1. I saw that EsExecutors exists but there is no implementation of
 ScheduledExecutorService. So, is there any reason why EsExecutor is
 implemented other than having a custom name and priority? Can I use
 ScheduledExecutorService inside a river without any performance issues?

 2. What I am doing for now is, I have 1 thread which wakes up every x
 hours and moves all the data from cassandra to ES, everytime. Its not very
 performant if the data is alot (will add some kind of batching of records).
 So wanted to know, are there some standard practices while throwing data
 to ES?

 The implementation is just 1 day old, very raw. I will put it up on
 github soon!
 I loved the simple APIs and it was very east to get started with (except
 lack of documentation, but reference implementations helped)!

 Thanks,
 -Utkarsh


 On Sat, Mar 23, 2013 at 2:31 AM, Jörg Prante joerg...@gmail.com wrote:

 1. I use IntelliJ (previously Netbeans) and mvn on command line but
 Eclipse TestNG use is documented here: http://testng.org/doc/eclipse.
 html

 2. Debugging running plugins works like debugging a running ES node.
 Beside extensive logging I use tools like jvisualvm to analyze runtime
 behaviour.

 3. I think it is best to start from an existing river as boilerplate
 code. It helps to examine the river sources documented at
 http://www.elasticsearch.org/guide/reference/modules/plugins.html

 Jörg

 Am 23.03.13 04:56, schrieb Utkarsh Sengar:

  I agree with you. I am also inclined towards implementing a plugin due
 to lack of elastic search and cassandra integration. I have been looking at
 the jdbc and rss river and it surely helps to understand the anatomy of an
 ES river.

 Although I have some questions about elastic search plugin development:
 1. These plugins have some nicely written tests whose test suits are
 defined in xml files under test/resources. How can I debug these tests via
 eclipse?
 2. Say I have a working prototype of the plugin and I manually install
 it in my local elastic search instance by placing the plugin project in the
 plugins folder. What is the best way to debug the plugin in ES, except
 logging the output of-course.
 3. Documentation about plugin development lacks but the sample rss
 river code helps. Can I safely assume that I can use rss river as a
 boildeplate project for cassandra river right? Or is there a way to create
 a plugin project for ES?

 Any pointers from you about ES plugin development will help :)


 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit https://groups.google.com/d/to
 pic/elasticsearch/9TJFiWr1oUQ/unsubscribe?hl=en-US.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearc...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.





 --
 Thanks,
 -Utkarsh

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/9TJFiWr1oUQ/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2a8718cd-22b1-4b42-a938-e771b877fe6c%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2a8718cd-22b1-4b42-a938-e771b877fe6c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Thanks,
-Utkarsh

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADjjot_O9543Gnz3r%2BftFP-4-xaCKZE9ZjAWezt5JgD0B3imqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Conditional query for geo location lookup

2014-05-15 Thread Utkarsh Sengar
Bumping this one up. Any advice on the query?


On Tue, May 13, 2014 at 6:17 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 I have a usecase where I have 2 types of locations (i.e. with geo_point
 type):

 1. Location 1: Has a lat/lon with say radius=90 miles (it will vary) and
 type=outgoing
 2. Location 2: Has a lat/lon with no radius and type=incoming

 Now, when a query comes in with: lat/lon and radius=20, I expect this to
 happen:

 1. Simple geo lookup: If the input lat/lon is within 20miles of location
 2, return location 2.
 2. If input lat/lon is within 90 miles of Location 1, return location 1
 too in the result. If you notice, I want input radius to be overwritten by
 the saved radius for a specific type of location.

 This is what I have come up with using script:
 {
   query: {
 match_all: {}
   },
   filter: {
 script: {
   script: !doc['geopoint'].empty  doc['coverage_type'].value ==
 'outgoing' ? doc['geopoint'].distanceInMiles(37,-121) =
 doc['radius'].value : doc['geopoint'].distanceInMiles(37,-121) = 20
 }
   }
 }
 Where 37,-121 is input lat/lon and 20 is the input radius.


 What do you think?

 --
 Thanks,
 -Utkarsh




-- 
Thanks,
-Utkarsh

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADjjot9Qxaq2Zd%2BdsCcX3YjJ4%3DKhAQ_fzv0BsUva5R-KWcSKOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Conditional query for geo location lookup

2014-05-13 Thread Utkarsh Sengar
I have a usecase where I have 2 types of locations (i.e. with geo_point
type):

1. Location 1: Has a lat/lon with say radius=90 miles (it will vary) and
type=outgoing
2. Location 2: Has a lat/lon with no radius and type=incoming

Now, when a query comes in with: lat/lon and radius=20, I expect this to
happen:

1. Simple geo lookup: If the input lat/lon is within 20miles of location 2,
return location 2.
2. If input lat/lon is within 90 miles of Location 1, return location 1 too
in the result. If you notice, I want input radius to be overwritten by the
saved radius for a specific type of location.

This is what I have come up with using script:
{
  query: {
match_all: {}
  },
  filter: {
script: {
  script: !doc['geopoint'].empty  doc['coverage_type'].value ==
'outgoing' ? doc['geopoint'].distanceInMiles(37,-121) =
doc['radius'].value : doc['geopoint'].distanceInMiles(37,-121) = 20
}
  }
}
Where 37,-121 is input lat/lon and 20 is the input radius.


What do you think?

-- 
Thanks,
-Utkarsh

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADjjot-aRjyyuHCfvPa2BrSFG6%3DPhqAcG1AwoszQF3dazXBHxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk indexing tips for Elastic search and Cassandra River

2014-02-04 Thread Utkarsh Sengar
Can you please file a bug (https://github.com/eBay/cassandra-river/issues)
or share the stacktrace?

Thanks,
-Utkarsh


On Tue, Feb 4, 2014 at 8:54 AM, AKhan ansa...@gmail.com wrote:

 cassandra-river is not working in my case too and I am getting exceptions
 on server side.

 elasticsearch.common.UUID;

 On Friday, March 29, 2013 10:01:14 PM UTC+1, utkar...@gmail.com wrote:

 Hello,

 I have been working on a cassandra river which triggers periodically and
 indexes all data in a cassandra column family. The implementation for now
 spawns 10 threads and processes 10k documents (with 13 columns)/thread.
 The performance initially was very good. It indexed 1M documents in
 10mins. But after a 1hour, the indexing became very slow and it indexed
 around 8M documents. I am trying to index a total of 50M documents.

 I have attached a screenshot of the memory and CPU usage. What I noticed
 was, a lot of merge threads spawned up which reduced the speed considerably:
 elasticsearch[Doppelganger][[prodinfo][1]: Lucene Merge Thread #329]
 daemon prio=10 tid=0x2a63 nid=0x4c28 runnable [0x246bd000]

 So, I believe this has to do with some configuration which I can tweak to
 improve bulk indexing. I am running 1 node with 5 shared with 2GB of
 ES_HEAP_SIZE and no replicas for now.

 Shay mentioned some tips here: https://groups.google.com/
 forum/?fromgroups=#!topic/elasticsearch/APWxRLrMOeU in 2011.
 Wanted to know if there are any bulk indexing performance improvements?

 I am also using: bulk.execute().addListener() (async) in place of
 bulk.execute().actionGet() (sync)

 I am planning to share the cassandra-river as soon its achieves acceptable
 performance.



 https://lh5.googleusercontent.com/-G8kxNXFaUmc/UVYAPHN-NFI/Agw/rmcLu6P1Urg/s1600/bigdesk_ES.png




 https://lh3.googleusercontent.com/-8RokQMwPSW0/UVYAJFwUvrI/Ago/GTEdX5MkTwA/s1600/visualvm_Es.png


 Thanks,
 -Utkarsh

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/M1aJqvAIpZE/unsubscribe.

 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1f5550ca-d53e-4513-b691-8992e0504533%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.




-- 
Thanks,
-Utkarsh

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADjjot9uWLyBw%3D7ESYL%3D-sHLJmBLigPx4iON1UjsXNgpWsm18g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.