Re: mass delete by query

2014-03-31 Thread Kevin Wang
Why not use TTL for 
document? 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html

On Tuesday, April 1, 2014 8:50:14 AM UTC+11, slushi wrote:
>
> I have varying data retention requirements I am trying to balance (I am 
> continuously indexing new documents):
>
>- 1% of my documents need to be kept forever
>- 10% need to be kept 1 year
>- the remainder needs to be kept for 1 month
>
> I can easily set properties indicating the retention policy for each 
> document and then periodically do a "delete by query". However, since the 
> delete would remove 89% of the indexed documents, would there be any 
> potential performance problems with this straightforward approach? I guess 
> this is a YMMV type thing, but I was just wondering what the typical 
> approach is here. Would it be necessary to perhaps filter the query to not 
> affect so many documents at once? Would query performance be greatly 
> impacted?
>
> The alternate approach I was thinking would be to create separate indices 
> for each retention type. Cleanup would be easier, but unfortunately a 
> document's retention policy can be upgraded/downgraded so that could be a 
> little messy to keep consistent.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eefba11c-d147-4e02-b84b-bc8f90a08e3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shards/routing documents imbalance problem

2014-03-27 Thread Kevin Wang
You can add that as a plugin, 
see 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
 

On Thursday, March 27, 2014 9:32:28 PM UTC+11, Han JU wrote:
>
> Thanks but can you explain some detail?
> Say I have the class in `MyHashFunction.java`, how could I put it in ES? I 
> need to modify the code of ES or ? 
>
> 在 2014年3月27日星期四UTC+1上午11时27分24秒,Kevin Wang写道:
>>
>> You can add a class that implements HashFunction and set the setting 
>> "cluster.routing.operation.hash.type“ to that class.
>>
>>
>> Regards,
>> Kevin 
>>
>> On Thursday, March 27, 2014 9:11:39 PM UTC+11, Han JU wrote:
>>>
>>> Do you guys know how to plug in a custom hash function for routing 
>>> parameter?
>>>
>>> 在 2014年3月26日星期三UTC+1下午12时51分24秒,Han JU写道:
>>>>
>>>> Thanks a lot Kevin.
>>>>
>>>> That DJB_HASH result makes it clear for us. I think we'll just use the 
>>>> id value as hash.
>>>> Do you guys know how to plugin a custom hash function?
>>>>
>>>>
>>>> 在 2014年3月26日星期三UTC+1上午11时58分36秒,Kevin Wang写道:
>>>>>
>>>>> There are two hash functions 
>>>>> implementation 
>>>>> org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction 
>>>>> and 
>>>>> org.elasticsearch.cluster.routing.operation.hash.simple.SimpleHashFunction,
>>>>>  
>>>>> default is DjbHashFunction. You can try get the hash by 
>>>>> using DjbHashFunction.DJB_HASH(you id)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wednesday, March 26, 2014 9:49:10 PM UTC+11, Han JU wrote:
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> As far as I know, in Java, basic hash value of positive int/long 
>>>>>> value is just themselves (our ids are small values like 1125, 345 etc).
>>>>>> So I calculated some_id % 128, and I got 116 distinct values. But in 
>>>>>> reality there's a lot less shards in use. 
>>>>>>
>>>>>> Does ElasticSearch use some special hash function?
>>>>>>
>>>>>> 在 2014年3月26日星期三UTC+1上午11时39分15秒,Kevin Wang写道:
>>>>>>>
>>>>>>> ES will get the shard id by hash(routing)%num of shards, in your 
>>>>>>> case, there are only 167 distinct values but have 128 shards, I think 
>>>>>>> it's 
>>>>>>> highly possible there is less than 128 distinct hash values. So some of 
>>>>>>> the 
>>>>>>> shard will not have any data.
>>>>>>>
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We've indexed 25M documents into a single index of 128 shards with 
>>>>>>>> 1 replica. 
>>>>>>>> The `routing` parameter is set to a path in the document, which is 
>>>>>>>> an int value:
>>>>>>>>
>>>>>>>> _routing: {
>>>>>>>>   path: "some_id"
>>>>>>>>   required: true
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> In out 25M documents, there's 167 distinct values of this "some_id" 
>>>>>>>> and in our expectation, ElasticSearch will route these documents 
>>>>>>>> evenly 
>>>>>>>> across all shards.
>>>>>>>> But we've found out that, out of 128 shards, there are 53 empty 
>>>>>>>> shards (with 0 document inside), or, 40% of the shards are not used at 
>>>>>>>> all.
>>>>>>>>
>>>>>>>> My question: 
>>>>>>>>
>>>>>>>> - is this normal? Do we miss something in configuring routing? 
>>>>>>>> - does this imbalanced shard utilization affect indexing speed?
>>>>>>>>
>>>>>>>> We can confirm that all documents are correctly indexed and routing 
>>>>>>>> works (when searching with routing only 1 shard responds with the 
>>>>>>>> correct 
>>>>>>>> answer).
>>>>>>>> ElasticSearch version is v1.0.1.
>>>>>>>>
>>>>>>>>  
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1ade0d24-ee99-4a00-8bf0-7a749271b58d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shards/routing documents imbalance problem

2014-03-27 Thread Kevin Wang
You can add a class that implements HashFunction and set the setting 
"cluster.routing.operation.hash.type“ to that class.


Regards,
Kevin 

On Thursday, March 27, 2014 9:11:39 PM UTC+11, Han JU wrote:
>
> Do you guys know how to plug in a custom hash function for routing 
> parameter?
>
> 在 2014年3月26日星期三UTC+1下午12时51分24秒,Han JU写道:
>>
>> Thanks a lot Kevin.
>>
>> That DJB_HASH result makes it clear for us. I think we'll just use the id 
>> value as hash.
>> Do you guys know how to plugin a custom hash function?
>>
>>
>> 在 2014年3月26日星期三UTC+1上午11时58分36秒,Kevin Wang写道:
>>>
>>> There are two hash functions 
>>> implementation 
>>> org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction 
>>> and 
>>> org.elasticsearch.cluster.routing.operation.hash.simple.SimpleHashFunction, 
>>> default is DjbHashFunction. You can try get the hash by 
>>> using DjbHashFunction.DJB_HASH(you id)
>>>
>>>
>>>
>>>
>>> On Wednesday, March 26, 2014 9:49:10 PM UTC+11, Han JU wrote:
>>>>
>>>> Thanks for your reply.
>>>>
>>>> As far as I know, in Java, basic hash value of positive int/long value 
>>>> is just themselves (our ids are small values like 1125, 345 etc).
>>>> So I calculated some_id % 128, and I got 116 distinct values. But in 
>>>> reality there's a lot less shards in use. 
>>>>
>>>> Does ElasticSearch use some special hash function?
>>>>
>>>> 在 2014年3月26日星期三UTC+1上午11时39分15秒,Kevin Wang写道:
>>>>>
>>>>> ES will get the shard id by hash(routing)%num of shards, in your case, 
>>>>> there are only 167 distinct values but have 128 shards, I think it's 
>>>>> highly 
>>>>> possible there is less than 128 distinct hash values. So some of the 
>>>>> shard 
>>>>> will not have any data.
>>>>>
>>>>>
>>>>> Kevin
>>>>>
>>>>> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We've indexed 25M documents into a single index of 128 shards with 1 
>>>>>> replica. 
>>>>>> The `routing` parameter is set to a path in the document, which is an 
>>>>>> int value:
>>>>>>
>>>>>> _routing: {
>>>>>>   path: "some_id"
>>>>>>   required: true
>>>>>> }
>>>>>>
>>>>>>
>>>>>> In out 25M documents, there's 167 distinct values of this "some_id" 
>>>>>> and in our expectation, ElasticSearch will route these documents evenly 
>>>>>> across all shards.
>>>>>> But we've found out that, out of 128 shards, there are 53 empty 
>>>>>> shards (with 0 document inside), or, 40% of the shards are not used at 
>>>>>> all.
>>>>>>
>>>>>> My question: 
>>>>>>
>>>>>> - is this normal? Do we miss something in configuring routing? 
>>>>>> - does this imbalanced shard utilization affect indexing speed?
>>>>>>
>>>>>> We can confirm that all documents are correctly indexed and routing 
>>>>>> works (when searching with routing only 1 shard responds with the 
>>>>>> correct 
>>>>>> answer).
>>>>>> ElasticSearch version is v1.0.1.
>>>>>>
>>>>>>  
>>>>>> Thanks!
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9817fbd7-5e75-4557-807f-276df5b3120d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Mapping question

2014-03-26 Thread Kevin Wang
I think you can use Dynamic template for 
that 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates


Regards,
Kevin

On Thursday, March 27, 2014 9:03:55 AM UTC+11, Parag Shah wrote:
>
> Hi all,
>
> {
> "fruits" : {
> "apple" : {
> "sweet" : true,
> "color" : "red",
> "seed" : "red",
> "flesh" : "white"
> },
> "orange" : {
> "sweet" : true,
> "color" : "orange",
> "seed" : "white",
> "flesh" : "orange"
> } 
> }
> }
>
> The above is what I want to generate mappings for.
>
> fruits is the container for different kinds of fruits.
>
> I could add any other fruit (unknown) to "fruits" (known) like:
>
> "banana" : {
>  "sweet" : true,
>  "color" : "yellow",
>  "seed" : "black",
>  "flesh" : "white"
> }
>
> banana is a user-generated value, and for any fruit here and it has some 
> attributes like sweet, color, seed and flesh.
>
> I am not sure how I would do the mapping for this kind of a structure. Any 
> help will be appreciated.
>
> Regards
> Parag
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b1250ddd-cb40-440b-b0a0-fff909041486%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shards/routing documents imbalance problem

2014-03-26 Thread Kevin Wang
There are two hash functions 
implementation 
org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction 
and org.elasticsearch.cluster.routing.operation.hash.simple.SimpleHashFunction, 
default is DjbHashFunction. You can try get the hash by 
using DjbHashFunction.DJB_HASH(you id)




On Wednesday, March 26, 2014 9:49:10 PM UTC+11, Han JU wrote:
>
> Thanks for your reply.
>
> As far as I know, in Java, basic hash value of positive int/long value is 
> just themselves (our ids are small values like 1125, 345 etc).
> So I calculated some_id % 128, and I got 116 distinct values. But in 
> reality there's a lot less shards in use. 
>
> Does ElasticSearch use some special hash function?
>
> 在 2014年3月26日星期三UTC+1上午11时39分15秒,Kevin Wang写道:
>>
>> ES will get the shard id by hash(routing)%num of shards, in your case, 
>> there are only 167 distinct values but have 128 shards, I think it's highly 
>> possible there is less than 128 distinct hash values. So some of the shard 
>> will not have any data.
>>
>>
>> Kevin
>>
>> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>>
>>> Hi,
>>>
>>> We've indexed 25M documents into a single index of 128 shards with 1 
>>> replica. 
>>> The `routing` parameter is set to a path in the document, which is an 
>>> int value:
>>>
>>> _routing: {
>>>   path: "some_id"
>>>   required: true
>>> }
>>>
>>>
>>> In out 25M documents, there's 167 distinct values of this "some_id" and 
>>> in our expectation, ElasticSearch will route these documents evenly across 
>>> all shards.
>>> But we've found out that, out of 128 shards, there are 53 empty shards 
>>> (with 0 document inside), or, 40% of the shards are not used at all.
>>>
>>> My question: 
>>>
>>> - is this normal? Do we miss something in configuring routing? 
>>> - does this imbalanced shard utilization affect indexing speed?
>>>
>>> We can confirm that all documents are correctly indexed and routing 
>>> works (when searching with routing only 1 shard responds with the correct 
>>> answer).
>>> ElasticSearch version is v1.0.1.
>>>
>>>  
>>> Thanks!
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9c8a9eba-2f0f-452f-98ac-34463da7f496%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Java API or REST API for client development ?

2014-03-26 Thread Kevin Wang
I think it's better to use official client. REST API will also call Java 
API internally. so if you use REST it will be Java -> REST -> Java.
What do you mean by multiplatform? Android? I'm not quite sure ES Java API 
works on Android or not, but I think the Android shouldn't talk to ES 
directly.


Regards,
Kevin

On Wednesday, March 26, 2014 9:42:19 PM UTC+11, Subhadip Bagui wrote:
>
>
> My app is in Java only. So what I mean is should I use elasticsearch Java 
> client or available REST api's only using HttpClient and all.
>
> What will be more flexiable for multiplatform ingration ?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bbe9688c-ef1f-46e3-9ecd-9d183672e310%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: IndexOutOfBoundsException at IndexShardRoutingTable class

2014-03-26 Thread Kevin Wang
pickIndex() will return the absolute value of the count, so it won't return 
a negative value. Can you provide more details?


Kevin


On Wednesday, March 26, 2014 3:53:15 PM UTC+11, Shinsuke Sugaya wrote:
>
> Hi
>
> I encountered the following problem:
>
> Caused by: java.lang.IndexOutOfBoundsException: index (-2) must not be 
> negative
> at 
> org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:306)
> at 
> org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:285)
> at 
> org.elasticsearch.common.collect.RegularImmutableList.get(RegularImmutableList.java:65)
> at 
> org.elasticsearch.cluster.routing.IndexShardRoutingTable.preferNodeActiveInitializingShardsIt(IndexShardRoutingTable.java:378)
> at 
> org.elasticsearch.cluster.routing.operation.plain.PlainOperationRouting.preferenceActiveShardIterator(PlainOperationRouting.java:210)
> at 
> org.elasticsearch.cluster.routing.operation.plain.PlainOperationRouting.getShards(PlainOperationRouting.java:80)
> at 
> org.elasticsearch.action.get.TransportGetAction.shards(TransportGetAction.java:80)
> at 
> org.elasticsearch.action.get.TransportGetAction.shards(TransportGetAction.java:42)
> at 
> org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.(TransportShardSingleOperationAction.java:121)
> at 
> org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.(TransportShardSingleOperationAction.java:97)
> at 
> org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:74)
> at 
> org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:49)
> at 
> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
> at 
> org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:49)
> at 
> org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:85)
> at 
> org.elasticsearch.client.support.AbstractClient.get(AbstractClient.java:174)
> ... 9 more
>
> My environment is:
>
>  - Elasticserach 0.90.7
>  - 3 nodes in a cluster
>  - Send GET request with preference=_local
>
> Looking into IndexShardRoutingTable class, it seems that "loc" is 
> an unexpected negative value at the following code. pickIndex method 
> returns a value of "counter"(incremental value). If "counter" achieves 
> Integer.MAX_VALUE, I think that "loc" is negative and then 
> activeShards.get(loc) throws the exception.
>
> int index = pickIndex();
> for (int i = 0; i < activeShards.size(); i++) {
> int loc = (index + i) % activeShards.size();
>
> If it's a bug, I'll file an issue.
>
> Best regards,
>  shinsuke
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/908dc93c-e7b6-4a03-802e-fe6e18f30f10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shards/routing documents imbalance problem

2014-03-26 Thread Kevin Wang
ES will get the shard id by hash(routing)%num of shards, in your case, 
there are only 167 distinct values but have 128 shards, I think it's highly 
possible there is less than 128 distinct hash values. So some of the shard 
will not have any data.


Kevin

On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>
> Hi,
>
> We've indexed 25M documents into a single index of 128 shards with 1 
> replica. 
> The `routing` parameter is set to a path in the document, which is an int 
> value:
>
> _routing: {
>   path: "some_id"
>   required: true
> }
>
>
> In out 25M documents, there's 167 distinct values of this "some_id" and in 
> our expectation, ElasticSearch will route these documents evenly across all 
> shards.
> But we've found out that, out of 128 shards, there are 53 empty shards 
> (with 0 document inside), or, 40% of the shards are not used at all.
>
> My question: 
>
> - is this normal? Do we miss something in configuring routing? 
> - does this imbalanced shard utilization affect indexing speed?
>
> We can confirm that all documents are correctly indexed and routing works 
> (when searching with routing only 1 shard responds with the correct answer).
> ElasticSearch version is v1.0.1.
>
>  
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8961b19-e024-4a04-83fa-48f4cd44b7c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Java API or REST API for client development ?

2014-03-26 Thread Kevin Wang
There is no official REST client for Java, so I think if your app is in 
Java you have to use Java API. If not in Java you have to use REST API




On Wednesday, March 26, 2014 8:46:16 PM UTC+11, Subhadip Bagui wrote:
>
> Hi, 
>
> We have a cloud management framework where all the event data are to be 
> stored in elasticsearch. I have to start the client side code for this.
>  
> I need a suggestion here. Which one should I use, elasticsearch Java API 
> or REST API for the client ?
>
> Kindly suggest and mention the pros and cons for the same so it will be 
> easy for me to decide the product design than latter hassel.
>
> Subhadip
>  
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1a863a31-ac80-4cc8-bf0a-c0f6ecb6771d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Java issue when trying to send requests to ElasticSearch

2014-03-21 Thread Kevin Wang
It looks like you are using "elasticsearch-http-basic" plugin and that 
plugin doesn't support ES 1.0
https://github.com/Asquera/elasticsearch-http-basic/issues/9


On Friday, March 21, 2014 9:50:02 PM UTC+11, cha...@pocketplaylab.com wrote:
>
> Hi all,
>
> I am currently trying to set up a complete ElasticSearch + LogStash + 
> Kibana stack on Amazon Web Services OpsWorks using the following tutorial : 
> http://devblog.springest.com/complete-logstash-stack-on-aws-opsworks-in-15-minutes/
>
> Most of the things run fine except for ElasticSearch. When the process is 
> started, if I try to do a simple *c**url -X GET http://localhost:9200/ 
> *, I get the following answer : *curl: (52) Empty 
> reply from server*
>
> In my cluster's log, I see the hereunder java error. Did anybody 
> experience that ? Any suggestions ?
>
> Thanks for your help,
>
> Charles.
>
> Java error :
>
> *[2014-03-21 10:46:48,657][WARN ][http.netty   ] [Cecilia 
> Reyes] Caught exception while handling client http traffic, closing 
> connection [id: 0xf290eec5, /127.0.0.1:60355  => 
> /127.0.0.1:9200 ]*
>
> *java.lang.IncompatibleClassChangeError: Found class 
> org.elasticsearch.http.HttpRequest, but interface was expected*
>
> * at 
> com.asquera.elasticsearch.plugins.http.HttpBasicServer.shouldLetPass(HttpBasicServer.java:43)*
>
> * at 
> com.asquera.elasticsearch.plugins.http.HttpBasicServer.internalDispatchRequest(HttpBasicServer.java:35)*
>
> * at 
> org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)*
>
> * at 
> org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)*
>
> * at 
> org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)*
>
> * at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>
> * at 
> org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)*
>
> * at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>
> * at 
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)*
>
> * at 
> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)*
>
> * at 
> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)*
>
> * at 
> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)*
>
> * at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)*
>
> * at 
> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)*
>
> * at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)*
>
> * at 
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)*
>
> * at 
> org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)*
>
> * at 
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)*
>
> * at 
> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)*
>
> * at 
> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)*
>
> * at 
> java.util.concurr

[Ann] Elasticsearch Image Plugin 1.2.0 released

2014-03-20 Thread Kevin Wang
Hi All,

I've released version 1.2.0 of Elasticsearch Image Plugin.
The Image Plugin is an Content Based Image Retrieval Plugin for 
Elasticsearch using LIRE (Lucene Image Retrieval). It allows users to index 
images and search for similar images.

Changes in 1.2.0:

   - Use multi-thread when multiple features are required to improve index 
   speed
   - Allow index metadata
   - Allow query by existing image in index
   


https://github.com/kzwang/elasticsearch-image

Also I've created a demo website for this plugin 
(http://demo.elasticsearch-image.com/), it has 1,000,000 images  from 
MIRFLICKR-1M collection (http://press.liacs.nl/mirflickr)


Thanks,
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e43d34ff-e0e6-4d3a-a3d2-565bb6867e67%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[Ann] Elasticsearch Image Plugin 1.1.0 released

2014-03-13 Thread Kevin Wang
Hi All,

I've released version 1.1.0 of Elasticsearch Image Plugin.
The Image Plugin is an Content Based Image Retrieval Plugin for 
Elasticsearch using LIRE (Lucene Image Retrieval). It allows users to index 
images and search for similar images.

Changes in 1.1.0:

   - Added limit in image query
   - Added plugin version in es-plugin.properties
   

https://github.com/kzwang/elasticsearch-image

Also I've created a demo website for this plugin (
http://demo.elasticsearch-image.com/), it has 1,000,000 images (well, 
haven't finish index all images yet, but it should be able to demo this 
plugin) from MIRFLICKR-1M collection (http://press.liacs.nl/mirflickr)


Thanks,
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/14c7ca2e-e6c0-4c68-bedd-02fd0c85db40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Ann] Elasticsearch Image Plugin 1.0.0 released

2014-03-05 Thread Kevin Wang
Hi Nik,

For storage, it will only store analyzed binary for each feature, so it 
won't use lots of space. I tried index 5000 images,~550MB, it only uses 
~7MB for index (has to disable source for that image field otherwise 
elasticsearch will store the base64 of the image, it will use lots of 
space).

For hash, if no hash is used, it's an linear search, it will calculate 
score for all images in your index. If hash is used, it will only calculate 
score for matching images. The current implementation in this plugin is a 
little bit different than original LIRE, When using hash search, LIRE will 
only return top n (set by user) matches, however this plugin will return 
all matches. The more result returned by hash query, more accurate the 
result and more search time. I'll support limit the number of matches soon 
(hopefully next release). 
There is a image in LIRE for compare linear search and hashing based 
search, 
http://www.semanticmetadata.net/wp-content/uploads/2013/03/Results-CEDD-Hashing.jpg

Bag of Visual Words Indexing 
(http://www.semanticmetadata.net/wiki/doku.php?id=lire:bovw) will also help 
large data sets. However that's much more complex and you have to have 
enough images in the index in order to use this. I'll include this one in 
the plugin later as well.


Kevin


On Thursday, March 6, 2014 1:15:32 AM UTC+11, Nikolas Everett wrote:
>
> Oh man do I have some images  So I see something about a hash for 
> large datasets but no reference to what the storage requirements are like 
> either for the hash or for the linear search.  Do you have some links?  I'm 
> working with on order of ten million images and, while I don't have a 
> mandate to implement image search, it'd be pretty cool.
>
> Nik
>
>
> On Wed, Mar 5, 2014 at 6:45 AM, David Pilato 
> > wrote:
>
>> Sounds great!
>>
>> Congrats.
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>> Le 5 mars 2014 à 12:04, Kevin Wang > a 
>> écrit :
>>
>> Hi,
>>
>> I've released the first version of Elasticsearch Image Plugin. It's an 
>> Content Based Image Retrieval Plugin for Elasticsearch using LIRE (Lucene 
>> Image Retrieval).
>>
>> https://github.com/kzwang/elasticsearch-image
>>
>>
>> Thanks,
>> Kevin
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/7b78f7dc-3f8b-4e78-ba56-0f314e48eb2d%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/7b78f7dc-3f8b-4e78-ba56-0f314e48eb2d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/D9EF7C51-6504-43C5-83FB-66082BEB53C9%40pilato.fr<https://groups.google.com/d/msgid/elasticsearch/D9EF7C51-6504-43C5-83FB-66082BEB53C9%40pilato.fr?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3ad6335-985f-4682-8352-deac0969%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [Ann] Elasticsearch Image Plugin 1.0.0 released

2014-03-05 Thread Kevin Wang
It allows you to index images and search for similar images. (like google 
image search, you upload an image and return similiar ones)

On Wednesday, March 5, 2014 10:16:54 PM UTC+11, Garry Welding wrote:
>
> I might be being a bit dense here, but what exactly does this allow you to 
> do?
>
> On Wednesday, March 5, 2014 11:04:49 AM UTC, Kevin Wang wrote:
>>
>> Hi,
>>
>> I've released the first version of Elasticsearch Image Plugin. It's an 
>> Content Based Image Retrieval Plugin for Elasticsearch using LIRE (Lucene 
>> Image Retrieval).
>>
>> https://github.com/kzwang/elasticsearch-image
>>
>>
>> Thanks,
>> Kevin
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/be16d669-6bde-4f16-bc97-2c97473ce394%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[Ann] Elasticsearch Image Plugin 1.0.0 released

2014-03-05 Thread Kevin Wang
Hi,

I've released the first version of Elasticsearch Image Plugin. It's an 
Content Based Image Retrieval Plugin for Elasticsearch using LIRE (Lucene 
Image Retrieval).

https://github.com/kzwang/elasticsearch-image


Thanks,
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7b78f7dc-3f8b-4e78-ba56-0f314e48eb2d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Outdated doc - what is the 1.0 equivalent?

2014-02-12 Thread Kevin Wang
It has been changed to setPostFilter(...)

On Thursday, February 13, 2014 1:39:11 PM UTC+11, Ben McCann wrote:
>
> Hi,
>
> The method setFilter on this page no longer exists:
>
> http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/search.html
>
> Any tips on what that has changed to?
>
> Thanks,
> Ben
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/04de0b3c-5f43-4bf3-ab2f-5ae638e5e35c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


DynamoDB river plugin

2014-02-10 Thread Kevin Wang
Hi,
I've created a river plugin for AWS DynamoDB. It can fetch data from 
DynamoDB and index into Elasticsearch.


https://github.com/kzwang/elasticsearch-river-dynamodb



Thanks,
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5167b26e-67e3-42fa-aef2-e6e0e2f12120%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Updating the indexes in every half an hour

2014-02-10 Thread Kevin Wang
You can use JDBC river plugin to fetch from database directly.

https://github.com/jprante/elasticsearch-river-jdbc



On Monday, February 10, 2014 8:40:31 PM UTC+11, Vallabh Bothre wrote:
>
> Dear All,
>
> I am using elasticsearch in some of the my API.
> I have created the index and document and have added data in elasticsearch 
> server from Mysql database.
>
> I am following 3 steps that is,
> 1. Delete the index using 
> curl -X DELETE 'http://localhost:9200/adminvenue/?pretty=true'
>
> 2. Create the index using 
> curl -X PUT 'http://localhost:9200/adminvenue/?pretty=true' -d '
> {...
>
> 3. Create the mapping using 
> curl -X PUT '
> http://localhost:9200/adminvenue/jos_content/_mapping?pretty=true' -d '
> {..
>
>
> I have set the cron which runs every half an hour.
> Cron consist of 2 files named .sh file which is for 
> creating/deleting/mapping indexes and .php which is use for adding data 
> from mysql to elasticsearch.
>
> My mysql data is continuously updating that is some data get removed some 
> modified and some added because of this i have run cron for 1/2 hour.
>
> My concerns is that i have to first delete the indexes and then create the 
> indexes in every 1/2 hour.
> Is there any way to update the indexes that is i just wanted to update the 
> data in elasticsearch which is added, deleted and modified in mysql ?
>
> Any suggestion is very much appreciated.
>
> Thanks,
> Vallabh
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05415e13-f76c-420e-bac8-15a754230296%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: GridFS repository plugin

2014-02-08 Thread Kevin Wang
Thanks David.



On Saturday, February 8, 2014 10:14:07 AM UTC+11, Kevin Wang wrote:
>
> Hi,
>
> I've released the first version of GridFS repository plugin. It allows to 
> store snapshot data in MongoDB GridFS.
>
> https://github.com/kzwang/elasticsearch-repository-gridfs
>
>
> Thanks,
> Kevin
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e6915b4a-3985-4769-9115-c3448656847a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


GridFS repository plugin

2014-02-07 Thread Kevin Wang
Hi,

I've released the first version of GridFS repository plugin. It allows to 
store snapshot data in MongoDB GridFS.

https://github.com/kzwang/elasticsearch-repository-gridfs


Thanks,
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9e08b21d-dbac-4bdf-bb30-02819945c651%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch index mapping in java

2014-02-04 Thread Kevin Wang
The index request is used to index document, you should use put mapping 
request.

e,g,
PutMappingResponse response = 
client.admin().indices().preparePutMapping(INDEX).setType(INDEX_TYPE).setSource(source).get();


On Wednesday, February 5, 2014 1:27:41 AM UTC+11, Doru Sular wrote:
>
> Hi guys,
>
> I am trying to create an index with the following code:
> XContentBuilder source = XContentFactory.jsonBuilder().startObject()//
> .startObject("settings")
> .field("number_of_shards", 1)
> .endObject()// end settings
> .startObject("mappings")
> .startObject(INDEX_TYPE)//
> .startObject("properties")//
> .startObject("user")
> .field("type", "string") // start user
> .field("store", "yes")
> .field("index", "analyzed")//
> .endObject()// end user
> .startObject("postDate")//
> .field("type", "date")
> .field("store", "yes")
> .field("index", "analyzed")//
> .endObject()// end post date
> .startObject("message") //
> .field("type", "string")
> .field("store", "yes")
> .field("index", "not_analyzed")
> .endObject() // end user field
> .endObject() // end properties
> .endObject() // end index type
> .endObject() // end mappings
> .endObject(); // end the container object
>
> IndexResponse response = this.client.prepareIndex(INDEX,INDEX_TYPE
> ).setSource(source)
> .setType(INDEX_TYPE).execute()
> .actionGet();
>
>
> I want to have the "message" field not analyzed, because later I want to 
> use facets to obtain unique messages.
> Unfortunately my code seems to add just a document in index with the 
> following structure:
> {
>   "settings": {
> "number_of_shards": 1
>   },
>   "mappings": {
> "tweet": {
>   "properties": {
> "user": {
>   "type": "string",
>   "store": "yes",
>   "index": "analyzed"
> },
> "postDate": {
>   "type": "date",
>   "store": "yes",
>   "index": "analyzed"
> },
> "message": {
>   "type": "string",
>   "store": "yes",
>   "index": "not_analyzed"
> }
>   }
> }
>   }
> }
>
> Please help me to spot the error, it seems that mapping are not created.
> Thank you very much,
> Doru
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/38ae7a3e-b8ef-4a05-8f7a-5ff20917f85e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[Ann] ElasticSearch OSEM and ElasticSearch Redis Transport

2014-02-04 Thread Kevin Wang
Hi

I've released a Object/Search Engine Mapping (OSEM) for ElasticSearch and 
Redis Transport for ElasticSearch

https://github.com/kzwang/elasticsearch-osem

https://github.com/kzwang/elasticsearch-transport-redis


Thanks.
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9fe91fd-d1a1-4af0-a431-8fc441b11b41%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.