Index level Search interceptor

2015-01-22 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

I have a set of indices and using an alias, I am querying all the indices.

After the results are fetched from every index, I would like to store the 
individual per-index results someplace for another task where I need to 
recognize the records per index.

Is there a plugin that I can use to achieve this?

Thanks
SRK

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/07d71b69-e7fd-4c56-9467-209ef4ad4a46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Performance - Very large list of buckets in an aggregation field

2015-01-22 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

I have to run aggregation on a very large corpus and pull out facets for 
~10-12 fields. All fields except one have decent sized buckets (like, not 
more than ~1K at a maximum), however, one field may have a very large 
number of buckets. Probably in millions. Will that turn out to be a 
performance issue?

All I am interested is in the grouping of the records based on that field.

Is there any best practice on how to achieve this, or is this not a normal 
scenario?

Thanks,
SRK

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/82c68775-f0a8-4044-bf6b-f2a975754013%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


P-C Relationship using DocValues instead of Field Data Cache

2014-11-06 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

>From the link here, on P-C relationships, given is the excerpt shown below.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child.html

memory vs doc values

At the time of going to press, the parent-child ID map is held in memory as 
part offielddata 
.
 
There are plans afoot to change the default setting to use doc values 

 by 
default instead.


How can we track whether this is still on and being worked on as part of a 
future release? 

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/70f3677b-ca48-4798-bfa7-9af412cd2129%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Custom Scoring in ElasticSearch

2014-09-22 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

I have a list of terms and I want to submit a TermsQuery for a document 
corpus for a field that is multi-valued.

When I get the set of matching results, I want to make sure that I use my 
own logic for scoring so that the Top N reflects my view of which documents 
should be top rated.

Is there a plugin, a scripting function, or a built-in method to achieve 
this?

My custom scoring function will look at a few aspects like determining 
which terms in the input are more important than the rest and what size of 
the document should be matched first, whether my list of terms is a 
complete subset of a document and if that document has other terms or not, 
etc.

Please let me know! Thanks,

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/75f7f5d1-3378-4f22-9034-cb2512fa3fc6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Knowing which one of the boolean queries matched the result document

2014-09-18 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,


If I create a Bool Query with 3 OR clauses and I get back a bunch of 
results. Is there a way of knowing/associating a hit with the specific Bool 
Query? Thanks,


Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f22d2b2a-dfff-4056-b3c1-00c82f794f57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Write a plugin to query and aggregate results from multiple shards

2014-09-14 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,


I am looking through the sources, and I am not sure whether this is 
possible. What I am looking to is the possibility to manipulate the 
SearchRequest object when it reaches the SearchShards level.
Since I need to update the object with some value that is shard specific.

For this, I was checking the TransportBroadcastOperationAction which 
actually allows to hit multiple shards and we can inject a SearchService. 
However, in the response aggregation, we may have to write our own logic to 
call SearchPhaseController::merge() or something. Not sure if this will be 
a problem when the same code in ElasticSearch changes over releases.

There are also other classes like SearchServiceTransportAction and we can 
also probably extend TransportSearchTypeAction like the other QAF, DFS_QAF, 
QTF, DFS_QTF, etc. However, what I want to know is whether this is standard 
practice and should be done this way? Or is there any other plugin that 
allows me to do this?


Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e1f52da2-bb05-4005-bf88-8031f5440225%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-09-11 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi Jorg,

Sure. Thanks,

Just wondering what changed so much in 1.3? Is there sort of a quick fix? 
Or else, will just wait for an update from you. 

Thanks,
Sandeep


On Wednesday, 10 September 2014 15:20:57 UTC+5:30, Jörg Prante wrote:
>
> The plugin is for 1.2, I have to update the simple action plugin to 
> Elasticsearch 1.3
>
> Thanks for the reminder
>
> Jörg
>
>
> On Wed, Sep 10, 2014 at 11:08 AM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi Jorg,
>>
>> I was trying to install this plugin on ES v1.3.1. I am getting the errors 
>> similar to below. Can you please tell me what has changed and how I can 
>> rectify? Thanks,
>>
>> 4) No implementation for 
>> java.util.Map> org.elasticsearch.action.support.TransportAction> was bound.
>>   while locating java.util.Map> org.elasticsearch.action.support.TransportAction>
>> for parameter 1 at 
>> org.elasticsearch.client.node.NodeClusterAdminClient.(Unknown Source)
>>   while locating org.elasticsearch.client.node.NodeClusterAdminClient
>> for parameter 1 at 
>> org.elasticsearch.client.node.NodeAdminClient.(Unknown Source)
>>   while locating org.elasticsearch.client.node.NodeAdminClient
>> for parameter 2 at 
>> org.elasticsearch.client.node.NodeClient.(Unknown Source)
>>   at 
>> org.elasticsearch.client.node.NodeClientModule.configure(NodeClientModule.java:38)
>>
>> 5) No implementation for 
>> java.util.Map> org.elasticsearch.action.support.TransportAction> was bound.
>>   while locating java.util.Map> org.elasticsearch.action.support.TransportAction>
>> for parameter 1 at 
>> org.elasticsearch.client.node.NodeIndicesAdminClient.(Unknown Source)
>>   at 
>> org.elasticsearch.client.node.NodeClientModule.configure(NodeClientModule.java:36)
>>
>> 6) No implementation for 
>> java.util.Map> org.elasticsearch.action.support.TransportAction> was bound.
>>   while locating java.util.Map> org.elasticsearch.action.support.TransportAction>
>> for parameter 1 at 
>> org.elasticsearch.client.node.NodeIndicesAdminClient.(Unknown Source)
>>   while locating org.elasticsearch.client.node.NodeIndicesAdminClient
>> for parameter 2 at 
>> org.elasticsearch.client.node.NodeAdminClient.(Unknown Source)
>>   at 
>> org.elasticsearch.client.node.NodeClientModule.configure(NodeClientModule.java:37)
>>
>> 7) No implementation for 
>> java.util.Map> org.elasticsearch.action.support.TransportAction> was bound.
>>   while locating java.util.Map> org.elasticsearch.action.support.TransportAction>
>> for parameter 1 at 
>> org.elasticsearch.client.node.NodeIndicesAdminClient.(Unknown Source)
>>   while locating org.elasticsearch.client.node.NodeIndicesAdminClient
>> for parameter 2 at 
>> org.elasticsearch.client.node.NodeAdminClient.(Unknown Source)
>>   while locating org.elasticsearch.client.node.NodeAdminClient
>> for parameter 2 at 
>> org.elasticsearch.client.node.NodeClient.(Unknown Source)
>>   at 
>> org.elasticsearch.client.node.NodeClientModule.configure(NodeClientModule.java:38)
>>
>> 8) No implementation for org.elasticsearch.action.GenericAction annotated 
>> with @org.elasticsearch.common.inject.multibindings.Element(setNam
>> e=,uniqueId=275) was bound.
>>   at 
>> org.elasticsearch.action.ActionModule.configure(ActionModule.java:304)
>>
>> 9) An exception was caught and reported. Message: null
>>   at 
>> org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
>>
>> 9 errors
>> at 
>> org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:344)
>> at 
>> org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:151)
>> at 
>> org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:102)
>> at 
>> org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
>> at 
>> org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
>> at 
>> org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
>> at 
>> org.elasticsearch.node.internal.InternalNode.(InternalNode.java:192)
>> at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
>> at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70)
>> at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:203)
>> at 
>>

IndexQueryParserModule Custom Filter with Shard awareness

2014-09-10 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

If I need to create my own query or filter parser, I can use a plugin that 
adds a new processor using the IndexQueryParserModule. It contains a 
QueryParserContext argument which holds the Index name. Is there any way I 
can be aware of the shardId to which I am being routed at that time inside 
the FilterParser class? Since the parser is invoked once for every shard, 
it will be good if that information could have been passed on in the 
argument to the filter/query parser.

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1cde10e-3cf9-4c5c-9177-4f972c663208%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-09-10 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi Jorg,

I was trying to install this plugin on ES v1.3.1. I am getting the errors 
similar to below. Can you please tell me what has changed and how I can 
rectify? Thanks,

4) No implementation for 
java.util.Map was bound.
  while locating java.util.Map
for parameter 1 at 
org.elasticsearch.client.node.NodeClusterAdminClient.(Unknown Source)
  while locating org.elasticsearch.client.node.NodeClusterAdminClient
for parameter 1 at 
org.elasticsearch.client.node.NodeAdminClient.(Unknown Source)
  while locating org.elasticsearch.client.node.NodeAdminClient
for parameter 2 at 
org.elasticsearch.client.node.NodeClient.(Unknown Source)
  at 
org.elasticsearch.client.node.NodeClientModule.configure(NodeClientModule.java:38)

5) No implementation for 
java.util.Map was bound.
  while locating java.util.Map
for parameter 1 at 
org.elasticsearch.client.node.NodeIndicesAdminClient.(Unknown Source)
  at 
org.elasticsearch.client.node.NodeClientModule.configure(NodeClientModule.java:36)

6) No implementation for 
java.util.Map was bound.
  while locating java.util.Map
for parameter 1 at 
org.elasticsearch.client.node.NodeIndicesAdminClient.(Unknown Source)
  while locating org.elasticsearch.client.node.NodeIndicesAdminClient
for parameter 2 at 
org.elasticsearch.client.node.NodeAdminClient.(Unknown Source)
  at 
org.elasticsearch.client.node.NodeClientModule.configure(NodeClientModule.java:37)

7) No implementation for 
java.util.Map was bound.
  while locating java.util.Map
for parameter 1 at 
org.elasticsearch.client.node.NodeIndicesAdminClient.(Unknown Source)
  while locating org.elasticsearch.client.node.NodeIndicesAdminClient
for parameter 2 at 
org.elasticsearch.client.node.NodeAdminClient.(Unknown Source)
  while locating org.elasticsearch.client.node.NodeAdminClient
for parameter 2 at 
org.elasticsearch.client.node.NodeClient.(Unknown Source)
  at 
org.elasticsearch.client.node.NodeClientModule.configure(NodeClientModule.java:38)

8) No implementation for org.elasticsearch.action.GenericAction annotated 
with @org.elasticsearch.common.inject.multibindings.Element(setNam
e=,uniqueId=275) was bound.
  at org.elasticsearch.action.ActionModule.configure(ActionModule.java:304)

9) An exception was caught and reported. Message: null
  at 
org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)

9 errors
at 
org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:344)
at 
org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:151)
at 
org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:102)
at 
org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
at 
org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
at 
org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
at 
org.elasticsearch.node.internal.InternalNode.(InternalNode.java:192)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:203)
at 
org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.lang.reflect.MalformedParameterizedTypeException
at 
sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl.validateConstructorArguments(ParameterizedTypeImpl.java:58)
at 
sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl.(ParameterizedTypeImpl.java:51)
at 
sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl.make(ParameterizedTypeImpl.java:92)
at 
sun.reflect.generics.factory.CoreReflectionFactory.makeParameterizedType(CoreReflectionFactory.java:105)
at 
sun.reflect.generics.visitor.Reifier.visitClassTypeSignature(Reifier.java:140)
at 
sun.reflect.generics.tree.ClassTypeSignature.accept(ClassTypeSignature.java:49)
at 
sun.reflect.generics.repository.ClassRepository.getSuperclass(ClassRepository.java:86)
at java.lang.Class.getGenericSuperclass(Class.java:764)
at 
org.elasticsearch.common.inject.internal.MoreTypes.getGenericSupertype(MoreTypes.java:390)
at 
org.elasticsearch.common.inject.TypeLiteral.getSupertype(TypeLiteral.java:262)
at 
org.elasticsearch.common.inject.spi.InjectionPoint.addInjectionPoints(InjectionPoint.java:341)
at 
org.elasticsearch.common.inject.spi.InjectionPoint.forInstanceMethodsAndFields(InjectionPoint.java:287)
at 
org.elasticsearch.common.inject.spi.InjectionPoint.forInstanceMethodsAndFields(InjectionPoint.java:309)
at 
org.elasticsearch.common.inject.internal.BindingBuilder.toInstance(BindingBuilder.java:78)
at 
org.elasticsearch.action.ActionModule.configure(ActionModule.java:304)
at 
org.elasticsearch.common.inject.Abs

Terms Query OR Terms Filter with Match All

2014-09-09 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

I have a list of terms (type:string), and I want to retrieve all documents 
that have those terms. The term field is like a unique incremental sequence 
field.

Which would be more performant? A Terms Query OR a Terms Filter with a 
Match All. Or some totally other approach  

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ee1460b3-3eab-4578-bd09-318c0779f918%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Overhead of _source field in searches

2014-09-08 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

Will having _source enabled for my documents put any kind of performance 
overhead or increase latency or memory consumption for searches that do not 
necessarily return the _source field. I am not interested in the _source 
field for returning during search, but the only purpose for _source field 
for me is to update the document at a later point in time. If there is no 
overhead in maintaining the extra data due to the _source field, either 
during indexing or searching, I will be okay to use it. However, I am not 
sure whether having _source enabled will put a memory/IO strain during 
searching even though I do not return _source as an output field. 


Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e7649b9b-dca0-4af3-8f66-5e6f718dc9fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Determine Shard Id based on routing key

2014-09-01 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi Adrian,

Thanks for the reply. That was important for me to understand.

However, I am a little concerned with your comment on the equivalence of 1 
index with 20 shards and 20 indices with one shard each. You mentioned that 
you would discourage the latter.

Can you please explain why? Is it for management reasons or performance 
overhead reasons? I can deal with the former but not the latter unless if 
you have some pointers. Thanks,

Thanks,
Sandeep


On Thursday, 28 August 2014 13:28:32 UTC+5:30, Adrien Grand wrote:
>
> Hi Sandeep,
>
> Routing is deterministic, otherwise we couldn't  know where data is 
> located when using the get API (this API goes to a single shard, not all of 
> them). However, you should not rely on the distribution of the hash values 
> as this is an implementation detail that we could indeed change at some 
> point.
>
> I don't know what your use-case is, but if you really need to manage the 
> sharding yourself, the easiest way to do it would be to creates 20 indices 
> with 1 shard instead of 1 index with 20 shards. I would discourage to do it 
> though.
>
>
>
> On Thu, Aug 28, 2014 at 8:31 AM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi
>>
>> Say I create an index with 20 shards.
>>
>> During indexing, if I specify a routing_key as 0, will it be indexed in 
>> shardId 0? Will routing_key 3 correspond to shard Id 3? Similarly for all 
>> other keys if I have 20 unique routing values since 0 % 20 will be 0 and 3 
>> % 20 will be 3, etc.
>>
>> There is no hash but a specific set of routing keys [0..19] = number of 
>> shards [0..19] that I have.
>>
>> Is this deterministic and documented by ES regarding this routing key to 
>> shard Id behavior? Or is it internal to ES and changeable anytime?
>>
>> Please let me know soon!! Thanks in advance...
>>
>> Thanks,
>> Sandeep
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/32b16cf4-517e-4b33-9eb8-129cf5bd8cf0%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/32b16cf4-517e-4b33-9eb8-129cf5bd8cf0%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fc1cc1c3-9987-4f51-98f9-0878145e6c66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Determine Shard Id based on routing key

2014-08-27 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi

Say I create an index with 20 shards.

During indexing, if I specify a routing_key as 0, will it be indexed in 
shardId 0? Will routing_key 3 correspond to shard Id 3? Similarly for all 
other keys if I have 20 unique routing values since 0 % 20 will be 0 and 3 
% 20 will be 3, etc.

There is no hash but a specific set of routing keys [0..19] = number of 
shards [0..19] that I have.

Is this deterministic and documented by ES regarding this routing key to 
shard Id behavior? Or is it internal to ES and changeable anytime?

Please let me know soon!! Thanks in advance...

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/32b16cf4-517e-4b33-9eb8-129cf5bd8cf0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Search Plugin to intercept search response

2014-08-27 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Is there any action/module that I can extend/register/add so that I can 
intercept the SearchResponse on the server node before the response is sent 
back to the TransportClient on the calling box?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/559a5c68-4567-425f-9842-7f2fe6755095%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Cache invalidating and recreation on TermsFilter values change

2014-08-27 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

I am using the IndexQueryParserModule and in the plugin, I have my 
TermsFilter implementation.

By default, since I am writing my own ABCFilterBuilder/Parser, I believe, I 
can control the caching, and ES will, by default, not cache the Lucene 
TermsFilter that I return. Correct me if I am wrong.

Can I specify a cache_key and have this explicitly cached by ES? It seems, 
from other FilterParsers, that this may be possible.

Although, when the documents in the filter condition change, I want the 
cache to be invalidated and a new filter created again, and then cached 
again.

Is that possible? Please let me know.

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/09fe53d1-f573-4e1a-9e50-a40fd129256c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregation across indices

2014-08-26 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

If I have two indices each having part of the record and joined using some 
common identifier, can I issue a query across both indices and have 
aggregations apply taking into consideration both indices?

Example:
Index 1: Type 1:
ID: String
Field1: String
Field2: String

Index 2: Type 2:
ID: String (From above. I can keep this same to behave like a foreign key.)
Field3: String
Field4: String

Can I effect a join across both indices and aggregate on Field4 for example?

Please let me know. Thanks,
Sandeep 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Indexing large number of files each with a huge size

2014-08-25 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

I am trying to index documents, each file approx ~10-20 MB. I start seeing 
memory issues if I try to index them all in a multi-threaded environment 
from a single TransportClient on one machine to a single node cluster with 
32GB ES server. It seems like the memory is an issue on the client as well 
as server side, and I probably understand and expect that :). 

I have tried tuning the heap sizes and batch sizes in Bulk APIs. However, 
am I trying to push the limits too much? One thought is to probably stream 
the data so that I do not hold it all in memory. Is it possible? Is this a 
general problem or just that my usage is wrong?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d2612109-b31c-4127-857b-f8aa27fb0aeb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to recover index if replicas are set to zero?

2014-08-22 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

If I have an index with 10 shards but no replicas on a 2 node cluster, and 
one node goes down. Can I somehow recover the 5 shards to the only running 
node? Assuming that I backup the data directory on the node and make it 
available to the other node. Is there an API or an automatic way to achieve 
this?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3b1caca8-66ed-4196-838b-b58fafa1601d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard Aware Routing of Query

2014-08-22 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi Jorg,

I have been trying to examine the QueryParserContext. However, I am only 
able to locate the Index name in this object, but there is no reference of 
any shard level information.

I understand that you are trying to say that the shard decision has already 
been made, (so there is no need to state that information here again 
possibly) so that information is not available with the QueryParser then, 
and that is probably by design?

Thanks,
Sandeep


On Tuesday, 15 July 2014 17:01:56 UTC+5:30, Jörg Prante wrote:
>
> Filters are always parsed as part of a query on shard level. If you 
> examine QueryParserContext from within executing FilterParser, the decision 
> of which shard to execute on has already been made.
>
> Jörg
>
>
> On Tue, Jul 15, 2014 at 1:09 PM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi,
>>
>> Thanks, I will take a look at the SearchRequestBuilder class.
>> However, it does seem like a Query API invoke time decision for the user 
>> to decide the routing by setting the appropriate values in the SRB.
>>
>> However, I want the custom FilterParser that I added as a processor in 
>> the IndexQueryParserModule plugin to be aware of the shard on which it will 
>> execute. This is because then I can set filter values for only the 
>> documents that exist on that shard. I checked the QueryParserContext, and 
>> there is no information in that regard.
>>
>> If I use the SRB at client side, and specify the shards and the filters 
>> for those shards, then I will have to aggregate the results myself which is 
>> not preferable.
>>
>> Can you please give me some example of how this can be achieved? 
>>
>>
>> Thanks,
>> Sandeep
>>
>>
>> On Tuesday, 15 July 2014 15:18:47 UTC+5:30, Jörg Prante wrote:
>>
>>> You can create single shard index, or you can use routing to select 
>>> shards.
>>>
>>> See SearchRequestBuilder for setRouting() 
>>>
>>> Jörg
>>>
>>>
>>> On Tue, Jul 15, 2014 at 10:25 AM, 'Sandeep Ramesh Khanzode' via 
>>> elasticsearch  wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a large-ish data set that could grow beyond a 100M. I have 
>>>> queries to be executed for this index. I would like to have query filter 
>>>> data local to a shard being sent to that shard, so that I spend less time 
>>>> creating a filter and even lesser time matching it for a shard. If I do 
>>>> not 
>>>> do this, I will have to create a filter that will have to contain data for 
>>>> all 100M documents across all shards, and every shard will have to match 
>>>> documents against that filter for all documents that are not even 
>>>> belonging 
>>>> to that shard.
>>>>
>>>> I plan to write a query filter using the IndexQueryParserModule plugin.
>>>>
>>>> However, in the QueryParserContent, I can only see the Index object 
>>>> which contains some details of the index, like the name, etc. I could not 
>>>> see any other details like the specific shard where this query will be 
>>>> executed. 
>>>>
>>>> Is there a way to write shard aware query and filter parsers?
>>>>
>>>> If not, can I create as many indices as I want to create shards (since 
>>>> I already get the index name), and effectively create one shard per index 
>>>> (+1 for replica) and treat every index as if it were a shard? Is that too 
>>>> heavy or just non-compliant to the philosophy of ES? 
>>>>
>>>> Please let me know,
>>>>
>>>> Thanks,
>>>> Sandeep
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/e8c09c18-4192-41ae-86e9-5d67723e5558%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/e8c09c18-4192-41ae-86e9-5d67723e5558%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the 

Re: Elastic search dynamic number of replicas from Java API

2014-08-22 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi Jorg,

Can you please give a server-side or client-side example of using 
CLusterStateListener?
Do I have to use a plugin. if so, which module do I register/override?
If not, do I have to use a Node Client (not a TransportClient), and 
retrieve the ClusterService somehow and then register?

Thanks
Sandeep

On Thursday, 10 July 2014 22:25:51 UTC+5:30, Jörg Prante wrote:
>
> On the client side, you can't use cluster state listener, it is for nodes 
> that have access to a local copy of the master cluster state. Clients must 
> execute an action to ask for cluster state, and with the current transport 
> request/response cycle, they must poll for new events ...
>
> Jörg
>
>
> On Thu, Jul 10, 2014 at 6:38 PM, Ivan Brusic  > wrote:
>
>> Jörg, have you actually implemented your own ClusterStateListener? I 
>> never had much success. Tried using that interface or 
>> even PublishClusterStateAction.NewClusterStateListener, but either I could 
>> not configure successfully the module (the former) or received no events 
>> (the latter). Implemented on the client side, not as a plugin.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Wed, Jul 9, 2014 at 4:21 PM, joerg...@gmail.com  <
>> joerg...@gmail.com > wrote:
>>
>>>
>>> 4. Yes. Use org.elasticsearch.cluster.ClusterStateListener
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBB%3DW_qG9E7i-sEc6HZeMskxKgbqzaKgqzSQ26sjgT5%2BQ%40mail.gmail.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/35f6b64e-3787-4891-a3a8-518dfd7638e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How do I know if I need replica shards?

2014-08-22 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Just want to add that my intention is not high availability or failover. It 
is more w.r.t. how performance will improve in such a scenario?

I believe, only if you have more nodes or you current nodes are 
underutilized, would you think of this. But how do you determine that?

Thanks
Sandeep

On Friday, 22 August 2014 12:46:16 UTC+5:30, Sandeep Ramesh Khanzode wrote:
>
> Hi,
>
> If I have setup a 3 node cluster, deployed one index with 20 shards.
>
> How can I determine that my current setup is inadequate, and I need to add 
> one replica or two replica shards per primary shards?
>
> Even if I add another data node, I may get 5 primary shards on each, now 
> if that distributes load evenly, what is the need for replica shards?
>
> IMPORTANT: Please note that my query will not be specific to a group of 
> shards, I mean, there is no way, I can route or classify my query as only 
> hitting a subsection of shards. It will actually go to all shards for every 
> search query.
>
> Appreciate your response.
>
> Thanks,
> Sandeep
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8294050a-0da9-4ee0-b9ac-1ea0e7985e47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How do I know if I need replica shards?

2014-08-22 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

If I have setup a 3 node cluster, deployed one index with 20 shards.

How can I determine that my current setup is inadequate, and I need to add 
one replica or two replica shards per primary shards?

Even if I add another data node, I may get 5 primary shards on each, now if 
that distributes load evenly, what is the need for replica shards?

IMPORTANT: Please note that my query will not be specific to a group of 
shards, I mean, there is no way, I can route or classify my query as only 
hitting a subsection of shards. It will actually go to all shards for every 
search query.

Appreciate your response.

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a9b5272-bd56-483a-9231-551513a1c8e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Call when shard reallocation occurs

2014-08-21 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Is it possible to have my custom callback function defined and invoked by 
ES whenever it moves a shard from one node to another? 

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b792e04-8bd1-4006-93a2-f3d736cd8474%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Multiple Indices vs Multiple Shards

2014-08-21 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

What is the difference in performance or additional load on ElasticSearch 
if I define an index with 50 shards or I define 50 indices with one shard.

I mean, technically, there are blogs that suggest that these are 
equivalent? Of course without the shard rebalancing, replica, failover, etc.

But, performance-wise, if I am to do 50 indices with one shard, and 
aggregate results from all 50, will I see a degradation in performance 
compared to 1 index with 50 shards?

Thanks, 
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e3c4bba7-6de8-41af-9022-12ea31cef63a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: When does ElasticSearch reallocate shards between nodes?

2014-08-21 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
I see what you mean.

Is it possible that once the shards have been allocated by this formula or 
the policies, they will ever change again. 

I mean, if we have 3 indices on a two node cluster with 10 shards each. Now 
I add another index with 10 shards, will the EXISTING indices' shards be 
reallocated? Or is reallocation only for new shards?


On Thursday, 21 August 2014 13:49:54 UTC+5:30, Jörg Prante wrote:
>
> There is disk-based allocation. It does not take shard volume into 
> account. It is not always a good idea to use total shard volume per node as 
> a measurement across indices, consider heavy bulk indexing with steep 
> volume changes, but the remaining disk space.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html#disk
>
> If you mean search load by "heavily loaded", I suggest to just add 
> replica, maybe auto expand replica, so each node holds every shard as a 
> copy, for best load balancing.
>
> Jörg
>
>
>
>
> On Thu, Aug 21, 2014 at 9:57 AM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi Jorg, 
>>
>> Thanks. Is there a size based allocation. What seems to be is that we 
>> have allocation based on number of primaries, per index, per node, etc. Is 
>> there a size factor that comes into play like, say if the routing is not 
>> even function, and shards on one node are more heavily loaded than another?
>>
>> Thanks,
>> Sandeep
>>
>>
>> On Thursday, 21 August 2014 12:44:56 UTC+5:30, Jörg Prante wrote:
>>
>>> There is a formula ES uses by default to find if nodes get unbalanced 
>>> regarding the shards. See
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/cluster-update-settings.html#_balanced_shards
>>>
>>> Jörg
>>>
>>>
>>> On Thu, Aug 21, 2014 at 8:26 AM, 'Sandeep Ramesh Khanzode' via 
>>> elasticsearch  wrote:
>>>
>>>> Hi,
>>>>
>>>> What can be the possible causes when ElasticSearch will automatically 
>>>> reallocate a shard from node in the cluster to another node?
>>>>
>>>> One can be obviously when you add a new node.
>>>>
>>>> What are the automatic triggers, like continuously indexing new data or 
>>>> something? What is the policy for this?
>>>>
>>>> Thanks,
>>>> Sandeep
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/e4569cb1-f33a-41ab-9598-ec5e0c7ef7b5%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/e4569cb1-f33a-41ab-9598-ec5e0c7ef7b5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6362f730-1ecd-4ffc-b1ef-fe4f343781c4%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/6362f730-1ecd-4ffc-b1ef-fe4f343781c4%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f0f8129c-d61a-4a54-bf98-ce386846b194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: When does ElasticSearch reallocate shards between nodes?

2014-08-21 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi Jorg, 

Thanks. Is there a size based allocation. What seems to be is that we have 
allocation based on number of primaries, per index, per node, etc. Is there 
a size factor that comes into play like, say if the routing is not even 
function, and shards on one node are more heavily loaded than another?

Thanks,
Sandeep

On Thursday, 21 August 2014 12:44:56 UTC+5:30, Jörg Prante wrote:
>
> There is a formula ES uses by default to find if nodes get unbalanced 
> regarding the shards. See
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html#_balanced_shards
>
> Jörg
>
>
> On Thu, Aug 21, 2014 at 8:26 AM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi,
>>
>> What can be the possible causes when ElasticSearch will automatically 
>> reallocate a shard from node in the cluster to another node?
>>
>> One can be obviously when you add a new node.
>>
>> What are the automatic triggers, like continuously indexing new data or 
>> something? What is the policy for this?
>>
>> Thanks,
>> Sandeep
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e4569cb1-f33a-41ab-9598-ec5e0c7ef7b5%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/e4569cb1-f33a-41ab-9598-ec5e0c7ef7b5%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6362f730-1ecd-4ffc-b1ef-fe4f343781c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


When does ElasticSearch reallocate shards between nodes?

2014-08-20 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

What can be the possible causes when ElasticSearch will automatically 
reallocate a shard from node in the cluster to another node?

One can be obviously when you add a new node.

What are the automatic triggers, like continuously indexing new data or 
something? What is the policy for this?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e4569cb1-f33a-41ab-9598-ec5e0c7ef7b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Route documents at index time to a particular shard

2014-08-20 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Can you please tell me if there is a plugin that I can use during indexing 
which will let me direct a document to a particular shard? So that I can 
set the shardId and send the document as the request to that shard?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dec2f249-4c1a-4c56-88d0-2a9493e5ec4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Hooks for knowing when topology has changed

2014-08-15 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
how to know when a shard has moved or a new node is added in elasticsearch. 
is there any plugin/hook in java to do so?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8661cc3a-8261-4fa6-adc3-aca59eadc6e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Get Shard Info From Cluster/Nodes/Index

2014-08-13 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Can I using Cluster API or some other Java way, find the shards that are 
allocated to cluster -> node -> index.

I would like to check which shards are deployed to a physical node, and 
query only that shard to find what was indexed on that data.

I would be using a _routing value, and using this query, I want to check 
which routing value went to which shard.

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/808f15d3-f552-4fb9-a404-8645faaded79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Moving Index/Shards from One Node to Another

2014-08-13 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Lets say I have a 3 node cluster and I deploy one index SPECIFIC to every 
data node in the cluster. 
So, index1 goes to node1, index2 goes to node2, etc. using the 
routing.allocation settings based on node.zone etc. config properties. 
There may be 5-6 shards per index, but no replicas. All three indices, 
index 1/2/3 will have the same mapping schemas.

Now, if the following scenarios occur:

1.] One node, node2, goes down:
How can I get the node 2's index, index2, live on the other two data nodes?
Can I just copy the data directory to the other nodes? Since there is no 
mapping like index2 defined on those nodes, will I have to first create the 
mapping there? 
Can I move half the shards to each remaining node?

2.] Assume one more node is now added to this cluster:
Can I copy the mapping schema to the new node and selectively copy 1-2 
shards each from the existing 3 data nodes so that I can rebalance the 
cluster 3-4 shards per index per node?

I am not sure if there is this level of control and how it is exposed. 
Please let me know. Thanks,

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9a5eca15-401d-4aae-ad62-ef39a78d863f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


TermsLookupFilter Caching

2014-08-12 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Does the TermsLookupFilter cache results in a bitmap/bitset? Or does it 
cache the results of the filter completely without using bits for document 
identifiers? 

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3db7695-87d4-4e15-8174-dbecf5211b3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


BitSet Filters in ES/Lucene

2014-08-12 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch


Hi,
I have looked at TermsLookupFilter and it is a good approach to cache 
frequently used filters. However, even if I write a custom filter plugin, I 
cannot use a BitSet to hold any sort of document identifier. Even the _uid 
field is converted into a TermFilter.

Assume a scenario where I need to tag millions of documents with a tag like 
"Finance", "IT", "Legal", etc.

Unless, I can cache these filters in memory, the cost of constructing this 
filter at run time per query is not practical. If I could map the documents 
to a numeric long identifier and put them in a BitMap, I could then cache 
them because the size reduces drastically. However, I cannot use this 
numeric long identifier in ES/Lucene filters, either Custom Filter Plugin 
or Terms Lookup Filter. Is there any way?

I read about possible solutions in ES and found this link: 
http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/. 

Please help with this scenario. Thanks,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d2a12986-220b-44c8-ac8f-a836de692c16%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregate results over multiple indices

2014-08-06 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

If I have three different indices with the same schema mapping for a type, 
can I use the SearchRequestBuilder (or any other class) to simultaneously 
query all three indices and have ElasticSearch perform aggregations/sorts 
on the results from all three?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/23cca37f-b999-4016-be3c-7e4eda34b28a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Move specific shard to a different index

2014-08-05 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

For the below scenario:
Assume that I am allocating exactly similar indices (with different name) 
to different ElasticSearch nodes. Every index can have multiple shards.

At some time, I add another node to the existing cluster. Now, I use the 
index template to create the same mapping schema on the new node. 

Will ElasticSearch help to rebalance the shards to the new index on the new 
node? I guess not since it is not part of the Cluster metadata and I will 
have specified the node in the index settings. 

Or can I programmatically move a specific shard to the new index? Is there 
a provision to do that? 

Please let me know. Thanks,

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/892a5cc3-ba1a-42ad-9513-e0098fdb27d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Route query so that data for a shard is localized

2014-08-05 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi Jörg,

Thanks, really appreciate the response and the link. I will do a small PoC 
with the approach given therein.

Since we are pulling data from an index, I am assuming we will be limited 
the first time by disk speed.

In the cache, if the data for the field that is cached has some updates 
(like a new value being added in the multi-valued field or removed), will 
the purge and re-cache automatically happen?

I also think that I will need to enable the _source field for updates to 
work?

Is there any value to be had by making the columns to be doc_values in this 
case? I read that doc_values cannot be used for filtering purposes though. 
Please confirm.

Please let me know your comments. Thanks again,

Thanks,
Sandeep


On Monday, 4 August 2014 00:47:12 UTC+5:30, Jörg Prante wrote:
>
> Have you consulted the docs
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism
>
> about the optimizations of term lookup for TermFilter?
>
> There are caches in use, and for term lookup, you can also use routing to 
> select a particular shard.
>
> Regarding the "tree-like data mapping": ES rolls the tree notation into a 
> flat format to make use of the Lucene API for fields in documents. There is 
> no performance implication with this. If you decide to use an extraordinary 
> high amount of fields (>>1000), you will notice each field consumes a bit 
> of RAM, but this is not related to a "tree-like data mapping".
>
> Jörg
>
>
>
> On Sun, Aug 3, 2014 at 8:37 PM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi,
>>
>> I have fairly large data and a ES cluster. Can I use some shard knowledge 
>> to execute queries so that only data relevant to a particular shard is 
>> fetched for that shard/node? I want to make sure that if I have a filter, 
>> then the values in the TermFilter only hold records that are relevant to 
>> the shard it will act upon. Is this a known problem? If so, how is it 
>> solved?
>>
>> Is there any performance implication in using the tree-like data mapping 
>> in ES? I am evaluating it now, and I wanted to know if it is feasible to 
>> maintain a treelike structure in ES, or just split it into multiple records 
>> or multiple indices?
>>
>> Thanks, 
>> Sandeep
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b2c3fb0-a770-4857-9729-1da34a9baf04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Shard rebalancing

2014-08-03 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
What is the behavior of ES when it comes to shard sizes? Does it do 
automatic shard rebalancing at any point of time? If so, is it also 
controlled through an API? 

How can I know if the shards are changing in the background? If I do not 
add any new node or change any cluster configuration once indexing has 
started, is there any pattern to this behavior? Please let me know.

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9f3dee22-0c33-4446-a0dc-eaf1a314d2c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Route query so that data for a shard is localized

2014-08-03 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

I have fairly large data and a ES cluster. Can I use some shard knowledge 
to execute queries so that only data relevant to a particular shard is 
fetched for that shard/node? I want to make sure that if I have a filter, 
then the values in the TermFilter only hold records that are relevant to 
the shard it will act upon. Is this a known problem? If so, how is it 
solved?

Is there any performance implication in using the tree-like data mapping in 
ES? I am evaluating it now, and I wanted to know if it is feasible to 
maintain a treelike structure in ES, or just split it into multiple records 
or multiple indices?

Thanks, 
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Update a field if _source is disabled

2014-07-29 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi, 

I read it here 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html)
 
that the _source field needs to be enabled for Update API to work.

Does it mean that from Java or REST API, I cannot update any field defined 
in the type mapping unless the _source is enabled? 

Can I just use the stored:true on that field and update it? If so, can you 
please show an example?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4abea481-e8d4-4295-a450-45caecb2eaf8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Localized data with Shard Knowledge

2014-07-18 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

I have a query about using ElasticSearch.

I have part of my data (about 90%) to be indexed in ElasticSearch and the 
remaining 10% to be stored elsewhere (like a DB/NoSQL DB etc.) since that 
10% data will be very frequently changing data.

I have the following queries:

1.] Should I be keeping the remaining 10% of my data on non-ES/Lucene 
persistent store? Is it okay to store/index those fields in ES/Lucene even 
if they are frequently updating? With NRT readers, will it be efficient at 
reflecting changes and perform at par when I have to search for those 
changes in different threads?  

2.] Is there any concept of localization in ES? Can I segregate my data 
based on some logical partition and apply filters and search only on that 
subsection? Not sure how shard routing works, but if I route search 
requests to multiple shards using SearchRequestBuilder, how do I aggregate 
the results from multiple shards? Probably this is not the solution I am 
looking for. My use case is to not worry too much about how ES is 
organizing my data, but I still want to set specific filters for data that 
will exist on a corresponding shard. If this is confusing, please let me 
know, and I will rephrase. 

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f6dcfbb2-f065-43f7-9e0d-bfc5e3be43a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard Aware Routing of Query

2014-07-15 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Thanks, I will take a look at the SearchRequestBuilder class.
However, it does seem like a Query API invoke time decision for the user to 
decide the routing by setting the appropriate values in the SRB.

However, I want the custom FilterParser that I added as a processor in the 
IndexQueryParserModule plugin to be aware of the shard on which it will 
execute. This is because then I can set filter values for only the 
documents that exist on that shard. I checked the QueryParserContext, and 
there is no information in that regard.

If I use the SRB at client side, and specify the shards and the filters for 
those shards, then I will have to aggregate the results myself which is not 
preferable.

Can you please give me some example of how this can be achieved? 


Thanks,
Sandeep


On Tuesday, 15 July 2014 15:18:47 UTC+5:30, Jörg Prante wrote:
>
> You can create single shard index, or you can use routing to select shards.
>
> See SearchRequestBuilder for setRouting() 
>
> Jörg
>
>
> On Tue, Jul 15, 2014 at 10:25 AM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi,
>>
>> I have a large-ish data set that could grow beyond a 100M. I have queries 
>> to be executed for this index. I would like to have query filter data local 
>> to a shard being sent to that shard, so that I spend less time creating a 
>> filter and even lesser time matching it for a shard. If I do not do this, I 
>> will have to create a filter that will have to contain data for all 100M 
>> documents across all shards, and every shard will have to match documents 
>> against that filter for all documents that are not even belonging to that 
>> shard.
>>
>> I plan to write a query filter using the IndexQueryParserModule plugin.
>>
>> However, in the QueryParserContent, I can only see the Index object which 
>> contains some details of the index, like the name, etc. I could not see any 
>> other details like the specific shard where this query will be executed. 
>>
>> Is there a way to write shard aware query and filter parsers?
>>
>> If not, can I create as many indices as I want to create shards (since I 
>> already get the index name), and effectively create one shard per index (+1 
>> for replica) and treat every index as if it were a shard? Is that too heavy 
>> or just non-compliant to the philosophy of ES? 
>>
>> Please let me know,
>>
>> Thanks,
>> Sandeep
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e8c09c18-4192-41ae-86e9-5d67723e5558%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/e8c09c18-4192-41ae-86e9-5d67723e5558%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c736a73-1a7c-4a3d-aa6b-9c9860d78f79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES Plugin to Functionality Documentation

2014-07-15 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi All,

Is there some sort of documentation that exists which depicts the ES 
modules available for extension by writing plugins and the functionality 
that they will help to achieve/override? I an new to ES, and it would be 
helpful to not reinvent the wheel or give up on a particular functionality 
because we do not know where to look and how to find. 

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b2571a8-fde7-46d1-93c7-4d2393efd6ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Shard Aware Routing of Query

2014-07-15 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

I have a large-ish data set that could grow beyond a 100M. I have queries 
to be executed for this index. I would like to have query filter data local 
to a shard being sent to that shard, so that I spend less time creating a 
filter and even lesser time matching it for a shard. If I do not do this, I 
will have to create a filter that will have to contain data for all 100M 
documents across all shards, and every shard will have to match documents 
against that filter for all documents that are not even belonging to that 
shard.

I plan to write a query filter using the IndexQueryParserModule plugin.

However, in the QueryParserContent, I can only see the Index object which 
contains some details of the index, like the name, etc. I could not see any 
other details like the specific shard where this query will be executed. 

Is there a way to write shard aware query and filter parsers?

If not, can I create as many indices as I want to create shards (since I 
already get the index name), and effectively create one shard per index (+1 
for replica) and treat every index as if it were a shard? Is that too heavy 
or just non-compliant to the philosophy of ES? 

Please let me know,

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e8c09c18-4192-41ae-86e9-5d67723e5558%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Custom Plugin for specifying custom filter attributes at query time

2014-07-08 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
 
>> https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/IndexQueryParserModule$20Plugin/elasticsearch/5Gqxx3UvN2s/FL4Lb2RxQt0J
>> 2. https://groups.google.com/forum/#!topic/elasticsearch/1jiHl4kngJo
>> 3. https://github.com/elasticsearch/elasticsearch/issues/208
>> 4. 
>> http://elasticsearch-users.115913.n3.nabble.com/custom-filter-handler-plugin-td4051973.html
>>
>> Thanks,
>> Sandeep
>>
>> On Mon, Jul 7, 2014 at 2:17 AM, joerg...@gmail.com  <
>> joerg...@gmail.com > wrote:
>>
>>> Thanks for being so patient with me :)
>>>
>>> I understand now the following: there are 50m of documents in an 
>>> external DB, from which up to 1m is to be exported in form of document 
>>> identifiers to work as a filter in ES. The idea is to use internal 
>>> mechanisms like bit sets. There is no API for manipulating filters in ES on 
>>> that level, ES receives the terms and passes them into Lucene TermFilter 
>>> class according to the type of the filter.
>>>
>>> What is a bit unclear to me: how is the filter set constructed? I assume 
>>> it should be a select statement on the database?
>>>
>>> Next, if you have this large set of document identifiers selected, I do 
>>> not understand what is the base query you want to apply the filter on? Is 
>>> there a user given query for ES? How does such query looks like? Is it 
>>> assumed there are other documents in ES that are related somehow to the 50m 
>>> documents? An illustrative example of the steps in the scenario would 
>>> really help to understand the data model.
>>>
>>> Just some food for thought: it is close to impossible to filter in ES on 
>>> 1m unique terms with a single step - the default setting of maximum clauses 
>>> in a Lucene Query is for good reason limited to 1024 terms. A workaround 
>>> would be iterating over 1m terms and execute 1000 filter queries and add up 
>>> the results. This takes a long time and may not be the desired solution. 
>>>
>>> Fortunately, in most situations, it is possible to find more concise 
>>> grouping to reduce the 1m document identifiers into fewer ones for more 
>>> efficient filtering.
>>>
>>> Jörg
>>>
>>>
>>>
>>> On Sun, Jul 6, 2014 at 9:39 PM, 'Sandeep Ramesh Khanzode' via 
>>> elasticsearch > wrote:
>>>
>>>> Hi,
>>>>
>>>> Appreciate your continued assistance. :) Thanks,
>>>>
>>>> Disclaimer: I am yet to sufficiently understand ES sources so as to 
>>>> depict my scenario completely. Some info' below may be conjecture.
>>>>
>>>> I would have a corpus of 50M docs (actually lot more, but for testing 
>>>> now) out of which I would have say, upto, 1M DocIds to be used as a 
>>>> filter. 
>>>> This set of 1M docs can be different for different use cases, the point 
>>>> being, upto 1M docIds can form one logical set of documents for filtering 
>>>> results. If I use a simple IdsFilter from ES Java API, I would have to 
>>>> keep 
>>>> adding these 1M docs to the List implementation internally, and I have a 
>>>> feeling it may not scale very well as they may change per use case and per 
>>>> some combinations internal to a single use case also.
>>>>
>>>> As I debug the code, the IdsFilter will be converted to a Lucene 
>>>> filter. Lucene filters, on the other hand, operate on a docId bitset type. 
>>>> That gels very well with my requirement, since I can scale with BitSets (I 
>>>> assume).
>>>>
>>>> If I can find a way to directly plug this BitSet as a Lucene Filter to 
>>>> the Lucene search() call bypassing the ES filters using, I dont know, may 
>>>> some sort of a plugin, I believe that may support my cause. I assume I may 
>>>> not get to use the Filter cache from ES but probably I can cache these 
>>>> BitSets for subsequent use. 
>>>>
>>>> Please let me know. And thanks!
>>>>
>>>> Thanks,
>>>> Sandeep
>>>>
>>>>
>>>> On Saturday, 5 July 2014 01:40:55 UTC+5:30, Jörg Prante wrote:
>>>>
>>>>> What I understand is a TermsFilter is required
>>>>>
>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/
>>>>> reference/current/query-dsl-terms-filter.html
>>>>>
>>>>

Re: Custom Plugin for specifying custom filter attributes at query time

2014-07-06 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Appreciate your continued assistance. :) Thanks,

Disclaimer: I am yet to sufficiently understand ES sources so as to depict 
my scenario completely. Some info' below may be conjecture.

I would have a corpus of 50M docs (actually lot more, but for testing now) 
out of which I would have say, upto, 1M DocIds to be used as a filter. This 
set of 1M docs can be different for different use cases, the point being, 
upto 1M docIds can form one logical set of documents for filtering results. 
If I use a simple IdsFilter from ES Java API, I would have to keep adding 
these 1M docs to the List implementation internally, and I have a feeling 
it may not scale very well as they may change per use case and per some 
combinations internal to a single use case also.

As I debug the code, the IdsFilter will be converted to a Lucene filter. 
Lucene filters, on the other hand, operate on a docId bitset type. That 
gels very well with my requirement, since I can scale with BitSets (I 
assume).

If I can find a way to directly plug this BitSet as a Lucene Filter to the 
Lucene search() call bypassing the ES filters using, I dont know, may some 
sort of a plugin, I believe that may support my cause. I assume I may not 
get to use the Filter cache from ES but probably I can cache these BitSets 
for subsequent use. 

Please let me know. And thanks!

Thanks,
Sandeep


On Saturday, 5 July 2014 01:40:55 UTC+5:30, Jörg Prante wrote:
>
> What I understand is a TermsFilter is required
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html
>
> and the source of the terms is a DB. That is no problem. The plan is: 
> fetch the terms from the DB, build the query (either Java API or JSON) and 
> execute it.
>
> What I don't understand is the part with the "quick mapping", Lucene, and 
> the doc ids. Lucene doc IDs are not reliable and are not exposed by 
> Elasticsearch, Elasticsearch uses it's own document identifiers which are 
> stable and augmented with info about the index type they belong to, in 
> order to make them unique. But I do not understand why this is important in 
> this context.
>
> Elasticsearch API uses query builders and filter builders to build search 
> requests . A "quick mapping" is just fetching the terms from the DB as a 
> string array before this API is called.
>
> I also do not understand the role of the number "1M", is this the number 
> of fields, or the number of terms? Is it a total number or a number per 
> query?
>
> Did I misunderstand anything more? I am not really sure what is the 
> challenge...
>
> Jörg
>
>
>
> On Fri, Jul 4, 2014 at 8:55 PM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi,
>>
>> Just to give some background. I will have a large-ish corpus of more than 
>> 100M documents indexed. The filters that I want to apply will be on a field 
>> that is not indexed. I mean, I prefer to not have them indexed in ES/Lucene 
>> since they will be frequently changing. So, for that, I will be maintaining 
>> them elsewhere, like a DB etc.
>>
>> Everytime I have a query, I would want to filter the results by those 
>> fields that are not indexed in Lucene. And I am guessing that number may 
>> well be more than 1M. In that case, I think, since we will maintain some 
>> sort of TermsFilter, it may not scale linearly. What I would want to do, 
>> preferably, is to have a hook inside the ES query, so that I can, at query 
>> time, inject the required filter values. Since the filter values have to be 
>> recognized by Lucene, and I will not be indexing them, I will need to do 
>> some quick mapping to get those fields and map them quickly to some field 
>> in Lucene that I can save in the filter. I am not sure whether we can 
>> access and set Lucene DocIDs in the filter or whether they are even exposed 
>> in ES.
>>
>> Please assist with this query. Thanks,
>>
>> Thanks,
>> Sandeep
>>
>>
>> On Thursday, 3 July 2014 21:33:45 UTC+5:30, Jörg Prante wrote:
>>
>>> Maybe I do not fully understand, but in a client, you can fetch the 
>>> required filter terms from any external source before a JSON query is 
>>> constructed?
>>>
>>> Can you give an example what you want to achieve?
>>>
>>> Jörg
>>>
>>>
>>> On Thu, Jul 3, 2014 at 3:34 PM, 'Sandeep Ramesh Khanzode' via 
>>> elasticsearch  wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am new to ES and I have the following requirement:
>>>> I need to specify a list of strings as a filter that applies to a 
>>

Re: Custom Plugin for specifying custom filter attributes at query time

2014-07-04 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi,

Just to give some background. I will have a large-ish corpus of more than 
100M documents indexed. The filters that I want to apply will be on a field 
that is not indexed. I mean, I prefer to not have them indexed in ES/Lucene 
since they will be frequently changing. So, for that, I will be maintaining 
them elsewhere, like a DB etc.

Everytime I have a query, I would want to filter the results by those 
fields that are not indexed in Lucene. And I am guessing that number may 
well be more than 1M. In that case, I think, since we will maintain some 
sort of TermsFilter, it may not scale linearly. What I would want to do, 
preferably, is to have a hook inside the ES query, so that I can, at query 
time, inject the required filter values. Since the filter values have to be 
recognized by Lucene, and I will not be indexing them, I will need to do 
some quick mapping to get those fields and map them quickly to some field 
in Lucene that I can save in the filter. I am not sure whether we can 
access and set Lucene DocIDs in the filter or whether they are even exposed 
in ES.

Please assist with this query. Thanks,

Thanks,
Sandeep


On Thursday, 3 July 2014 21:33:45 UTC+5:30, Jörg Prante wrote:
>
> Maybe I do not fully understand, but in a client, you can fetch the 
> required filter terms from any external source before a JSON query is 
> constructed?
>
> Can you give an example what you want to achieve?
>
> Jörg
>
>
> On Thu, Jul 3, 2014 at 3:34 PM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi All,
>>
>> I am new to ES and I have the following requirement:
>> I need to specify a list of strings as a filter that applies to a 
>> specific field in the document. Like what a filter does, but instead of 
>> sending them on the query, I would like them to be populated from an 
>> external sources, like a DB or something. Can you please guide me to the 
>> relevant examples or references to achieve this on v1.1.2? 
>>
>> Thanks,
>> Sandeep
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0093d97d-0f47-48e9-ba19-85b0850eda89%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/0093d97d-0f47-48e9-ba19-85b0850eda89%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/513172cd-9507-4e96-b456-498c98c3b8c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Custom Plugin for specifying custom filter attributes at query time

2014-07-03 Thread &#x27;Sandeep Ramesh Khanzode&#x27; via elasticsearch
Hi All,

I am new to ES and I have the following requirement:
I need to specify a list of strings as a filter that applies to a specific 
field in the document. Like what a filter does, but instead of sending them 
on the query, I would like them to be populated from an external sources, 
like a DB or something. Can you please guide me to the relevant examples or 
references to achieve this on v1.1.2? 

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0093d97d-0f47-48e9-ba19-85b0850eda89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.