how to use my customer lucene query?

2014-08-22 Thread Peiyong Lin
I have extend a new query, and I can use it with `Query q = new 
ExtendedBooleanQuery()` in Lucene. I want to use it in elasticsearch. 
How can I integrate it to elasticsearch?
I googled but found no guide to integrate. Or is there any plugin point 
like analyzer to add my own custom made Lucene query ?

Thank you!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9fa6076-2b85-446c-a683-083e667649ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-22 Thread Robert Muir
How big is it? Maybe i can have it anyway? I pulled two ancient ultrasparcs
out of my closet to try to debug your issue, but unfortunately they are a
pita to work with (dead nvram battery on both, zeroed mac address, etc.) Id
still love to get to the bottom of this.
On Aug 22, 2014 3:59 PM,  wrote:

> Hi Adrien,
> It's a bunch of garbled binary data, basically a dump of the process image.
> Tony
>
>
> On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote:
>>
>> Hi Tony,
>>
>> Do you have more information in the core dump file? (cf. the "Core dump
>> written" line that you pasted)
>>
>>
>> On Thu, Aug 21, 2014 at 7:53 PM,  wrote:
>>
>>> Hello,
>>> I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to scale
>>> out of small x86 machine.  I get a similar exception running ES with
>>> JAVA_OPTS=-d64.  When Logstash 1.4.1 sends the first message I get the
>>> error below on the ES process:
>>>
>>>
>>> #
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  SIGBUS (0xa) at pc=0x7a9a3d8c, pid=14473, tid=209
>>> #
>>> # JRE version: 7.0_25-b15
>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
>>> solaris-sparc compressed oops)
>>> # Problematic frame:
>>> # V  [libjvm.so+0xba3d8c]  Unsafe_GetInt+0x158
>>> #
>>> # Core dump written. Default location: 
>>> /export/home/elasticsearch/elasticsearch-1.3.2/core
>>> or core.14473
>>> #
>>> # If you would like to submit a bug report, please visit:
>>> #   http://bugreport.sun.com/bugreport/crash.jsp
>>> #
>>>
>>> ---  T H R E A D  ---
>>>
>>> Current thread (0x000107078000):  JavaThread
>>> "elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker #147}"
>>> daemon [_thread_in_vm, id=209, stack(0x5b80,
>>> 0x5b84)]
>>>
>>> siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN),
>>> si_addr=0x000709cc09e7
>>>
>>>
>>> I can run ES using 32bit java but have to shrink ES_HEAPS_SIZE more than
>>> I want to.  Any assistance would be appreciated.
>>>
>>> Regards,
>>> Tony
>>>
>>>
>>> On Tuesday, July 22, 2014 5:43:28 AM UTC-4, David Roberts wrote:

 Hello,

 After upgrading from Elasticsearch 1.0.1 to 1.2.2 I'm getting JVM core
 dumps on Solaris 10 on SPARC.

 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x7e452d78, pid=15483, tid=263
 #
 # JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) (build
 1.7.0_55-b13)
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode
 solaris-sparc compressed oops)
 # Problematic frame:
 # V  [libjvm.so+0xc52d78]  Unsafe_GetLong+0x158

 I'm pretty sure the problem here is that Elasticsearch is making
 increasing use of "unsafe" functions in Java, presumably to speed things
 up, and some CPUs are more picky than others about memory alignment.  In
 particular, x86 will tolerate misaligned memory access whereas SPARC won't.

 Somebody has tried to report this to Oracle in the past and
 (understandably) Oracle has said that if you're going to use unsafe
 functions you need to understand what you're doing:
 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021574

 A quick grep through the code of the two versions of Elasticsearch
 shows that the new use of "unsafe" memory access functions is in the
 BytesReference, MurmurHash3 and HyperLogLogPlusPlus classes:

 bash-3.2$ git checkout v1.0.1
 Checking out files: 100% (2904/2904), done.

 bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils
 ./src/main/java/org/elasticsearch/common/util/UnsafeUtils.java:public
 enum UnsafeUtils {
 ./src/main/java/org/elasticsearch/search/aggregations/bucket/
 BytesRefHash.java:if (id == -1L || UnsafeUtils.equals(key,
 get(id, spare))) {
 ./src/main/java/org/elasticsearch/search/aggregations/bucket/
 BytesRefHash.java:} else if (UnsafeUtils.equals(key,
 get(curId, spare))) {
 ./src/test/java/org/elasticsearch/benchmark/common/util/Byte
 sRefComparisonsBenchmark.java:import org.elasticsearch.common.util.
 UnsafeUtils;
 ./src/test/java/org/elasticsearch/benchmark/common/util/Byte
 sRefComparisonsBenchmark.java:return
 UnsafeUtils.equals(b1, b2);

 bash-3.2$ git checkout v1.2.2
 Checking out files: 100% (2220/2220), done.

 bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils
 ./src/main/java/org/elasticsearch/common/bytes/BytesReference.java:import
 org.elasticsearch.common.util.UnsafeUtils;
 ./src/main/java/org/elasticsearch/common/bytes/BytesReferenc
 e.java:return UnsafeUtils.equals(a.array(),
 a.arrayOffset(), b.array(), b.arrayOffset(), a.length());
 ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java:import
 org.elasticsearch.com

Re: How to do sequence matching

2014-08-22 Thread Smitha Gowda
Sure.
Going back to my original example

This is a session document containing a sequence of events occurring in
specific interval

Session :
 {
 StartTime:"20130101T01:00"
 EndTime:"20130101T04:00"
 Sequence: "A B C"
 Events: [
  {
  Name: "A"
  StartTime:"20130101T01:00"
  EndTime:"20130101T02:00"
  },
  {
  Name: "B"
  StartTime:"20130101T02:30"
  EndTime:"20130101T03:00"
  },
  {
  Name: "C"
  StartTime:"20130101T03:30"
  EndTime:"20130101T04:00"
  }
  ]
 }

What I want
1. Match all the documents having a specific sequence of events, say "B C"
2. On the result,  bucket aggregate documents by day on Session.StartTime
(Date_Histogram)
3. on each bucket find the average of time elapsed in seconds between the
searched sequence. Here it was "B, C", so it will be
session.Events[indexOfC].EndTime - session.Events[indexOfB].StartTime

I tried bucket filter aggregation on #1, seems to be working
I tried  date_histogram for #2, not working,* I am not sure how to consume
the result of #1 in #2*
*I have not reached to trying #3 because #2 is not working, but I think I
need avg aggr with script value.*

Can you help with syntax or pointer on highlighted. *I am also interested
in how I feed it to a Kibana chart.*

Thanks in advance.



On Fri, Aug 22, 2014 at 2:26 AM, vineeth mohan 
wrote:

> Hello Smitha ,
>
> Please be more elaborate.
> What is the sequence AB , what is event here and what is last and first
> event.
>
> Thanks
>Vineeth
>
>
> On Fri, Aug 22, 2014 at 8:16 AM, Smitha Gowda 
> wrote:
>
>> Thanks that will work.
>>
>> One more question related to Kibana to visualize this data.
>>
>> For a query that matches sequence "AB"
>> Once I have all the matching documents I want to plot a bar chart with
>> x-axis: Session StartTime (Day granularity)
>> y-axis: Mean of (LastEvent.EndTime(In this example B) -
>> FirstEvent.StartTime(In this Example A)) for the given day
>>
>> Any pointers on how do I aggregate on other properties on the matched
>> document?
>>
>> Thanks in advance!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/320375d3-a0f6-402e-92a0-08279b1f7c7c%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/0DT9B499joU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMedq5ta5G%2BvX4Hzi19PyY6s9Kc2JFL-2mrkca_mvbww%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACyOFkRDH_6B_r0XJ0st%3Dx0wS7Z%3DOsdqgGHghqAsO-N%3DFvSS1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Accuracy of aggregation when having queries

2014-08-22 Thread Adrien Grand
Yes, this issue also exists if you specify a query.


On Thu, Aug 21, 2014 at 7:02 PM, Roxana Balaci  wrote:

> I am reading this post about not having maximum accuracy on aggregation
> results on terms, when adding the "size" parameter:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_document_counts_are_approximate
> Does this lack of accuracy happen when you also have a query? like:
> {
> query:{}
> aggs{}
> }
> Thanks,
> Roxana
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5561a5cc-eef9-456f-a32c-c0be70d6e4ac%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6eR3wzjZfS53HC%2B32bHmn1LeOvmRUojjWh9TAw6mmtVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Error running ES DSL in hadoop mapreduce

2014-08-22 Thread Adrien Grand
Hi Sona,

Would you have the rest of the stack trace, I would like to know where the
ArrayIndexOutOfBoundsException occurred? What version of Elasticsearch are
you using?


On Fri, Aug 22, 2014 at 1:37 PM, Sona Samad  wrote:

> Hi,
>
> I was trying to run the below query from hadoop mapreduce:
>
> {
>  "aggs": {
> "group_by_body_part": {
>   "terms": {
> "field": "body_part",
> "size": 5,
> "order" : { "examcount" : "desc" }
> },
>   "aggs": {
> "examcount": {
>   "cardinality": {
> "field": "ExamRowKey"
>   }
> }
>   }
> }
>   }
> }
>
> The query is returning more than 5 records, even when the size is given as
> 5.
> Also, the result was not aggregated, rather it returns the entire record
> from the index as value to mapper.
>
> Also the following error is logged:
>
> [2014-08-22 16:06:21,459][DEBUG][action.search.type   ] [Algrim the
> Strong] All shards failed for phase: [init_scan]
> [2014-08-22 16:26:38,875][DEBUG][action.search.type   ] [Algrim the
> Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to
> execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
> org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]:
> query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
> [Failed to execute main query]
> at
> org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
> at
> org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
> at
> org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
> at
> org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
> at
> org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>
>
> Could you please help to create the correct query.
>
> Thanks,
> Sona
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9a5b-b4b4-4b71-8977-ccc80e0e320d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j75DqQs-SG5G5T-zfhxECNK9%2BuEhugLwPFir0a-cgNfig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: aggregate max against date field returns unformatted date

2014-08-22 Thread Adrien Grand
You are doing things right, it is just that the max aggregation doesn't
support this feature.


On Wed, Aug 20, 2014 at 8:56 PM,  wrote:

> queries such as
> {
>   "aggs": {
> "max_startTime": {
>   "max": {
> "field": "startDate"
>   }
> },
> "min_startTime": {
>   "min": {
> "field": "startDate"
>   }
> }
>   }
> }
>
> return s140834239392... instead of nicely formatted date.
> The field is maked as date type.
> Am I using the query wrong? is there a way to 'cast' the data to date?
>
> thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/fdc5-1ca9-4140-b828-770389f38691%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j79zd9B9QeVMcED2Xay4_Mx7DLuz0w2y%3DL-YcTxBJsOQg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Terms Filter Assistance

2014-08-22 Thread Adrien Grand
You are trying to perform a join. The closest things to a join that
elasticsearch has are nested documents (index-time joins) and parent/child
relationships (search-time joins). However, I don't think any of these
solutions would work out of the box for you since your are trying to join
on two fields at the same time.


On Wed, Aug 20, 2014 at 10:40 PM, Brian  wrote:

> We have 2 indices (logs & intel) and are trying to search 2 fields in the
> logs index (src & dst) for any match from the intel ip field. The challenge
> is the terms filter is expecting 1 document with all the values to be
> searched for within that document.  The intel index has over 150k documents.
>
> Is there a way to extract the ip field from the intel index (aggregations
> maybe) and use that to search the src & dst fields in the logs index?
>
> Here is the code I am trying to use:
>
> curl -XGET localhost:9200/logs/_search -d '{
>   "query" : {
> "filtered" : {
>   "filter" : {
> "terms" : {
>   "src" : {
> "index" : "intel",
> "type" : "ipaddress",
> "id" : "*",
> "path" : "ip"
>   },
>
>   "dst" : {
> "index" : "intel",
> "type" : "ipaddress",
> "id" : "*",
> "path" : "ip"
>   },
>
> }
>   }
> }
>   }
> }
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b2d9d8c9-4747-4cb6-badc-4752345544dc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7t8p7Z%3Dob4FeRH1cw0M0eUUa%2BB8-1-bgaui9AiAaNOsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Route documents at index time to a particular shard

2014-08-22 Thread Adrien Grand
You cannot specify a shard but you can give a routing value that will make
sure that all documents that have the same routing value are on the same
shard.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/routing-value.html


On Thu, Aug 21, 2014 at 8:04 AM, 'Sandeep Ramesh Khanzode' via
elasticsearch  wrote:

> Hi,
>
> Can you please tell me if there is a plugin that I can use during indexing
> which will let me direct a document to a particular shard? So that I can
> set the shardId and send the document as the request to that shard?
>
> Thanks,
> Sandeep
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/dec2f249-4c1a-4c56-88d0-2a9493e5ec4a%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j68c-7Ch%2BfJ0h4sz_hCnCUEv%3DQe5RFBMfzRC7ycs7hEzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sustainable way to regularly purge deleted docs

2014-08-22 Thread Adrien Grand
Hi Jonathan,

The default merge policy is already supposed to merge quite aggressively
segments that contain lots of deleted documents so it is a bit surprising
that you can see that many numbers of deleted documents, even with merge
throttling disabled.

You mention having memory pressure because of the number of documents in
your index, do you know what causes this memory pressure? In case it is due
to field data maybe you could consider storing field data on disk? (what we
call "doc values")



On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy  wrote:

> Hello
>
> I'm in the process of putting a two-node Elasticsearch cluster (1.1.2)
> into production, but I'm having a bit of trouble keeping it stable enough
> for comfort.  Specifically, I'm trying to figure out the best way to keep
> the number of deleted documents under control.
>
> Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM).  The ES
> cluster mirrors the primary data store, a MySQL database.  Relevant updates
> to the database are caught via triggers which populate a table that's
> monitored by an indexing process.  This results in what I'd consider of lot
> of reindexing, any time the primary data is updated.  Search and indexing
> performance thus far has been in line with expectations when the number of
> deleted documents is small, but as it grows (up to 30-40%), the amount of
> available RAM becomes limited, ultimately causing memory problems.  If I
> optimize/purge deletes then things return to normal, though I usually end
> up having to restart at least one server if not both due to OOM problems
> and shard failures during optimization.  When ES becomes the source of all
> searches for the application, I can't really afford this downtime.
>
> What would be the preferred course of action here?  I do have a window
> over the weekend where I could work with somewhat reduced capacity;  I was
> thinking perhaps I could pull one node out of search rotation, optimize it,
> swap it with the other, optimize it, and then go on my way.  However, I
> don't know that I CAN pull one node out of rotation (it seems like the
> search API lets me specify a node, but nothing to say "Node X doesn't need
> any searches"), nor does it appear that I can optimize an index on one node
> without doing the same to the other.
>
> I've tried tweaking the merge settings to favour segments containing large
> numbers of deletions, but it doesn't seem to make enough of a difference.
> I've also disabled merge throttling (I do have SSD-backed storage).  Is
> there any safe way to perform regular maintenance on the cluster,
> preferably one node at a time, without causing TOO many problems?  Am I
> just trying to do too much with the hardware I have?
>
> Any advice is appreciated.  Let me know what info I left out that would
> help.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6Zfo0LZ_Zot2gaNuHMP-6iJn5qyG30kTOMr%3DkrvABkfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: One large index vs. many smaller indexes

2014-08-22 Thread Adrien Grand
Hi Chris,

Usually, the problem is not that much in terms of indices but shards, which
are the physical units of data storage (an index being a logical view over
several shards).

Something to beware of is that shards typically have some constant overhead
(disk space, file descriptors, memory usage) that does not depend on the
amount of data that they store. Although it would be ok to have up to a few
tens of shards per nodes, you should avoid to have eg. thousands of shards
per node.

if you plan on always adding a filter for a specific application in your
search requests, then splitting by application makes sense since this will
make the filter useless at search time, you will just need to query the
application-specific index. On the other hand if you don't filter by
application, then splitting data by yourself into smaller indices would be
pretty equivalent to storing everything in a single index with a higher
number of shards.

You might want to check out the following resources that talk about
capacity planning:
 - http://www.elasticsearch.org/videos/big-data-search-and-analytics/
 -
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html



On Fri, Aug 22, 2014 at 9:08 PM, Chris Neal 
wrote:

> Hi all,
>
> As the subject says, I'm wondering about index size vs. number of indexes.
>
> I'm indexing many application log files, currently with an index by day
> for all logs, which will make a very large index.  For just a few
> applications in Development, the index is 55GB a day (across 2 servers).
>  In prod with all applications, it will be "much more than that".  1TB a
> day maybe?
>
> I'm wondering if there is value in splitting the indexes by day and by
> application, which would produce more indexes per day, but they would be
> smaller, vs. value in having a single, mammoth index by day alone.
>
> Is it just a resource question?  If I have enough RAM/disk/CPU to support
> a "mammoth" index, then I'm fine?  Or are there other reasons to (or to
> not) split up indexes?
>
> Very much appreciate your time.
> Chris
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAND3DphfsYx0LW0M-yvLWGauRSzVWG0etaBkiTrN7zVafq7tMA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5i7AAnasMYZgR83aTXvELan%3DkR6OLvGYKfs9d5Subi4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parent/Child query performance in version 1.1.2

2014-08-22 Thread Adrien Grand
Hi Mark,

Given that you had 1 replica in your first setup, it could take several
queries to warm up the field data cache completely, does the query still
take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but
just to be sure)

Does it change anything if you query elasticsearch with preference=_local?
This should be equivalent to your single-node setup, so it would be
interesting to see if that changes something.

As a side note, you might want to try out a more recent version of
Elasticsearch since parent/child performance improved quite significantly
in 1.2.0 because of https://github.com/elasticsearch/elasticsearch/pull/5846



On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene  wrote:

> I wanted to update the list with an interesting piece of information. We
> found that when we took one of our two data nodes out of the cluster,
> leaving just one data node with no replicas, the query performance
> increased dramatically. The queries are now returning in <100ms on
> subsequent executions which is what we'd expect to see as a result of the
> data being stored in the field data cache.
>
> Is it possible that there is some kind of inefficient code path when a
> query is spread across primary and replica shards?
>
>
> On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
>>
>> We are experiencing slow parent/child queries even when we run the query
>> a second time and I wanted to know if this is just the limit of this
>> feature within ElasticSearch. According to the ES Docs (
>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/
>> current/parent-child-performance.html) parent/child queries can be 5-10x
>> slower and consume a lot of memory.
>>
>> My impression has been that as long as we give ES enough memory via the
>> field data cache, subsequent queries would be quicker than the first time
>> it is executed. We are seeing the following query take ~16 seconds to
>> complete every time.
>>
>>
>> {
>> "from": 0,
>> "size": 100,
>> "query": {
>> "filtered": {
>> "query": {
>> "match_all": {}
>> },
>> "filter": {
>> "bool": {
>> "must": [
>> {
>> "term": {
>> "oid": 61
>> }
>> },
>> {
>> "has_child": {
>> "type": "social",
>> "query": {
>> "bool": {
>> "should": [
>> {
>> "term": {
>> "engagement.type":
>> "like"
>> }
>> },
>> {
>> "term": {
>> "content.remote_id":
>> "20697868961_10152270678178962"
>> }
>> }
>> ]
>> }
>> }
>> }
>> }
>> ]
>> }
>> }
>> }
>> },
>> "fields": "id",
>> "sort": [
>> {
>> "_score": {}
>> },
>> {
>> "id": {
>> "order": "asc"
>> }
>> }
>> ]
>> }
>>
>>
>> The index (which has 5 shards with 1 replica shard) we are testing this
>> on has 2.2 million parent documents and 1.1 million child documents.
>>
>> We are running our two data nodes on r3.2xlarge's which have 8 CPU's,
>> 60GB of RAM, and SSD.
>>
>> Our ES data nodes have 30G of heap and the field data cache is only
>> consuming around ~3GB right now and there are no cache evictions. The field
>> data cache is also allowed to grow to 75% of the available heap.
>>
>> I'm looking to understand if this is a limitation with parent/child or is
>> there additional configuration that has to be set beyond the defaults that
>> would help speed these queries up?
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com
> 
> .
>
> For more 

Re: Parent/Child query performance in version 1.1.2

2014-08-22 Thread Mark Greene
I wanted to update the list with an interesting piece of information. We 
found that when we took one of our two data nodes out of the cluster, 
leaving just one data node with no replicas, the query performance 
increased dramatically. The queries are now returning in <100ms on 
subsequent executions which is what we'd expect to see as a result of the 
data being stored in the field data cache. 

Is it possible that there is some kind of inefficient code path when a 
query is spread across primary and replica shards?

On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
>
> We are experiencing slow parent/child queries even when we run the query a 
> second time and I wanted to know if this is just the limit of this feature 
> within ElasticSearch. According to the ES Docs (
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child-performance.html)
>  
> parent/child queries can be 5-10x slower and consume a lot of memory. 
>
> My impression has been that as long as we give ES enough memory via the 
> field data cache, subsequent queries would be quicker than the first time 
> it is executed. We are seeing the following query take ~16 seconds to 
> complete every time. 
>
>
> {
> "from": 0,
> "size": 100,
> "query": {
> "filtered": {
> "query": {
> "match_all": {}
> },
> "filter": {
> "bool": {
> "must": [
> {
> "term": {
> "oid": 61
> }
> },
> {
> "has_child": {
> "type": "social",
> "query": {
> "bool": {
> "should": [
> {
> "term": {
> "engagement.type": 
> "like"
> }
> },
> {
> "term": {
> "content.remote_id": 
> "20697868961_10152270678178962"
> }
> }
> ]
> }
> }
> }
> }
> ]
> }
> }
> }
> },
> "fields": "id",
> "sort": [
> {
> "_score": {}
> },
> {
> "id": {
> "order": "asc"
> }
> }
> ]
> }
>
>
> The index (which has 5 shards with 1 replica shard) we are testing this on 
> has 2.2 million parent documents and 1.1 million child documents.
>
> We are running our two data nodes on r3.2xlarge's which have 8 CPU's, 60GB 
> of RAM, and SSD.
>
> Our ES data nodes have 30G of heap and the field data cache is only 
> consuming around ~3GB right now and there are no cache evictions. The field 
> data cache is also allowed to grow to 75% of the available heap.
>
> I'm looking to understand if this is a limitation with parent/child or is 
> there additional configuration that has to be set beyond the defaults that 
> would help speed these queries up?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-22 Thread tony . aponte
Hi Adrien,
It's a bunch of garbled binary data, basically a dump of the process image.
Tony


On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote:
>
> Hi Tony,
>
> Do you have more information in the core dump file? (cf. the "Core dump 
> written" line that you pasted)
>
>
> On Thu, Aug 21, 2014 at 7:53 PM, > wrote:
>
>> Hello,
>> I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to scale 
>> out of small x86 machine.  I get a similar exception running ES with 
>> JAVA_OPTS=-d64.  When Logstash 1.4.1 sends the first message I get the 
>> error below on the ES process:
>>
>>
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGBUS (0xa) at pc=0x7a9a3d8c, pid=14473, tid=209
>> #
>> # JRE version: 7.0_25-b15
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode 
>> solaris-sparc compressed oops)
>> # Problematic frame:
>> # V  [libjvm.so+0xba3d8c]  Unsafe_GetInt+0x158
>> #
>> # Core dump written. Default location: 
>> /export/home/elasticsearch/elasticsearch-1.3.2/core or core.14473
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://bugreport.sun.com/bugreport/crash.jsp
>> #
>>
>> ---  T H R E A D  ---
>>
>> Current thread (0x000107078000):  JavaThread 
>> "elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker #147}" 
>> daemon [_thread_in_vm, id=209, stack(0x5b80,0x5b84)]
>>
>> siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN), 
>> si_addr=0x000709cc09e7
>>
>>
>> I can run ES using 32bit java but have to shrink ES_HEAPS_SIZE more than 
>> I want to.  Any assistance would be appreciated.
>>
>> Regards,
>> Tony
>>
>>
>> On Tuesday, July 22, 2014 5:43:28 AM UTC-4, David Roberts wrote:
>>>
>>> Hello,
>>>
>>> After upgrading from Elasticsearch 1.0.1 to 1.2.2 I'm getting JVM core 
>>> dumps on Solaris 10 on SPARC.
>>>
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  SIGBUS (0xa) at pc=0x7e452d78, pid=15483, tid=263
>>> #
>>> # JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) (build 
>>> 1.7.0_55-b13)
>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode 
>>> solaris-sparc compressed oops)
>>> # Problematic frame:
>>> # V  [libjvm.so+0xc52d78]  Unsafe_GetLong+0x158
>>>
>>> I'm pretty sure the problem here is that Elasticsearch is making 
>>> increasing use of "unsafe" functions in Java, presumably to speed things 
>>> up, and some CPUs are more picky than others about memory alignment.  In 
>>> particular, x86 will tolerate misaligned memory access whereas SPARC won't.
>>>
>>> Somebody has tried to report this to Oracle in the past and 
>>> (understandably) Oracle has said that if you're going to use unsafe 
>>> functions you need to understand what you're doing: 
>>> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021574
>>>
>>> A quick grep through the code of the two versions of Elasticsearch shows 
>>> that the new use of "unsafe" memory access functions is in the 
>>> BytesReference, MurmurHash3 and HyperLogLogPlusPlus classes:
>>>
>>> bash-3.2$ git checkout v1.0.1
>>> Checking out files: 100% (2904/2904), done.
>>>
>>> bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils
>>> ./src/main/java/org/elasticsearch/common/util/UnsafeUtils.java:public 
>>> enum UnsafeUtils {
>>> ./src/main/java/org/elasticsearch/search/aggregations/bucket/BytesRefHash.java:
>>> 
>>> if (id == -1L || UnsafeUtils.equals(key, get(id, spare))) {
>>> ./src/main/java/org/elasticsearch/search/aggregations/bucket/BytesRefHash.java:
>>> 
>>> } else if (UnsafeUtils.equals(key, get(curId, spare))) {
>>> ./src/test/java/org/elasticsearch/benchmark/common/util/
>>> BytesRefComparisonsBenchmark.java:import org.elasticsearch.common.util.
>>> UnsafeUtils;
>>> ./src/test/java/org/elasticsearch/benchmark/common/util/
>>> BytesRefComparisonsBenchmark.java:return 
>>> UnsafeUtils.equals(b1, b2);
>>>
>>> bash-3.2$ git checkout v1.2.2
>>> Checking out files: 100% (2220/2220), done.
>>>
>>> bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils
>>> ./src/main/java/org/elasticsearch/common/bytes/BytesReference.java:import 
>>> org.elasticsearch.common.util.UnsafeUtils;
>>> ./src/main/java/org/elasticsearch/common/bytes/
>>> BytesReference.java:return 
>>> UnsafeUtils.equals(a.array(), a.arrayOffset(), b.array(), b.arrayOffset(), 
>>> a.length());
>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java:import 
>>> org.elasticsearch.common.util.UnsafeUtils;
>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java:
>>> return UnsafeUtils.readLongLE(key, blockOffset);
>>> ./src/main/java/org/elasticsearch/common/hash/
>>> MurmurHash3.java:long k1 = UnsafeUtils.readLongLE(key, 
>>> i);
>>> ./src/main/java/org/elasticsearch/common/hash/
>>> MurmurHash3.java:long k2 = UnsafeUtils.read

One large index vs. many smaller indexes

2014-08-22 Thread Chris Neal
Hi all,

As the subject says, I'm wondering about index size vs. number of indexes.

I'm indexing many application log files, currently with an index by day for
all logs, which will make a very large index.  For just a few applications
in Development, the index is 55GB a day (across 2 servers).  In prod with
all applications, it will be "much more than that".  1TB a day maybe?

I'm wondering if there is value in splitting the indexes by day and by
application, which would produce more indexes per day, but they would be
smaller, vs. value in having a single, mammoth index by day alone.

Is it just a resource question?  If I have enough RAM/disk/CPU to support a
"mammoth" index, then I'm fine?  Or are there other reasons to (or to not)
split up indexes?

Very much appreciate your time.
Chris

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3DphfsYx0LW0M-yvLWGauRSzVWG0etaBkiTrN7zVafq7tMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Boost the first word in a multi-word query

2014-08-22 Thread vineeth mohan
Hello Jeremy ,

You can try query_string then.

Query as "Brown^2 dog"

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query

Thanks
   Vineeth


On Sat, Aug 23, 2014 at 12:11 AM, Jérémy  wrote:

> Thanks for your answer!
>
> Unfortunately the phrase query is not enough, because I still want to keep
> words optional. In my understanding, the phrase query requires all the
> words of the query to be present.
>
> Cheers,
> Jeremy
>
>
> On Fri, Aug 22, 2014 at 8:20 PM, vineeth mohan 
> wrote:
>
>> Hello Jeremy ,
>>
>> I feel what you are looking for is a phrase query . It takes into
>> consideration the order of words -
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase
>>
>> Thanks
>>   Vineeth
>>
>>
>> On Fri, Aug 22, 2014 at 3:28 PM, Jeremy  wrote:
>>
>>> In case of a multi-word query, is there a way to boost the first terms
>>> of the query?
>>>
>>> For example, in the following query:
>>> GET /my_index/my_type/_search
>>> {
>>> "query": {
>>> "match": {
>>> "title": "BROWN DOG!"
>>> }
>>> }
>>> }
>>>
>>> "Brown" should be prioritized over "dog", therefore searching for "brown
>>> dog" will not return the same scores as searching for "dog brown".
>>> I'm ideally looking for a solution which work with N words and put
>>> weight accordingly the number of words.
>>>
>>> Regards,
>>> Jeremy
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/ojEtydA4zAw/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwjxRwLgfHAmNWxoGa0BX5ZSEtk6J0QFBvWCBcW8wX42Q%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5k5M8vasScWqjx%2BwHUD%2B-EGof2cLGJGH3YueMKpW0hYFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Boost the first word in a multi-word query

2014-08-22 Thread Jérémy
Thanks for your answer!

Unfortunately the phrase query is not enough, because I still want to keep
words optional. In my understanding, the phrase query requires all the
words of the query to be present.

Cheers,
Jeremy


On Fri, Aug 22, 2014 at 8:20 PM, vineeth mohan 
wrote:

> Hello Jeremy ,
>
> I feel what you are looking for is a phrase query . It takes into
> consideration the order of words -
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase
>
> Thanks
>   Vineeth
>
>
> On Fri, Aug 22, 2014 at 3:28 PM, Jeremy  wrote:
>
>> In case of a multi-word query, is there a way to boost the first terms of
>> the query?
>>
>> For example, in the following query:
>> GET /my_index/my_type/_search
>> {
>> "query": {
>> "match": {
>> "title": "BROWN DOG!"
>> }
>> }
>> }
>>
>> "Brown" should be prioritized over "dog", therefore searching for "brown
>> dog" will not return the same scores as searching for "dog brown".
>> I'm ideally looking for a solution which work with N words and put weight
>> accordingly the number of words.
>>
>> Regards,
>> Jeremy
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/ojEtydA4zAw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwjxRwLgfHAmNWxoGa0BX5ZSEtk6J0QFBvWCBcW8wX42Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Search terms matching order of precedence?

2014-08-22 Thread vineeth mohan
Hello Eric ,

I don't exactly understand your requirement.
Please be elaborate.

Thanks
Vineeth


On Fri, Aug 22, 2014 at 11:37 PM, Eric Greene  wrote:

> Hi Vineeth thanks so much this looks like it will help me.
>
> I have another question, if you don't mind... (or should I post a new
> question?)
>
> I would like to specify my top results based on:
>
> 1) A description field and tags both are hits.
> 2) Description field only is a hit.
> 3) Tags only have a hit.
>
> Is there something I can learn about to understand this?  Thanks Eric
>
>
>
>
>
> On Friday, August 22, 2014 10:50:48 AM UTC-7, vineeth mohan wrote:
>
>> Hello Eric ,
>>
>> Please explore phrase query - http://www.elasticsearch.
>> org/guide/en/elasticsearch/reference/current/query-dsl-
>> match-query.html#_phrase
>>
>> Also there is query_string type which has support for AND , OR etc and
>> even phrase query - http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/query-dsl-query-string-query.html#query-
>> dsl-query-string-query
>>
>> Hope that helps.
>>
>> Thanks
>>Vineeth
>>
>>
>> On Fri, Aug 22, 2014 at 11:17 PM, Eric Greene  wrote:
>>
>>> Hi everyone, I would like to take a search query with multiple terms and
>>> possibly define an order of precedence in the following way.
>>>
>>> (bare with me as I become familiar with elasticsearch lingo!)
>>>
>>> Can I specify that the exact match is first (The search words "word A
>>> word B" matches "word A + word B"),
>>> then it matches "word B + word A",
>>> then just "word B" is found,
>>> then either "word A or word B"
>>>
>>> I'd like to understand how to shuffle the above variations as well?
>>>
>>> Thanks much.
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/405bb3f0-d496-4c3d-8aa8-ec7afaf2075f%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c3614746-8aa8-43d6-bda6-00787bc8abfa%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mTEe8pR45vfOqU0DFUCsY68o0Sx3-SCWNKdfWaft6Fvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to recover index if replicas are set to zero?

2014-08-22 Thread vineeth mohan
Hello Sandeep ,

Instead of backing up the data directory , do a snapshot and you can
restore it later to single node machine or 10 node machine.

SNAPSHOT -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_snapshot

Thanks
  Vineeth


On Fri, Aug 22, 2014 at 2:23 PM, 'Sandeep Ramesh Khanzode' via
elasticsearch  wrote:

> Hi,
>
> If I have an index with 10 shards but no replicas on a 2 node cluster, and
> one node goes down. Can I somehow recover the 5 shards to the only running
> node? Assuming that I backup the data directory on the node and make it
> available to the other node. Is there an API or an automatic way to achieve
> this?
>
> Thanks,
> Sandeep
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3b1caca8-66ed-4196-838b-b58fafa1601d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nOMgNrGrSqWN%2BmSt31io_qnJLJ3hfHGq9xBghZXX3W0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Boost the first word in a multi-word query

2014-08-22 Thread vineeth mohan
Hello Jeremy ,

I feel what you are looking for is a phrase query . It takes into
consideration the order of words -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase

Thanks
  Vineeth


On Fri, Aug 22, 2014 at 3:28 PM, Jeremy  wrote:

> In case of a multi-word query, is there a way to boost the first terms of
> the query?
>
> For example, in the following query:
> GET /my_index/my_type/_search
> {
> "query": {
> "match": {
> "title": "BROWN DOG!"
> }
> }
> }
>
> "Brown" should be prioritized over "dog", therefore searching for "brown
> dog" will not return the same scores as searching for "dog brown".
> I'm ideally looking for a solution which work with N words and put weight
> accordingly the number of words.
>
> Regards,
> Jeremy
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: name conflicts between types and fields

2014-08-22 Thread vineeth mohan
Hello Karol ,

This is definitely a bug.

I think i understand what happens.
There is no type support in lucene and ES would be storing all fields as

TYPE.field.full.path format

Which means , there would be country.name and country.country.name field in
the index.
If you try with the following query -

{
  "facets": {
"term": {
  "terms": {
"field": "country.country.name",
"size": 10
  }
}
  }
}
You will get the expected result.

I have filed a ISSUE -
https://github.com/elasticsearch/elasticsearch/issues/7411

Thanks
Vineeth



On Fri, Aug 22, 2014 at 4:37 PM, Karol Gwaj  wrote:

> hi,
>
> is there any way to indicate that field name path used in the query
> represents absolute path ?
> looks like elasticsearch is recognizing that field path starts with type
> name and removes it before executing query
>
> consider example below:
>
> 1. insert two test documents:
>
> POST test/sublocality/1
> {
> "name" : "xxx",
> "country" :
> {
> "name" : "yyy"
> }
> }
>
>
> POST test/*country*/1
> {
> "name" : "zzz",
> "country" :
> {
> "name" : "yyy"
> }
> }
>
> 2. execute facet query:
>
> POST test/_search
> {
> "facets": {
>"term": {
>   "terms": {
>  "field": "*country.name *",
>  "size": 10
>   }
>}
> }
> }
>
> i will expect that facet response will contain only one term 'yyy' with
> count 2
> instead im getting 'xxx' with count 1 and 'zzz' with count 1
>
> so looks like elasticsearch recognized that field name is prefixed with
> type (country) and trimmed it:
>
> POST test/_search
> {
> "facets": {
>"term": {
>   "terms": {
>  "field": "*name*",
>  "size": 10
>   }
>}
> }
> }
>
> any idea how to get around it without renaming types and fields ?
> im using *elasticsearch 2.x*
>
>
> Thx,
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/53b99101-bf16-4905-98c5-49af2217e579%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mRzRrwhA6e%2BTgf%3DRck2dHA-VSSsxr1dnorG%3D6EUkiBXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Search terms matching order of precedence?

2014-08-22 Thread Eric Greene
Hi Vineeth thanks so much this looks like it will help me.

I have another question, if you don't mind... (or should I post a new 
question?)

I would like to specify my top results based on:

1) A description field and tags both are hits.
2) Description field only is a hit.
3) Tags only have a hit. 

Is there something I can learn about to understand this?  Thanks Eric




On Friday, August 22, 2014 10:50:48 AM UTC-7, vineeth mohan wrote:
>
> Hello Eric , 
>
> Please explore phrase query - 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase
>
> Also there is query_string type which has support for AND , OR etc and 
> even phrase query - 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query
>
> Hope that helps.
>
> Thanks
>Vineeth
>
>
> On Fri, Aug 22, 2014 at 11:17 PM, Eric Greene  > wrote:
>
>> Hi everyone, I would like to take a search query with multiple terms and 
>> possibly define an order of precedence in the following way.
>>
>> (bare with me as I become familiar with elasticsearch lingo!)
>>
>> Can I specify that the exact match is first (The search words "word A 
>> word B" matches "word A + word B"), 
>> then it matches "word B + word A",
>> then just "word B" is found,
>> then either "word A or word B"
>>
>> I'd like to understand how to shuffle the above variations as well?
>>
>> Thanks much. 
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/405bb3f0-d496-4c3d-8aa8-ec7afaf2075f%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3614746-8aa8-43d6-bda6-00787bc8abfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch is taking longer time while doing update and search simultaneously

2014-08-22 Thread vineeth mohan
Hello Subhadip* , *


What exactly are you trying to achieve using this code.

 updateResponse = client.prepareUpdate(index, type, id)
  .setDoc(jsonBuilder()
  .startObject().field("view_mode", "read")
  .endObject())
.setDocAsUpsert(true)
.setFields("_source")
.setTimeout("1")

I was wondering where the modification data is given.

Thanks
Vineeth



On Fri, Aug 22, 2014 at 7:00 PM, Subhadip Bagui  wrote:

> Hi,
>
> I'm doing update in elasticsearch document and the same time one rest api
> is calling for search results. Below is my code.
>
> public String updateElasticsearchDocument(String index, String type,
> List indexID) {
> Client client = ESClientFactory.getInstance();
> UpdateResponse updateResponse = null;
> JSONObject jsonResponse = new JSONObject();
> JSONObject json =new JSONObject();
> int i=1;
>  try {
> for(String id : indexID)
> {
>  updateResponse = client.prepareUpdate(index, type, id)
>   .setDoc(jsonBuilder()
>   .startObject().field("view_mode", "read")
>   .endObject())
> .setDocAsUpsert(true)
> .setFields("_source")
> .setTimeout("1")
>   .execute().actionGet();
>  logger.info("updating the document for type= "+
> updateResponse.getType()+ " for id= "+ updateResponse.getId());
>
>  json.put("indexID"+i, updateResponse.getId());
>  i++;
> }
> jsonResponse.put("updated_index", json);
>  } catch (ActionRequestValidationException e) {
> logger.warn(this.getClass().getName() + ":" + "updateDocument: "
> + e.getMessage(), e);
> }
> catch (ElasticsearchException e) {
> logger.warn(this.getClass().getName() + ":" + "updateDocument: "
> + e.getMessage(), e);
> e.printStackTrace();
> } catch (IOException e) {
> logger.warn(this.getClass().getName() + ":" + "updateDocument: "
> + e.getMessage(), e);
> } catch (JSONException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return jsonResponse.toString();
> }
>
> *the search query is :*
>
> POST /monitoring/quota-management/_search
>
> {
>   "query": {"match": {
>   "view_mode": "read"
>}},
> "sort": [
>{
>   "_timestamp": {
>  "order": "desc"
>   }
>}
> ],
> "size": 10
> }
>
> Now, I have to wait for like 40-50 seconds to get the updated search
> result. This is affecting the production application.
> Please let me know what needs to be done here to minimizes the time taken.
>
> Thanks,
> Subhadip
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/945425a0-69c2-46bd-b63f-a23bc6dc455c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kp10tiuNpPN4wW1mso_-MLxfB1zBZULM%2BpSeSG6o75dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Search terms matching order of precedence?

2014-08-22 Thread vineeth mohan
Hello Eric ,

Please explore phrase query -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase

Also there is query_string type which has support for AND , OR etc and even
phrase query -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query

Hope that helps.

Thanks
   Vineeth


On Fri, Aug 22, 2014 at 11:17 PM, Eric Greene  wrote:

> Hi everyone, I would like to take a search query with multiple terms and
> possibly define an order of precedence in the following way.
>
> (bare with me as I become familiar with elasticsearch lingo!)
>
> Can I specify that the exact match is first (The search words "word A word
> B" matches "word A + word B"),
> then it matches "word B + word A",
> then just "word B" is found,
> then either "word A or word B"
>
> I'd like to understand how to shuffle the above variations as well?
>
> Thanks much.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/405bb3f0-d496-4c3d-8aa8-ec7afaf2075f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mGsXg5q1u26aVRcXSn_sUom2R25koZ23HLvVOMEcQSbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Simple howto stunnel for elastcisearch cluster.

2014-08-22 Thread John Smith
Ok so I think I figured it out and seems to be working ok. Please feel free 
to publish this or improve upon it etc... Note: client certs have not been 
tested yet.

Software versions used (though I don't think it matters really)
Ubuntu 14.04
JDK 1.8_20
elasticsearch 1.3.2
stunnel4

This config is for 2 node config.


NODE 1


Required config changes to elasticsearch.yml

# First bind elasticsearch to localhost (this makes es invisible to the 
outside world)
network.bind_host: 127.0.0.1
transport.tcp.port: 9300

# Since we are going to hide this node from the outside, we have to tell 
the rest of the nodes how he looks on the outside
network.publish_host: 
transport.publish_port: 9700

http.port: 9200

# Disable muslticast
discovery.zen.ping.multicast.enabled: false

# Since we are hiding all the nodes behind stunnel we also need to proxy es 
client requests through SSL. 
# For each additional node add 127.0.0.1:970x where x is incremented by 1 
I.e: 9702, 9703 etc...
# Connect to NODE 2
discovery.zen.ping.unicast.hosts: 127.0.0.1:9701

stunnel.conf on NODE 1

;Proxy ssl for tcp transport.
[es-trasnport]
accept = :9300
connect = 127.0.0.1:9300
cert = stunnel.pem

;Proxy ssl for http
[es-http]
accept = :9200
connect = 127.0.0.1:9200
cert = stunnel.pem

;ES clustering does some local discovery.
;Since stunnel binds it's own ports, we pick an arbitrary port that is not 
used by other "systems/protocols"
; See the publish settings of elasticsearch.yml above.
[es-transport-local]
client = yes
accept = :9700
connect = :9300

; The ssl client tunnel for es to connect ssl to node 2.
[es-transport-node2]
client = yes
accept = 127.0.0.1:9701
connect = :9301

;For each additional node increment x by 1, I.e: 9702, 9703 etc...
[es-transport-nodex]
client = yes
accept = 127.0.0.1:970x
connect = :930x


NODE 2


Required config changes to elasticsearch.yml

# First bind elasticsearch to localhost (this makes es invisible to the 
outside world)
network.bind_host: 127.0.0.1
transport.tcp.port: 9301

# Since we are going to hide this node from the outside, we have to tell 
the rest of the nodes how he looks on the outside
network.publish_host: 
transport.publish_port: 9701

http.port: 9200

# Disable muslticast
discovery.zen.ping.multicast.enabled: false

# Since we are hiding all the nodes behind stunnel we also need to proxy es 
client requests through SSL. 
# For each additional node add 127.0.0.1:970x where x is incremented by 1 
I.e: 9702, 9703 etc...
# Connect to NODE 1
discovery.zen.ping.unicast.hosts: 127.0.0.1:9700

stunnel.conf on NODE 2

;Proxy ssl for tcp transport.
[es-trasnport]
accept = :9301
connect = 127.0.0.1:9301
cert = stunnel.pem

;Proxy ssl for http
[es-http]
accept = :9200
connect = 127.0.0.1:9200
cert = stunnel.pem

;ES clustering does some local discovery.
;Since stunnel binds it's own ports, we pick an arbitrary port that is not 
used by other "systems/protocols"
; See the publish settings of elasticsearch.yml above.
[es-transport-local]
client = yes
accept = :9701
connect = :9301


; The ssl client tunnel for es to connect ssl to node 1.
[es-transport-node1]
client = yes
accept = 127.0.0.1:9700
connect = :9300

;For each additional node increment x by 1, I.e: 9702, 9703 etc...
[es-transport-nodex]
client = yes
accept = 127.0.0.1:970x
connect = :930x




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f7e8f653-3f09-4a12-92c5-d5e0a54e7f1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Search terms matching order of precedence?

2014-08-22 Thread Eric Greene
Hi everyone, I would like to take a search query with multiple terms and 
possibly define an order of precedence in the following way.

(bare with me as I become familiar with elasticsearch lingo!)

Can I specify that the exact match is first (The search words "word A word 
B" matches "word A + word B"), 
then it matches "word B + word A",
then just "word B" is found,
then either "word A or word B"

I'd like to understand how to shuffle the above variations as well?

Thanks much. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/405bb3f0-d496-4c3d-8aa8-ec7afaf2075f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

2014-08-22 Thread Ivan Brusic
How expensive are your queries? Are you using aggregations or sorting on
string fields that could use up your field data cache? Are you using the
defaults for the cache? Post the current usage.

If you post an example query and mapping, perhaps the community can help
optimize it.

Cheers,

Ivan


On Fri, Aug 22, 2014 at 12:28 AM, Narendra Yadala  wrote:

> I have a cluster of size 240 GB including replica and it has 5 nodes in
> it. I allocated 5 GB RAM (total 5*5 GB) to each node and started the
> cluster. When I start continuously firing queries on the cluster the GC
> starts kicking in and eventually node goes down because of OutOfMemory
> exception. I add upto 200k documents everyday. The indexing part works fine
> but querying part is causing trouble. I have the cluster on ec2 and I use
> ec2 discovery mode.
>
> What is ideal RAM size and are there any other parameters I need to tune
> to get this cluster going?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5b659d11-d757-4f8e-b347-60b3807c2dfe%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDQ9GTt%3Dcf1s1sXy57UMNB-0MNgNgCWEQOLooXDX7yNUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: estab: export fields to TSV

2014-08-22 Thread Martin Czygan
Thanks for the quick fix, it works now. Just a minor issue with nested 
field values (e.g. "institution.address.street" or the like), but I'll 
probably open a issue on GH for that.

Cheers,
Martin

On Friday, 22 August 2014 17:46:30 UTC+2, Jörg Prante wrote:
>
> My fault. I forgot to push out a release for the 1.3 update version.
>
> Here it is:
>
> https://github.com/jprante/elasticsearch-csv/releases/tag/1.3.0.0
>
> ./bin/plugin -install csv -url 
> http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-csv/1.3.0.0/elasticsearch-csv-1.3.0.0-plugin.zip
>
> Jörg
>
>
>
> On Fri, Aug 22, 2014 at 4:32 PM, Martin Czygan  > wrote:
>
>> Hi Jörg,
>>
>> Thanks, didn't know about that plugin! I installed it on Elasticsearch 
>> 1.3.2, it registers fine, but the URL is not available; maybe I am doing[1] 
>> it wrong? 
>>
>> Cheers,
>> Martin
>>
>> [1] https://gist.github.com/miku/da3e1641174c38a7a4ce
>>
>>
>>
>> On Friday, 22 August 2014 14:38:44 UTC+2, Jörg Prante wrote:
>>
>>> You can also use 
>>>
>>> https://github.com/jprante/elasticsearch-csv/
>>>
>>> The plugin is for CSV output and works like the _search endpoint. 
>>>
>>> For TSV, minor adjustments can be applied to the source, or you can use 
>>> LibreOffice for export/import.
>>>
>>> Jörg
>>>
>>>
>>> On Fri, Aug 22, 2014 at 1:16 PM, Martin Czygan >> > wrote:
>>>
  Hi,

 just wanted to announce a tiny project: estab, which helps to export 
 document fields as tab separated values (TSV). It's nothing curl + jq[1] 
 couldn't do, it's just a bit simpler to use. Hope you like it.

 * https://github.com/miku/estab


 Best,
 Martin

 [1] http://stedolan.github.io/jq/ | http://stackoverflow.com/
 questions/18892560/is-there-any-way-in-elasticsearch-to-
 get-results-as-csv-file-in-curl-api/25443448#25443448
  
 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/a1e9c17d-2be9-4542-93c2-9295792518c3%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/605d83cf-9081-48f5-b2d5-bfb9badff507%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/12bd4136-a854-422e-94d4-e462c925fe0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: estab: export fields to TSV

2014-08-22 Thread joergpra...@gmail.com
My fault. I forgot to push out a release for the 1.3 update version.

Here it is:

https://github.com/jprante/elasticsearch-csv/releases/tag/1.3.0.0

./bin/plugin -install csv -url
http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-csv/1.3.0.0/elasticsearch-csv-1.3.0.0-plugin.zip

Jörg



On Fri, Aug 22, 2014 at 4:32 PM, Martin Czygan  wrote:

> Hi Jörg,
>
> Thanks, didn't know about that plugin! I installed it on Elasticsearch
> 1.3.2, it registers fine, but the URL is not available; maybe I am doing[1]
> it wrong?
>
> Cheers,
> Martin
>
> [1] https://gist.github.com/miku/da3e1641174c38a7a4ce
>
>
>
> On Friday, 22 August 2014 14:38:44 UTC+2, Jörg Prante wrote:
>
>> You can also use
>>
>> https://github.com/jprante/elasticsearch-csv/
>>
>> The plugin is for CSV output and works like the _search endpoint.
>>
>> For TSV, minor adjustments can be applied to the source, or you can use
>> LibreOffice for export/import.
>>
>> Jörg
>>
>>
>> On Fri, Aug 22, 2014 at 1:16 PM, Martin Czygan 
>> wrote:
>>
>>> Hi,
>>>
>>> just wanted to announce a tiny project: estab, which helps to export
>>> document fields as tab separated values (TSV). It's nothing curl + jq[1]
>>> couldn't do, it's just a bit simpler to use. Hope you like it.
>>>
>>> * https://github.com/miku/estab
>>>
>>>
>>> Best,
>>> Martin
>>>
>>> [1] http://stedolan.github.io/jq/ | http://stackoverflow.com/
>>> questions/18892560/is-there-any-way-in-elasticsearch-to-
>>> get-results-as-csv-file-in-curl-api/25443448#25443448
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/a1e9c17d-2be9-4542-93c2-9295792518c3%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/605d83cf-9081-48f5-b2d5-bfb9badff507%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHm-cc-v6ZRQ1LA1B4hB%2BMSqCTz%3DfLxzSxOC%2B9W3sRVqA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: estab: export fields to TSV

2014-08-22 Thread Martin Czygan
Hi Jörg,

Thanks, didn't know about that plugin! I installed it on Elasticsearch 
1.3.2, it registers fine, but the URL is not available; maybe I am doing[1] 
it wrong? 

Cheers,
Martin

[1] https://gist.github.com/miku/da3e1641174c38a7a4ce


On Friday, 22 August 2014 14:38:44 UTC+2, Jörg Prante wrote:
>
> You can also use 
>
> https://github.com/jprante/elasticsearch-csv/
>
> The plugin is for CSV output and works like the _search endpoint. 
>
> For TSV, minor adjustments can be applied to the source, or you can use 
> LibreOffice for export/import.
>
> Jörg
>
>
> On Fri, Aug 22, 2014 at 1:16 PM, Martin Czygan  > wrote:
>
>> Hi,
>>
>> just wanted to announce a tiny project: estab, which helps to export 
>> document fields as tab separated values (TSV). It's nothing curl + jq[1] 
>> couldn't do, it's just a bit simpler to use. Hope you like it.
>>
>> * https://github.com/miku/estab
>>
>>
>> Best,
>> Martin
>>
>> [1] http://stedolan.github.io/jq/ | 
>> http://stackoverflow.com/questions/18892560/is-there-any-way-in-elasticsearch-to-get-results-as-csv-file-in-curl-api/25443448#25443448
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a1e9c17d-2be9-4542-93c2-9295792518c3%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/605d83cf-9081-48f5-b2d5-bfb9badff507%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Can indexing slowlog level be modified in logging.yml?

2014-08-22 Thread Tim Hopper
I am trying to tune how indexing slowlogs are collected. I know that I can set 
a threshold 

 
for how long an indexing op on a shard must take before it is logged. I 
would also like to be choose what level I'm logging at. It would appear 
that modifying this line 

 in 
logging.yml would allow me to do this. However, unless I am 
misunderstanding the purpose of that setting, it is being ignored by 
ElasticSearch.

Here is how I came to that conclusion. I have ElasticSearch 1.3.2 running 
locally on my Mac. I these config files 
 in ~/es-config. 
This elasticsearch.yml file is bare-bones, as you can see. The logging.yml 
file only modifies this line 

 of 
the default file. My assumption is that this should mean *only* WARN level 
indexing ops are logged. 

However, if I start an ES instance using 

elasticsearch --config="/Users/tdhopper/es-config/elasticsearch.yml"

and then run this Python script

import elasticsearch, time
es = elasticsearch.Elasticsearch(hosts="localhost")

while True:
print ".",
es.index(index="index1", doc_type="test_doc", body = {"hot_body": 1})
time.sleep(.5)

my ~/es-logs/elasticsearch_index_indexing_slowlog.log file is immediately 
filled up with lines like 

[2014-08-22 10:24:52,162][INFO ][index.indexing.slowlog.index] [War 
Machine] [index1][2] took[1.4ms], took_millis[1], type[test_doc], 
id[WH83A0yvRHaQtQ34_6wncg], routing[], source[{"hot_body":1}]
[2014-08-22 10:24:52,666][INFO ][index.indexing.slowlog.index] [War 
Machine] [index1][1] took[1.5ms], took_millis[1], type[test_doc], 
id[sErCAr3BR_qWVfp0pnayGw], routing[], source[{"hot_body":1}]

That is, INFO level log statements.

Am I misunderstanding the purpose of this setting or is ES ignoring my 
request?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/855ae52d-8912-4b90-adda-faf5e7158128%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch is taking longer time while doing update and search simultaneously

2014-08-22 Thread Subhadip Bagui
Hi,

I'm doing update in elasticsearch document and the same time one rest api 
is calling for search results. Below is my code.

public String updateElasticsearchDocument(String index, String type, 
List indexID) {
Client client = ESClientFactory.getInstance();
UpdateResponse updateResponse = null;
JSONObject jsonResponse = new JSONObject();
JSONObject json =new JSONObject();
int i=1;
 try {
for(String id : indexID)
{ 
 updateResponse = client.prepareUpdate(index, type, id)
  .setDoc(jsonBuilder()
  .startObject().field("view_mode", "read")
  .endObject())
.setDocAsUpsert(true)
.setFields("_source")
.setTimeout("1")
  .execute().actionGet();
 logger.info("updating the document for type= "+ updateResponse.getType()+ 
" for id= "+ updateResponse.getId());
 
 json.put("indexID"+i, updateResponse.getId());
 i++;
} 
jsonResponse.put("updated_index", json);
 } catch (ActionRequestValidationException e) {
logger.warn(this.getClass().getName() + ":" + "updateDocument: "
+ e.getMessage(), e);
} 
catch (ElasticsearchException e) {
logger.warn(this.getClass().getName() + ":" + "updateDocument: "
+ e.getMessage(), e);
e.printStackTrace();
} catch (IOException e) {
logger.warn(this.getClass().getName() + ":" + "updateDocument: "
+ e.getMessage(), e);
} catch (JSONException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return jsonResponse.toString();
}

*the search query is :*

POST /monitoring/quota-management/_search

{
  "query": {"match": {
  "view_mode": "read"
   }}, 
"sort": [
   {
  "_timestamp": {
 "order": "desc"
  }
   }
],
"size": 10
}

Now, I have to wait for like 40-50 seconds to get the updated search 
result. This is affecting the production application.
Please let me know what needs to be done here to minimizes the time taken.

Thanks,
Subhadip

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/945425a0-69c2-46bd-b63f-a23bc6dc455c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: estab: export fields to TSV

2014-08-22 Thread joergpra...@gmail.com
You can also use

https://github.com/jprante/elasticsearch-csv/

The plugin is for CSV output and works like the _search endpoint.

For TSV, minor adjustments can be applied to the source, or you can use
LibreOffice for export/import.

Jörg


On Fri, Aug 22, 2014 at 1:16 PM, Martin Czygan  wrote:

> Hi,
>
> just wanted to announce a tiny project: estab, which helps to export
> document fields as tab separated values (TSV). It's nothing curl + jq[1]
> couldn't do, it's just a bit simpler to use. Hope you like it.
>
> * https://github.com/miku/estab
>
>
> Best,
> Martin
>
> [1] http://stedolan.github.io/jq/ |
> http://stackoverflow.com/questions/18892560/is-there-any-way-in-elasticsearch-to-get-results-as-csv-file-in-curl-api/25443448#25443448
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a1e9c17d-2be9-4542-93c2-9295792518c3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFfT3Qr4RgDhxeEQbYjaN0EWY16RL614-SWRsb_z0sPNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Dynamically adding new fields from a root mapper

2014-08-22 Thread Jakub Kotowski
Hi,

I'm wondering if anyone could provide some hints... :)

Pointers where to learn more about Mappers, FieldMappers, RootMappers, 
adding new fields, how mapping is updated and its relation to toXContent() 
of mappers would be appreciated too. Also paths vs. full paths vs. 
multi_field / fields, etc.

Cheers,

Jakub


On Thursday, August 21, 2014 12:47:10 PM UTC+2, Jakub Kotowski wrote:
>
> Hi,
>
> I am creating a plugin that analyzes document being indexed and based on 
> the analysis it adds new fields to it. It is done from a root mapper and it 
> works similarly as the attachment mapper.
>
> I need to use the root mapper because I need the whole document for 
> analysis, not individual fields. I also don't have a fixed predefined set 
> of fields that can be generated.
>
> I'd like to ask what is the correct way of updating the mapping in this 
> scenario.
>
> If I don't update the mapping then I'm getting a NPE:
>
> Caused by: java.lang.NullPointerException
> 
> org.elasticsearch.index.fieldvisitor.FieldsVisitor.postProcess(FieldsVisitor.java:70)
> 
> org.elasticsearch.index.get.ShardGetService.innerGetLoadFromStoredFields(ShardGetService.java:333)
> 
> org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:212)
> 
> org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:106)
> 
> org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:109)
> 
> org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:43)
>
> That's because there is no FieldMapper for the newly added field.
>
> I can add the field mapper this 
> way: context.docMapper().addFieldMappers(ImmutableSet.of(field.mapper)) and 
> then it seems to work. No NPE, no warnings in the log.
>
> However, to create the field mapper, I need a Mapper.BuilderContext, so 
> that I can do e.g. field.mapper = MapperBuilders.stringField(field.name
> ()).build(builderContext));
>
> This seems to be a problem because the BuilderContext is available only 
> from my root mapper's Builder. I pass the BuilderContext to my root mapper 
> which then uses it to create the field mappers. But this seems hacky/wrong.
>
> What would be the correct approach? Will the mapping be otherwise updated 
> somehow by Elasticsearch in this case? Or how exactly should I go about 
> implementing my root mapper's toXContent() method?
>
> You can see the mapper here: 
> https://github.com/sindicetech/smartindexing/blob/issue-1/cogito-elasticsearch/src/main/java/net/expertsystem/connector/elasticsearch/index/CogitoMapper.java
>  
> (work in progress/playground).
>
> Thanks,
>
> Jakub
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48f1f2a5-95d2-4182-833a-135798e3f44a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Error running ES DSL in hadoop mapreduce

2014-08-22 Thread Sona Samad
Hi,

I was trying to run the below query from hadoop mapreduce:

{
 "aggs": {
"group_by_body_part": {
  "terms": {
"field": "body_part",
"size": 5,
"order" : { "examcount" : "desc" }
},
  "aggs": {
"examcount": {
  "cardinality": {
"field": "ExamRowKey"
  }
}
  }
}
  }
}

The query is returning more than 5 records, even when the size is given as 
5. 
Also, the result was not aggregated, rather it returns the entire record 
from the index as value to mapper.

Also the following error is logged:

[2014-08-22 16:06:21,459][DEBUG][action.search.type   ] [Algrim the 
Strong] All shards failed for phase: [init_scan]
[2014-08-22 16:26:38,875][DEBUG][action.search.type   ] [Algrim the 
Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to 
execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]: 
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed 
[Failed to execute main query]
at 
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at 
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException


Could you please help to create the correct query.

Thanks,
Sona

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9a5b-b4b4-4b71-8977-ccc80e0e320d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


estab: export fields to TSV

2014-08-22 Thread Martin Czygan
Hi,

just wanted to announce a tiny project: estab, which helps to export 
document fields as tab separated values (TSV). It's nothing curl + jq[1] 
couldn't do, it's just a bit simpler to use. Hope you like it.

* https://github.com/miku/estab


Best,
Martin

[1] http://stedolan.github.io/jq/ | 
http://stackoverflow.com/questions/18892560/is-there-any-way-in-elasticsearch-to-get-results-as-csv-file-in-curl-api/25443448#25443448

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1e9c17d-2be9-4542-93c2-9295792518c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


name conflicts between types and fields

2014-08-22 Thread Karol Gwaj
hi,

is there any way to indicate that field name path used in the query 
represents absolute path ?
looks like elasticsearch is recognizing that field path starts with type 
name and removes it before executing query

consider example below:

1. insert two test documents:

POST test/sublocality/1
{
"name" : "xxx",
"country" : 
{
"name" : "yyy"
}
}


POST test/*country*/1
{
"name" : "zzz",
"country" : 
{
"name" : "yyy"
}
}

2. execute facet query:

POST test/_search
{
"facets": {
   "term": {
  "terms": {
 "field": "*country.name*",
 "size": 10
  }
   }
}
}

i will expect that facet response will contain only one term 'yyy' with 
count 2
instead im getting 'xxx' with count 1 and 'zzz' with count 1

so looks like elasticsearch recognized that field name is prefixed with 
type (country) and trimmed it:

POST test/_search
{
"facets": {
   "term": {
  "terms": {
 "field": "*name*",
 "size": 10
  }
   }
}
}

any idea how to get around it without renaming types and fields ?
im using *elasticsearch 2.x*


Thx,



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53b99101-bf16-4905-98c5-49af2217e579%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Boost the first word in a multi-word query

2014-08-22 Thread Jeremy
In case of a multi-word query, is there a way to boost the first terms of 
the query?

For example, in the following query:
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": "BROWN DOG!"
}
}
}

"Brown" should be prioritized over "dog", therefore searching for "brown 
dog" will not return the same scores as searching for "dog brown".
I'm ideally looking for a solution which work with N words and put weight 
accordingly the number of words.

Regards,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Exclude specific bucket with integer key from term aggregation

2014-08-22 Thread Michele Palmia
I added a comment to an issue opened a while ago about the exclude feature
of term aggregations, on GitHub: I think this is something that should be
fixed.

https://github.com/elasticsearch/elasticsearch/issues/6782


On Fri, Aug 15, 2014 at 8:31 PM, Luke Nezda  wrote:

> I have this problem too - this was easily solved using the Terms Facet's
> exclude feature
> ,
> but I haven't found a solution *within* Elasticsearch (aggregations) to
> this either.  Here's a gist demonstrating this:
> https://gist.github.com/nezda/60932c73a8485e9d9a49 .
>
>
> On Thursday, August 7, 2014 10:54:43 AM UTC-5, Michele Palmia wrote:
>>
>> Hi all!
>>
>> My documents contain an *integer array field* storing the id of tags
>> describing them. Given a specific tag id, *I want to extract a list of
>> top tags that occur most frequently together with the provided one*.
>>
>> I can solve this problem associating a *term aggregation* over the tag
>> id field to a *term filter* over the same field, but the list I get back
>> obviously always starts with the album id I provide: all documents matching
>> my filter have that tag, and it is thus the first in the list.I though of 
>> using
>> the *exclude* field
>> 
>> to avoid creating the problematic bucket, but as I'm dealing with an
>> integer field, that seems not to be possible: this query
>>
>> {
>>>   "size": 0,
>>>   "query": {
>>> "term": {
>>>   "tag_ids": "1"
>>> }
>>>   },
>>>   "aggs": {
>>> "tags": {
>>>   "terms": {
>>> "size": 3,
>>> "field": "tag_ids",
>>> "exclude": "1"
>>>   }
>>> }
>>>   }
>>> }
>>
>>
>> returns an error saying that
>>
>> Aggregation [tags] cannot support the include/exclude settings as it can
>>> only be applied to string values.
>>
>>
>> Is it possible to avoid getting back this bucket in some way?
>> Unfortunately, I can only use ES 1.2 (AWS plugin not yet ready for 1.3).
>> I'm mostly afraid dealing with this problem after query execution,
>> because the bucket corresponding to the query is not guaranteed to be the
>> first one of the list, for example in case there are only a little matching
>> documents, all having exactly the same two tags.
>>
>> Thank you in advance!
>> Michele
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/8g74ov0run0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/af56ce57-48a0-4c75-b3c5-d2f9363fd881%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALkm6kdPV3Yk7oyU0_JRSaXweZvOWYpKEvSaB5ayp4o32dSgGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to do sequence matching

2014-08-22 Thread vineeth mohan
Hello Smitha ,

Please be more elaborate.
What is the sequence AB , what is event here and what is last and first
event.

Thanks
   Vineeth


On Fri, Aug 22, 2014 at 8:16 AM, Smitha Gowda  wrote:

> Thanks that will work.
>
> One more question related to Kibana to visualize this data.
>
> For a query that matches sequence "AB"
> Once I have all the matching documents I want to plot a bar chart with
> x-axis: Session StartTime (Day granularity)
> y-axis: Mean of (LastEvent.EndTime(In this example B) -
> FirstEvent.StartTime(In this Example A)) for the given day
>
> Any pointers on how do I aggregate on other properties on the matched
> document?
>
> Thanks in advance!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/320375d3-a0f6-402e-92a0-08279b1f7c7c%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMedq5ta5G%2BvX4Hzi19PyY6s9Kc2JFL-2mrkca_mvbww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How to recover index if replicas are set to zero?

2014-08-22 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

If I have an index with 10 shards but no replicas on a 2 node cluster, and 
one node goes down. Can I somehow recover the 5 shards to the only running 
node? Assuming that I backup the data directory on the node and make it 
available to the other node. Is there an API or an automatic way to achieve 
this?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3b1caca8-66ed-4196-838b-b58fafa1601d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard Aware Routing of Query

2014-08-22 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi Jorg,

I have been trying to examine the QueryParserContext. However, I am only 
able to locate the Index name in this object, but there is no reference of 
any shard level information.

I understand that you are trying to say that the shard decision has already 
been made, (so there is no need to state that information here again 
possibly) so that information is not available with the QueryParser then, 
and that is probably by design?

Thanks,
Sandeep


On Tuesday, 15 July 2014 17:01:56 UTC+5:30, Jörg Prante wrote:
>
> Filters are always parsed as part of a query on shard level. If you 
> examine QueryParserContext from within executing FilterParser, the decision 
> of which shard to execute on has already been made.
>
> Jörg
>
>
> On Tue, Jul 15, 2014 at 1:09 PM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch > wrote:
>
>> Hi,
>>
>> Thanks, I will take a look at the SearchRequestBuilder class.
>> However, it does seem like a Query API invoke time decision for the user 
>> to decide the routing by setting the appropriate values in the SRB.
>>
>> However, I want the custom FilterParser that I added as a processor in 
>> the IndexQueryParserModule plugin to be aware of the shard on which it will 
>> execute. This is because then I can set filter values for only the 
>> documents that exist on that shard. I checked the QueryParserContext, and 
>> there is no information in that regard.
>>
>> If I use the SRB at client side, and specify the shards and the filters 
>> for those shards, then I will have to aggregate the results myself which is 
>> not preferable.
>>
>> Can you please give me some example of how this can be achieved? 
>>
>>
>> Thanks,
>> Sandeep
>>
>>
>> On Tuesday, 15 July 2014 15:18:47 UTC+5:30, Jörg Prante wrote:
>>
>>> You can create single shard index, or you can use routing to select 
>>> shards.
>>>
>>> See SearchRequestBuilder for setRouting() 
>>>
>>> Jörg
>>>
>>>
>>> On Tue, Jul 15, 2014 at 10:25 AM, 'Sandeep Ramesh Khanzode' via 
>>> elasticsearch  wrote:
>>>
 Hi,

 I have a large-ish data set that could grow beyond a 100M. I have 
 queries to be executed for this index. I would like to have query filter 
 data local to a shard being sent to that shard, so that I spend less time 
 creating a filter and even lesser time matching it for a shard. If I do 
 not 
 do this, I will have to create a filter that will have to contain data for 
 all 100M documents across all shards, and every shard will have to match 
 documents against that filter for all documents that are not even 
 belonging 
 to that shard.

 I plan to write a query filter using the IndexQueryParserModule plugin.

 However, in the QueryParserContent, I can only see the Index object 
 which contains some details of the index, like the name, etc. I could not 
 see any other details like the specific shard where this query will be 
 executed. 

 Is there a way to write shard aware query and filter parsers?

 If not, can I create as many indices as I want to create shards (since 
 I already get the index name), and effectively create one shard per index 
 (+1 for replica) and treat every index as if it were a shard? Is that too 
 heavy or just non-compliant to the philosophy of ES? 

 Please let me know,

 Thanks,
 Sandeep

  -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/e8c09c18-4192-41ae-86e9-5d67723e5558%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0c736a73-1a7c-4a3d-aa6b-9c9860d78f79%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8c6eb14-962f-472a-86b6-97

Re: Elastic search dynamic number of replicas from Java API

2014-08-22 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi Jorg,

Can you please give a server-side or client-side example of using 
CLusterStateListener?
Do I have to use a plugin. if so, which module do I register/override?
If not, do I have to use a Node Client (not a TransportClient), and 
retrieve the ClusterService somehow and then register?

Thanks
Sandeep

On Thursday, 10 July 2014 22:25:51 UTC+5:30, Jörg Prante wrote:
>
> On the client side, you can't use cluster state listener, it is for nodes 
> that have access to a local copy of the master cluster state. Clients must 
> execute an action to ask for cluster state, and with the current transport 
> request/response cycle, they must poll for new events ...
>
> Jörg
>
>
> On Thu, Jul 10, 2014 at 6:38 PM, Ivan Brusic  > wrote:
>
>> Jörg, have you actually implemented your own ClusterStateListener? I 
>> never had much success. Tried using that interface or 
>> even PublishClusterStateAction.NewClusterStateListener, but either I could 
>> not configure successfully the module (the former) or received no events 
>> (the latter). Implemented on the client side, not as a plugin.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Wed, Jul 9, 2014 at 4:21 PM, joerg...@gmail.com  <
>> joerg...@gmail.com > wrote:
>>
>>>
>>> 4. Yes. Use org.elasticsearch.cluster.ClusterStateListener
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBB%3DW_qG9E7i-sEc6HZeMskxKgbqzaKgqzSQ26sjgT5%2BQ%40mail.gmail.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/35f6b64e-3787-4891-a3a8-518dfd7638e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)

2014-08-22 Thread Narendra Yadala
I have a cluster of size 240 GB including replica and it has 5 nodes in it. 
I allocated 5 GB RAM (total 5*5 GB) to each node and started the cluster. 
When I start continuously firing queries on the cluster the GC starts 
kicking in and eventually node goes down because of OutOfMemory exception. 
I add upto 200k documents everyday. The indexing part works fine but 
querying part is causing trouble. I have the cluster on ec2 and I use ec2 
discovery mode.

What is ideal RAM size and are there any other parameters I need to tune to 
get this cluster going?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b659d11-d757-4f8e-b347-60b3807c2dfe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How do I know if I need replica shards?

2014-08-22 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Just want to add that my intention is not high availability or failover. It 
is more w.r.t. how performance will improve in such a scenario?

I believe, only if you have more nodes or you current nodes are 
underutilized, would you think of this. But how do you determine that?

Thanks
Sandeep

On Friday, 22 August 2014 12:46:16 UTC+5:30, Sandeep Ramesh Khanzode wrote:
>
> Hi,
>
> If I have setup a 3 node cluster, deployed one index with 20 shards.
>
> How can I determine that my current setup is inadequate, and I need to add 
> one replica or two replica shards per primary shards?
>
> Even if I add another data node, I may get 5 primary shards on each, now 
> if that distributes load evenly, what is the need for replica shards?
>
> IMPORTANT: Please note that my query will not be specific to a group of 
> shards, I mean, there is no way, I can route or classify my query as only 
> hitting a subsection of shards. It will actually go to all shards for every 
> search query.
>
> Appreciate your response.
>
> Thanks,
> Sandeep
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8294050a-0da9-4ee0-b9ac-1ea0e7985e47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How do I know if I need replica shards?

2014-08-22 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

If I have setup a 3 node cluster, deployed one index with 20 shards.

How can I determine that my current setup is inadequate, and I need to add 
one replica or two replica shards per primary shards?

Even if I add another data node, I may get 5 primary shards on each, now if 
that distributes load evenly, what is the need for replica shards?

IMPORTANT: Please note that my query will not be specific to a group of 
shards, I mean, there is no way, I can route or classify my query as only 
hitting a subsection of shards. It will actually go to all shards for every 
search query.

Appreciate your response.

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a9b5272-bd56-483a-9231-551513a1c8e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Call when shard reallocation occurs

2014-08-22 Thread Sandeep Ramesh Khanzode
Thanks, Ivan. Appreciate it. I will take a look at 'ClusterStateListener'.
Meanwhile, if you have any ready reference available for this, like doc or
example code, please share. Thanks!


On Fri, Aug 22, 2014 at 6:19 AM, Ivan Brusic  wrote:

> AFAIK, there is no way to achieve such functionality.
>
> The only way I have figured out have similar functionality is to write a
> plugin with a cluster state listener and have the plugin reach out to some
> external service.
>
> Cheers,
>
> Ivan
>
>
> On Thu, Aug 21, 2014 at 10:02 AM, 'Sandeep Ramesh Khanzode' via
> elasticsearch  wrote:
>
>> Hi,
>>
>> Is it possible to have my custom callback function defined and invoked by
>> ES whenever it moves a shard from one node to another?
>>
>> Thanks,
>> Sandeep
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/9b792e04-8bd1-4006-93a2-f3d736cd8474%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/4fEahX9nhL0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDMCNNsvAv2r0bRsog6a2oow9bYSyQ-CZtcxtcgkUJtRQ%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKnM90a-CrxeUm4F4q889naxhcwNLrhYoZnsptQYZhi7gHL9pQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.