date:20140502

Re: access parent bucket's key from child aggregation in geohash grid

2014-05-02 Thread Adrien Grand

Hi,

Unfortunately, accessing the parent bucket key is not possible.

On Fri, May 2, 2014 at 12:04 AM, Thomas Gruner tom.gru...@gmail.com wrote:

Hello!

I have been progressing well with aggregations, but this one has got me
stumped.

I'm trying to figure out how to access the key of the parent bucket from a
child aggregation.

The parent bucket is geohash_grid, and the child aggregation is avg
(trying to get avg lat and lon, but only for points that match the parent's
bucket's geohash key)

Something like this:
aggregations : {
LocationsGrid: {
geohash_grid : {
field : Locations,
precision : 7,
},
aggregations : {
avg_lat: {
avg: {
script: if
(doc['Locations'].value.geohash.startsWith(*parent_bucket.key*))
doc['Locations'].value.lat;
}
}
},
}
}

Thanks for any help or ideas with this!

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/624d0bdd-c380-4c72-b642-e6afff3458a9%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/624d0bdd-c380-4c72-b642-e6afff3458a9%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4fxXs4eQeh7SJ5KQyVbPuqd2bLKfWRmdRDsc%3DiCwkgdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

stats, extended stats, percentiles for doc_count in aggregations

2014-05-02 Thread Loren

Is it possible to get stats, extended stats, or percentiles across the 
doc_counts in each bucket of an aggregation? I see how to use it on an 
existing numeric field value (e.g., height, grade), but I want to see the 
average bucket size, stddev, or other stats on how one doc_count compares 
to doc_counts in the other buckets.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8f847621-e92c-4bdf-915f-60bd799071ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Read/Write consistency

2014-05-02 Thread Radu Gheorghe

Hi Mohit,

I think the transaction log takes care of that, because there's a copy on
all instances of the same shard, and they need to be in sync.

Best regards,
Radu

--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Thu, May 1, 2014 at 9:57 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 What's not clear is how does elasticsearch identify what pieces of data is
 missing between the primary and the replica?

 On Wed, Apr 30, 2014 at 3:27 AM, Radu Gheorghe radu.gheor...@sematext.com
  wrote:

 Hi Mohit,

 I'll answer inline.

 On Mon, Apr 28, 2014 at 4:57 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Trying to understand the following scenarios of consistency in
 elasticsearch:

 1) sync replication - How does elasticsearch deals with consistency
 issue that may arise from 1 node momentarily going down and missing writes
 to it?


 This depends on the write consistency setting. By default, the operation
 only succeeds if a quorum of replicas can index the document:

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency


  When the node comes backup and the reads going to the non-primary
 shards could get inconsistent data?


 No, when the node comes back up it will sync the stuff it missed with the
 other nodes.


 2) async replication - What happens if replication is slow for some
 reason, could users see inconsistent data?


 Yes, if you hit a shard that didn't get the latest operation, it could
 see an old version of the data. You can use preference to try and hit
 the primary shard all the time, but then your replicas will just be sitting
 there for redundancy:

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html


 3) sync/async replication - how does elasticsearch keep data in sync for
 those writes that never happened on the non-primary shard because of
 network/node failures?


 It either uses the transaction log or it transfers the whole shard to
 that node.

 Best regards,
 Radu
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3aJ4qZt47uyjqs0gd6L1Fz0EhLrV_L7jzSFAYOEvz1Nw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAHXA0_3aJ4qZt47uyjqs0gd6L1Fz0EhLrV_L7jzSFAYOEvz1Nw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpdBxiXZgDw5HXdeRPr5oJtnwHTwHNFr2_UoJYobPqzxw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAOT3TWpdBxiXZgDw5HXdeRPr5oJtnwHTwHNFr2_UoJYobPqzxw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3FAEvQGjMWDqSCT6biYJGiMNGSUDJ80QvT1cJXnqtNJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: stats, extended stats, percentiles for doc_count in aggregations

2014-05-02 Thread Adrien Grand

Hi,

There is currently no way to do that but I think this could be done on
client side?

On Fri, May 2, 2014 at 8:56 AM, Loren lo...@siebert.org wrote:

Is it possible to get stats, extended stats, or percentiles across the
doc_counts in each bucket of an aggregation? I see how to use it on an
existing numeric field value (e.g., height, grade), but I want to see the
average bucket size, stddev, or other stats on how one doc_count compares
to doc_counts in the other buckets.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8f847621-e92c-4bdf-915f-60bd799071ee%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8f847621-e92c-4bdf-915f-60bd799071ee%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5nYVvcxUPAtH06Sbw7oayGPNF6x%2BLEkKmWkwtgHqqLKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help with ES 1.x percolator query plz

2014-05-02 Thread Martijn v Groningen

Hi,

Can you share the stored percolator queries and the percolate request that
you were initially trying with, but didn't work?\

Martijn


On 2 May 2014 11:14, JGL j.g.liu...@gmail.com wrote:

 Can anybody help plz?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4ee60836-1922-43e0-8d9b-64ef9bb0b00a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4ee60836-1922-43e0-8d9b-64ef9bb0b00a%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Met vriendelijke groet,

Martijn van Groningen

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TxcY%2BTB%2Btpg6C2Ujei5uHc3xXN67rGduEa4gR1c_PyNtg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: strange problem: my ES server almost lost all its data. (All shards failed)

2014-05-02 Thread joergpra...@gmail.com

On FreeBSD, do you have multicast on IPv6 enabled? You should disable IPv6
on the JVM.

Seems you received a severe network error from the OS.

Jörg

On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski
elasticsea...@patpro.net wrote:

Hello,

I'm running a small server with logstash, ES, Kibana. Tonight, I've
restarted my ES process. Very bad idea: it restarted with lots of errors,
and finally lost all its data.
Basically, before restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/

after restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/
elasticsearch/nodes/1/indices/logstash-2014.05.01
elasticsearch/nodes/1/_state/

Then, Kibana was not able to find anything (dashboards lost, etc.).

I've stopped Logstash, stopped Elasticsearch, waited a bit and checked
everything is down, then restarted ES. It looked OK, then I've restarted
Logstash, and I was able to access my dashboards again. I've just lost 15
minutes of data.
Now I can see that elasticsearch/nodes/0 is the current working directory,
and I can browse old data and current data.
elasticsearch/nodes/1 is not used anymore.

I'm running FreeBSD, and used the service command to restart ES. When
attempting the second shutdown, the script wouldn't find the pid file, so
I've had to kill the Java process.

I don't understand what happened. But I don't feel comfortable putting ES
in production. Full log for first and second restart here:
http://patpro.net/elastic.log

Any idea?
Regards,
Patrick

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/DEC08780-FC7C-44F7-B7B8-B70215060351%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG2Dvb2RTGdyukXOKS1DYGsnDTNQLnzCX%2Ba%2Bx%2B-KiuXjQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: strange problem: my ES server almost lost all its data. (All shards failed)

2014-05-02 Thread Patrick Proniewski

Hi Jörg,

Thank you for your reply.
The service script includes an option that might deal with IPv6, but it's not
active:

# Force the JVM to use IPv4 stack
# elasticshearch_props-Djava.net.preferIPv4Stack=true

(http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955)

In past years, I used to disable IPv6 everywhere (kernel, ports compilation,
etc.) but now I don't bother anymore.
Do you mean I should use this option to force IPv4?

Thanks,
Patrick

On 2 mai 2014, at 09:38, joergpra...@gmail.com wrote:

On FreeBSD, do you have multicast on IPv6 enabled? You should disable IPv6
on the JVM.

Seems you received a severe network error from the OS.

Jörg

On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski
elasticsea...@patpro.net wrote:

Hello,

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/

after restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/
elasticsearch/nodes/1/indices/logstash-2014.05.01
elasticsearch/nodes/1/_state/

Then, Kibana was not able to find anything (dashboards lost, etc.).

I'm running FreeBSD, and used the service command to restart ES. When
attempting the second shutdown, the script wouldn't find the pid file, so
I've had to kill the Java process.

I don't understand what happened. But I don't feel comfortable putting ES
in production. Full log for first and second restart here:
http://patpro.net/elastic.log

Any idea?
Regards,
Patrick

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/AA0BC7BA-8856-4A23-A172-0601BC0B4FEE%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

Re: How to write a custom river

2014-05-02 Thread Luca Cavanna

Hi Joshua,
the package is not an issue if you are using the default one for your 
classes. Looking deeper, the type of the river that you try to register 
with your rest call doesn't match the type of the river you registered in 
the plugin when you did module.registerRiver(type, riverclass). 

Cheers
Luca

On Friday, May 2, 2014 6:08:28 AM UTC+2, Rob Ottaway wrote:

 I should have sent you the following earlier rather than a non-river 
 plugin:

 the plugin:

 https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/plugin/river/rabbitmq/RabbitmqRiverPlugin.java

 The river implementation:

 https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/river/rabbitmq/RabbitmqRiver.java

 The module:

 https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/river/rabbitmq/RabbitmqRiverModule.java

 Looks like you are registering the river implementation rather than the 
 river module hence the not working. Had to look at an example I know works 
 to figure it out.

 -Rob


 On Thu, May 1, 2014 at 8:04 PM, Joshua Chan joshua.bennett.c...@gmail.com
  wrote:

 So, that's what I did, but no love... I checked in the latest.

 -Josh


 On Thursday, May 1, 2014 9:49:11 PM UTC-5, Rob Ottaway wrote:

 Look at this plugin for help:

 https://github.com/elasticsearch/elasticsearch-
 cloud-aws/blob/master/src/main/resources/es-plugin.properties

 Yes it needs to be the FQN.

 On Thursday, May 1, 2014 5:47:31 PM UTC-7, Joshua Chan wrote:

 Thanks Rob. Someone else also told me the plugin property should be the 
 fully qualified name. I didn't declare a package, so I guess I'm using the 
 default package, and I thought I had the namespacing right since IntelliJ 
 corrected the class name when I wrote it.

 Thoughts?


 -Josh


 On Thursday, May 1, 2014 5:23:25 PM UTC-5, Rob Ottaway wrote:

 Look at this file in your BB repo:

 https://bitbucket.org/futurechan/example-river/src/
 fd23648c3e7cc42fd2286d4134e80ecd7e98f802/src/main/resources/
 es-plugin.properties?at=master

 cheers

 On Thursday, May 1, 2014 3:21:59 PM UTC-7, Rob Ottaway wrote:

 This strikes me as odd:

 java.lang.ClassNotFoundException: example_river

 Assume you didn't map the string example_river to the actual class 
 name properly?

 -Rob

 On Thursday, May 1, 2014 11:40:52 AM UTC-7, Joshua Chan wrote:

 I'm making my first go at writing a river. (Here's the source code: 
 https://bitbucket.org/futurechan/example-river/src)

 I followed this tutorial 
 http://blog.trifork.com/2013/01/10/how-to-write-an-
 elasticsearch-river-plugin/

 and compared it to this existing river
 https://github.com/jprante/elasticsearch-river-jdbc

 but I haven't had much luck.

 To deploy the river, I created a folder called example-river under 
 plugins, dropped my jar in that folder, and restarted the node. 
 Everything 
 starts up fine.

 I have also tried bin/plugin --url file:///path/to/plugin --install 
 example-river, which seems to work, but it unpacks my jar. So, I 
 tried zipping it first and then installing, which works and does not 
 unpack 
 my jar, but it didn't help.

 When I issue this PUT request:

 http://localhost:9200/_river/example_river/_meta
 {
 type: example_river,
   example_river:{
 blah:blah
   }
 }

 I get this exception:

 [2014-04-20 22:28:46,538][DEBUG][river ] [Gloom] creating river 
 [example_river][example_river] 
 [2014-04-20 22:28:46,543][WARN ][river ] [Gloom] failed to create river 
 [example_river][example_river] 
 org.elasticsearch.common.settings.NoClassSettingsException: Failed to 
 load class with value [example_river] at 
 org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87) 
 at 
 org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58) 
 at 
 org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
  at 
 org.elasticsearch.river.RiversService.createRiver(RiversService.java:137)
  at 
 org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275)
  at 
 org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269)
  at 
 org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.jav
 a:93) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
 Source) at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at 
 java.lang.Thread.run(Unknown Source) Caused by: 
 java.lang.ClassNotFoundException: example_river at 
 java.net.URLClassLoader$1.run(Unknown Source) at 
 java.net.URLClassLoader$1.run(Unknown Source) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 java.net.URLClassLoader.findClass(Unknown Source) at 
 java.lang.ClassLoader.loadClass(Unknown Source) at 
 sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at 
 java.lang.ClassLoader.loadClass(Unknown Source) at

Re: checkIndex with ES 1.1.0

2014-05-02 Thread Michael Salmon

Here I go replying to my own questions again.

ES spreads Lucene files over all data paths and that means that checkIndex
doesn't work as it expects all files to be in one directory. What I did was
to create a directory and then make symlinks to all the files across all
data paths and then run checkIndex. This works as expected.

On Wednesday, 9 April 2014 11:41:24 UTC+2, Michael Salmon wrote:

I recently had a problem with an index and after searching the net I
decided to give checkIndex a try. I found the class in the right jar but I
haven't been able to get it to check an index. For example when I run

checkIndex -verbose ...heat-analyzer/7/index

I get:

ERROR: could not read any segments file in directory
java.io.FileNotFoundException: ...heat-analyzer/7/index/segments_37l (No
such file or directory)

That is correct, the directory contains:

_checksums-1397035754356 _j85h_es090_0.blm _j8fu.cfe
_j91a.cfs
_isy5.nvm _j86v.cfe _j8g4.cfs
_j95w_es090_0.tim
_j821.si _j870.cfs _j8hl.si _
j963.si
_j82l.cfs _j898_Lucene45_0.dvd _j8ln.cfs
_j968.cfs
_j838.cfe _j8ap.cfe _j8t6.cfe _
j975.si
_j83a.fdx _j8b6.cfs _j8tb_es090_0.doc
_j9b9.cfs
_j83o.nvd _j8c1.si _j8tp.cfs
_j9di_es090_0.blm
_j83o.si _j8cz.si _j8v9.fnm
_j9e0.cfe
_j83p.nvd _j8et.cfs _j8yv.cfe
segments.gen
_j84u.cfs _j8fa.cfs _j919.fdt
write.lock

I am running ES 1.1.0 and using the checkIndex from Lucene 4.7.0.

Has anyone gotten checkIndex to work with this combination?

/Michael

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9093249d-b657-4da7-9804-a59afb4f698d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregation bug? Or user error?

2014-05-02 Thread mooky

 
I havent been able to figure out what is required to recreate it.
I am doing a number of identical aggregations (just different values 
intentMarketCode 
and intentDate
Three aggregations give correct numbers - one doesnt I havent figured 
why
 

On Wednesday, 30 April 2014 14:13:00 UTC+1, Adrien Grand wrote:

 This looks wrong indeed. By any chance, would you have a curl recreation 
 of this issue?


 On Tue, Apr 29, 2014 at 7:35 PM, mooky nick.mi...@gmail.com javascript:
  wrote:

 It looks like a bug to me - but if its user error, then obviously I can 
 fix it a lot quicker :)


 On Tuesday, 29 April 2014 13:04:53 UTC+1, mooky wrote:

 I am seeing some very odd aggregation results - where the sum of the 
 sub-aggregations is more than the parent bucket.

 Results:
 CSSX : {
   doc_count : *24*,
   intentDate : {
 buckets : [ {
   key : Overdue,
   to : 1.3981248E12,
   to_as_string : 2014-04-22,
   doc_count : *1*,
   ME : {
 doc_count : *0*
   },
   NOT_ME : {
 doc_count : *24*
   }
 }, {
   key : May,
   from : 1.3981248E12,
   from_as_string : 2014-04-22,
   to : 1.4006304E12,
   to_as_string : 2014-05-21,
   doc_count : *23*,
   ME : {
 doc_count : 0
   },
   NOT_ME : {
 doc_count : *24*
   }
 }, {
   key : June,
   from : 1.4006304E12,
   from_as_string : 2014-05-21,
   to : 1.4033088E12,
   to_as_string : 2014-06-21,
   doc_count : *0*,
   ME : {
 doc_count : *0*
   },
   NOT_ME : {
 doc_count : *24*
   }
 } ]
   }
 },


 I wouldn't have thought that to be possible at all.
 Here is the request that generated the dodgy results.


 CSSX : {
   filter : {
 and : {
   filters : [ {
 type : {
   value : inventory
 }
   }, {
 term : {
   isAllocated : false
 }
   }, {
 term : {
   intentMarketCode : CSSX
 }
   }, {
 terms : {
   groupCompanyId : [ 0D13EF2D0E114D43BFE362F5024D8873, 
 0D593DE0CFBE49BEA3BF5AD7CD965782, 1E9C36CC45C64FCAACDEE0AF4FB91FBA, 
 33A946DC2B0E494EB371993D345F52E4, 6471AA50DFCF4192B8DD1C2E72A032C7, 
 9FB2FFDC0FF0797FE04014AC6F0616B6, 9FB2FFDC0FF1797FE04014AC6F0616B6, 
 9FB2FFDC0FF2797FE04014AC6F0616B6, 9FB2FFDC0FF3797FE04014AC6F0616B6, 
 9FB2FFDC0FF5797FE04014AC6F0616B6, 9FB2FFDC0FF6797FE04014AC6F0616B6, 
 AFE0FED33F06AFB6E04015AC5E060AA3 ]
 }
   }, {
 not : {
   filter : {
 terms : {
   status : [ Cancelled, Completed ]
 }
   }
 }
   } ]
 }
   },
   aggregations : {
 intentDate : {
   date_range : {
 field : intentDate,
 ranges : [ {
   key : Overdue,
   to : 2014-04-22
 }, {
   key : May,
   from : 2014-04-22,
   to : 2014-05-21
 }, {
   key : June,
   from : 2014-05-21,
   to : 2014-06-21
 } ]
   },
   aggregations : {
 ME : {
   filter : {
 term : {

   trafficOperatorSid : S-1-5-21-20xxspan 
 style=color: #000; class=styled-by
 ...

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3e7d8928-f76b-4358-97b9-3189e037006c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Problem while searching for date range or date

2014-05-02 Thread Hemant


Found the solution to same problem here -
https://groups.google.com/forum/#!searchin/elasticsearch/date/elasticsearch/eeTwWVf6Sfo/1jbHq0gca6QJ

Thanks. 


On Friday, May 2, 2014 3:13:30 PM UTC+5:30, Hemant wrote:

 Hello, 

 I have indexed some data, with default mapping -

1. {
2. inventory: {
3. products: {
4. properties: {
5. exp_date: {
6. type: date,
7. format: dateOptionalTime
8. },
9. man_date: {
10. type: date,
11. format: dateOptionalTime
12. },
13. price: {
14. type: long
15. },
16. product_description: {
17. type: string
18. },
19. product_name: {
20. type: string
21. },
22. quan_available: {
23. type: long
24. }
25. }
26. }
27. }
28. }


 Now when I perform a search to match some date, I am not getting the 
 expected result. 
 Consider query like this
 {
   query: {
 filtered: {
   query: {
 query_string: {
   query: exp_date:[2013-03-1 TO 2013-03-5]
 }
   }
 }
   },
   fields: [
 price,
 quan_available,
 product_name,
 product_description,
 exp_date,
 man_date
   ],
   from: 0,
   size: 50,
   sort: {
 _score: {
   order: asc
 }
   },
   explain: true
 }

 Gives me expected result, that is all the documents which matches this 
 date range, but when I remove the field Name exp_date from the query 
 string, I am getting no result at all. 
 The following query results zero result.
 { query: { filtered: { query: { query_string: { query: 
 [2013-03-1 TO 2013-03-5] } } } }, fields: [ price, quan_available, 
 product_name, product_description, exp_date, man_date ], from: 0, 
 size: 50, sort: { _score: { order: asc } }, explain: true }

 Can anybody suggest solution to this problem? What am doing wrong?
 Thansk in advance. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5155a1ed-b2d2-4fa7-9f2e-c37dd3323d52%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Partial word match with singular and plurals: Elasticsearch

2014-05-02 Thread Kruti Shukla

Any help?
Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?


On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote:

 Hi Radu,

 Thank you so for the suggestions. I was knowing mul-field but was not 
 knowing how helpful it can be but now I'm able play with the multi field 
 feature.
 I tried following suggestion and created index and mapping accordingly.

 I tried querying for first 2. First one was simple and second one with 
 slop. It is not returning correct slop(i,e, incremental distance). 
 Please help/suggest query improvements.

 *Please see my settings below:*

 *For index: *
 curl -XPUT http://localhost:9200/my_improved_index; -d'
 {
settings: {
 analysis: {
 filter: {
 trigrams_filter: {
 type: ngram,
 min_gram: 1,
 max_gram: 50
 },
  my_stemmer : {
 type : stemmer,
 name : minimal_english
 }
 },
 analyzer: {
 trigrams: {
 type:  custom,
 tokenizer: standard,
 filter:   [
 standard,
 lowercase,
 trigrams_filter
 ]
 },
 my_stemmer_analyzer:{
 type:  custom,
 tokenizer: standard,
 filter:   [
 standard,
 lowercase,
 my_stemmer
 ]
 }
 }
 }
 }
 }'

 *For mappings:*
 curl -XPUT 
 http://localhost:9200/my_improved_index/my_improved_index_type/_mapping; 
 -d'
 {
 my_improved_index_type: {
   properties: {
  name: {
 type: multi_field,
 fields: {
name_gram: {
   type: string,
   analyzer: trigrams
},
untouched: {
   type: string,
   index: not_analyzed
},
name_stemmer:{
type: string,
analyzer: my_stemmer_analyzer
}
 }
  }
   }
}

 }'

 *Available documents:*
 1. men’s shaver
 2. men’s shavers
 3. men’s foil shaver
 4. men’s foils shaver
 5. men’s foil shavers
 6. men’s foils shavers
 7.men's foil advanced shaver
 8.norelco men's foil advanced shaver

 *Query:*
 curl -XPOST 
 http://localhost:9200/my_improved_index/my_improved_index_type/_search; 
 -d'
 {
size: 30,
query: {
   bool: {
  should: [
 {
match: {
   name.untouched: {
  query: men\s shaver,
  operator: and,
  type: phrase,
  boost: 10
   }
}
 },
 {
match_phrase: {
   name.name_stemmer: {
  query: men\s shaver,
  slop: 5
   }
}
 }
  ]
   }
}
 }'

 *Returned result:*
 1. men's shaver -- correct
 2. men's shavers -- correct
 3. men's foils shaver -- NOT correct
 4. norelco men's foil advanced shaver -- NOT correct
 5. men's foil advanced shaver -- NOT correct
 6. men's foil shaver -- NOT correct. 

 *Expected result:*
 1. men's shaver -- exact phrase match
 2. men's shavers -- ZERO word distance + 1 plural
 3. men's foil shaver -- 1 word distance
 4. men's foils shaver -- 1 word distance + 1 plural
 5. men's foil advanced shaver -- 2 word distance
 4. norelco men's foil advanced shaver -- 2 word distance

 Why higher distance document scored higher?
 Is there any problem with stemmer or nGram settings?


 On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:

 Hi Kruti,

 The short answer is yes, it is possible. Here's one way to do it:

 Have the fields you search on as multi 
 fieldhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html,
  
 where you index them with various settings, like once not-analyzed for 
 exact matches, once with ngrams to account for typoes and so on. You can 
 query all those sub-fields, and use the multi-match query with best 
 fieldshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fieldsor
  the DisMax 
 queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.htmlto
  wrap all those queries and take the best score (or the best score and a 
 factor of the other scores by using the tie breaker).

 Now, for the specific requirements you have:
 1. For exact matching, you can skip analysis altogether, and set

Re: Unable to create mapping and settings using Java API

2014-05-02 Thread Michael McCandless

Hmm, I'm able to create an index and its mappings/settings with a single
JSON request to http://localhost:9200/indexName.

What settings are you trying to set?

Mike

http://blog.mikemccandless.com

On Thu, May 1, 2014 at 5:10 PM, Amit Soni amitson...@gmail.com wrote:

hello everyone - I have settings and mapping defined in a single JSON
document and I have been trying to find a way to create index using that
JSON document. I tried different code snippets but have not found one which
allows me to create settings as well as mapping using one JSON document.

Any help on this will be great!

-Amit.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAOGaQ%2BEUsstyy7qdNq%2BRmHzA-Rp9mYNYnOoQ8HESiGAvXwXVg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAAOGaQ%2BEUsstyy7qdNq%2BRmHzA-Rp9mYNYnOoQ8HESiGAvXwXVg%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc-y_futLvWVuycgpxwSshJHawNWu8zrDkmZrfZ5sAnZw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: SearchParseExceptions in Marvel monitoring cluster

2014-05-02 Thread Mihir M

Thanks Boaz for your reply.

Following is the output of curl SERVER:9200/_cat/shards/?v for both nodes of
our marvel cluster:

index  shard prirep  state docs storeip 
  
node  
.marvel-2014.05.01 0 p  STARTED   70  865.4kb  Server-ip-1  Marvel_1 
.marvel-2014.05.01 0 r  STARTED   70   865kbServer-ip-2 
Marvel_2 

Some more things to highlight, in the Marvel Dashboard - Cluster Overview
page we get following errors :
- Oops! FacetPhaseExecutionException[Facet [0]: (value) field
[total.search.query_total] not found] --- in the Search Request Rate panel

- Oops! FacetPhaseExecutionException[Facet [timestamp]: failed to find
mapping for index.raw] --- in the Indices panel

- Oops! FacetPhaseExecutionException[Facet [0]: (value) field
[primaries.indexing.index_total] not found] --- in the Indexing Request
Rate panel

- Oops! FacetPhaseExecutionException[Facet [0]: (value) field
[primaries.docs.count] not found] --- in the Document Count Panel

All these apart from the SearchParseExceptions mentioned in earlier post.
Also if Marvel is not storing the right data, how is it supposed to be
handled? 





-
Regards
--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/SearchParseExceptions-in-Marvel-monitoring-cluster-tp4054926p4055150.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1398919064490-4055150.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: strange problem: my ES server almost lost all its data. (All shards failed)

2014-05-02 Thread joergpra...@gmail.com

Yes, you should use this option.

Some FreeBSD kernels seem to have difficulties to run UDP multicast on IPv6
together with IPv4 properly, so I would like to suggest disabling IPv6 use
on the JVM.

Jörg

On Fri, May 2, 2014 at 10:23 AM, Patrick Proniewski
elasticsea...@patpro.net wrote:

Hi Jörg,

Thank you for your reply.
The service script includes an option that might deal with IPv6, but it's
not active:

# Force the JVM to use IPv4 stack
# elasticshearch_props-Djava.net.preferIPv4Stack=true

(
http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955
)

In past years, I used to disable IPv6 everywhere (kernel, ports
compilation, etc.) but now I don't bother anymore.
Do you mean I should use this option to force IPv4?

Thanks,
Patrick

On 2 mai 2014, at 09:38, joergpra...@gmail.com wrote:

On FreeBSD, do you have multicast on IPv6 enabled? You should disable
IPv6
on the JVM.

Seems you received a severe network error from the OS.

Jörg

On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski
elasticsea...@patpro.net wrote:

Hello,

I'm running a small server with logstash, ES, Kibana. Tonight, I've
restarted my ES process. Very bad idea: it restarted with lots of
errors,
and finally lost all its data.
Basically, before restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/

after restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/
elasticsearch/nodes/1/indices/logstash-2014.05.01
elasticsearch/nodes/1/_state/

Then, Kibana was not able to find anything (dashboards lost, etc.).

I've stopped Logstash, stopped Elasticsearch, waited a bit and checked
everything is down, then restarted ES. It looked OK, then I've restarted
Logstash, and I was able to access my dashboards again. I've just lost
15
minutes of data.
Now I can see that elasticsearch/nodes/0 is the current working
directory,
and I can browse old data and current data.
elasticsearch/nodes/1 is not used anymore.

I'm running FreeBSD, and used the service command to restart ES. When
attempting the second shutdown, the script wouldn't find the pid file,
so
I've had to kill the Java process.

I don't understand what happened. But I don't feel comfortable putting
ES
in production. Full log for first and second restart here:
http://patpro.net/elastic.log

Any idea?
Regards,
Patrick

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/AA0BC7BA-8856-4A23-A172-0601BC0B4FEE%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGN_zdUi9cCAoAgprBh-Lxtu_g1ejSQWT_nZU0fd_YRTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Partial word match with singular and plurals: Elasticsearch

2014-05-02 Thread Radu Gheorghe

Hello,

The exact match vs plural is probably because of the stemmer. As you have
your fields and queries now, Elasticsearch has no way to boost individual
exact word matches higher. To fix this, you can add another field where you
just analyze the text using the standard analyzer (no stemming). Then add
that to another query within your bool and exact word matches should be
ranked higher. Though I would do a simple match for that (no phrase), to
account for the case where one word is exact and one is plural - such a
document should be ranked higher than if both are plurals. You'll get that
with standard match because it looks for all terms, while match_phrase will
try to match the phrase with the given slop and none of those two documents
will get hit.

I don't know why the higher distance document is scored higher in your case
- the 6th result should have been higher. Can you try with an index of one
shard and see if results are any different?

Either way, you should get an explanation for each document's score by
enabling Explain:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Fri, May 2, 2014 at 1:40 PM, Kruti Shukla krutibhat...@gmail.com wrote:

 Any help?
 Why higher distance document scored higher?
 Is there any problem with stemmer or nGram settings?


 On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote:

 Hi Radu,

 Thank you so for the suggestions. I was knowing mul-field but was not
 knowing how helpful it can be but now I'm able play with the multi field
 feature.
 I tried following suggestion and created index and mapping accordingly.

 I tried querying for first 2. First one was simple and second one with
 slop. It is not returning correct slop(i,e, incremental distance).
 Please help/suggest query improvements.

 *Please see my settings below:*

 *For index: *
 curl -XPUT http://localhost:9200/my_improved_index; -d'
 {
settings: {
 analysis: {
 filter: {
 trigrams_filter: {
 type: ngram,
 min_gram: 1,
 max_gram: 50
 },
  my_stemmer : {
 type : stemmer,
 name : minimal_english
 }
 },
 analyzer: {
 trigrams: {
 type:  custom,
 tokenizer: standard,
 filter:   [
 standard,
 lowercase,
 trigrams_filter
 ]
 },
 my_stemmer_analyzer:{
 type:  custom,
 tokenizer: standard,
 filter:   [
 standard,
 lowercase,
 my_stemmer
 ]
 }
 }
 }
 }
 }'

 *For mappings:*
 curl -XPUT http://localhost:9200/my_improved_index/my_improved_
 index_type/_mapping -d'
 {
 my_improved_index_type: {
   properties: {
  name: {
 type: multi_field,
 fields: {
name_gram: {
   type: string,
   analyzer: trigrams
},
untouched: {
   type: string,
   index: not_analyzed
},
name_stemmer:{
type: string,
analyzer: my_stemmer_analyzer
}
 }
  }
   }
}

 }'

 *Available documents:*
 1. men’s shaver
 2. men’s shavers
 3. men’s foil shaver
 4. men’s foils shaver
 5. men’s foil shavers
 6. men’s foils shavers
 7.men's foil advanced shaver
 8.norelco men's foil advanced shaver

 *Query:*
 curl -XPOST http://localhost:9200/my_improved_index/my_improved_
 index_type/_search -d'
 {
size: 30,
query: {
   bool: {
  should: [
 {
match: {
   name.untouched: {
  query: men\s shaver,
  operator: and,
  type: phrase,
  boost: 10
   }
}
 },
 {
match_phrase: {
   name.name_stemmer: {
  query: men\s shaver,
  slop: 5
   }
}
 }
  ]
   }
}
 }'

 *Returned result:*
 1. men's shaver -- correct
 2. men's shavers -- correct
 3. men's foils shaver -- NOT correct
 4. norelco men's foil advanced shaver -- NOT correct
 5. men's foil advanced shaver -- NOT correct
 6. men's foil shaver -- NOT correct.

 *Expected result:*
 1. men's shaver -- exact phrase match
 2.

Re: strange problem: my ES server almost lost all its data. (All shards failed)

2014-05-02 Thread Patrick Proniewski

Thank you for the tip, Jörg.
I've activated this option and carefully restarted. I've re-read yesterday's
log file, and now I think may be the new ES instance started before the former
one was completely terminated. This too can cause some network/socket trouble.
I might try and add a short sleep into the restart command.

On 2 mai 2014, at 14:07, joergpra...@gmail.com wrote:

Yes, you should use this option.

Some FreeBSD kernels seem to have difficulties to run UDP multicast on IPv6
together with IPv4 properly, so I would like to suggest disabling IPv6 use
on the JVM.

Jörg

On Fri, May 2, 2014 at 10:23 AM, Patrick Proniewski
elasticsea...@patpro.net wrote:

Hi Jörg,

Thank you for your reply.
The service script includes an option that might deal with IPv6, but it's
not active:

# Force the JVM to use IPv4 stack
# elasticshearch_props-Djava.net.preferIPv4Stack=true

(
http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955
)

In past years, I used to disable IPv6 everywhere (kernel, ports
compilation, etc.) but now I don't bother anymore.
Do you mean I should use this option to force IPv4?

Thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CE3C61C8-3EC1-49A0-A6DC-F38432CF123C%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

Re: Need help on similarity ranking approach

2014-05-02 Thread Rgs

Thanks Binh Ly and Ivan Brusic for your replies.

I need to find the similarity in percentage of a document against other
documents and this will be considered for grouping the documents.

is it possible to get the similarity percentage using more like this query?
or is any other way to calculate the percentage of similarity from the query
result?

Eg:  document1 is 90% similar to document2.
  document1 is 45% similar to document3
  etc..

Thanks



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4055227.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1399036954584-4055227.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregation bug? Or user error?

2014-05-02 Thread Adrien Grand

What version of Elasticsearch are you using? If it is small enough, I would
also be interested if you could share your index so that I can try to
reproduce the issue locally.


On Fri, May 2, 2014 at 12:07 PM, mooky nick.minute...@gmail.com wrote:


 I havent been able to figure out what is required to recreate it.
 I am doing a number of identical aggregations (just different values 
 intentMarketCode
 and intentDate
 Three aggregations give correct numbers - one doesnt I havent figured
 why


 On Wednesday, 30 April 2014 14:13:00 UTC+1, Adrien Grand wrote:

 This looks wrong indeed. By any chance, would you have a curl recreation
 of this issue?


 On Tue, Apr 29, 2014 at 7:35 PM, mooky nick.mi...@gmail.com wrote:

 It looks like a bug to me - but if its user error, then obviously I can
 fix it a lot quicker :)


 On Tuesday, 29 April 2014 13:04:53 UTC+1, mooky wrote:

 I am seeing some very odd aggregation results - where the sum of the
 sub-aggregations is more than the parent bucket.

 Results:
 CSSX : {
   doc_count : *24*,
   intentDate : {
 buckets : [ {
   key : Overdue,
   to : 1.3981248E12,
   to_as_string : 2014-04-22,
   doc_count : *1*,
   ME : {
 doc_count : *0*
   },
   NOT_ME : {
 doc_count : *24*
   }
 }, {
   key : May,
   from : 1.3981248E12,
   from_as_string : 2014-04-22,
   to : 1.4006304E12,
   to_as_string : 2014-05-21,
   doc_count : *23*,
   ME : {
 doc_count : 0
   },
   NOT_ME : {
 doc_count : *24*
   }
 }, {
   key : June,
   from : 1.4006304E12,
   from_as_string : 2014-05-21,
   to : 1.4033088E12,
   to_as_string : 2014-06-21,
   doc_count : *0*,
   ME : {
 doc_count : *0*
   },
   NOT_ME : {
 doc_count : *24*
   }
 } ]
   }
 },


 I wouldn't have thought that to be possible at all.
 Here is the request that generated the dodgy results.


 CSSX : {
   filter : {
 and : {
   filters : [ {
 type : {
   value : inventory
 }
   }, {
 term : {
   isAllocated : false
 }
   }, {
 term : {
   intentMarketCode : CSSX
 }
   }, {
 terms : {
   groupCompanyId : [ 0D13EF2D0E114D43BFE362F5024D8873,
 0D593DE0CFBE49BEA3BF5AD7CD965782, 1E9C36CC45C64FCAACDEE0AF4FB91FBA,
 33A946DC2B0E494EB371993D345F52E4, 6471AA50DFCF4192B8DD1C2E72A032C7,
 9FB2FFDC0FF0797FE04014AC6F0616B6, 9FB2FFDC0FF1797FE04014AC6F0616B6,
 9FB2FFDC0FF2797FE04014AC6F0616B6, 9FB2FFDC0FF3797FE04014AC6F0616B6,
 9FB2FFDC0FF5797FE04014AC6F0616B6, 9FB2FFDC0FF6797FE04014AC6F0616B6,
 AFE0FED33F06AFB6E04015AC5E060AA3 ]
 }
   }, {
 not : {
   filter : {
 terms : {
   status : [ Cancelled, Completed ]
 }
   }
 }
   } ]
 }
   },
   aggregations : {
 intentDate : {
   date_range : {
 field : intentDate,
 ranges : [ {
   key : Overdue,
   to : 2014-04-22
 }, {
   key : May,
   from : 2014-04-22,
   to : 2014-05-21
 }, {
   key : June,
   from : 2014-05-21,
   to : 2014-06-21
 } ]
   },
   aggregations : {
 ME : {
   filter : {
 term : {

   trafficOperatorSid : S-1-5-21-20xxspan
 style=color: #000; class=styled-by
 ...

  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%
 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/3e7d8928-f76b-4358-97b9-3189e037006c%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/3e7d8928-f76b-4358-97b9-3189e037006c%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options,

Re: Significant Term aggregation

2014-05-02 Thread Mark Harwood

your second concern that the query criteria is not identifying a result set
with any sense of cohesion might be true. Basically the search I am
executing is a filter. Either the document metadata either has the value or
not. Hence the result set may not be cohesive. The reason for me to use
the Significant terms is so that the query can be enhanced to provide a
more cohesive set of documents.

We can probably debug that from the results of the agg. For each
significant term you should get a score and all the ingredients that went
into it are also available:
1) The number of docs in the result set with the given term
2) The size of your result set
3) The number of docs in the index with the given term (see the bg_count
value)
4) The size of the index

In a cohesive set you should see a reasonable difference in the term
probabilities e.g. the numbers 1/2 vs 3/4
If all you've selected in your query is effectively random docs with no
common theme then the use of words in background and foreground barely
differ and 1/2 vs 3/4 are practically the same giving a poor-scoring set of
results.

On Thursday, 1 May 2014 10:04:15 UTC-5, Mark Harwood wrote:

Thanks for the feedback, Ramdev.

What I noticed in my aggregation results is a lot of Stopwords (a, an,
the, at, and, etc.) being included as significant terms.

These sorts of terms shouldn't really need any sort of special treatment.
If they are appearing as suggestions then I expect one of the following
statements to be true:

1) You have a very small number of docs in the result set representing
the foreground sample. Significant terms needs a reasonable number of
docs in a sample to draw any real conclusions
2) You have query criteria that is not identifying a result set with any
sense of cohesion e.g. a query for random docs
3) You have changed the set of stopwords in use in your index - what
previously never used to appear at all is now suddenly common or
vice-versa.
4) You are querying across mixed indices or doc-types (one with
stop-words, one without) and we fail to tune-out the stopwords as part of
the results merging process because one small index reports them back as
commonplace while another large index has them as missing or rare. In the
merged stats they therefore appear to be highly correlated with your query
request.

Please let me know if none of these scenarios explain your results.

Another possible enhancement would be get a phrase significance (instead
of a single term, doing a multi term significance) would be nice.

I outline some of the possibilities in creating phrases from significant
terms, starting 51 mins into this recent video:
https://skillsmatter.com/skillscasts/5175-revealing-the-uncommonly-common-with-elasticsearch

Cheers and Thanks for all the fish

You're welcome and thanks again for the feedback
Mark

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25602f15-42ab-4857-9880-509d66a1a818%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Terms Aggregations

2014-05-02 Thread Jose A. Garcia

Hi Adrien,

Thanks for your answer, but I have a question. Wouldn't that give me the 
different sums of the values of those fields?

What I need is, using the example from before:

Doc1 : {  field1 : A,  field2: B, field3: C, size: 1, }
Doc2: { field1 : A,  field2: B2,  field3: C2, size: 2}
Doc3: { field1 : Z, field2: B3, field3: C3, size: 99 }

If I search in my index and those three documents match my query I want a 
list of the possible values that 'field1' can take and the sum of the 
'size' fields for all documents with each value in my result set. So in 
this case I would expect:

field1: { 
 {value: 'A', sum_of_sizes: 3} 
 {value: 'Z', sum_of_sizes: 99}
}

Thanks,
Jose.

On Friday, 2 May 2014 14:51:36 UTC+1, Adrien Grand wrote:

 Hi Jose,

 There are two ways to do so: either with a script (slow because term 
 ordinals can't be used):

 terms : {
 script: doc['A'].values + doc['B'].values + doc['C'].values
 }

 Or by having all values in a single field at indexing time (potentially 
 using `copy_to`[1]).

 [1] 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to



 On Fri, May 2, 2014 at 11:44 AM, Jose A. Garcia 
 argan...@gmail.comjavascript:
  wrote:

 Hi,

 I have a question about Aggregations. I have documents with several 
 fields:

 { 
   field1 : A,
   field2: B,
   field3: C,
   size: 1,
 }

 { 
   field1 : A,
   field2: B2,
   field3: C2,
   size: 2,
 }

 { 
   field1 : Z,
   field2: B3,
   field3: C3,
   size: 99,
 }

 And I need to be able to calculate aggregations for each one of those 
 fields, and get the sum of the sizes for each field, so for example, 
 aggregating by field1 should get me { A, size = 3 }, {Z, size = 99}.

 Looking at the documentation for aggregations I can see how to get the 
 sum for a field and how to get the terms and their counts, but I need a 
 combination of both, what is the best way to do this?

 Thanks in advance,
 Jose. 

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/1deb1eb1-a5dc-40cb-8689-c5518869f40a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/1deb1eb1-a5dc-40cb-8689-c5518869f40a%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec381c10-45cd-4406-8aac-2e542097cf49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to write a custom river

2014-05-02 Thread Joshua Chan

org.elasticsearch.common.settings.NoClassSettingsException: Failed to load 
class with value [river]
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.
java:87)
at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java
:58)
at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder
.java:44)
at org.elasticsearch.river.RiversService.createRiver(RiversService.
java:137)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(
RiversService.java:275)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(
RiversService.java:269)
at org.elasticsearch.action.support.
TransportAction$ThreadedActionListener$1.run(TransportAction.java:93)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source
)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: river
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.
java:73)
... 9 more


On Friday, May 2, 2014 8:51:42 AM UTC-5, Joshua Chan wrote:

 I'm not sure I follow. 

 In my Plugin.onModule I have
 public void onModule(RiversModule module) {
 module.registerRiver(RiverImpl.TYPE, ModuleImpl.class);
 
 //client.admin().indices().prepareDeleteMapping(_river).setType(riverName.name()).execute();
 }

 And on my Module I have
 protected void configure() {
 bind(River.class).to(RiverImpl.class).asEagerSingleton();
 }



 On Thursday, May 1, 2014 11:08:28 PM UTC-5, Rob Ottaway wrote:

 I should have sent you the following earlier rather than a non-river 
 plugin:

 the plugin:

 https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/plugin/river/rabbitmq/RabbitmqRiverPlugin.java

 The river implementation:

 https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/river/rabbitmq/RabbitmqRiver.java

 The module:

 https://github.com/elasticsearch/elasticsearch-river-rabbitmq/blob/master/src/main/java/org/elasticsearch/river/rabbitmq/RabbitmqRiverModule.java

 Looks like you are registering the river implementation rather than the 
 river module hence the not working. Had to look at an example I know works 
 to figure it out.

 -Rob


 On Thu, May 1, 2014 at 8:04 PM, Joshua Chan joshua.be...@gmail.comwrote:

 So, that's what I did, but no love... I checked in the latest.

 -Josh


 On Thursday, May 1, 2014 9:49:11 PM UTC-5, Rob Ottaway wrote:

 Look at this plugin for help:

 https://github.com/elasticsearch/elasticsearch-
 cloud-aws/blob/master/src/main/resources/es-plugin.properties

 Yes it needs to be the FQN.

 On Thursday, May 1, 2014 5:47:31 PM UTC-7, Joshua Chan wrote:

 Thanks Rob. Someone else also told me the plugin property should be 
 the fully qualified name. I didn't declare a package, so I guess I'm 
 using 
 the default package, and I thought I had the namespacing right since 
 IntelliJ corrected the class name when I wrote it.

 Thoughts?


 -Josh


 On Thursday, May 1, 2014 5:23:25 PM UTC-5, Rob Ottaway wrote:

 Look at this file in your BB repo:

 https://bitbucket.org/futurechan/example-river/src/
 fd23648c3e7cc42fd2286d4134e80ecd7e98f802/src/main/resources/
 es-plugin.properties?at=master

 cheers

 On Thursday, May 1, 2014 3:21:59 PM UTC-7, Rob Ottaway wrote:

 This strikes me as odd:

 java.lang.ClassNotFoundException: example_river

 Assume you didn't map the string example_river to the actual class 
 name properly?

 -Rob

 On Thursday, May 1, 2014 11:40:52 AM UTC-7, Joshua Chan wrote:

 I'm making my first go at writing a river. (Here's the source code: 
 https://bitbucket.org/futurechan/example-river/src)

 I followed this tutorial 
 http://blog.trifork.com/2013/01/10/how-to-write-an-
 elasticsearch-river-plugin/

 and compared it to this existing river
 https://github.com/jprante/elasticsearch-river-jdbc

 but I haven't had much luck.

 To deploy the river, I created a folder called example-river under 
 plugins, dropped my jar in that folder, and restarted the node. 
 Everything 
 starts up fine.

 I have also tried bin/plugin --url file:///path/to/plugin 
 --install example-river, which seems to work, but it unpacks my 
 jar. So, I tried zipping it first and then installing, which works and 
 does 
 not unpack my jar, but it didn't help.

 When I issue this PUT request:

 http://localhost:9200/_river/example_river/_meta
 {

Re: How to write a custom river

2014-05-02 Thread Rob Ottaway

I think he means in your Guice module. You are registering the WRONG thing
;)


On Fri, May 2, 2014 at 6:49 AM, Joshua Chan
joshua.bennett.c...@gmail.comwrote:

 I've tried this too with no luck

 http://localhost:9200/_river/example_river/_meta
 {
 type: river,
   river:{
 blah:blah
   }
 }


 On Friday, May 2, 2014 3:31:23 AM UTC-5, Luca Cavanna wrote:

 Hi Joshua,
 the package is not an issue if you are using the default one for your
 classes. Looking deeper, the type of the river that you try to register
 with your rest call doesn't match the type of the river you registered in
 the plugin when you did module.registerRiver(type, riverclass).

 Cheers
 Luca

 On Friday, May 2, 2014 6:08:28 AM UTC+2, Rob Ottaway wrote:

 I should have sent you the following earlier rather than a non-river
 plugin:

 the plugin:
 https://github.com/elasticsearch/elasticsearch-
 river-rabbitmq/blob/master/src/main/java/org/elasticsearch/plugin/river/
 rabbitmq/RabbitmqRiverPlugin.java

 The river implementation:
 https://github.com/elasticsearch/elasticsearch-
 river-rabbitmq/blob/master/src/main/java/org/
 elasticsearch/river/rabbitmq/RabbitmqRiver.java

 The module:
 https://github.com/elasticsearch/elasticsearch-
 river-rabbitmq/blob/master/src/main/java/org/
 elasticsearch/river/rabbitmq/RabbitmqRiverModule.java

 Looks like you are registering the river implementation rather than the
 river module hence the not working. Had to look at an example I know works
 to figure it out.

 -Rob


 On Thu, May 1, 2014 at 8:04 PM, Joshua Chan joshua.be...@gmail.comwrote:

 So, that's what I did, but no love... I checked in the latest.

 -Josh


 On Thursday, May 1, 2014 9:49:11 PM UTC-5, Rob Ottaway wrote:

 Look at this plugin for help:

 https://github.com/elasticsearch/elasticsearch-cloud-aws/
 blob/master/src/main/resources/es-plugin.properties

 Yes it needs to be the FQN.

 On Thursday, May 1, 2014 5:47:31 PM UTC-7, Joshua Chan wrote:

 Thanks Rob. Someone else also told me the plugin property should be
 the fully qualified name. I didn't declare a package, so I guess I'm 
 using
 the default package, and I thought I had the namespacing right since
 IntelliJ corrected the class name when I wrote it.

 Thoughts?


 -Josh


 On Thursday, May 1, 2014 5:23:25 PM UTC-5, Rob Ottaway wrote:

 Look at this file in your BB repo:

 https://bitbucket.org/futurechan/example-river/src/fd23648c3
 e7cc42fd2286d4134e80ecd7e98f802/src/main/resources/es-
 plugin.properties?at=master

 cheers

 On Thursday, May 1, 2014 3:21:59 PM UTC-7, Rob Ottaway wrote:

 This strikes me as odd:

 java.lang.ClassNotFoundException: example_river

 Assume you didn't map the string example_river to the actual class
 name properly?

 -Rob

 On Thursday, May 1, 2014 11:40:52 AM UTC-7, Joshua Chan wrote:

 I'm making my first go at writing a river. (Here's the source
 code: https://bitbucket.org/futurechan/example-river/src)

 I followed this tutorial
 http://blog.trifork.com/2013/01/10/how-to-write-an-elasticse
 arch-river-plugin/

 and compared it to this existing river
 https://github.com/jprante/elasticsearch-river-jdbc

 but I haven't had much luck.

 To deploy the river, I created a folder called example-river under
 plugins, dropped my jar in that folder, and restarted the node. 
 Everything
 starts up fine.

 I have also tried bin/plugin --url file:///path/to/plugin
 --install example-river, which seems to work, but it unpacks my
 jar. So, I tried zipping it first and then installing, which works 
 and does
 not unpack my jar, but it didn't help.

 When I issue this PUT request:

 http://localhost:9200/_river/example_river/_meta
 {
 type: example_river,
   example_river:{
 blah:blah
   }
 }

 I get this exception:

 [2014-04-20 22:28:46,538][DEBUG][river ] [Gloom] creating river 
 [example_river][example_river]
 [2014-04-20 22:28:46,543][WARN ][river ] [Gloom] failed to create 
 river [example_river][example_river] 
 org.elasticsearch.common.settings.NoClassSettingsException: Failed to 
 load class with value [example_river] at
 org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87)
  at
 org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58) 
 at
 org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
  at
 org.elasticsearch.river.RiversService.createRiver(RiversService.java:137)
  at
 org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275)
  at
 org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269)
  at
 org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.jav
 a:93) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
 Source) at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at
 java.lang.Thread.run(Unknown Source) Caused by: 
 java.lang.ClassNotFoundException: example_river at 
 java.net.URLClassLoader$1.run(Unknown Source) at

Re: How to write a custom river

2014-05-02 Thread Rob Ottaway

Oh sorry module is fine, it's the call to module.registerRiver which is
being passed the River itself and not the River Guice module. Try that
change.




On Fri, May 2, 2014 at 10:33 AM, Rob Ottaway robotta...@gmail.com wrote:

 I think he means in your Guice module. You are registering the WRONG thing
 ;)


 On Fri, May 2, 2014 at 6:49 AM, Joshua Chan joshua.bennett.c...@gmail.com
  wrote:

 I've tried this too with no luck

 http://localhost:9200/_river/example_river/_meta
 {
 type: river,
   river:{
 blah:blah
   }
 }


 On Friday, May 2, 2014 3:31:23 AM UTC-5, Luca Cavanna wrote:

 Hi Joshua,
 the package is not an issue if you are using the default one for your
 classes. Looking deeper, the type of the river that you try to register
 with your rest call doesn't match the type of the river you registered in
 the plugin when you did module.registerRiver(type, riverclass).

 Cheers
 Luca

 On Friday, May 2, 2014 6:08:28 AM UTC+2, Rob Ottaway wrote:

 I should have sent you the following earlier rather than a non-river
 plugin:

 the plugin:
 https://github.com/elasticsearch/elasticsearch-
 river-rabbitmq/blob/master/src/main/java/org/
 elasticsearch/plugin/river/rabbitmq/RabbitmqRiverPlugin.java

 The river implementation:
 https://github.com/elasticsearch/elasticsearch-
 river-rabbitmq/blob/master/src/main/java/org/
 elasticsearch/river/rabbitmq/RabbitmqRiver.java

 The module:
 https://github.com/elasticsearch/elasticsearch-
 river-rabbitmq/blob/master/src/main/java/org/
 elasticsearch/river/rabbitmq/RabbitmqRiverModule.java

 Looks like you are registering the river implementation rather than the
 river module hence the not working. Had to look at an example I know works
 to figure it out.

 -Rob


 On Thu, May 1, 2014 at 8:04 PM, Joshua Chan joshua.be...@gmail.comwrote:

  So, that's what I did, but no love... I checked in the latest.

 -Josh


 On Thursday, May 1, 2014 9:49:11 PM UTC-5, Rob Ottaway wrote:

 Look at this plugin for help:

 https://github.com/elasticsearch/elasticsearch-cloud-aws/
 blob/master/src/main/resources/es-plugin.properties

 Yes it needs to be the FQN.

 On Thursday, May 1, 2014 5:47:31 PM UTC-7, Joshua Chan wrote:

 Thanks Rob. Someone else also told me the plugin property should be
 the fully qualified name. I didn't declare a package, so I guess I'm 
 using
 the default package, and I thought I had the namespacing right since
 IntelliJ corrected the class name when I wrote it.

 Thoughts?


 -Josh


 On Thursday, May 1, 2014 5:23:25 PM UTC-5, Rob Ottaway wrote:

 Look at this file in your BB repo:

 https://bitbucket.org/futurechan/example-river/src/fd23648c3
 e7cc42fd2286d4134e80ecd7e98f802/src/main/resources/es-
 plugin.properties?at=master

 cheers

 On Thursday, May 1, 2014 3:21:59 PM UTC-7, Rob Ottaway wrote:

 This strikes me as odd:

 java.lang.ClassNotFoundException: example_river

 Assume you didn't map the string example_river to the actual class
 name properly?

 -Rob

 On Thursday, May 1, 2014 11:40:52 AM UTC-7, Joshua Chan wrote:

 I'm making my first go at writing a river. (Here's the source
 code: https://bitbucket.org/futurechan/example-river/src)

 I followed this tutorial
 http://blog.trifork.com/2013/01/10/how-to-write-an-elasticse
 arch-river-plugin/

 and compared it to this existing river
 https://github.com/jprante/elasticsearch-river-jdbc

 but I haven't had much luck.

 To deploy the river, I created a folder called example-river
 under plugins, dropped my jar in that folder, and restarted the node.
 Everything starts up fine.

 I have also tried bin/plugin --url file:///path/to/plugin
 --install example-river, which seems to work, but it unpacks my
 jar. So, I tried zipping it first and then installing, which works 
 and does
 not unpack my jar, but it didn't help.

 When I issue this PUT request:

 http://localhost:9200/_river/example_river/_meta
 {
 type: example_river,
   example_river:{
 blah:blah
   }
 }

 I get this exception:

 [2014-04-20 22:28:46,538][DEBUG][river ] [Gloom] creating river 
 [example_river][example_river]
 [2014-04-20 22:28:46,543][WARN ][river ] [Gloom] failed to create 
 river [example_river][example_river] 
 org.elasticsearch.common.settings.NoClassSettingsException: Failed 
 to load class with value [example_river] at
 org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87)
  at
 org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58)
  at
 org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
  at
 org.elasticsearch.river.RiversService.createRiver(RiversService.java:137)
  at
 org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275)
  at
 org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269)
  at
 org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.jav
 a:93) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
 Source) at

Re: Significant Term aggregation

2014-05-02 Thread Ramdev Wudali

I think I should clarify something. Even though my query is essentially a
filter, the significant terms aggregation is run against the body of the
documents (which is typical prose in a news document).

here is an example :

Query : Query index to find docs with a Specific String in field
Class_Text  with aggregation (Significant Terms)  on the Body of the
document:
POST _search
{
  size : 0,
  query : {
nested : {
  query : {
match : {
  Class_Text : {
query : Fuel Cell  Battery,
type : boolean
  }
}
  },
  path : SMART_TERM
}
  },
  aggregations : {
sigTerms : {
  significant_terms : {
field : BODY.v,
size : 1000
  }
}
  }
}


..
{
   key: resistance,
   doc_count: 68795,
   score: 53.42999474620047,
   bg_count: 129149
},
{
   key: patented,
   doc_count: 42848,
   score: 50.98806065128648,
   bg_count: 52548
},
{
   key: marketintelligencecenter.com's,
   doc_count: 33701,
   score: 48.58994469232905,
   bg_count: 34122
},
{
   key: for,
   doc_count: 427040,
   score: 47.73227955829178,
   bg_count: 5483708
},
{
   key: html,
   doc_count: 91658,
   score: 46.79933234224686,
   bg_count: 261374
},
{
   key: an,
   doc_count: 348706,
   score: 43.20270422802958,
   bg_count: 4046974
},
{
   key: protection,
   doc_count: 80987,
   score: 43.187880126230326,
   bg_count: 221159
},
{
   key: of,
   doc_count: 430217,
   score: 42.90990816758588,
   bg_count: 6177535
},
{
   key: by,
   doc_count: 364873,
   score: 42.68719313911975,
   bg_count: 4480098
},
...


as you can see words like for an of by are showing up in the aggregations
list with pretty decent scores to put them in the top 50 significant terms.

The documents get tagged with Class_Text after being classified and that
value is being queried in the query.

In my case it would be more helpful if I am able to get Phrases rather than
terms. (I am yet to finish watching  your presentation).

let me know if you have any insight .

Thanks much

Ramdev



On Fri, May 2, 2014 at 9:07 AM, Mark Harwood mark.harw...@elasticsearch.com
 wrote:



 your second concern that the query criteria is not identifying a result
 set with any sense of cohesion might be true. Basically  the search I am
 executing is a filter. Either the document metadata either has the value or
 not. Hence the result set may not be cohesive. The reason for me to use
 the Significant terms is so that the query can be enhanced to provide a
 more cohesive set of documents.


 We can probably debug that from the results of the agg. For each
 significant term you should get a score and all the ingredients that went
 into it are also available:
 1) The number of docs in the result set with the given term
 2) The size of your result set
 3) The number of docs in the index with the given term (see the bg_count
 value)
 4) The size of the index

 In a cohesive set you should see a reasonable difference in the term
 probabilities e.g. the numbers 1/2  vs 3/4
 If all you've selected in your query is effectively random docs with no
 common theme then the use of words in background and foreground barely
 differ and 1/2 vs 3/4 are practically the same giving a poor-scoring set of
 results.









 On Thursday, 1 May 2014 10:04:15 UTC-5, Mark Harwood wrote:

 Thanks for the feedback, Ramdev.


 What I noticed in my aggregation results is  a lot of Stopwords (a, an,
 the, at, and, etc.) being included as significant terms.


 These sorts of terms shouldn't really need any sort of special
 treatment. If they are appearing as suggestions then I expect one of the
 following statements to be true:

 1) You have a very small number of docs in the result set representing
 the foreground sample. Significant terms needs a reasonable number of
 docs in a sample to draw any real conclusions
 2) You have query criteria that is not identifying a result set with any
 sense of cohesion e.g. a query for random docs
 3) You have changed the set of stopwords in use in your index - what
 previously never used to appear at all is now suddenly common or
 vice-versa.
 4) You are querying across mixed indices or doc-types (one with
 stop-words, one without) and we fail to tune-out the stopwords as part of
 the results merging process because one small index reports

Re: Handling updates from multiple sources

2014-05-02 Thread Michał Zgliczyński

My system is changing rapidly. The final result is to have all the data
inside the ES index. The way I have it set up currently I have 2 different
systems that write to the ES index:
1) Bulk job. Run through all the dbs, fetch things in batch updates of 5k
and send it to ES.
2) Live updating job. Pickup the newest changes and send them to ES. Either
updates or inserts. Note: the updates don't contain full documents

After this step (1) and (2) I would like to have (almost) 100% guarantee
that the index is full and up to date.

I think that this is quite common use case if you want to have an index
with live data, not stale as of the time of the beginning of the bulk job.

W dniu czwartek, 1 maja 2014 19:45:53 UTC-7 użytkownik Rob Ottaway napisał:

I missed that the later doc would only be partial. What is the reason to
use the partial doc? That really complicates things.

Filling in missing fields is going to be a very large headache. You'll
probably kill performance trying to do it too. Likely it'll be so complex
it will present a lot more trouble.

I think if you can better present the overall use cases you will get
better insight into how to work this out.

On Thursday, May 1, 2014 4:51:03 PM UTC-7, Michał Zgliczyński wrote:

Hi,
Thank you for your response. I have looked through this blog post:
http://www.elasticsearch.org/blog/elasticsearch-versioning-support/
It looks as if external versioning would be the way to go. Have the
timestamps act as version numbers and let ES only pick the document with
the newest version as the correct document. However, with the situation I
have presented above, ES will fail. A quote from the post:
With version_type set to external, Elasticsearch will store the version
number as given and will not increment it. Also, instead of checking for an
exact match, Elasticsearch will only return a version collision error if
the version currently stored is greater or equal to the one in the indexing
command. This effectively means “only store this information if no one else
has supplied the same or a more recent version in the meantime”.
Concretely, the above request will succeed if the stored version number is
smaller than 526. 526 and above will cause the request to fail.

In my example, we would have that situation. A partial doc with a larger
version number(later timestamp) is already stored in ES and we get the
complete document with a smaller timestamp. In this situation we would like
to merge these 2 documents in a way that, we have all of the fields from
the partial doc and the other fields(not currently specified in the ES
document) to be filled from the complete document.

Thanks!
Michal Zgliczynski

W dniu czwartek, 1 maja 2014 14:58:31 UTC-7 użytkownik Rob Ottaway
napisał:

Have you looked at using versioning?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning

cheers,
Rob

On Thursday, May 1, 2014 2:47:39 PM UTC-7, Michał Zgliczyński wrote:

Hi,

I am building a system in which I will have two sources of updates:
1) Bulk updating from the source of truth(db) - Always inserting
documents(complete docs)
2) Live updates - Adding insert and update (complete and incomplete
docs)

Also, lets assume that each insert/update has a timestamp, which we
belive in (not ES timestamp).

The idea is to have a complete, up to date index once the bulk updating
finishes. To achieve this I need to guarantee that I will have the correct
data. This would work mostly well, if everything we would do upserts and
the inserts/updates coming into ES have a strictly increasing timestamp.
But one could imagine that this is a possibly problematic situation,
when:

1) We are performing bulk indexing,
a) we read an object from the db
b) process it
c) send it to ES.
2) We have an update on the same object, after step (a) and before if
makes to ES in the bulk updating - phase(c). That is, ES gets an update
with new data and only after that we get the insert with the entire
document from the source of truth with older data. Hence, in ES we have a
document with a newer timestamp, than the newly added one phase(c).

My theoretical solution: For each operation, have the timestamp for
that change (timestamp from the system that made the change, not from
Elastic Search). Lets say that all of the operations that we will perform
are upserts.
Then once we get an insert or an update (lets call it doc), we have to
perform the following script (pseudo mvel) inside ES.
{
if (doc.timestamp ctx.source.timestamp) {
// doc is newer than what was in ES
upsert(doc); // update the index with all of the info from the new
doc
} else {
// there is already a document in ES with a newer timestamp, note,
this may be an incomplete document (an update)
__fill the missing fields in the document in ES with values from
doc__
}

Re: access parent bucket's key from child aggregation in geohash grid

stats, extended stats, percentiles for doc_count in aggregations

Re: Read/Write consistency

Re: stats, extended stats, percentiles for doc_count in aggregations

Re: Help with ES 1.x percolator query plz

Re: strange problem: my ES server almost lost all its data. (All shards failed)

Re: strange problem: my ES server almost lost all its data. (All shards failed)

Re: How to write a custom river

Re: checkIndex with ES 1.1.0

Re: Aggregation bug? Or user error?

Re: Problem while searching for date range or date

Re: Partial word match with singular and plurals: Elasticsearch

Re: Unable to create mapping and settings using Java API

Re: SearchParseExceptions in Marvel monitoring cluster

Re: strange problem: my ES server almost lost all its data. (All shards failed)

Re: Partial word match with singular and plurals: Elasticsearch

Re: strange problem: my ES server almost lost all its data. (All shards failed)

Re: Need help on similarity ranking approach

Re: Aggregation bug? Or user error?

Re: Significant Term aggregation

Re: Terms Aggregations

Re: How to write a custom river

Re: How to write a custom river

Re: How to write a custom river

Re: Significant Term aggregation

Re: Handling updates from multiple sources

26 matches

Site Navigation

Mail list logo

Footer information