date:20140318

My mind was not clear since I was debugging this issue for a few hours.
Once I realized it was a multicast issue, I switched to unicast and the
cluster was back up and running. So it was multicast after all. I should
have been more careful when I received an email on Friday that said
" will have to wait till early next week due to errors on the
host." Errors on the host? I should have made them explain themselves.

I do not have control over the sysadmin aspects of the system. If I did, we
would be running the latest stable release of Java, Elasticsearch, ..

Thanks,

Ivan

On Tue, Mar 18, 2014 at 11:11 PM, 熊贻青  wrote:

> How many NIC are there on each of your nodes? We got some issue on boxes
> with 4 NIC, some address were not reachable due to linux kernel setting.
> I'd suggest you test the full connection matrix via some shell script, so
> as to rule out this cause.
> My 2 cents
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ2Ks4Sg%2B0xoiKsX-jmb3oa7L%3D94BD0E9u5sk_uT7f0ohw%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQArVT-Vze2rqDmwcGQHsc69v2D0bcMiP51JeCcYW1a9TQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Community kibana panel repo

2014-03-18 Thread Mark Walkom

Hi all,
Someone mentioned this in the IRC channel and I was hoping to kick start a
page on the ES/KB site similar to the ES plugin list (if the ES team is
willing).

If anyone has some nice kibana panels they are willing to share, please
drop a note with a link to the gist/pastebin/etc and we can start to
collate them here for now.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ahYT%2BTb6239o1fMGpBRgBeXDfr7WgBui6y5zu2eauqjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Local JVM node

May be this could help you: 
https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/test/TestCluster.java#L632

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 19 mars 2014 à 01:10, Mohit Anchlia  a écrit :

I removed local(true) from the builder call and it seem to come up on 9300 now

> On Tue, Mar 18, 2014 at 3:55 PM, Mohit Anchlia  wrote:
> I am not seeing anything come up on port 9300.
>  
> this is the log that shows only port 9200
>  
> INFO org.elasticsearch.node [Thread-0]: [Agent Zero] version[1.0.0], 
> pid[11624], build[a46900e/2014-02-12T16:18:34Z]
> 
> INFO org.elasticsearch.node [Thread-0]: [Agent Zero] initializing ...
> 
> INFO org.elasticsearch.plugins [Thread-0]: [Agent Zero] loaded [], sites []
> 
> INFO org.elasticsearch.node [Thread-0]: [Agent Zero] initialized
> 
> INFO org.elasticsearch.node [Thread-0]: [Agent Zero] starting ...
> 
> INFO org.elasticsearch.transport [Thread-0]: [Agent Zero] bound_address 
> {local[1]}, publish_address {local[1]}
> 
> INFO org.elasticsearch.cluster.service [elasticsearch[Agent 
> Zero][clusterService#updateTask][T#1]]: [Agent Zero] new_master [Agent 
> Zero][CDptnrekR-yL5eH91y7Yiw][SDGL0770E674E03][local[1]]{local=true}, reason: 
> local-disco-initial_connect(master)
> 
> INFO org.elasticsearch.discovery [Thread-0]: [Agent Zero] 
> elasticsearch/CDptnrekR-yL5eH91y7Yiw
> 
> INFO org.elasticsearch.http [Thread-0]: [Agent Zero] bound_address 
> {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.28.172.154:9200]}
> 
> INFO org.elasticsearch.gateway [elasticsearch[Agent 
> Zero][clusterService#updateTask][T#1]]: [Agent Zero] recovered [1] indices 
> into cluster_state
> 
> 
> 
>> On Tue, Mar 18, 2014 at 3:45 PM, Ivan Brusic  wrote:
>> Also, you should see something along the lines of
>> 
>> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
>> {inet[/192.168.52.162:9300]}
>> 
>> Note the 9300
>> 
>> 
>> 
>>> On Tue, Mar 18, 2014 at 3:44 PM, Ivan Brusic  wrote:
>>> It appears that everything is running, but you simply might be using the 
>>> wrong ports.
>>> 
>>> In your code:
>>> client = new TransportClient()
>>> .addTransportAddress(new InetSocketTransportAddress(host, port));  
>>> 
>>> What value are you using for the port? 9200 or 9300? Make sure it is 9300. 
>>> 
>>> 
>>> 
 On Tue, Mar 18, 2014 at 3:37 PM, Mohit Anchlia  
 wrote:
 I see, is there a way to run embedded ES that allows me to connect using 
 TransportClient?
 
> On Tue, Mar 18, 2014 at 3:32 PM, Ivan Brusic  wrote:
> TransportClient uses port 9300 by default. Port 9200 is for HTTP/REST 
> traffic.
> 
> -- 
> Ivan
> 
> 
>> On Tue, Mar 18, 2014 at 3:30 PM, Mohit Anchlia  
>> wrote:
>> I do see process listening on port 9200 and also the logs seems to 
>> indicate that it's connected on that port
>> 
>>> On Tue, Mar 18, 2014 at 3:21 PM, Ivan Brusic  wrote:
>>> Just a guess, but perhaps creating a local node client does not 
>>> instantiate the internal Jetty server.
>>> 
>>> -- 
>>> Ivan
>>> 
>>> 
 On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia 
  wrote:
 I am trying to form a cluster between my test elasticsearch that runs 
 in jvm with my main code that connects using transportclient but it 
 doesn't seem to work:
  
 My test class code:
  
 
node = nodeBuilder().local(true).node();
client = node.client();   
  
 // This connects fine
 INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent] 
 bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
 {inet[/172.28.172.154:9200]}
 
 // Now the transport client code
   client = new TransportClient()
 .addTransportAddress(new InetSocketTransportAddress(host, 
 port));  
  
  
 // This fails to connect to the one I started in local(true) mode:
 INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to 
 get node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]], 
 disconnecting...
 org.elasticsearch.transport.ReceiveTimeoutTransportException: 
 [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0] 
 timed out after [5002ms]
  at 
 org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
  
  
  
 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails fr

Re: Complete cluster failure

2014-03-18 Thread 熊贻青

How many NIC are there on each of your nodes? We got some issue on boxes
with 4 NIC, some address were not reachable due to linux kernel setting.
I'd suggest you test the full connection matrix via some shell script, so
as to rule out this cause.
My 2 cents

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ2Ks4Sg%2B0xoiKsX-jmb3oa7L%3D94BD0E9u5sk_uT7f0ohw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Complete cluster failure

No matter in what order I restart the servers, the same 4 node clusters get
created. I suspect network, especially since there was some work done this
past Friday on the underlying VM host. Would Elasticsearch cache multicast
information? The servers have not been restarted in at least a week.

Ivan


On Tue, Mar 18, 2014 at 10:03 PM, Ivan Brusic  wrote:

> I have been running Elasticsearch for years and I have never encountered a
> collapse such as the one I am experiencing. Even when experiencing split
> brain clusters, I still had it running and accepting search requests.
>
> 8 node development cluster running 0.90.2 using multicast. Last time the
> cluster was full restarted was probably when it was upgraded to 0.90.2
> (July 2013?). All are master and data enabled.
>
> I decided to upgrade to Java 7u25 from 7u04. Clients were upgraded first
> with no issues. Restarted 2 nodes on the cluster, once again, no issues.
> Attempting to restart the next two wrecked havoc on the cluster. Only 5
> nodes were able form a cluster. The other 3 nodes were not able to join.
> Disabled gateways options, removed some plugins. Nothing. The same 3 would
> not join.
>
> After resigning to the facet that I must have bad state files, I decided
> to remove the data directory and restart from scratch. The nodes still
> output the same message over and over again:
>
> [2014-03-18 21:41:18,333][DEBUG][monitor.network  ] [search5]
> net_info
> host [srch-dv105]
> eth0 display_name [eth0]
> address [/fe80:0:0:0:250:56ff:feba:9b%2] [/192.168.50.105]
>  mtu [1500] multicast [true] ptp [false] loopback [false] up [true]
> virtual [false]
> lo display_name [lo]
>  address [/0:0:0:0:0:0:0:1%1] [/127.0.0.1]
> mtu [16436] multicast [false] ptp [false] loopback [true] up [true]
> virtual [false]
> ...
> [2014-03-18 21:24:19,414][INFO ][node ] [search5]
> {0.90.2}[30297]: initialized
> [2014-03-18 21:24:19,414][INFO ][node ] [search5]
> {0.90.2}[30297]: starting ...
> [2014-03-18 21:24:19,459][DEBUG][netty.channel.socket.nio.SelectorUtil]
> Using select timeout of 500
> [2014-03-18 21:24:19,461][DEBUG][netty.channel.socket.nio.SelectorUtil]
> Epoll-bug workaround enabled = false
> [2014-03-18 21:24:19,953][DEBUG][transport.netty  ] [search5]
> Bound to address [/0:0:0:0:0:0:0:0:9300]
> [2014-03-18 21:24:19,966][INFO ][transport] [search5]
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 192.168.50.105:9300]}
> [2014-03-18 21:24:25,124][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:24:30,172][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:24:35,176][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:24:40,276][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:24:45,280][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:24:50,017][WARN ][discovery] [search5]
> waited for 30s and no initial state was set by the discovery
> [2014-03-18 21:24:50,019][INFO ][discovery] [search5]
> development/0iuC15VyQ32GdRbZ3kzLLQ
> [2014-03-18 21:24:50,020][DEBUG][gateway  ] [search5]
> can't wait on start for (possibly) reading state from gateway, will do it
> asynchronously
> [2014-03-18 21:24:50,063][INFO ][http ] [search5]
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 192.168.50.105:9200]}
> [2014-03-18 21:24:50,064][INFO ][node ] [search5]
> {0.90.2}[30297]: started
> [2014-03-18 21:24:50,283][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:24:55,287][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:25:00,290][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:25:05,294][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:25:10,297][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:25:15,301][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:25:20,305][DEBUG][discovery.zen] [search5]
> filtered ping responses: (filter_client[true], filter_data[false]) {none}
> [2014-03-18 21:25:25,309][DEB

Complete cluster failure

I have been running Elasticsearch for years and I have never encountered a
collapse such as the one I am experiencing. Even when experiencing split
brain clusters, I still had it running and accepting search requests.

8 node development cluster running 0.90.2 using multicast. Last time the
cluster was full restarted was probably when it was upgraded to 0.90.2
(July 2013?). All are master and data enabled.

I decided to upgrade to Java 7u25 from 7u04. Clients were upgraded first
with no issues. Restarted 2 nodes on the cluster, once again, no issues.
Attempting to restart the next two wrecked havoc on the cluster. Only 5
nodes were able form a cluster. The other 3 nodes were not able to join.
Disabled gateways options, removed some plugins. Nothing. The same 3 would
not join.

After resigning to the facet that I must have bad state files, I decided to
remove the data directory and restart from scratch. The nodes still output
the same message over and over again:

[2014-03-18 21:41:18,333][DEBUG][monitor.network  ] [search5]
net_info
host [srch-dv105]
eth0 display_name [eth0]
address [/fe80:0:0:0:250:56ff:feba:9b%2] [/192.168.50.105]
mtu [1500] multicast [true] ptp [false] loopback [false] up [true] virtual
[false]
lo display_name [lo]
address [/0:0:0:0:0:0:0:1%1] [/127.0.0.1]
mtu [16436] multicast [false] ptp [false] loopback [true] up [true] virtual
[false]
...
[2014-03-18 21:24:19,414][INFO ][node ] [search5]
{0.90.2}[30297]: initialized
[2014-03-18 21:24:19,414][INFO ][node ] [search5]
{0.90.2}[30297]: starting ...
[2014-03-18 21:24:19,459][DEBUG][netty.channel.socket.nio.SelectorUtil]
Using select timeout of 500
[2014-03-18 21:24:19,461][DEBUG][netty.channel.socket.nio.SelectorUtil]
Epoll-bug workaround enabled = false
[2014-03-18 21:24:19,953][DEBUG][transport.netty  ] [search5] Bound
to address [/0:0:0:0:0:0:0:0:9300]
[2014-03-18 21:24:19,966][INFO ][transport] [search5]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
192.168.50.105:9300]}
[2014-03-18 21:24:25,124][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:30,172][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:35,176][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:40,276][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:45,280][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:50,017][WARN ][discovery] [search5]
waited for 30s and no initial state was set by the discovery
[2014-03-18 21:24:50,019][INFO ][discovery] [search5]
development/0iuC15VyQ32GdRbZ3kzLLQ
[2014-03-18 21:24:50,020][DEBUG][gateway  ] [search5] can't
wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2014-03-18 21:24:50,063][INFO ][http ] [search5]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
192.168.50.105:9200]}
[2014-03-18 21:24:50,064][INFO ][node ] [search5]
{0.90.2}[30297]: started
[2014-03-18 21:24:50,283][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:55,287][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:00,290][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:05,294][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:10,297][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:15,301][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:20,305][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:25,309][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:30,312][DEBUG][discovery.zen] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
...

Attempting to bring up other nodes results in the same type of error
repeatedly, but slightly different:
[2014-03-18 21:46:31,200][DEBUG][discovery.zen] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [

Re: Exist filter not giving consistent results in version 0.90.6

2014-03-18 Thread Narinder Kaur


Thanks for letting me know the aspect. Currently i am not using preference, 
i will check it and let you know.


On Wednesday, 19 March 2014 05:04:48 UTC+5:30, Binh Ly wrote:
>
> Is it consistent if you specify preference=_primary in your search request?
>
> GET localhost:9200/_search?preference=_primary
>
> If yes, I'd check the logs to see if there was any failures in there 
> related to indexing data.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e87ae04-afbe-4f70-a4b4-846ceaa50cb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: MoreLikeThis ignores queries?

2014-03-18 Thread Alexey Bagryancev

Anyone can help me? It really does not work...

среда, 19 марта 2014 г., 2:05:49 UTC+7 пользователь Alexey Bagryancev 
написал:
>
> Hi,
>
> I am trying to filter moreLikeThis results by adding additional query - 
> but it seems to ignore it at all.
>
> I tried to run my ignoreQuery separately and it works fine, but how to 
> make it work with moreLikeThis? Please help me.
>
> $ignoreQuery = $this->IgnoreCategoryQuery('movies')
>
>
>
> $this->resultsSet = $this->index->moreLikeThis(
>new \Elastica\Document($id), 
>array_merge($this->mlt_fields, array('search_size' => $this->
> size, 'search_from' => $this->from)), 
>$ignoreQuery);
>
>
>
> My IgnoreCategory function:
>
> public function IgnoreCategoryQuery($category = 'main') 
> { 
>  $categoriesTermQuery = new \Elastica\Query\Term();
>  $categoriesTermQuery->setTerm('categories', $category);
>  
>  $categoriesBoolQuery = new \Elastica\Query\Bool(); 
>  $categoriesBoolQuery->addMustNot($categoriesTermQuery);
>  
>  return $categoriesBoolQuery;
> }
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/212ca5d9-4c48-476d-9ea1-983b9578e8f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: questions about aggregation min_doc_count = 0

2014-03-18 Thread John Stanford

Thanks Matt, I suspected as much on #1.  I think it might save a little 
post-processing if it provided buckets for the specified range.  The issue 
appears to be logged as 
https://github.com/elasticsearch/elasticsearch/issues/5224 and a pull request 
has been made.   I tried the filter on #2, and it still picked up hosts that 
weren’t in that doc type, so I filed 
https://github.com/elasticsearch/elasticsearch/issues/5458.

Cheers,

John

jxstanf...@gmail.com
@jxstanford



On Mar 18, 2014, at 13:17:10, Matt Weber  wrote:

> 1.  The histogram aggregation (and facet) work on indexed values not based on 
> the current time or "now".  So, if the last indexed document timestamp is 
> 3/15/14T16:15 you will not get empty buckets between 3/15/14T16:15 and the 
> current time.  It would be interesting to be able to set the "to" and "from" 
> on histogram based aggregations to allow for generating buckets on intervals 
> between the defined range.
> 
> 2.  I believe this is the way the keys are pulled from the fielddata which is 
> index level data.  So if you are using the "all" index you are going to get 
> data from all indices.  Not sure if this is a bug or not.  You can try 
> applying a filter aggregation:
> 
> POST _all/summary_phys/_search
> {
>   "aggs": {
> "summary_phys_events": {
>   "filter": {
> "type": {"value": "summary_phys_events"}
>   },
>   "aggs": {
> "events_by_date": {
>   "date_histogram": {
> "field": "@timestamp",
> "interval": "300s",
> "min_doc_count": 0
>   },
>   "aggs": {
> "events_by_host": {
>   "terms": {
> "field": "host.raw",
> "min_doc_count": 0
>   },
>   "aggs": {
> "avg_used": {
>   "avg": {
> "field": "used"
>   }
> },
> "max_used": {
>   "max": {
> "field": "used"
>   }
> }
>   }
> }
>   }
> }
>   }
> }
>   }
> }
> 
> 
> 
> 
> 
> On Tue, Mar 18, 2014 at 12:39 PM, John Stanford  wrote:
> Hi,
> 
> I'm trying to get a better understanding of aggregations, so here are a 
> couple of questions that came up recently.
> 
> Question 1:
> 
> I have some time based data that I am using aggregations to chart.  The data 
> may be sparsely populated, so I've been setting min_doc_count to 0 so I get 
> empty buckets back anyway.  I've noticed that it will fill in empty buckets 
> unless they are before or after the first record of the range.  
> 
> For example, if I use a query similar to the one below, and there are no 
> records after 3/15/14T16:15, the last aggregation record will be for 
> 3/15/14T16:15.  On the other hand, if there is a gap in between the start 
> time and 3/15/14T16:15, I will get a bucket with a 0 doc count (as expected). 
>  
> 
> POST _all/summary_phys/_search
> 
> {
>"aggs": {
>   "events_by_date": {
>  "date_histogram": {
> "field": "@timestamp",
> "interval": "300s",
> "min_doc_count": 0
>  },
>  "aggs": {
> "events_by_host": {
>"terms": {
>   "field": "host.raw"
>},
>"aggs": {
>   "avg_used": {
>  "avg": {
> "field": "used"
>  }
>   },
>   "max_used": {
>  "max": {
> "field": "used"
>  }
>   }
>}
> }
>  }
>   }
>}
> }
> 
> Not getting the 0 doc count buckets back at the front and back of the range 
> seems contrary to the documented purpose of min_doc_count.  Am I doing 
> something wrong?
> 
> Question 2:
> 
> 
> If I add a min_doc_count = 0 to the inner aggregation, but limit the search 
> to a specific doc type like:
> 
>   doc type
>v
> POST _all/summary_phys/_search
> {
>"aggs": {
>   "events_by_date": {
>  "date_histogram": {
> "field": "@timestamp",
> "interval": "300s",
> "min_doc_count": 0
>  },
>  "aggs": {
> "events_by_host": {
>"terms": {
>   "field": "host.raw",
>   "min_doc_count": 0
>},
>"aggs": {
>   "avg_used": {
>  "avg": {
> "field": "used"
>  }
>   },
>   "max_used": {
>  "max": {
> "field": "used"
>  }
>   }
>}
> }
>  }
>   }
>}
> }
> 
> I get bu

JVM Heap usage didn't decline after clearing field data cache

2014-03-18 Thread Gary Gao

Hi,
 I have a total 8GB of JVM Heap, it't usage was near 50%,
 While I cleared field data cache（almost 1GB） using Clear Cache API:
$curl -XPOST 'http://localhost:9200/_cache/clear?field_data=true
'
  field data cahced size decreased, indeed, but JVM heap usage didn't 
declined.
  What's the reasons ?

   

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5e4a1e94-d6f0-4069-9d62-481c5a974f2c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JVM Heap usage didn't decline after clearing field data cache by clear cache API

2014-03-18 Thread Gary Gao

Hi,
 While I cleared field data cache using Clear Cache API:
$curl -XPOST 
'http://localhost:9200/_cache/clear?field_data=true'
  field data cahced size decreased, indeed, but JVM heap usage didn't 
declined.
  What's the reasons ?

 
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2f60ef42-36a6-4faa-a817-c48ebe36c725%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Complex Query

2014-03-18 Thread Yogesh Shetty

I recently downloaded and started exploring elasticsearch and went through 
search api, I am still trying to figure out what is the best way to phrase 
this complex query for eg. what will be the elastic search query look like 
for below conditional operators.

(Source = "Crystal" and ReportName = "BOLI.rpt") Or 
( (ParameterName = "EffDate" And FieldName = "SourceId") Or 
  (ParameterName = "Type" and FieldName = "Cusip" ) )

Do i need to take filter route? or can this be done using query 

thanks in advance

Cheers
Yogesh

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cc1dcfb6-0fd5-4999-8399-be6e10ae4bc1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Some questions on wikipedia river and cluster config

2014-03-18 Thread joergpra...@gmail.com

You can change the parameters bulk_size from 100 to a higher value,
max_concurrent_bulk from 1 to a higher value, and disable flush_interval
from 5s to -1 to increase wikipedia river bulk performance. Also make sure
the bzip2 file is downloading quick enough over the wire, which is often
not the case (some KB/sec here). Otherwise download it to the local file
system and change url parameter to this file.

The cluster stats show the Lucene indexing time. If you want overall
indexing time, you should take measurements at client side.

Jörg


On Tue, Mar 18, 2014 at 10:59 PM, jorge canellas <
jorge.canellas@gmail.com> wrote:

> Hi!
>
> That was something that I was thinking since the indexing time shown in
> the indexing statistic is much lower than the real time the process is
> running. Each 5 seconds that the process is running, only a bit more than a
> second is spent indexing.
> I was thinking on parsing the wikipedia dumps manually and then use the
> bulk api to index them.
> Only one question: the indexing time shown in the stats of the cluster
> takes into account the time used to choose the shard, send the document
> through the network, analyze the document and write the terms? or only the
> time spent writting the terms in the index? I am asking this because I am
> interested on the indexing performance of Elasricsearch, and I could use
> this time and forget the time spent by the river parsing the pages of the
> wikipedia.
> El 18/03/2014 20:32, "Ivan Brusic"  escribió:
>
>> It all depends on where the bottleneck is. A river, by design, runs on
>> only one node. Perhaps Elasticsearch is quickly indexing the content, but
>> the Wikipedia ingestion is the slowdown. Is the Wikipedia indexer threaded
>> or are all the bulks being executed on one thread? The BulkProcessor
>> supports multithreading.
>>
>> Have you modified any of the default settings? In particular the merge
>> settings and throttling. Elasticsearch has low defaults for throttling out
>> of the box. If you have fast disks, you can easily raise the values. Are
>> you searching on the index while it is being built? If you are just
>> building the index, you can increase the number of segments and/or reduce
>> the amount of merging done at once. But it all depends on where the
>> bottleneck is. If you disk performance is fine, you might need to look
>> elsewhere.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Tue, Mar 18, 2014 at 10:52 AM, jorge canellas <
>> jorge.canellas@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> I am trying to index the wikipedia dumps (downloaded and uncompressed)
>>> using the wikipedia river, but it takes about 6 hours in the cluster.
>>> I have increased the number of nodes and primary shards from 5 to 8, but
>>> the performance does not increase.
>>>
>>>- I have set the refresh time to -1, the number of replicas to 0,
>>>increased the amount of memory from 256MB/1GB to 3GB/4GB, allowed the
>>>mlockall, and set the buffer to 30% from 10%.
>>>- I have changed the river settings increasing the bulk_size from
>>>100 to 10k, refresh interval is set to 1s
>>>- I have changed the mapping to only index title and text. And set
>>>_source to false.
>>>
>>>
>>> I do not know what else can I modify to increase the indexing ratio, at
>>> this moment it is indexing about 120 docs/sec.
>>>
>>> Any ideas?
>>>
>>> Kind regards,
>>>
>>> Jorge
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/c9bd6437-3241-46a4-838e-5327067bc3cc%40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/-DxbnAqNlZU/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD_ZWNYmqxu-7qNuv%3D6%3Dfm9Yska_vCNReYa8VNG_i1p8Q%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...

Re: Local JVM node

I removed local(true) from the builder call and it seem to come up on 9300
now

On Tue, Mar 18, 2014 at 3:55 PM, Mohit Anchlia wrote:

> I am not seeing anything come up on port 9300.
>
> this is the log that shows only port 9200
>
>
> INFO org.elasticsearch.node [Thread-0]: [Agent Zero] version[1.0.0],
> pid[11624], build[a46900e/2014-02-12T16:18:34Z]
>
> INFO org.elasticsearch.node [Thread-0]: [Agent Zero] initializing ...
>
> INFO org.elasticsearch.plugins [Thread-0]: [Agent Zero] loaded [], sites []
>
> INFO org.elasticsearch.node [Thread-0]: [Agent Zero] initialized
>
> INFO org.elasticsearch.node [Thread-0]: [Agent Zero] starting ...
>
> INFO org.elasticsearch.transport [Thread-0]: [Agent Zero] bound_address
> {local[1]}, publish_address {local[1]}
>
> INFO org.elasticsearch.cluster.service [elasticsearch[Agent
> Zero][clusterService#updateTask][T#1]]: [Agent Zero] new_master [Agent
> Zero][CDptnrekR-yL5eH91y7Yiw][SDGL0770E674E03][local[1]]{local=true},
> reason: local-disco-initial_connect(master)
>
> INFO org.elasticsearch.discovery [Thread-0]: [Agent Zero]
> elasticsearch/CDptnrekR-yL5eH91y7Yiw
>
> INFO org.elasticsearch.http [Thread-0]: [Agent Zero] bound_address
> {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.28.172.154:9200
> ]}
>
> INFO org.elasticsearch.gateway [elasticsearch[Agent
> Zero][clusterService#updateTask][T#1]]: [Agent Zero] recovered [1] indices
> into cluster_state
>
>
> On Tue, Mar 18, 2014 at 3:45 PM, Ivan Brusic  wrote:
>
>> Also, you should see something along the lines of
>>
>> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
>> 192.168.52.162:9300]}
>>
>> Note the 9300
>>
>>
>>
>> On Tue, Mar 18, 2014 at 3:44 PM, Ivan Brusic  wrote:
>>
>>> It appears that everything is running, but you simply might be using the
>>> wrong ports.
>>>
>>> In your code:
>>> client = new TransportClient()
>>> .addTransportAddress(new InetSocketTransportAddress(host,
>>> port));
>>>
>>>  What value are you using for the port? 9200 or 9300? Make sure it is
>>> 9300.
>>>
>>>
>>>
>>> On Tue, Mar 18, 2014 at 3:37 PM, Mohit Anchlia 
>>> wrote:
>>>
 I see, is there a way to run embedded ES that allows me to connect
 using TransportClient?

 On Tue, Mar 18, 2014 at 3:32 PM, Ivan Brusic  wrote:

> TransportClient uses port 9300 by default. Port 9200 is for HTTP/REST
> traffic.
>
> --
> Ivan
>
>
> On Tue, Mar 18, 2014 at 3:30 PM, Mohit Anchlia  > wrote:
>
>> I do see process listening on port 9200 and also the logs seems to
>> indicate that it's connected on that port
>>
>> On Tue, Mar 18, 2014 at 3:21 PM, Ivan Brusic  wrote:
>>
>>> Just a guess, but perhaps creating a local node client does not
>>> instantiate the internal Jetty server.
>>>
>>> --
>>> Ivan
>>>
>>>
>>> On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia <
>>> mohitanch...@gmail.com> wrote:
>>>
  I am trying to form a cluster between my test elasticsearch that
 runs in jvm with my main code that connects using transportclient but 
 it
 doesn't seem to work:

 My test class code:


node = nodeBuilder().local(true).node();
client = node.client();

 // This connects fine
 INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent]
 bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
 172.28.172.154:9200]}

 // Now the transport client code
   client = new TransportClient()
 .addTransportAddress(new InetSocketTransportAddress(host,
 port));


 // This fails to connect to the one I started in local(true) mode:
 INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to
 get node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
 disconnecting...
 org.elasticsearch.transport.ReceiveTimeoutTransportException:
 [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0]
 timed out after [5002ms]
  at
 org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
  at java.lang.Thread.run(Unknown Source)




 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com

Can aggregation return the documents in each bucket?

2014-03-18 Thread Erich Lin

It seems aggregation will return the count of documents for each bucket.

Can I also retrieve the documents within each bucket that matched that 
bucket criteria?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e5c1001-e3bb-4345-849e-3b77816411d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Terms facet not working with not_analyzed fields and dynamic template

Just FYI, the dynamic mapping applies only to fields that you have no 
explicit mappings defined. In your case, you predefined field1 and field2 
as type string so the dynamic mapping will ignore those 2 fields and will 
not be applied to them. That is probably the cause of your query behaviors.

If you remove field1 and field2 from your initial mapping, I think you 
should get what you are expecting.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/afdbdac5-7315-42ef-a2b9-c30a488408ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: slow indexing geo_shape

I'd experiment with the precision setting and relax it a bit to see how 
much indexing speed improves.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c031a14-95ad-411d-8511-b489a6706c57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: create panel to display latency time using two datetime fields

If you are referring to a Kibana panel, it sounds like you want something 
like the histogram panel, except you want a script to get the value instead 
of a field. The date histogram facet actually supports a value_script 
computation, so you should be able to make a copy of Kibana histogram 
panel, and slightly modify the call to the date histogram facet to pass in 
your script into the value_script field.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c8f6a13-d603-403b-a8de-f5557b75b97b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Exist filter not giving consistent results in version 0.90.6

Is it consistent if you specify preference=_primary in your search request?

GET localhost:9200/_search?preference=_primary

If yes, I'd check the logs to see if there was any failures in there 
related to indexing data.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39da6568-4972-4e05-89a4-6861569a3157%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Question about java api with multiple filters combination

2014-03-18 Thread anng



Hi,

 

I have 3 different filters OrFilterBuilder, RangeFilterBuilder, 
PrefixFilterBuilder. I'd like to combine those filter and have an "and" 
relation among them. FilterBuilders.andFilter(filter1, filter2, filter3) 
would work; however, filter1, filter2, filter3 could be null. Is there a 
better way of combining those rather than nested if else?

 

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d6a88417-0b44-4212-be5a-bf09d5390e03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Local JVM node

I am not seeing anything come up on port 9300.

this is the log that shows only port 9200


INFO org.elasticsearch.node [Thread-0]: [Agent Zero] version[1.0.0],
pid[11624], build[a46900e/2014-02-12T16:18:34Z]

INFO org.elasticsearch.node [Thread-0]: [Agent Zero] initializing ...

INFO org.elasticsearch.plugins [Thread-0]: [Agent Zero] loaded [], sites []

INFO org.elasticsearch.node [Thread-0]: [Agent Zero] initialized

INFO org.elasticsearch.node [Thread-0]: [Agent Zero] starting ...

INFO org.elasticsearch.transport [Thread-0]: [Agent Zero] bound_address
{local[1]}, publish_address {local[1]}

INFO org.elasticsearch.cluster.service [elasticsearch[Agent
Zero][clusterService#updateTask][T#1]]: [Agent Zero] new_master [Agent
Zero][CDptnrekR-yL5eH91y7Yiw][SDGL0770E674E03][local[1]]{local=true},
reason: local-disco-initial_connect(master)

INFO org.elasticsearch.discovery [Thread-0]: [Agent Zero]
elasticsearch/CDptnrekR-yL5eH91y7Yiw

INFO org.elasticsearch.http [Thread-0]: [Agent Zero] bound_address
{inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.28.172.154:9200]}

INFO org.elasticsearch.gateway [elasticsearch[Agent
Zero][clusterService#updateTask][T#1]]: [Agent Zero] recovered [1] indices
into cluster_state


On Tue, Mar 18, 2014 at 3:45 PM, Ivan Brusic  wrote:

> Also, you should see something along the lines of
>
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 192.168.52.162:9300]}
>
> Note the 9300
>
>
>
> On Tue, Mar 18, 2014 at 3:44 PM, Ivan Brusic  wrote:
>
>> It appears that everything is running, but you simply might be using the
>> wrong ports.
>>
>> In your code:
>> client = new TransportClient()
>> .addTransportAddress(new InetSocketTransportAddress(host, port));
>>
>>
>>  What value are you using for the port? 9200 or 9300? Make sure it is
>> 9300.
>>
>>
>>
>> On Tue, Mar 18, 2014 at 3:37 PM, Mohit Anchlia wrote:
>>
>>> I see, is there a way to run embedded ES that allows me to connect using
>>> TransportClient?
>>>
>>> On Tue, Mar 18, 2014 at 3:32 PM, Ivan Brusic  wrote:
>>>
 TransportClient uses port 9300 by default. Port 9200 is for HTTP/REST
 traffic.

 --
 Ivan


 On Tue, Mar 18, 2014 at 3:30 PM, Mohit Anchlia 
 wrote:

> I do see process listening on port 9200 and also the logs seems to
> indicate that it's connected on that port
>
> On Tue, Mar 18, 2014 at 3:21 PM, Ivan Brusic  wrote:
>
>> Just a guess, but perhaps creating a local node client does not
>> instantiate the internal Jetty server.
>>
>> --
>> Ivan
>>
>>
>> On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia <
>> mohitanch...@gmail.com> wrote:
>>
>>>  I am trying to form a cluster between my test elasticsearch that
>>> runs in jvm with my main code that connects using transportclient but it
>>> doesn't seem to work:
>>>
>>> My test class code:
>>>
>>>
>>>node = nodeBuilder().local(true).node();
>>>client = node.client();
>>>
>>> // This connects fine
>>> INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent]
>>> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
>>> 172.28.172.154:9200]}
>>>
>>> // Now the transport client code
>>>   client = new TransportClient()
>>> .addTransportAddress(new InetSocketTransportAddress(host,
>>> port));
>>>
>>>
>>> // This fails to connect to the one I started in local(true) mode:
>>> INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to
>>> get node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
>>> disconnecting...
>>> org.elasticsearch.transport.ReceiveTimeoutTransportException:
>>> [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0]
>>> timed out after [5002ms]
>>>  at
>>> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>>> Source)
>>>  at java.lang.Thread.run(Unknown Source)
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the G

Re: Local JVM node

Also, you should see something along the lines of

bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
192.168.52.162:9300]}

Note the 9300



On Tue, Mar 18, 2014 at 3:44 PM, Ivan Brusic  wrote:

> It appears that everything is running, but you simply might be using the
> wrong ports.
>
> In your code:
> client = new TransportClient()
> .addTransportAddress(new InetSocketTransportAddress(host, port));
>
> What value are you using for the port? 9200 or 9300? Make sure it is 9300.
>
>
>
> On Tue, Mar 18, 2014 at 3:37 PM, Mohit Anchlia wrote:
>
>> I see, is there a way to run embedded ES that allows me to connect using
>> TransportClient?
>>
>> On Tue, Mar 18, 2014 at 3:32 PM, Ivan Brusic  wrote:
>>
>>> TransportClient uses port 9300 by default. Port 9200 is for HTTP/REST
>>> traffic.
>>>
>>> --
>>> Ivan
>>>
>>>
>>> On Tue, Mar 18, 2014 at 3:30 PM, Mohit Anchlia 
>>> wrote:
>>>
 I do see process listening on port 9200 and also the logs seems to
 indicate that it's connected on that port

 On Tue, Mar 18, 2014 at 3:21 PM, Ivan Brusic  wrote:

> Just a guess, but perhaps creating a local node client does not
> instantiate the internal Jetty server.
>
> --
> Ivan
>
>
> On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia  > wrote:
>
>>  I am trying to form a cluster between my test elasticsearch that
>> runs in jvm with my main code that connects using transportclient but it
>> doesn't seem to work:
>>
>> My test class code:
>>
>>
>>node = nodeBuilder().local(true).node();
>>client = node.client();
>>
>> // This connects fine
>> INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent]
>> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
>> 172.28.172.154:9200]}
>>
>> // Now the transport client code
>>   client = new TransportClient()
>> .addTransportAddress(new InetSocketTransportAddress(host,
>> port));
>>
>>
>> // This fails to connect to the one I started in local(true) mode:
>> INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to
>> get node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
>> disconnecting...
>> org.elasticsearch.transport.ReceiveTimeoutTransportException:
>> [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0]
>> timed out after [5002ms]
>>  at
>> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>  at java.lang.Thread.run(Unknown Source)
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfGFKJ5tQfwO%2Br2%2B58kKea-LDLvtoRN6Uts9Auh4SeQg%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

  --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
  To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpGfA5JLYghAjOhWPOLdZDeM59x5WW2wS6YKCvDUYwScg%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving email

Re: Local JVM node

It appears that everything is running, but you simply might be using the
wrong ports.

In your code:
client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress(host, port));

What value are you using for the port? 9200 or 9300? Make sure it is 9300.



On Tue, Mar 18, 2014 at 3:37 PM, Mohit Anchlia wrote:

> I see, is there a way to run embedded ES that allows me to connect using
> TransportClient?
>
> On Tue, Mar 18, 2014 at 3:32 PM, Ivan Brusic  wrote:
>
>> TransportClient uses port 9300 by default. Port 9200 is for HTTP/REST
>> traffic.
>>
>> --
>> Ivan
>>
>>
>> On Tue, Mar 18, 2014 at 3:30 PM, Mohit Anchlia wrote:
>>
>>> I do see process listening on port 9200 and also the logs seems to
>>> indicate that it's connected on that port
>>>
>>> On Tue, Mar 18, 2014 at 3:21 PM, Ivan Brusic  wrote:
>>>
 Just a guess, but perhaps creating a local node client does not
 instantiate the internal Jetty server.

 --
 Ivan


 On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia 
 wrote:

>  I am trying to form a cluster between my test elasticsearch that
> runs in jvm with my main code that connects using transportclient but it
> doesn't seem to work:
>
> My test class code:
>
>
>node = nodeBuilder().local(true).node();
>client = node.client();
>
> // This connects fine
> INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent]
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 172.28.172.154:9200]}
>
> // Now the transport client code
>   client = new TransportClient()
> .addTransportAddress(new InetSocketTransportAddress(host,
> port));
>
>
> // This fails to connect to the one I started in local(true) mode:
> INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to
> get node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
> disconnecting...
> org.elasticsearch.transport.ReceiveTimeoutTransportException:
> [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0]
> timed out after [5002ms]
>  at
> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
>
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

  --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfGFKJ5tQfwO%2Br2%2B58kKea-LDLvtoRN6Uts9Auh4SeQg%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpGfA5JLYghAjOhWPOLdZDeM59x5WW2wS6YKCvDUYwScg%40mail.gmail.com
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJx2_mVh2fch6jAgGKb%3DrORiamfbHTtNUuAGzi2qZh8w%40mail.gmail.com

Re: Using Java API's prepareUpdate

2014-03-18 Thread InquiringMind

1. By default, it creates an index if the index does not exist. But no, it 
won't create an index if the index does not exist and the following is in 
your elsasticsearch.yml configuration:

action.auto_create_index: false

2. But did you mean: Does it create a *document* if the document does not 
exist? If so, then yes: The index action updates the document if it exists 
but creates the document if it does not exist.

The create action fails if the document already exists.

3. I deleted my first post because I realized that it didn't answer your 
question: I wasn't using prepareUpdate and actually have no experience with 
that particular API action. Sorry for any confusion I may have caused.

I hope this helps.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0819f28a-0417-4727-b7d1-90b5927439bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Local JVM node

I see, is there a way to run embedded ES that allows me to connect using
TransportClient?

On Tue, Mar 18, 2014 at 3:32 PM, Ivan Brusic  wrote:

> TransportClient uses port 9300 by default. Port 9200 is for HTTP/REST
> traffic.
>
> --
> Ivan
>
>
> On Tue, Mar 18, 2014 at 3:30 PM, Mohit Anchlia wrote:
>
>> I do see process listening on port 9200 and also the logs seems to
>> indicate that it's connected on that port
>>
>> On Tue, Mar 18, 2014 at 3:21 PM, Ivan Brusic  wrote:
>>
>>> Just a guess, but perhaps creating a local node client does not
>>> instantiate the internal Jetty server.
>>>
>>> --
>>> Ivan
>>>
>>>
>>> On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia 
>>> wrote:
>>>
  I am trying to form a cluster between my test elasticsearch that runs
 in jvm with my main code that connects using transportclient but it doesn't
 seem to work:

 My test class code:


node = nodeBuilder().local(true).node();
client = node.client();

 // This connects fine
 INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent]
 bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
 172.28.172.154:9200]}

 // Now the transport client code
   client = new TransportClient()
 .addTransportAddress(new InetSocketTransportAddress(host,
 port));


 // This fails to connect to the one I started in local(true) mode:
 INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to get
 node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
 disconnecting...
 org.elasticsearch.transport.ReceiveTimeoutTransportException:
 [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0]
 timed out after [5002ms]
  at
 org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)




 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfGFKJ5tQfwO%2Br2%2B58kKea-LDLvtoRN6Uts9Auh4SeQg%40mail.gmail.com
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpGfA5JLYghAjOhWPOLdZDeM59x5WW2wS6YKCvDUYwScg%40mail.gmail.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJx2_mVh2fch6jAgGKb%3DrORiamfbHTtNUuAGzi2qZh8w%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3

Re: Local JVM node

TransportClient uses port 9300 by default. Port 9200 is for HTTP/REST
traffic.

-- 
Ivan


On Tue, Mar 18, 2014 at 3:30 PM, Mohit Anchlia wrote:

> I do see process listening on port 9200 and also the logs seems to
> indicate that it's connected on that port
>
> On Tue, Mar 18, 2014 at 3:21 PM, Ivan Brusic  wrote:
>
>> Just a guess, but perhaps creating a local node client does not
>> instantiate the internal Jetty server.
>>
>> --
>> Ivan
>>
>>
>> On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia wrote:
>>
>>>  I am trying to form a cluster between my test elasticsearch that runs
>>> in jvm with my main code that connects using transportclient but it doesn't
>>> seem to work:
>>>
>>> My test class code:
>>>
>>>
>>>node = nodeBuilder().local(true).node();
>>>client = node.client();
>>>
>>> // This connects fine
>>> INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent] bound_address
>>> {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
>>> 172.28.172.154:9200]}
>>>
>>> // Now the transport client code
>>>   client = new TransportClient()
>>> .addTransportAddress(new InetSocketTransportAddress(host,
>>> port));
>>>
>>>
>>> // This fails to connect to the one I started in local(true) mode:
>>> INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to get
>>> node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
>>> disconnecting...
>>> org.elasticsearch.transport.ReceiveTimeoutTransportException:
>>> [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0] timed
>>> out after [5002ms]
>>>  at
>>> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>>  at java.lang.Thread.run(Unknown Source)
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfGFKJ5tQfwO%2Br2%2B58kKea-LDLvtoRN6Uts9Auh4SeQg%40mail.gmail.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpGfA5JLYghAjOhWPOLdZDeM59x5WW2wS6YKCvDUYwScg%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJx2_mVh2fch6jAgGKb%3DrORiamfbHTtNUuAGzi2qZh8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Local JVM node

I do see process listening on port 9200 and also the logs seems to indicate
that it's connected on that port

On Tue, Mar 18, 2014 at 3:21 PM, Ivan Brusic  wrote:

> Just a guess, but perhaps creating a local node client does not
> instantiate the internal Jetty server.
>
> --
> Ivan
>
>
> On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia wrote:
>
>> I am trying to form a cluster between my test elasticsearch that runs in
>> jvm with my main code that connects using transportclient but it doesn't
>> seem to work:
>>
>> My test class code:
>>
>>
>>node = nodeBuilder().local(true).node();
>>client = node.client();
>>
>> // This connects fine
>> INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent] bound_address
>> {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.28.172.154:9200
>> ]}
>>
>> // Now the transport client code
>>   client = new TransportClient()
>> .addTransportAddress(new InetSocketTransportAddress(host,
>> port));
>>
>>
>> // This fails to connect to the one I started in local(true) mode:
>> INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to get
>> node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
>> disconnecting...
>> org.elasticsearch.transport.ReceiveTimeoutTransportException:
>> [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0] timed
>> out after [5002ms]
>>  at
>> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>  at java.lang.Thread.run(Unknown Source)
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfGFKJ5tQfwO%2Br2%2B58kKea-LDLvtoRN6Uts9Auh4SeQg%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpGfA5JLYghAjOhWPOLdZDeM59x5WW2wS6YKCvDUYwScg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Local JVM node

Just a guess, but perhaps creating a local node client does not instantiate
the internal Jetty server.

-- 
Ivan


On Tue, Mar 18, 2014 at 2:55 PM, Mohit Anchlia wrote:

> I am trying to form a cluster between my test elasticsearch that runs in
> jvm with my main code that connects using transportclient but it doesn't
> seem to work:
>
> My test class code:
>
>
>node = nodeBuilder().local(true).node();
>client = node.client();
>
> // This connects fine
> INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent] bound_address
> {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.28.172.154:9200
> ]}
>
> // Now the transport client code
>   client = new TransportClient()
> .addTransportAddress(new InetSocketTransportAddress(host, port));
>
>
> // This fails to connect to the one I started in local(true) mode:
> INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to get
> node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
> disconnecting...
> org.elasticsearch.transport.ReceiveTimeoutTransportException:
> [][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0] timed
> out after [5002ms]
>  at
> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfGFKJ5tQfwO%2Br2%2B58kKea-LDLvtoRN6Uts9Auh4SeQg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Some questions on wikipedia river and cluster config

2014-03-18 Thread jorge canellas

Hi!

That was something that I was thinking since the indexing time shown in the
indexing statistic is much lower than the real time the process is running.
Each 5 seconds that the process is running, only a bit more than a second
is spent indexing.
I was thinking on parsing the wikipedia dumps manually and then use the
bulk api to index them.
Only one question: the indexing time shown in the stats of the cluster
takes into account the time used to choose the shard, send the document
through the network, analyze the document and write the terms? or only the
time spent writting the terms in the index? I am asking this because I am
interested on the indexing performance of Elasricsearch, and I could use
this time and forget the time spent by the river parsing the pages of the
wikipedia.
El 18/03/2014 20:32, "Ivan Brusic"  escribió:

> It all depends on where the bottleneck is. A river, by design, runs on
> only one node. Perhaps Elasticsearch is quickly indexing the content, but
> the Wikipedia ingestion is the slowdown. Is the Wikipedia indexer threaded
> or are all the bulks being executed on one thread? The BulkProcessor
> supports multithreading.
>
> Have you modified any of the default settings? In particular the merge
> settings and throttling. Elasticsearch has low defaults for throttling out
> of the box. If you have fast disks, you can easily raise the values. Are
> you searching on the index while it is being built? If you are just
> building the index, you can increase the number of segments and/or reduce
> the amount of merging done at once. But it all depends on where the
> bottleneck is. If you disk performance is fine, you might need to look
> elsewhere.
>
> Cheers,
>
> Ivan
>
>
> On Tue, Mar 18, 2014 at 10:52 AM, jorge canellas <
> jorge.canellas@gmail.com> wrote:
>
>> Hi!
>>
>> I am trying to index the wikipedia dumps (downloaded and uncompressed)
>> using the wikipedia river, but it takes about 6 hours in the cluster.
>> I have increased the number of nodes and primary shards from 5 to 8, but
>> the performance does not increase.
>>
>>- I have set the refresh time to -1, the number of replicas to 0,
>>increased the amount of memory from 256MB/1GB to 3GB/4GB, allowed the
>>mlockall, and set the buffer to 30% from 10%.
>>- I have changed the river settings increasing the bulk_size from 100
>>to 10k, refresh interval is set to 1s
>>- I have changed the mapping to only index title and text. And set
>>_source to false.
>>
>>
>> I do not know what else can I modify to increase the indexing ratio, at
>> this moment it is indexing about 120 docs/sec.
>>
>> Any ideas?
>>
>> Kind regards,
>>
>> Jorge
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/c9bd6437-3241-46a4-838e-5327067bc3cc%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/-DxbnAqNlZU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD_ZWNYmqxu-7qNuv%3D6%3Dfm9Yska_vCNReYa8VNG_i1p8Q%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEXx9%2BWdibgD8NLt1NUCZWSLo2eLxDxbSeJgOsJgi%3DuRmYOf_w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Local JVM node

I am trying to form a cluster between my test elasticsearch that runs in
jvm with my main code that connects using transportclient but it doesn't
seem to work:

My test class code:


   node = nodeBuilder().local(true).node();
   client = node.client();

// This connects fine
INFO  org.elasticsearch.http [Thread-0]: [Midgard Serpent] bound_address
{inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.28.172.154:9200]}

// Now the transport client code
  client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress(host, port));


// This fails to connect to the one I started in local(true) mode:
INFO  org.elasticsearch.client.transport [main]: [Oracle] failed to get
node info for [#transport#-1][SDGL0][inet[/172.28.172.154:9200]],
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[][inet[/172.28.172.154:9200]][cluster/nodes/info] request_id [0] timed out
after [5002ms]
 at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWoht0_gzj-GRkAO2xcvAqDfZ%3Djz-xCUPWvozLnKbfKbtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Field Queries in ElasticSearch 1.0.1

Field queries were removed in 1.0

https://github.com/elasticsearch/elasticsearch/issues/4033

-- 
Ivan


On Tue, Mar 18, 2014 at 7:01 AM,  wrote:

> Hello all,
>
> today I tried to change the version of our ElasticServer frim Version
> 0.90.7 to 1.0.1.
>
> Some Queries doesn't work anymore.
>
> For us important are FieldQueries, but it seems they are not included in
> 1.0.1 or change to other names.
>
> So my questions are:
>
> Are FieldQueries in 1.0.1 existing?
> When not, what other possibillities are given to make a Field Query?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/55cc55c8-5179-4784-827e-b24c84f7a591%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQACBVizoY%2BLJr%3D%2BgkjDn4rB9bW4nDKZVyVUNVEqRwZmDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using Java API's prepareUpdate

Thanks does it create a new index if it doesn't exist?

On Tue, Mar 18, 2014 at 12:35 PM, InquiringMind wrote:

> Here is my code for the INDEX action. Similar code (not shown) exists for
> the CREATE action. This is for ES 1.0.0 GA, though it hasn't changed since
> its 0.90.X versions:
>>
>>
> IndexResponse response = null;
> try
> {
>   IndexRequestBuilder irb = client.prepareIndex(index, type, id);
>   irb.setSource(json).setOpType(IndexRequest.OpType.INDEX);
>   irb.setRefresh(refresh);
>
>   /* Set the version number and version type (internal or external) */
>   if (rec.hasVersion())
>   {
> irb.setVersion(rec.getVersion());
> if (rec.isVersionExternal())
>   irb.setVersionType(VersionType.EXTERNAL);
>   }
>
>   /* Set the TTL (time to live) if there is one */
>   TimeValue ttl = rec.getTTL();
>   if (ttl != null)
> irb.setTTL(ttl.getMillis());
>
>   response = irb.execute().actionGet();
> }
> catch (Exception e)
> {
>   ...
> }
>
> Don't worry about rec. It's just a class that generically describes the
> documents and contains various attributes and field values, and can
> serialize itself to JSON.
>
> Hope this helps.
>
> Brian
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c26e586f-3df5-4c22-b239-42dc614bf9af%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWq6fcUJNs26X%2BovxsJOvB5kjjTcSOU_ehttj%2BEUWnf6Ug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Delete by query fails often with HTTP 503

2014-03-18 Thread Thomas S.

Thanks Clint,

We have two nodes with 60 shards per node. I will increase the queue size. 
Hopefully this will reduce the amount of rejections.

Thomas


On Tuesday, March 18, 2014 6:11:27 PM UTC+1, Clinton Gormley wrote:
>
> Do you have lots of shards on just a few nodes? Delete by query is handled 
> by the `index` thread pool, but those threads are shared across all shards 
> on a node.  Delete by query can produce a large number of changes, which 
> can fill up the thread pool queue and result in rejections.
>
> You can either just (a) retry or (b) increase the queue size for the 
> `index` thread pool (which will use more memory as more delete requests 
> will need to be queued)
>
> See 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#types
>
> clint
>
>
> On 18 March 2014 08:13, Thomas S. > wrote:
>
>> Hi,
>>
>> We often get failures when using the delete by query API. The response is 
>> an HTTP 503 with a body like this:
>>
>> {"_indices": {"myindex": {"_shards": {"successful": 2, "failed": 58, 
>> "total": 60
>>
>> Is there a way to figure out what is causing this error? It seems to 
>> mostly happen when the search cluster is busy.
>>
>> Thomas
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b815184a-8382-4b25-8a54-b98753f6cbb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: questions about aggregation min_doc_count = 0

2014-03-18 Thread Matt Weber

1.  The histogram aggregation (and facet) work on indexed values not based
on the current time or "now".  So, if the last indexed document timestamp
is 3/15/14T16:15 you will not get empty buckets between 3/15/14T16:15 and the
current time.  It would be interesting to be able to set the "to" and
"from" on histogram based aggregations to allow for generating buckets on
intervals between the defined range.

2.  I believe this is the way the keys are pulled from the fielddata which
is index level data.  So if you are using the "all" index you are going to
get data from all indices.  Not sure if this is a bug or not.  You can try
applying a filter aggregation:

POST _all/summary_phys/_search
{
  "aggs": {
"summary_phys_events": {
  "filter": {
"type": {"value": "summary_phys_events"}
  },
  "aggs": {
"events_by_date": {
  "date_histogram": {
"field": "@timestamp",
"interval": "300s",
"min_doc_count": 0
  },
  "aggs": {
"events_by_host": {
  "terms": {
"field": "host.raw",
"min_doc_count": 0
  },
  "aggs": {
"avg_used": {
  "avg": {
"field": "used"
  }
},
"max_used": {
  "max": {
"field": "used"
  }
}
  }
}
  }
}
  }
}
  }
}





On Tue, Mar 18, 2014 at 12:39 PM, John Stanford wrote:

> Hi,
>
> I'm trying to get a better understanding of aggregations, so here are a
> couple of questions that came up recently.
>
> Question 1:
>
> I have some time based data that I am using aggregations to chart.  The
> data may be sparsely populated, so I've been setting min_doc_count to 0 so
> I get empty buckets back anyway.  I've noticed that it will fill in empty
> buckets unless they are before or after the first record of the range.
>
> For example, if I use a query similar to the one below, and there are no
> records after 3/15/14T16:15, the last aggregation record will be for
> 3/15/14T16:15.  On the other hand, if there is a gap in between the start
> time and 3/15/14T16:15, I will get a bucket with a 0 doc count (as
> expected).
>
> POST _all/summary_phys/_search
>
> {
>"aggs": {
>   "events_by_date": {
>  "date_histogram": {
> "field": "@timestamp",
> "interval": "300s",
> "min_doc_count": 0
>  },
>  "aggs": {
> "events_by_host": {
>"terms": {
>   "field": "host.raw"
>},
>"aggs": {
>   "avg_used": {
>  "avg": {
> "field": "used"
>  }
>   },
>   "max_used": {
>  "max": {
> "field": "used"
>  }
>   }
>}
> }
>  }
>   }
>}
> }
>
> Not getting the 0 doc count buckets back at the front and back of the
> range seems contrary to the documented purpose of min_doc_count.  Am I
> doing something wrong?
>
> Question 2:
>
>
> If I add a min_doc_count = 0 to the inner aggregation, but limit the
> search to a specific doc type like:
>
>   doc type
>v
> POST _all/summary_phys/_search
> {
>"aggs": {
>   "events_by_date": {
>  "date_histogram": {
> "field": "@timestamp",
> "interval": "300s",
> "min_doc_count": 0
>  },
>  "aggs": {
> "events_by_host": {
>"terms": {
>   "field": "host.raw",
>   "min_doc_count": 0
>},
>"aggs": {
>   "avg_used": {
>  "avg": {
> "field": "used"
>  }
>   },
>   "max_used": {
>  "max": {
> "field": "used"
>  }
>   }
>}
> }
>  }
>   }
>}
> }
>
> I get buckets with entries matching hosts that do not show up in this doc
> type.  For example, I have only 3 values for host in this doc type
> [compute-4, compute-2, compute-3], but I will get buckets back with hosts
> from other doc types like:
>
> "events_by_host": {
>   "buckets": [
>  {
> "key": "compute-4",
> "doc_count": 11,
> "max_used": {
>"value": 4608
> },
> "avg_used": {
>"value": 3677.090909090909
> }
>  },
>

Re: EC2 discovery issues with 2 local instances

Opening a new thread. 
https://groups.google.com/forum/#!topic/elasticsearch/tJi7iJhU9ZU

On Tuesday, March 18, 2014 3:29:16 PM UTC-4, Bastien Chong wrote:
>
> The common denominator is that when I start a [*node.master: false]  
> [node.data: false] *node fist, the cluster is not created properly, no 
> matter how the discovery is configured, EC2 or unicast localhost.
>
>
> On Tuesday, March 18, 2014 10:41:39 AM UTC-4, Bastien Chong wrote:
>>
>> Hi,
>>
>> I have a server with 2 ES instances. The first one is the master one, 
>> used to store documents. The second one is just there to receive requests 
>> from Kibana (I call it ES read-only), it has :
>>
>> node.master: false
>>> node.data: false
>>>
>>
>> Both are configured with cloud-aws plugin, and the http/java port are 
>> left by default for automatic assignment.
>>
>> When master start first, it's binded to 9200/9300, then the second one to 
>> 9201/9300. When I do : *curl -XGET 
>> 'http://localhost:9200/_cluster/health?pretty=true 
>> '*  everything is 
>> working as expected and both nodes are in the same cluster.
>>
>> *But*, if I start the read-only instance first, discovery stop working. 
>> I have enabled DEBUG and TRACE but I didn't found what's the issue.
>>
>> I also tried to hardcode the port allocation :
>>
>> transport.tcp.port: 930(0/1)
>>> http.port: 920(0-1)
>>>
>>
>> And after that, It's actually worse, whatever the order I start the 
>> instances, EC2 discovery is broken. So it's sort of a race-condition that's 
>> happening.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce057522-32e5-429f-a069-ccee522f6d1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Starting a non elligible-master node first, breaks the cluster creation



I have ES 0.90.10.

On the same server, I have a master/data instance of ES and a second 
non-master one. ( *node.master: false* )

When the master instance is started first, and the non-master afterwards, 
both are in the same configured cluster.
Master is automatically binded to 9200/9300 and the non-master 9201/9301.

If I start the non-master first, it stops working. (ports are inverted) 

I tested with both the AWS EC2 plugin discovery and the unicast discovery.

That tells me, if the first node to start can't be master, even by starting 
a second instance that can with the same cluster.name, the cluster can't be 
created.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8bb6e4f1-2680-4baf-bc09-c6b53e4bc1e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Multiple parents

2014-03-18 Thread vineeth mohan

Hi ,

Is it possible for a feed to have multiple parents ?

Or is there any way to emulate this ?


Thanks
   Vineeth

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5n9DXjdmCWAkqV0zzpaK%2BEZZKH%2BhR0hJFMyMf57%2B1uF7w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

questions about aggregation min_doc_count = 0

2014-03-18 Thread John Stanford

Hi,

I'm trying to get a better understanding of aggregations, so here are a 
couple of questions that came up recently.

Question 1:

I have some time based data that I am using aggregations to chart.  The 
data may be sparsely populated, so I've been setting min_doc_count to 0 so 
I get empty buckets back anyway.  I've noticed that it will fill in empty 
buckets unless they are before or after the first record of the range.  

For example, if I use a query similar to the one below, and there are no 
records after 3/15/14T16:15, the last aggregation record will be for 
3/15/14T16:15.  On the other hand, if there is a gap in between the start 
time and 3/15/14T16:15, I will get a bucket with a 0 doc count (as 
expected).  

POST _all/summary_phys/_search

{
   "aggs": {
  "events_by_date": {
 "date_histogram": {
"field": "@timestamp",
"interval": "300s",
"min_doc_count": 0
 },
 "aggs": {
"events_by_host": {
   "terms": {
  "field": "host.raw"
   },
   "aggs": {
  "avg_used": {
 "avg": {
"field": "used"
 }
  },
  "max_used": {
 "max": {
"field": "used"
 }
  }
   }
}
 }
  }
   }
}

Not getting the 0 doc count buckets back at the front and back of the range 
seems contrary to the documented purpose of min_doc_count.  Am I doing 
something wrong?

Question 2:


If I add a min_doc_count = 0 to the inner aggregation, but limit the search 
to a specific doc type like:

  doc type
   v
POST _all/summary_phys/_search
{
   "aggs": {
  "events_by_date": {
 "date_histogram": {
"field": "@timestamp",
"interval": "300s",
"min_doc_count": 0
 },
 "aggs": {
"events_by_host": {
   "terms": {
  "field": "host.raw",
  "min_doc_count": 0
   },
   "aggs": {
  "avg_used": {
 "avg": {
"field": "used"
 }
  },
  "max_used": {
 "max": {
"field": "used"
 }
  }
   }
}
 }
  }
   }
}

I get buckets with entries matching hosts that do not show up in this doc 
type.  For example, I have only 3 values for host in this doc type 
[compute-4, compute-2, compute-3], but I will get buckets back with hosts 
from other doc types like:

"events_by_host": {
  "buckets": [
 {
"key": "compute-4",
"doc_count": 11,
"max_used": {
   "value": 4608
},
"avg_used": {
   "value": 3677.090909090909
}
 },
 {
"key": "compute-2",
"doc_count": 8,
"max_used": {
   "value": 4608
},
"avg_used": {
   "value": 2304
}
 },
 {
"key": "compute-3",
"doc_count": 2,
"max_used": {
   "value": 4608
},
"avg_used": {
   "value": 4608
}
 },
 {
"key": "10.10.11.22:49509",
"doc_count": 0,
"max_used": {
   "value": null
},
"avg_used": {
   "value": null
}
 },
 {
"key": "controller",
"doc_count": 0,
"max_used": {
   "value": null
},
"avg_used": {
   "value": null
}
 },
 {
"key": "object-1",
"doc_count": 0,
"max_used": {
   "value": null
},
"avg_used": {
   "value": null
}
 }
  ]
}

Is there a way to ensure that the inn

Re: Using Java API's prepareUpdate

2014-03-18 Thread InquiringMind

Here is my code for the INDEX action. Similar code (not shown) exists for 
the CREATE action. This is for ES 1.0.0 GA, though it hasn't changed since 
its 0.90.X versions:
>
>  
IndexResponse response = null;
try
{
  IndexRequestBuilder irb = client.prepareIndex(index, type, id);
  irb.setSource(json).setOpType(IndexRequest.OpType.INDEX);
  irb.setRefresh(refresh);

  /* Set the version number and version type (internal or external) */
  if (rec.hasVersion())
  {
irb.setVersion(rec.getVersion());
if (rec.isVersionExternal())
  irb.setVersionType(VersionType.EXTERNAL);
  }

  /* Set the TTL (time to live) if there is one */
  TimeValue ttl = rec.getTTL();
  if (ttl != null)
irb.setTTL(ttl.getMillis());

  response = irb.execute().actionGet();
}
catch (Exception e)
{
  ...
}

Don't worry about rec. It's just a class that generically describes the 
documents and contains various attributes and field values, and can 
serialize itself to JSON.

Hope this helps.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c26e586f-3df5-4c22-b239-42dc614bf9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 discovery issues with 2 local instances

The common denominator is that when I start a [*node.master: false]  
[node.data: false] *node fist, the cluster is not created properly, no 
matter how the discovery is configured, EC2 or unicast localhost.


On Tuesday, March 18, 2014 10:41:39 AM UTC-4, Bastien Chong wrote:
>
> Hi,
>
> I have a server with 2 ES instances. The first one is the master one, used 
> to store documents. The second one is just there to receive requests from 
> Kibana (I call it ES read-only), it has :
>
> node.master: false
>> node.data: false
>>
>
> Both are configured with cloud-aws plugin, and the http/java port are left 
> by default for automatic assignment.
>
> When master start first, it's binded to 9200/9300, then the second one to 
> 9201/9300. When I do : *curl -XGET 
> 'http://localhost:9200/_cluster/health?pretty=true 
> '*  everything is 
> working as expected and both nodes are in the same cluster.
>
> *But*, if I start the read-only instance first, discovery stop working. I 
> have enabled DEBUG and TRACE but I didn't found what's the issue.
>
> I also tried to hardcode the port allocation :
>
> transport.tcp.port: 930(0/1)
>> http.port: 920(0-1)
>>
>
> And after that, It's actually worse, whatever the order I start the 
> instances, EC2 discovery is broken. So it's sort of a race-condition that's 
> happening.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d18b9838-188a-4efe-bb0f-dccba07fbd79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 discovery issues with 2 local instances

I also tried to comment the EC2 config on the master and enable unicast 
discovery.

Same issue !

On Tuesday, March 18, 2014 10:41:39 AM UTC-4, Bastien Chong wrote:
>
> Hi,
>
> I have a server with 2 ES instances. The first one is the master one, used 
> to store documents. The second one is just there to receive requests from 
> Kibana (I call it ES read-only), it has :
>
> node.master: false
>> node.data: false
>>
>
> Both are configured with cloud-aws plugin, and the http/java port are left 
> by default for automatic assignment.
>
> When master start first, it's binded to 9200/9300, then the second one to 
> 9201/9300. When I do : *curl -XGET 
> 'http://localhost:9200/_cluster/health?pretty=true 
> '*  everything is 
> working as expected and both nodes are in the same cluster.
>
> *But*, if I start the read-only instance first, discovery stop working. I 
> have enabled DEBUG and TRACE but I didn't found what's the issue.
>
> I also tried to hardcode the port allocation :
>
> transport.tcp.port: 930(0/1)
>> http.port: 920(0-1)
>>
>
> And after that, It's actually worse, whatever the order I start the 
> instances, EC2 discovery is broken. So it's sort of a race-condition that's 
> happening.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7778b165-f596-4638-82c2-523ee13c9857%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: need help to update document fields in python

2014-03-18 Thread Honza Král

The document you will put in will be merged with the document in
elasticsearch.

Honza
On Mar 18, 2014 6:06 PM, "PAULINE BONNEAU" 
wrote:

> I'm sorry but I have just one more problem. I don't know what I must put
> in the *document* parameter in the update function :
>
> update(*index*, *doc_type*, *id*, *script=None*, *lang='mvel'*,
> *params=None*, *document=None*, *upsert=None*, *model=None*, *bulk=False*,
> *querystring_args=None*, *retry_on_conflict=None*, *routing=None*,
> *doc_as_upsert=None*)
>
>
> I think this parameter permits me to update some fields of this document
> but I don't know how to do that.
> Thanks
>
> Paulyne
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/dbe1e0f0-284e-4f95-912b-9ee30dd42105%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDioy9bzb--TmbGK1sMtV37uL9iWhc177LPjeJfEEfH5R1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 discovery issues with 2 local instances

I tried without luck.
I changed :

discovery.zen.ping.multicast.enabled: false
> discovery.zen.ping.unicast.hosts: ["localhost"]
>
and commented the plugin part.

So by starting the 'read-only' instance first, it get 9200/9300. 
Then I start the master instance (9201/9301)

In the Read-only log : 

> sending to [#zen_unicast_1#][inet[localhost/127.0.0.1:9300]]
>

It's not finding that I have an instance on port 9301.

On Tuesday, March 18, 2014 2:07:38 PM UTC-4, David Pilato wrote:
>
> May be I was unclear.
>
> I know you need ec2 discovery.
>
> but you have two nodes per machine.
>
> Set the first one to use ec2 discovery and the second one to use regular 
> discovery.
>
> And could you open an issue in elasticsearch-cloud-aws plugin?
>
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | 
> @elasticsearchfr
>
>
> Le 18 mars 2014 à 19:03:04, Bastien Chong (basti...@gmail.com) 
> a écrit:
>
> What I didn't mention, is that this node won't be alone, there will be 
> about 6 other identical node, so I need the EC2 discovery feature.
>
> On Tuesday, March 18, 2014 1:33:16 PM UTC-4, David Pilato wrote: 
>>
>>  For the second node, remove 
>>  
>>  aws:
>>> access_key: xx
>>> secret_key: x
>>> region: us-west-2
>>> discovery:
>>> type: ec2
>>> ec2:
>>> tag:
>>> elasticsearch: true
>>>
>>
>>>   It should work I think.
>>  Or set unicast to localhost and disable multicast
>>  
>>
>>  -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>  @dadoonet  | 
>> @elasticsearchfr
>>  
>>
>> Le 18 mars 2014 à 16:25:52, Bastien Chong (basti...@gmail.com) a écrit:
>>
>>  I'm not sure what I can change since I have the bare minimum config :
>>
>> The only difference between the 2 elasticsearch.yml config file are:
>>
>> node.master: false
>>> node.data: false
>>> path.conf: /etc/elasticsearchro
>>> path.data: /var/lib/elasticsearchro
>>>
>>
>> Both config share this part:
>>
>>  cluster.name: ESCluster
>>> cloud:
>>> aws:
>>> access_key: xx
>>> secret_key: x
>>> region: us-west-2
>>> discovery:
>>> type: ec2
>>> ec2:
>>> tag:
>>> elasticsearch: true
>>>
>>
>>
>>
>> On Tuesday, March 18, 2014 11:00:15 AM UTC-4, David Pilato wrote: 
>>>
>>>  I think I understand what is happening here.
>>>
>>> Wondering if giving another elasticsearch.yml file as à configuration 
>>> for the second node with all defaults (except cluster name) could help.
>>>
>>> --
>>> David ;-) 
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>  
>>> Le 18 mars 2014 à 15:41, Bastien Chong  a écrit :
>>>
>>>  Hi,
>>>
>>> I have a server with 2 ES instances. The first one is the master one, 
>>> used to store documents. The second one is just there to receive requests 
>>> from Kibana (I call it ES read-only), it has :
>>>
>>> node.master: false
 node.data: false

>>>
>>> Both are configured with cloud-aws plugin, and the http/java port are 
>>> left by default for automatic assignment.
>>>
>>> When master start first, it's binded to 9200/9300, then the second one 
>>> to 9201/9300. When I do : *curl -XGET 
>>> 'http://localhost:9200/_cluster/health?pretty=true' 
>>> *  everything is 
>>> working as expected and both nodes are in the same cluster.
>>>
>>> *But*, if I start the read-only instance first, discovery stop working. 
>>> I have enabled DEBUG and TRACE but I didn't found what's the issue.
>>>
>>> I also tried to hardcode the port allocation :
>>>
>>> transport.tcp.port: 930(0/1)
 http.port: 920(0-1)

>>>
>>> And after that, It's actually worse, whatever the order I start the 
>>> instances, EC2 discovery is broken. So it's sort of a race-condition that's 
>>> happening.
>>> --
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/8851834f-cefd-4732-8c34-f3410cec0c99%40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>  
>>>   --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To

MoreLikeThis ignores queries?

2014-03-18 Thread Alexey Bagryancev

Hi,

I am trying to filter moreLikeThis results by adding additional query - but 
it seems to ignore it at all.

I tried to run my ignoreQuery separately and it works fine, but how to make 
it work with moreLikeThis? Please help me.

$ignoreQuery = $this->IgnoreCategoryQuery('movies')



$this->resultsSet = $this->index->moreLikeThis(
   new \Elastica\Document($id), 
   array_merge($this->mlt_fields, array('search_size' => $this->size
, 'search_from' => $this->from)), 
   $ignoreQuery);



My IgnoreCategory function:

public function IgnoreCategoryQuery($category = 'main') 
{ 
 $categoriesTermQuery = new \Elastica\Query\Term();
 $categoriesTermQuery->setTerm('categories', $category);
 
 $categoriesBoolQuery = new \Elastica\Query\Bool(); 
 $categoriesBoolQuery->addMustNot($categoriesTermQuery);
 
 return $categoriesBoolQuery;
}


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1480e498-f21c-4bc2-b43c-3d068c73ba55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

MoreLikeThis ignores queries?

2014-03-18 Thread Alexey Bagryancev

Hi,

I am trying to filter moreLikeThis results by adding additional query - but 
it seems to ignore it at all.

I tried to run my ignoreQuery separately and it works fine, but how to make 
it work with moreLikeThis? Please help me.

$ignoreQuery = $this->IgnoreCategoryQuery('movies')

$this->resultsSet = $this->index->moreLikeThis(new \Elastica\Document($id), 
array_merge($this->mlt_fields, array('search_size' => $this->size, 
'search_from' => $this->from, 'min_doc_freq' => 2, 'min_term_freq' => 1, 
'min_word_length' => 3)), $ignoreQuery);


My IgnoreCategory function:

public function IgnoreCategoryQuery($category = 'main') 
{ 
$categoriesTermQuery = new \Elastica\Query\Term();
$categoriesTermQuery->setTerm('categories', $category);
 $categoriesBoolQuery = new \Elastica\Query\Bool(); 
$categoriesBoolQuery->addMustNot($categoriesTermQuery);
 return $categoriesBoolQuery;
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5779463-9dbd-4e76-bd84-9d8c5d98c930%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Some questions on wikipedia river and cluster config

It all depends on where the bottleneck is. A river, by design, runs on only
one node. Perhaps Elasticsearch is quickly indexing the content, but the
Wikipedia ingestion is the slowdown. Is the Wikipedia indexer threaded or
are all the bulks being executed on one thread? The BulkProcessor supports
multithreading.

Have you modified any of the default settings? In particular the merge
settings and throttling. Elasticsearch has low defaults for throttling out
of the box. If you have fast disks, you can easily raise the values. Are
you searching on the index while it is being built? If you are just
building the index, you can increase the number of segments and/or reduce
the amount of merging done at once. But it all depends on where the
bottleneck is. If you disk performance is fine, you might need to look
elsewhere.

Cheers,

Ivan

On Tue, Mar 18, 2014 at 10:52 AM, jorge canellas <
jorge.canellas@gmail.com> wrote:

> Hi!
>
> I am trying to index the wikipedia dumps (downloaded and uncompressed)
> using the wikipedia river, but it takes about 6 hours in the cluster.
> I have increased the number of nodes and primary shards from 5 to 8, but
> the performance does not increase.
>
>- I have set the refresh time to -1, the number of replicas to 0,
>increased the amount of memory from 256MB/1GB to 3GB/4GB, allowed the
>mlockall, and set the buffer to 30% from 10%.
>- I have changed the river settings increasing the bulk_size from 100
>to 10k, refresh interval is set to 1s
>- I have changed the mapping to only index title and text. And set
>_source to false.
>
>
> I do not know what else can I modify to increase the indexing ratio, at
> this moment it is indexing about 120 docs/sec.
>
> Any ideas?
>
> Kind regards,
>
> Jorge
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c9bd6437-3241-46a4-838e-5327067bc3cc%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD_ZWNYmqxu-7qNuv%3D6%3Dfm9Yska_vCNReYa8VNG_i1p8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: OpenSearch for elasticsearch.org docs?

That is not how OpenSearch works. With OpenSearch, you don't need to type
?s= As Lukas explained, if you visited the site before, you should be
presented with a search interface after pressing TAB after entered the
first few (unique) characters of a domain. Try it out with google.com or
cnn.com

Great feature. I wish more sites supported it. My own company doesn't, but
that is because I have nothing to do with the front end! It would be done
immediately if I had access.

-- 
Ivan


On Tue, Mar 18, 2014 at 10:52 AM, David Pilato  wrote:

> Hi Lukas,
>
> Actually it's working as you described in google chrome. I don't know for
> other browsers.
> It redirects to http://www.elasticsearch.org/?s=bulk
>
> (I searched for "bulk" here)
>
> --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | 
> @elasticsearchfr
>
>
> Le 18 mars 2014 à 14:41:01, Lukáš Vlček (lukas.vl...@gmail.com) a écrit:
>
> Hi,
>
> is there any plan to provide OpenSearch API for Elasticsearch docs soon?
>
> It would be really nice if I could search the docs directly from browser
> URL. Many browsers support that and it is quite fast shortcut to docs: once
> you start typing into URL field "elast" -> it will give you a hint that you
> can hit TAB to search in elasticsearch.org -> if you hit the TAB it will
> allow you to directly input search term and send it into search API of
> docs, also search suggestions are fully supported in many browsers.
>
> Just wanted to ping the ML before opening ticket for this.
>
> Regards,
> Lukas
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYCZE4Rkyks6hpentrDZ1c7xyawFCTgZniN0hOOcKZPrA%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/etPan.532887f0.7724c67e.97ca%40MacBook-Air-de-David.local
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB0uSTrYP0qMKKctYF%2BWfUEDkzcSu6uPRACAOJz7owEsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 discovery issues with 2 local instances

May be I was unclear.

I know you need ec2 discovery.

but you have two nodes per machine.

Set the first one to use ec2 discovery and the second one to use regular
discovery.

And could you open an issue in elasticsearch-cloud-aws plugin?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 18 mars 2014 à 19:03:04, Bastien Chong (bastien...@gmail.com) a écrit:

What I didn't mention, is that this node won't be alone, there will be about 6
other identical node, so I need the EC2 discovery feature.

On Tuesday, March 18, 2014 1:33:16 PM UTC-4, David Pilato wrote:
For the second node, remove
aws:
access_key: xx
secret_key: x
region: us-west-2
discovery:
type: ec2
ec2:
tag:
elasticsearch: true

It should work I think.
Or set unicast to localhost and disable multicast

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 18 mars 2014 à 16:25:52, Bastien Chong (basti...@gmail.com) a écrit:

I'm not sure what I can change since I have the bare minimum config :

The only difference between the 2 elasticsearch.yml config file are:

node.master: false
node.data: false
path.conf: /etc/elasticsearchro
path.data: /var/lib/elasticsearchro

Both config share this part:

cluster.name: ESCluster
cloud:
aws:
access_key: xx
secret_key: x
region: us-west-2
discovery:
type: ec2
ec2:
tag:
elasticsearch: true

On Tuesday, March 18, 2014 11:00:15 AM UTC-4, David Pilato wrote:
I think I understand what is happening here.

Wondering if giving another elasticsearch.yml file as à configuration for the
second node with all defaults (except cluster name) could help.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 mars 2014 à 15:41, Bastien Chong a écrit :

Hi,

I have a server with 2 ES instances. The first one is the master one, used to
store documents. The second one is just there to receive requests from Kibana
(I call it ES read-only), it has :

node.master: false
node.data: false

Both are configured with cloud-aws plugin, and the http/java port are left by
default for automatic assignment.

When master start first, it's binded to 9200/9300, then the second one to
9201/9300. When I do : curl -XGET
'http://localhost:9200/_cluster/health?pretty=true' everything is working as
expected and both nodes are in the same cluster.

But, if I start the read-only instance first, discovery stop working. I have
enabled DEBUG and TRACE but I didn't found what's the issue.

I also tried to hardcode the port allocation :

transport.tcp.port: 930(0/1)
http.port: 920(0-1)

And after that, It's actually worse, whatever the order I start the instances,
EC2 discovery is broken. So it's sort of a race-condition that's happening.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8851834f-cefd-4732-8c34-f3410cec0c99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f31f79ec-ad39-440f-b2cd-36b642e18d48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/bd4fb871-4113-418d-974c-6a7dd1f26975%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53288b6b.153ea438.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 discovery issues with 2 local instances

What I didn't mention, is that this node won't be alone, there will be 
about 6 other identical node, so I need the EC2 discovery feature.

On Tuesday, March 18, 2014 1:33:16 PM UTC-4, David Pilato wrote:
>
> For the second node, remove 
>
> aws:
>> access_key: xx
>> secret_key: x
>> region: us-west-2
>> discovery:
>> type: ec2
>> ec2:
>> tag:
>> elasticsearch: true
>>
>
>> It should work I think.
> Or set unicast to localhost and disable multicast
>
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | 
> @elasticsearchfr
>
>
> Le 18 mars 2014 à 16:25:52, Bastien Chong (basti...@gmail.com) 
> a écrit:
>
> I'm not sure what I can change since I have the bare minimum config :
>
> The only difference between the 2 elasticsearch.yml config file are:
>
> node.master: false
>> node.data: false
>> path.conf: /etc/elasticsearchro
>> path.data: /var/lib/elasticsearchro
>>
>
> Both config share this part:
>
> cluster.name: ESCluster
>> cloud:
>> aws:
>> access_key: xx
>> secret_key: x
>> region: us-west-2
>> discovery:
>> type: ec2
>> ec2:
>> tag:
>> elasticsearch: true
>>
>
>
>
> On Tuesday, March 18, 2014 11:00:15 AM UTC-4, David Pilato wrote: 
>>
>>  I think I understand what is happening here.
>>
>> Wondering if giving another elasticsearch.yml file as à configuration for 
>> the second node with all defaults (except cluster name) could help.
>>
>> --
>> David ;-) 
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>  
>> Le 18 mars 2014 à 15:41, Bastien Chong  a écrit :
>>
>>  Hi,
>>
>> I have a server with 2 ES instances. The first one is the master one, 
>> used to store documents. The second one is just there to receive requests 
>> from Kibana (I call it ES read-only), it has :
>>
>> node.master: false
>>> node.data: false
>>>
>>
>> Both are configured with cloud-aws plugin, and the http/java port are 
>> left by default for automatic assignment.
>>
>> When master start first, it's binded to 9200/9300, then the second one to 
>> 9201/9300. When I do : *curl -XGET 
>> 'http://localhost:9200/_cluster/health?pretty=true' 
>> *  everything is 
>> working as expected and both nodes are in the same cluster.
>>
>> *But*, if I start the read-only instance first, discovery stop working. 
>> I have enabled DEBUG and TRACE but I didn't found what's the issue.
>>
>> I also tried to hardcode the port allocation :
>>
>> transport.tcp.port: 930(0/1)
>>> http.port: 920(0-1)
>>>
>>
>> And after that, It's actually worse, whatever the order I start the 
>> instances, EC2 discovery is broken. So it's sort of a race-condition that's 
>> happening.
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8851834f-cefd-4732-8c34-f3410cec0c99%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>  
>>   --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/f31f79ec-ad39-440f-b2cd-36b642e18d48%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bd4fb871-4113-418d-974c-6a7dd1f26975%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: OpenSearch for elasticsearch.org docs?

Hi Lukas,

Actually it's working as you described in google chrome. I don't know for other
browsers.
It redirects to http://www.elasticsearch.org/?s=bulk

(I searched for "bulk" here)

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 18 mars 2014 à 14:41:01, Lukáš Vlček (lukas.vl...@gmail.com) a écrit:

Hi,

is there any plan to provide OpenSearch API for Elasticsearch docs soon?

It would be really nice if I could search the docs directly from browser URL.
Many browsers support that and it is quite fast shortcut to docs: once you
start typing into URL field "elast" -> it will give you a hint that you can hit
TAB to search in elasticsearch.org -> if you hit the TAB it will allow you to
directly input search term and send it into search API of docs, also search
suggestions are fully supported in many browsers.

Just wanted to ping the ML before opening ticket for this.

Regards,
Lukas
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYCZE4Rkyks6hpentrDZ1c7xyawFCTgZniN0hOOcKZPrA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.532887f0.7724c67e.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Some questions on wikipedia river and cluster config

2014-03-18 Thread jorge canellas

Hi!

I am trying to index the wikipedia dumps (downloaded and uncompressed) 
using the wikipedia river, but it takes about 6 hours in the cluster.
I have increased the number of nodes and primary shards from 5 to 8, but 
the performance does not increase.

   - I have set the refresh time to -1, the number of replicas to 0, 
   increased the amount of memory from 256MB/1GB to 3GB/4GB, allowed the 
   mlockall, and set the buffer to 30% from 10%.
   - I have changed the river settings increasing the bulk_size from 100 to 
   10k, refresh interval is set to 1s
   - I have changed the mapping to only index title and text. And set 
   _source to false.


I do not know what else can I modify to increase the indexing ratio, at 
this moment it is indexing about 120 docs/sec.

Any ideas?

Kind regards,

Jorge

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c9bd6437-3241-46a4-838e-5327067bc3cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Using Java API's prepareUpdate

I currently use prepareIndex to index a document like this:


idxResponse =

client.prepareIndex(indexName, indexType, id)

.setSource(json)

.execute()

.actionGet();



I am trying to use setSource(json) with prepareUpdate but that doesn't seem
to work. Is there some other alternative that I can use?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWpZbF2jQwfKZmb5_BN%2BvFSd7c0aRhzEPSDQPH80E_iK9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Terms facet not working with not_analyzed fields and dynamic template

2014-03-18 Thread Mahesh Kommareddi

Hi all.

I've been trying to get a terms facet working along side not_analyzed 
fields an I'm at a bit of a loss about how to get some of this worked out.

First, I'm using a dynamic template, that allows for multi_field and where 
the fields have an analyzed section and an not_analyzed section.

You can see and try the mapping here:
https://gist.github.com/mkommar/9624564

I put in some test data:
https://gist.github.com/mkommar/9624626

Now I try searching against the analyzed field (field1) and it works as I 
would think:
curl -XGET localhost:9200/mtest/dynamics/_search?q=field1:Words

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},
"hits":{"total":1,"max_score":0.19178301,"hits":[{"_index":"mtest","_type":
"dynamics","_id":"1","_score":0.19178301, "_source" : {"field1" : "Words 
are nice", "field2": "We are here"}}]}}


I try searching field1 again with a different term that appears twice and 
it works as I would expect
curl -XGET localhost:9200/mtest/dynamics/_search?q=field1:Word

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},
"hits":{"total":2,"max_score":0.30685282,"hits":[{"_index":"mtest","_type":
"dynamics","_id":"3","_score":0.30685282, "_source" : {"field1" : "Word", 
"field2": "not a match"}},"_index":"mtest","_type":"dynamics","_id":"4",
"_score":0.30685282, "_source" : {"field1" : "Word", "field2": "match"}}]}}


If I search for the single word with the not_analyzed field, I get no 
matches. Something I don't expect. No hits:
curl -XGET localhost:9200/mtest/dynamics/_search?q=field1.org:Word

{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},
"hits":{"total":0,"max_score":null,"hits":[]}}

However, if I search on whole phrases, I get a hit:
curl -XGET localhost:9200/mtest/dynamics/_search?q=field1.org:Words%20are%
20nice

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},
"hits":{"total":1,"max_score":0.02250402,"hits":[{"_index":"mtest","_type":
"dynamics","_id":"1","_score":0.02250402, "_source" : {"field1" : "Words 
are nice", "field2": "We are here"}}]}}


So Question 1 is: How do I get hits on single words for not_analyzed fields?

Question 2 is about the terms facet relating to this mapping / not_analyed 
setting.

If I do a match_all against all the items and run a terms facet, I get the 
tokenized results I expect:

curl -XGET localhost:9200/mtest/dynamics/_search -d '{
  "query" : {
"match_all": {}
},
  "facets" : {
"topFieldElements" : {
  "terms" : {
"field" : "field1",
"size" : 10
  }
}
  }
}'


... "facets":{"topFieldElements":{"_type":"terms","missing":0,"total":5,
"other":0,"terms":[{"term":"word","count":2},{"term":"words","count":1},{
"term":"sentence","count":1},{"term":"nice","count":1}]}}}


However, running the terms facet on not_analyzed fields results in no 
information (missing: 4):
curl -XGET localhost:9200/mtest/dynamics/_search -d '{
  "query" : {
"match_all": {}
},
  "facets" : {
"topFieldElements" : {
  "terms" : {
"field" : "field1.org",
"size" : 10
  }
}
  }
}'

{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},
"hits":{"total":4,"max_score":1.0,"hits":[{"_index":"mtest","_type":
"dynamics","_id":"3","_score":1.0, "_source" : {"field1" : "Word", "field2": 
"not 
a match"}},{"_index":"mtest","_type":"dynamics","_id":"2","_score":1.0, 
"_source" : {"field1" : "this is a sentence", "field2": "We are here"}},{
"_index":"mtest","_type":"dynamics","_id":"4","_score":1.0, "_source" : {
"field1" : "Word", "field2": "match"}},{"_index":"mtest","_type":"dynamics",
"_id":"1","_score":1.0, "_source" : {"field1" : "Words are nice", "field2": "We 
are here"}}]},"facets":{"topFieldElements":{"_type":"terms","missing":4,
"total":0,"other":0,"terms":[]}}}

This appears to be the case in any amount of data (?). How can I get terms 
data in not_analyzed fields?

Thanks!


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf270a25-60d8-4f03-b2ed-d2dc568748b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 discovery issues with 2 local instances

For the second node, remove
aws:
access_key: xx
secret_key: x
region: us-west-2
discovery:
type: ec2
ec2:
tag:
elasticsearch: true

It should work I think.
Or set unicast to localhost and disable multicast

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 18 mars 2014 à 16:25:52, Bastien Chong (bastien...@gmail.com) a écrit:

I'm not sure what I can change since I have the bare minimum config :

The only difference between the 2 elasticsearch.yml config file are:

node.master: false
node.data: false
path.conf: /etc/elasticsearchro
path.data: /var/lib/elasticsearchro

Both config share this part:

cluster.name: ESCluster
cloud:
aws:
access_key: xx
secret_key: x
region: us-west-2
discovery:
type: ec2
ec2:
tag:
elasticsearch: true

On Tuesday, March 18, 2014 11:00:15 AM UTC-4, David Pilato wrote:
I think I understand what is happening here.

Wondering if giving another elasticsearch.yml file as à configuration for the
second node with all defaults (except cluster name) could help.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 mars 2014 à 15:41, Bastien Chong a écrit :

Hi,

I have a server with 2 ES instances. The first one is the master one, used to
store documents. The second one is just there to receive requests from Kibana
(I call it ES read-only), it has :

node.master: false
node.data: false

Both are configured with cloud-aws plugin, and the http/java port are left by
default for automatic assignment.

But, if I start the read-only instance first, discovery stop working. I have
enabled DEBUG and TRACE but I didn't found what's the issue.

I also tried to hardcode the port allocation :

transport.tcp.port: 930(0/1)
http.port: 920(0-1)

And after that, It's actually worse, whatever the order I start the instances,
EC2 discovery is broken. So it's sort of a race-condition that's happening.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8851834f-cefd-4732-8c34-f3410cec0c99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f31f79ec-ad39-440f-b2cd-36b642e18d48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.5328835c.737b8ddc.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: searching pdf files by content with Mongodb-river

Unsure but I think I already answered to that question. Was that on stack 
overflow?
Could you describe what is wrong here with the result?


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 18 mars 2014 à 18:05:47, sAs59 (mr.akmu...@gmail.com) a écrit:

Hi,  
I am first at elasticsearch and I want to search pdf files by content, but  
the resulting can't read properly the content of pdf file. It looks as  
following:  

http://localhost:9200/mongoindex/_search?pretty=true  

{  
"took" : 10,  
"timed_out" : false,  
"_shards" : {  
"total" : 5,  
"successful" : 5,  
"failed" : 0  
},  
"hits" : {  
"total" : 1,  
"max_score" : 1.0,  
"hits" : [ {  
"_index" : "mongoindex",  
"_type" : "files",  
"_id" : "532595b8f37d5cc2d64a517d",  
"_score" : 1.0,  
"_source" : {"content":{"content_type":"application/pdf",  
"title":"D:/sample.pdf",  
"content":"JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0
  


"filename":"D:/sample.pdf","contentType":"application/pdf","md5":"afe70f97bce7876e39aa43f71dc7266f","length":82441,"chunkSize":262144,"uploadDate":"2014-03-16T12:14:48.542Z","metadata":{}}
  
} ]  
}  
}  

Could someone please help me on it? Thank you!  

Here is the link I used: http://v.bartko.info/?p=463  



--  
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989.html
  
Sent from the ElasticSearch Users mailing list archive at Nabble.com.  

--  
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.  
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.  
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1395043287994-4051989.post%40n3.nabble.com.
  
For more options, visit https://groups.google.com/d/optout.  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.532880f3.2ca88611.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: Mechanism of internal search with multiple indices

2014-03-18 Thread Clinton Gormley

If the field doesn't exist in the mapping, then the index is not searched.

clint


On 18 March 2014 09:56, golchhamohit  wrote:

> Thanks for explaining clearly how the query modifies itself when a query is
> given with multiple indices and types.
>
> My another doubt is that which I represent here. I have multiple indices
> (assume 5 - i1,i2,i3,i4,i5), multiple types(5 in each index, so total 25
> types - t1,t2...,t25) , some field (called "field_foo" which is present in
> all documents of all indices) and there are 100 documents in each
> index(d1,d2,..,d100). So total there are 5*100 = 500 documents totally.
>
> Now, my search query contains 2 indices(i2,i4) and
> types(t2,t5,t7,t13,t17,t23) and to search for a particular list of values
> ["value_field_foo1",value_field_foo2",value_field_foo3"] in field
> "field_foo" (where actually value_field_foo1 is present in index2,
> value_field_foo2 is present in index2 and index4, assume this, but in real
> time we would not be aware which value is present in which field). My
> question is will all the values(value_field_foo1, value_field_foo2,etc)  be
> checked its presence in the documents of all indexes , or is there some
> mechanism by which it determines that value is not present in that index
> and
> hence it avoids the  search in that index thereby saving time. ?
>
> Sorry for being verbose.
>
> Thanks in advance.
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/Mechanism-of-internal-search-with-multiple-indices-tp4051988p4052080.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1395132968292-4052080.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKQV2d6EXmW7ypSzSN-yaUFyrjYG%2BZ9%3Dhc5pik_9Nm4Riw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Delete by query fails often with HTTP 503

2014-03-18 Thread Clinton Gormley

Do you have lots of shards on just a few nodes? Delete by query is handled
by the `index` thread pool, but those threads are shared across all shards
on a node.  Delete by query can produce a large number of changes, which
can fill up the thread pool queue and result in rejections.

You can either just (a) retry or (b) increase the queue size for the
`index` thread pool (which will use more memory as more delete requests
will need to be queued)

See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#types

clint


On 18 March 2014 08:13, Thomas S.  wrote:

> Hi,
>
> We often get failures when using the delete by query API. The response is
> an HTTP 503 with a body like this:
>
> {"_indices": {"myindex": {"_shards": {"successful": 2, "failed": 58,
> "total": 60
>
> Is there a way to figure out what is causing this error? It seems to
> mostly happen when the search cluster is busy.
>
> Thomas
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKTOAAc0ov6YrKGj1Fgkxkh%2BA5%2Bya1LKSfXjX5gVtbb9xA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: need help to update document fields in python

2014-03-18 Thread PAULINE BONNEAU

I'm sorry but I have just one more problem. I don't know what I must put in 
the *document* parameter in the update function : 

update(*index*, *doc_type*, *id*, *script=None*, *lang='mvel'*, 
*params=None*, *document=None*, *upsert=None*, *model=None*, *bulk=False*, 
*querystring_args=None*, *retry_on_conflict=None*, *routing=None*, 
*doc_as_upsert=None*)


I think this parameter permits me to update some fields of this document 
but I don't know how to do that.
Thanks

Paulyne

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dbe1e0f0-284e-4f95-912b-9ee30dd42105%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using Elasticsearch with Mongodb-River for searching pdf

2014-03-18 Thread sAs59

http://localhost:9200/mongoindex/_search?pretty=true



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Using-Elasticsearch-with-Mongodb-River-for-searching-pdf-tp4051979p4051980.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1394992468457-4051980.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

searching pdf files by content with Mongodb-river

2014-03-18 Thread sAs59

Hi, 
I am first at elasticsearch and I want to search pdf files by content, but
the resulting can't read properly the content of pdf file. It looks as
following: 

http://localhost:9200/mongoindex/_search?pretty=true

 { 
   "took" : 10, 
   "timed_out" : false, 
   "_shards" : { 
   "total" : 5, 
   "successful" : 5, 
   "failed" : 0 
   }, 
   "hits" : { 
   "total" : 1, 
   "max_score" : 1.0, 
   "hits" : [ { 
   "_index" : "mongoindex", 
   "_type" : "files", 
   "_id" : "532595b8f37d5cc2d64a517d", 
   "_score" : 1.0, 
   "_source" : {"content":{"content_type":"application/pdf",
"title":"D:/sample.pdf", 
"content":"JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0
 

   
"filename":"D:/sample.pdf","contentType":"application/pdf","md5":"afe70f97bce7876e39aa43f71dc7266f","length":82441,"chunkSize":262144,"uploadDate":"2014-03-16T12:14:48.542Z","metadata":{}}
 
   } ] 
} 
} 

Could someone please help me on it? Thank you! 

Here is the link I used: http://v.bartko.info/?p=463



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1395043287994-4051989.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch memory usage

2014-03-18 Thread codebird

Hello Mark, thanks for the reply

The host isn't dedicated to ES, there is mysql and apache on it, but there
is not too much load from their side, so I can set it to 50% of the RAM I
just lowered it to see if this will fix my issue.


Here's the output of ps -ef | grep elasticsearch 

/home/elasticsearch/bin/service/exec/elasticsearch-linux-x86-64
/home/elasticsearch/bin/service/elasticsearch.conf
wrapper.syslog.ident=elasticsearch
wrapper.pidfile=/home/elasticsearch/bin/service/./elasticsearch.pid
wrapper.name=elasticsearch wrapper.displayname=Elasticsearch
wrapper.daemonize=TRUE
wrapper.statusfile=/home/elasticsearch/bin/service/./elasticsearch.status
wrapper.java.statusfile=/home/elasticsearch/bin/service/./elasticsearch.java.status
wrapper.lockfile=/var/lock/subsys/elasticsearch
wrapper.script.version=3.5.14
root 23802 23798 99 03:52 ?07:55:57
/usr/java/jdk1.7.0_25/bin/java -Delasticsearch-service
-Des.path.home=/home/elasticsearch -Xss256k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Djava.awt.headless=true -XX:MinHeapFreeRatio=40 -XX:MaxHeapFreeRatio=70
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:MaxDirectMemorySize=4g -Xms8192m -Xmx8192m
-Djava.library.path=/home/elasticsearch/bin/service/lib -classpath
/home/elasticsearch/bin/service/lib/wrapper.jar:/home/elasticsearch/lib/elasticsearch-1.0.0.jar:/home/elasticsearch/lib/elasticsearch-1.0.0.jar:/home/elasticsearch/lib/jna-3.3.0.jar:/home/elasticsearch/lib/jts-1.12.jar:/home/elasticsearch/lib/log4j-1.2.17.jar:/home/elasticsearch/lib/lucene-analyzers-common-4.6.1.jar:/home/elasticsearch/lib/lucene-codecs-4.6.1.jar:/home/elasticsearch/lib/lucene-core-4.6.1.jar:/home/elasticsearch/lib/lucene-grouping-4.6.1.jar:/home/elasticsearch/lib/lucene-highlighter-4.6.1.jar:/home/elasticsearch/lib/lucene-join-4.6.1.jar:/home/elasticsearch/lib/lucene-memory-4.6.1.jar:/home/elasticsearch/lib/lucene-misc-4.6.1.jar:/home/elasticsearch/lib/lucene-queries-4.6.1.jar:/home/elasticsearch/lib/lucene-queryparser-4.6.1.jar:/home/elasticsearch/lib/lucene-sandbox-4.6.1.jar:/home/elasticsearch/lib/lucene-spatial-4.6.1.jar:/home/elasticsearch/lib/lucene-suggest-4.6.1.jar:/home/elasticsearch/lib/mysql-connector-java-5.1.26-bin.jar:/home/elasticsearch/lib/spatial4j-0.3.jar:/home/elasticsearch/lib/sigar/sigar-1.6.4.jar
-Dwrapper.key=DFYAQvFT98OPKWIu -Dwrapper.port=32000
-Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999
-Dwrapper.disable_console_input=TRUE -Dwrapper.pid=23798
-Dwrapper.version=3.5.14 -Dwrapper.native_library=wrapper
-Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1
org.tanukisoftware.wrapper.WrapperSimpleApp
org.elasticsearch.bootstrap.ElasticsearchF



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-memory-usage-tp4051793p4051910.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1394795452134-4051910.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Mechanism of internal search with multiple indices

2014-03-18 Thread golchhamohit

Thanks for explaining clearly how the query modifies itself when a query is
given with multiple indices and types.

My another doubt is that which I represent here. I have multiple indices
(assume 5 - i1,i2,i3,i4,i5), multiple types(5 in each index, so total 25
types - t1,t2...,t25) , some field (called "field_foo" which is present in
all documents of all indices) and there are 100 documents in each
index(d1,d2,..,d100). So total there are 5*100 = 500 documents totally.

Now, my search query contains 2 indices(i2,i4) and
types(t2,t5,t7,t13,t17,t23) and to search for a particular list of values
["value_field_foo1",value_field_foo2",value_field_foo3"] in field
"field_foo" (where actually value_field_foo1 is present in index2,
value_field_foo2 is present in index2 and index4, assume this, but in real
time we would not be aware which value is present in which field). My
question is will all the values(value_field_foo1, value_field_foo2,etc) be
checked its presence in the documents of all indexes , or is there some
mechanism by which it determines that value is not present in that index and
hence it avoids the search in that index thereby saving time. ?

Sorry for being verbose.

Thanks in advance.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Mechanism-of-internal-search-with-multiple-indices-tp4051988p4052080.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1395132968292-4052080.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

custom_filters_score query performance

2014-03-18 Thread venku123



An Elasticsearch filtered query with a custom filter score query is 
expected to do scoring process for the documents returned by the filter. 
Instead the custom filter score query was fetching all documents and 
evaluating each of those document against the filter set.

I did some analysis and have found the reasons for this behavior. I have 
documented my findings in the under mentioned link. Hope this will benefit 
those using custom_filters_score query.

http://readmytechstuff.blogspot.com/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b3980d5-e0c6-43b6-8c61-2d42ebbfa257%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: integration test issues with elasticsearch

2014-03-18 Thread Clinton Gormley

Please could you open an issue and try to provide the steps to reproduce
this issue.

thanks

clint

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKT76pBaKao1Qb%2B8HtH66xMafBGRdBLow83Z6mrxfRqTGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

slow indexing geo_shape

2014-03-18 Thread Georgi Ivanov

Hi,
I am playing with geo_shape type .
I am experiencing very slow indexing times. For example one simple 
linestring with couple of hundred points could take up to 60 seconds to 
index.
I tries geohash and quadtree implementations.
With quadtree it is faster (like 50% faster) , but still not fast enough.

Using Java API (bulk indexing)

Mapping:

{
  "entity": {
 "properties": {
  "id" : {"type": "integer"},
  "track" : {"type": "geo_shape","precision":"20m", "tree": 
"quadtree"},
  "date" : {"type": "date"}
   }
 }
}

My ES cluster is tuned for indexing like follows:

index.refresh_interval: 30s
index.translog.flush_threshold_ops: 10
indices.memory.index_buffer_size:: 15%

threadpool.bulk.queue_size: 500
threadpool.bulk.size: 100
threadpool.bulk.type: fixed


Any tips how to make indexing faster ? 
My estimation is that for one day data i could index it for 10 hours (and i 
need to index like 3 years of data ). 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cb6b6f20-639c-4bb1-93d9-52f81658761c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: need help to update document fields in python

2014-03-18 Thread PAULINE BONNEAU

Hi Honza, 

Thanks for your help. I will try this now. 

Paulyne

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b27d5190-78b6-4ca6-9c04-ed230d14eecb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: need help to update document fields in python

2014-03-18 Thread Honza Král

Hi Paulyne,

unfortunately with elasticsearch you need to update your documents
one-by-one using the Update API, I am not too familiar with pyes, but
it exposes the api as a method it seems:
http://pyes.readthedocs.org/en/latest/references/pyes.es.html?highlight=update#pyes.es.ES.update

alternatively you can just use the official python client that exposes
the api without any change from the curl:
http://elasticsearch-py.readthedocs.org/en/master/api.html#elasticsearch.Elasticsearch.update

Another alternative is just to index new version of the same document
(same index, type and id) using any api or tool you used in the first
place, that will overwrite the document in elasticsearch.

Hope this helps,
Honza

On Tue, Mar 18, 2014 at 4:40 PM, PAULINE BONNEAU
 wrote:
> Hi all,
>
> I'm new here and have a problem with a query in python. I hope someone can
> help me ,please.
>
>
> My problem is :
>
> I have some documents in an index. I would like to update some document
> fields in the index but I don't know how. If someone can help me, please
> explain me how I can do that. I work in python with pyes.
>
> If my problem is not clear, I can explain more.
>
>
> Sorry if my writing in english is not very clear :)
>
>
> Paulyne
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5688e05c-d4f0-4492-bc3f-c68d22a9d085%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDiqb1D4PFwYAZFdRoUw-Za5mu%2BgXZcTT1n4wp%3DoYnkj7Nw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

need help to update document fields in python

2014-03-18 Thread PAULINE BONNEAU

 

Hi all, 

I'm new here and have a problem with a query in python. I hope someone can 
help me ,please.


 My problem is :

I have some documents in an index. I would like to update some document 
fields in the index but I don't know how. If someone can help me, please 
explain me how I can do that. I work in python with pyes.

If my problem is not clear, I can explain more. 


 Sorry if my writing in english is not very clear :)


 Paulyne

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5688e05c-d4f0-4492-bc3f-c68d22a9d085%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 discovery issues with 2 local instances

I'm not sure what I can change since I have the bare minimum config :

The only difference between the 2 elasticsearch.yml config file are:

node.master: false
> node.data: false
> path.conf: /etc/elasticsearchro
> path.data: /var/lib/elasticsearchro
>

Both config share this part:

cluster.name: ESCluster
> cloud:
> aws:
> access_key: xx
> secret_key: x
> region: us-west-2
> discovery:
> type: ec2
> ec2:
> tag:
> elasticsearch: true
>



On Tuesday, March 18, 2014 11:00:15 AM UTC-4, David Pilato wrote:
>
> I think I understand what is happening here.
>
> Wondering if giving another elasticsearch.yml file as à configuration for 
> the second node with all defaults (except cluster name) could help.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 18 mars 2014 à 15:41, Bastien Chong > 
> a écrit :
>
> Hi,
>
> I have a server with 2 ES instances. The first one is the master one, used 
> to store documents. The second one is just there to receive requests from 
> Kibana (I call it ES read-only), it has :
>
> node.master: false
>> node.data: false
>>
>
> Both are configured with cloud-aws plugin, and the http/java port are left 
> by default for automatic assignment.
>
> When master start first, it's binded to 9200/9300, then the second one to 
> 9201/9300. When I do : *curl -XGET 
> 'http://localhost:9200/_cluster/health?pretty=true' 
> *  everything is 
> working as expected and both nodes are in the same cluster.
>
> *But*, if I start the read-only instance first, discovery stop working. I 
> have enabled DEBUG and TRACE but I didn't found what's the issue.
>
> I also tried to hardcode the port allocation :
>
> transport.tcp.port: 930(0/1)
>> http.port: 920(0-1)
>>
>
> And after that, It's actually worse, whatever the order I start the 
> instances, EC2 discovery is broken. So it's sort of a race-condition that's 
> happening.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/8851834f-cefd-4732-8c34-f3410cec0c99%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f31f79ec-ad39-440f-b2cd-36b642e18d48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

create panel to display latency time using two datetime fields

2014-03-18 Thread computer engineer

I would like to display in a panel the lagtime between two times that come 
in a message. I have no idea how to initiate it and would appreciate some 
help:

inDate "2014-03-18 05:00:00" outDate: "2014-03-18 05:05:00"

lagtime = outdate - indate

So what I would expect is lagtime to show average lagtime coming in for all.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/90c09d4c-e138-48f1-b1c5-a8d6738255ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 discovery issues with 2 local instances

I think I understand what is happening here.

Wondering if giving another elasticsearch.yml file as à configuration for the 
second node with all defaults (except cluster name) could help.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 18 mars 2014 à 15:41, Bastien Chong  a écrit :
> 
> Hi,
> 
> I have a server with 2 ES instances. The first one is the master one, used to 
> store documents. The second one is just there to receive requests from Kibana 
> (I call it ES read-only), it has :
> 
>> node.master: false
>> node.data: false
> 
> Both are configured with cloud-aws plugin, and the http/java port are left by 
> default for automatic assignment.
> 
> When master start first, it's binded to 9200/9300, then the second one to 
> 9201/9300. When I do : curl -XGET 
> 'http://localhost:9200/_cluster/health?pretty=true'  everything is working as 
> expected and both nodes are in the same cluster.
> 
> But, if I start the read-only instance first, discovery stop working. I have 
> enabled DEBUG and TRACE but I didn't found what's the issue.
> 
> I also tried to hardcode the port allocation :
> 
>> transport.tcp.port: 930(0/1)
>> http.port: 920(0-1)
> 
> And after that, It's actually worse, whatever the order I start the 
> instances, EC2 discovery is broken. So it's sort of a race-condition that's 
> happening.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/8851834f-cefd-4732-8c34-f3410cec0c99%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/914F1C15-3832-4C45-BD7C-095286011D23%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: need help with aggregation and unique counted values

2014-03-18 Thread Adrien Grand

Hi,

The next version of Elasticsearch will have a new cardinality[1]
aggregation that allows for computing unique counts. You could use it in
lieu of the "unique" terms aggregation in order to compute the unique count
of session IDs.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html


On Tue, Mar 18, 2014 at 2:42 PM, Andreas Hembach  wrote:

> Hi all,
>
> i'm new here and have a problem with an query. I hope someone can help me.
>
> My Problem:
> - I have a log with user clicks, the user revenue and there session id's.
> Now i want to build a data histogram with all counted clicks, the unqiue
> session ids and the user revenue.
>
> My Query:
> {
>"query":{
>   "match_all":{}
>},
>"aggs":{
>   "log_over_time":{
>  "date_histogram":{
> "field":"dateline",
> "interval":"month",
> "format":"-MM"
>  },
>  "aggs":{
> "amount":{
>"sum":{
>   "field":"order_amount"
>}
> },
> "unique":{
>"terms":{
>   "field":"user_session_id",
>   "size":10
>}
> }
>  }
>   }
>}
> }
>
> My first approach is to count the "unique" entries. But the response is
> very very large and limited to 10 entries.
>
> Is there a better way to do this? Can i do something like group by value?
>
> A big thank you for the help!
>
> Greetings,
> Andreas
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e4172503-ad5b-4bb0-9856-2fd3abb647b1%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5GNDsv118QRrYMc1bsBPGLG9W-XmwuDVCWsVdFPWkoHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

EC2 discovery issues with 2 local instances

Hi,

I have a server with 2 ES instances. The first one is the master one, used 
to store documents. The second one is just there to receive requests from 
Kibana (I call it ES read-only), it has :

node.master: false
> node.data: false
>

Both are configured with cloud-aws plugin, and the http/java port are left 
by default for automatic assignment.

When master start first, it's binded to 9200/9300, then the second one to 
9201/9300. When I do : *curl -XGET 
'http://localhost:9200/_cluster/health?pretty=true'*  everything is working 
as expected and both nodes are in the same cluster.

*But*, if I start the read-only instance first, discovery stop working. I 
have enabled DEBUG and TRACE but I didn't found what's the issue.

I also tried to hardcode the port allocation :

transport.tcp.port: 930(0/1)
> http.port: 920(0-1)
>

And after that, It's actually worse, whatever the order I start the 
instances, EC2 discovery is broken. So it's sort of a race-condition that's 
happening.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8851834f-cefd-4732-8c34-f3410cec0c99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Detecting whether elasticsearch has finished indexing after adding documents

IIRC as soon as docs have been sent to BulkProcessor. Which means that it's not 
synchronized.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 18 mars 2014 à 15:05, Andrew Coulton  a écrit :
> 
> David,
> 
> Thanks - that makes sense. Do you happen to know if the last_seq gets updated 
> once the document has been indexed, or just once it's been posted into the 
> elasticsearch queue?
> 
> Andrew
> 
> 
>> On 18 March 2014 12:47, David Pilato  wrote:
>> The only thing I can think about is by get _seq document from couchdb river.
>> 
>> curl -XGET 'localhost:9200/_river/my_db/_seq'
>> And see if `last_seq` field has the same value as the couchdb _changes API 
>> has.
>> 
>> 
>> -- 
>> David Pilato | Technical Advocate | Elasticsearch.com
>> @dadoonet | @elasticsearchfr
>> 
>> 
>>> Le 18 mars 2014 à 13:39:04, Andrew Coulton (and...@ingenerator.com) a écrit:
>>> 
>>> Hi,
>>> 
>>> Is there a way to check whether the index (and couchdb river) are up to 
>>> date with changes after we add a batch of documents? We have a workflow 
>>> that involves adding new content to the index and then running some reports 
>>> against it, but obviously I need to wait to run the reports until all the 
>>> changes are in the index.
>>> 
>>> I've had a look at the various status request/responses, and a fair bit of 
>>> googling around, but I can't see anything obvious - quite possible I've 
>>> missed it though.
>>> 
>>> Thanks,
>>> 
>>> Andrew
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearch+unsubscr...@googlegroups.com.
>>> 
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/8074a203-24b5-4c45-82a6-53b012997920%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/M6XNjoYld2I/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/etPan.53284052.3d1b58ba.97ca%40MacBook-Air-de-David.local.
>> 
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> -- 
> Andrew Coulton
> Founder
> inGenerator Ltd
> 
> Follow us on Twitter @inGenerator
> Phone us on 0131 510 0271
> 
> inGenerator Ltd is a company registered in Scotland (no SC435161) with its 
> registered office at 8 Craigleith Hill Row, Edinburgh EH4 2JX.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAN3TXFc-y4VDwk105UYHsbNg9zEYav-SgNsWOwjO7VjVb%3DKtCQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/AC721639-B741-4DBE-9572-59A5D674DE36%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic Search Scoring - Query Terms Boost

2014-03-18 Thread Honza Král

Hi Pratik,

in Lucene query syntax, which is what query_string uses, supports
boosting of terms with ^N notation (0), so

"Arvind^6 Kejriwal^6 India Today economic times"

will mark the two first terms as more important.

0 - 
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Boosting%20a%20Term

Hope this helps,
Honza

On Tue, Mar 18, 2014 at 11:25 AM, Pratik Poddar  wrote:
>
> Hey,
>
> Sorry for the noob question. Realy appreciate if someone can help me through
> this. I am using Elastic Search python. I am search a string like "Arvind
> Kejriwal India Today Economic Times" and it gives me reasonable results. I
> was hoping I could increase weight of the first words more in the search
> query. How can I do that?
>
> res = es.search(index="article-index", fields="url", body={"query":
> {"query_string": {"query": keywordstr, "fields": ["text", "title", "tags",
> "domain"]}}})
>
> I am using the above command to search right now.
>
> http://stackoverflow.com/questions/22476315/elastic-search-boost-query-corresponding-to-first-search-term
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8cecfda7-ab8b-4d0a-9a30-babd869b39c4%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDir0XEpCtL3Yt1k7x2U9JG-PHS7H4k6JiuGSdUkR_2D8tg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: fielddata breaker question

2014-03-18 Thread Dunaeth

Meant "custom" splitter analyzer

Le mardi 18 mars 2014 15:07:24 UTC+1, Dunaeth a écrit :
>
> That said, our stats indices (monthly indices) have almost the same 
> mapping but the documents are stored. I do not believe the tester index is 
> concerned by the issue since the only logs linked with it remains in the 
> first seconds after the cluster restart (no trace log after). To go further 
> with our data description, sure each document remains quiet small (atm 
> we're talking about an average of 250B per document in the index size, 2M 
> logs for 500MB in size). To be complete, here is the detail of our customer 
> splitter analyzer (from tester/_settings) :
>
> {
>>   "tester": {
>> "settings": {
>>   "index": {
>> "uuid": "HGEPQdWoRLWe0ATNGUElvw",
>> "number_of_replicas": "1",
>> "analysis": {
>>   "analyzer": {
>> "splitter": {
>>   "type": "custom",
>>   "tokenizer": "pattern"
>> }
>>   }
>> },
>> "number_of_shards": "1",
>> "version": {
>>   "created": "199"
>> }
>>   }
>> }
>>   }
>> }
>
>
> Le mardi 18 mars 2014 12:58:56 UTC+1, Dunaeth a écrit :
>>
>> Actually, tester is a dedicated percolator index with 5 percolation 
>> queries stored and no other data. Percolated documents are web logs and the 
>> tester mapping is :
>>
>> {
>>>   "tester": {
>>> "mappings": {
>>>   ".percolator": {
>>> "_id": {
>>>   "index": "not_analyzed"
>>> },
>>> "properties": {
>>>   "query": {
>>> "type": "object",
>>> "enabled": false
>>>   }
>>> }
>>>   },
>>>   "test_hit": {
>>> "dynamic_templates": [
>>>   {
>>> "template1": {
>>>   "mapping": {
>>> "type": "integer"
>>>   },
>>>   "match": "*_id"
>>> }
>>>   }
>>> ],
>>> "_timestamp": {
>>>   "enabled": true,
>>>   "path": "date",
>>>   "format": "date_time"
>>> },
>>> "_source": {
>>>   "excludes": [
>>> "@timestamp"
>>>   ]
>>> },
>>> "properties": {
>>>   "@timestamp": {
>>> "type": "date",
>>> "index": "no",
>>> "format": "dateOptionalTime"
>>>   },
>>>   "date": {
>>> "type": "date",
>>> "format": "date_time"
>>>   },
>>>   "geoip": {
>>> "type": "geo_point"
>>>   },
>>>   "host": {
>>> "type": "string",
>>> "index": "not_analyzed"
>>>   },
>>>   "ip": {
>>> "type": "ip"
>>>   },
>>>   "prefered-language": {
>>> "type": "string"
>>>   },
>>>   "referer": {
>>> "type": "string",
>>> "analyzer": "splitter"
>>>   },
>>>   "reverse_ip": {
>>> "type": "string"
>>>   },
>>>   "session_id": {
>>> "type": "string",
>>> "index": "not_analyzed"
>>>   },
>>>   "ua_build": {
>>> "type": "short"
>>>   },
>>>   "ua_device": {
>>> "type": "string"
>>>   },
>>>   "ua_major": {
>>> "type": "short"
>>>   },
>>>   "ua_minor": {
>>> "type": "short"
>>>   },
>>>   "ua_name": {
>>> "type": "string"
>>>   },
>>>   "ua_os": {
>>> "type": "string"
>>>   },
>>>   "ua_os_major": {
>>> "type": "short"
>>>   },
>>>   "ua_os_minor": {
>>> "type": "short"
>>>   },
>>>   "ua_os_name": {
>>> "type": "string"
>>>   },
>>>   "ua_patch": {
>>> "type": "short"
>>>   },
>>>   "unique": {
>>> "type": "boolean"
>>>   },
>>>   "uri": {
>>> "type": "string"
>>>   },
>>>   "user-agent": {
>>> "type": "string",
>>> "analyzer": "splitter"
>>>   },
>>>   "valid": {
>>> "type": "boolean"
>>>   }
>>> }
>>>   }
>>> }
>>>   }
>>> }
>>
>>
>> Le mardi 18 mars 2014 12:41:19 UTC+1, Lee Hinman a écrit :
>>>
>>> On 3/17/14, 2:36 AM, Dunaeth wrote: 
>>> > Hi, 
>>> > 
>>> > Due to the insert and search query frequency, it's nearly impossible 
>>> to 
>>> > get logs from specific queries. That said, logs attached are extracts 
>>> of 
>>> > the logs since the cluster restart and are most probably generated 
>>> > during document inserts. 
>>>
>>> It looks like you have incredibly small segments for this index 
>>> (tester), what does the data look like? Can you share your mappings for 
>>> the index as well as example documents? 
>>>
>>> ;; Lee 
>>>

Re: fielddata breaker question

2014-03-18 Thread Dunaeth

That said, our stats indices (monthly indices) have almost the same mapping 
but the documents are stored. I do not believe the tester index is 
concerned by the issue since the only logs linked with it remains in the 
first seconds after the cluster restart (no trace log after). To go further 
with our data description, sure each document remains quiet small (atm 
we're talking about an average of 250B per document in the index size, 2M 
logs for 500MB in size). To be complete, here is the detail of our customer 
splitter analyzer (from tester/_settings) :

{
>   "tester": {
> "settings": {
>   "index": {
> "uuid": "HGEPQdWoRLWe0ATNGUElvw",
> "number_of_replicas": "1",
> "analysis": {
>   "analyzer": {
> "splitter": {
>   "type": "custom",
>   "tokenizer": "pattern"
> }
>   }
> },
> "number_of_shards": "1",
> "version": {
>   "created": "199"
> }
>   }
> }
>   }
> }


Le mardi 18 mars 2014 12:58:56 UTC+1, Dunaeth a écrit :
>
> Actually, tester is a dedicated percolator index with 5 percolation 
> queries stored and no other data. Percolated documents are web logs and the 
> tester mapping is :
>
> {
>>   "tester": {
>> "mappings": {
>>   ".percolator": {
>> "_id": {
>>   "index": "not_analyzed"
>> },
>> "properties": {
>>   "query": {
>> "type": "object",
>> "enabled": false
>>   }
>> }
>>   },
>>   "test_hit": {
>> "dynamic_templates": [
>>   {
>> "template1": {
>>   "mapping": {
>> "type": "integer"
>>   },
>>   "match": "*_id"
>> }
>>   }
>> ],
>> "_timestamp": {
>>   "enabled": true,
>>   "path": "date",
>>   "format": "date_time"
>> },
>> "_source": {
>>   "excludes": [
>> "@timestamp"
>>   ]
>> },
>> "properties": {
>>   "@timestamp": {
>> "type": "date",
>> "index": "no",
>> "format": "dateOptionalTime"
>>   },
>>   "date": {
>> "type": "date",
>> "format": "date_time"
>>   },
>>   "geoip": {
>> "type": "geo_point"
>>   },
>>   "host": {
>> "type": "string",
>> "index": "not_analyzed"
>>   },
>>   "ip": {
>> "type": "ip"
>>   },
>>   "prefered-language": {
>> "type": "string"
>>   },
>>   "referer": {
>> "type": "string",
>> "analyzer": "splitter"
>>   },
>>   "reverse_ip": {
>> "type": "string"
>>   },
>>   "session_id": {
>> "type": "string",
>> "index": "not_analyzed"
>>   },
>>   "ua_build": {
>> "type": "short"
>>   },
>>   "ua_device": {
>> "type": "string"
>>   },
>>   "ua_major": {
>> "type": "short"
>>   },
>>   "ua_minor": {
>> "type": "short"
>>   },
>>   "ua_name": {
>> "type": "string"
>>   },
>>   "ua_os": {
>> "type": "string"
>>   },
>>   "ua_os_major": {
>> "type": "short"
>>   },
>>   "ua_os_minor": {
>> "type": "short"
>>   },
>>   "ua_os_name": {
>> "type": "string"
>>   },
>>   "ua_patch": {
>> "type": "short"
>>   },
>>   "unique": {
>> "type": "boolean"
>>   },
>>   "uri": {
>> "type": "string"
>>   },
>>   "user-agent": {
>> "type": "string",
>> "analyzer": "splitter"
>>   },
>>   "valid": {
>> "type": "boolean"
>>   }
>> }
>>   }
>> }
>>   }
>> }
>
>
> Le mardi 18 mars 2014 12:41:19 UTC+1, Lee Hinman a écrit :
>>
>> On 3/17/14, 2:36 AM, Dunaeth wrote: 
>> > Hi, 
>> > 
>> > Due to the insert and search query frequency, it's nearly impossible to 
>> > get logs from specific queries. That said, logs attached are extracts 
>> of 
>> > the logs since the cluster restart and are most probably generated 
>> > during document inserts. 
>>
>> It looks like you have incredibly small segments for this index 
>> (tester), what does the data look like? Can you share your mappings for 
>> the index as well as example documents? 
>>
>> ;; Lee 
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
htt

Re: Detecting whether elasticsearch has finished indexing after adding documents

2014-03-18 Thread Andrew Coulton

David,

Thanks - that makes sense. Do you happen to know if the last_seq gets
updated once the document has been indexed, or just once it's been posted
into the elasticsearch queue?

Andrew


On 18 March 2014 12:47, David Pilato  wrote:

> The only thing I can think about is by get _seq document from couchdb
> river.
>
> curl -XGET 'localhost:9200/_river/my_db/_seq'
>
> And see if `last_seq` field has the same value as the couchdb _changes API
> has.
>
>
> --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | 
> @elasticsearchfr
>
>
> Le 18 mars 2014 à 13:39:04, Andrew Coulton (and...@ingenerator.com) a
> écrit:
>
> Hi,
>
> Is there a way to check whether the index (and couchdb river) are up to
> date with changes after we add a batch of documents? We have a workflow
> that involves adding new content to the index and then running some reports
> against it, but obviously I need to wait to run the reports until all the
> changes are in the index.
>
> I've had a look at the various status request/responses, and a fair bit of
> googling around, but I can't see anything obvious - quite possible I've
> missed it though.
>
> Thanks,
>
> Andrew
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8074a203-24b5-4c45-82a6-53b012997920%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/M6XNjoYld2I/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/etPan.53284052.3d1b58ba.97ca%40MacBook-Air-de-David.local
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Andrew Coulton
Founder
inGenerator Ltd

Follow us on Twitter @inGenerator
Phone us on 0131 510 0271

inGenerator Ltd is a company registered in Scotland (no SC435161) with its
registered office at 8 Craigleith Hill Row, Edinburgh EH4 2JX.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAN3TXFc-y4VDwk105UYHsbNg9zEYav-SgNsWOwjO7VjVb%3DKtCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Field Queries in ElasticSearch 1.0.1

2014-03-18 Thread maximilian . brodhun

Hello all,

today I tried to change the version of our ElasticServer frim Version 
0.90.7 to 1.0.1.

Some Queries doesn't work anymore. 

For us important are FieldQueries, but it seems they are not included in 
1.0.1 or change to other names.

So my questions are:

Are FieldQueries in 1.0.1 existing?
When not, what other possibillities are given to make a Field Query?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55cc55c8-5179-4784-827e-b24c84f7a591%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Append to existing field

2014-03-18 Thread James Massey

Thanks for the reply, this works fine for me, I think I had the script set 
up improperly so it wouldn't point to an existing field and would make a 
new one instead. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f0a15ca6-9a1d-472d-929f-febe5761c5d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

need help with aggregation and unique counted values

2014-03-18 Thread Andreas Hembach

Hi all,

i'm new here and have a problem with an query. I hope someone can help me.

My Problem:
- I have a log with user clicks, the user revenue and there session id's. 
Now i want to build a data histogram with all counted clicks, the unqiue 
session ids and the user revenue.

My Query:
{
   "query":{
  "match_all":{}
   },
   "aggs":{
  "log_over_time":{
 "date_histogram":{
"field":"dateline",
"interval":"month",
"format":"-MM"
 },
 "aggs":{
"amount":{
   "sum":{
  "field":"order_amount"
   }
},
"unique":{
   "terms":{
  "field":"user_session_id",
  "size":10
   }
}
 }
  }
   }
}

My first approach is to count the "unique" entries. But the response is 
very very large and limited to 10 entries.

Is there a better way to do this? Can i do something like group by value?

A big thank you for the help!

Greetings,
Andreas

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e4172503-ad5b-4bb0-9856-2fd3abb647b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

OpenSearch for elasticsearch.org docs?

2014-03-18 Thread Lukáš Vlček

Hi,

is there any plan to provide OpenSearch API for Elasticsearch docs soon?

It would be really nice if I could search the docs directly from browser
URL. Many browsers support that and it is quite fast shortcut to docs: once
you start typing into URL field "elast" -> it will give you a hint that you
can hit TAB to search in elasticsearch.org -> if you hit the TAB it will
allow you to directly input search term and send it into search API of
docs, also search suggestions are fully supported in many browsers.

Just wanted to ping the ML before opening ticket for this.

Regards,
Lukas

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYCZE4Rkyks6hpentrDZ1c7xyawFCTgZniN0hOOcKZPrA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Exist filter not giving consistent results in version 0.90.6

2014-03-18 Thread Narinder Kaur

Hi All,

   I need to make a filter query to find how many document are there 
with a particular field existing, and its giving me different results every 
time,when I make request to Elasticsearch server. The filter used is

{
"filter":
  {
  "exists":
{
  "field":"approved"
} 
  }
}

The query is being run on a single _type,with approx 200K documents in it. 
And this issue is there on production server only, locally it is working 
fine. On production we have two server for Elasticsearch distribution. 

If anyone having the idea, why is this happening?? Is it a bug or something 
else, that I need to understand, Please help.


Thanks 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9424a5ba-22e6-4117-bc1d-ecf73570c107%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Suppressing the content in Cluster response

2014-03-18 Thread prashy

Hi ES Users,

I have indexed a large sample of data to ES. Now I am running the cluster
query to categorize the data on the basis of "Content". But I don't want the
"Content" field to be returned as ES response i.e. I want only the Cluster
labels and ID's so is there any way around this scenario?

For ex: 
If my query is :
{
  "search_request": {
"fields": [
  "Content"
],
"query": {
  "match": {
"Content": "*mobile*"
  }
}
  },
  "query_hint": "mobile",
  "field_mapping": {
"content": [
  "fields.Content"
]
  },
  "algorithm": "lingo3g"
}


In this case I will get the response which will contain the
"fields->Content" as well as "Cluster labels and ID's".

So is there a way I can suppress the "Content" as return and perform the
categorization as well.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Suppressing-the-content-in-Cluster-response-tp4052109.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1395149021409-4052109.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Append to existing field

2014-03-18 Thread Zachary Tong

You can use the Update API with a script to append to the array:

curl -XPOST "http://localhost:9200/t/t/1"; -d'
{
"hobbies" : ["a", "b"]
}'


curl -XPOST "http://localhost:9200/t/t/1/_update"; -d'
{
"script" : "ctx._source.hobbies += hobby",
"params" : {
"hobby" : "c"
}
}'


curl -XGET "http://localhost:9200/t/t/1";


You can even append an array to the array:

curl -XPOST "http://localhost:9200/t/t/1/_update"; -d'
{
"script" : "ctx._source.hobbies += hobby",
"params" : {
"hobby" : ["d", "e"]
}
}'


-Zach






On Tuesday, March 18, 2014 8:46:27 AM UTC-4, James Massey wrote:
>
> In looking at the Elasticsearch documentation, specifically the Update 
> API, it doesn't appear that Elasticsearch has the functionality to append 
> to an array, it can "update" it with new values, but not preserve the 
> existing values. Am I correct in saying this? 
>
> For an example of what I'm trying to figure out, I borrowed an example 
> from Exploring Elasticsearch. If I have data that looks like the following:
>
>  "_id": 1,
>   "handle": "ron",
>   "age": 28,
>   "hobbies": ["hacking", "the great outdoors"],
>   "computer": {"cpu": "pentium pro", "mhz": 200}
>
>
> Is there any way to add "cooking" to the list of hobbies without deleting 
> "hacking" and "the great outdoors" within Elasticsearch itself? 
>
>
On Tuesday, March 18, 2014 8:46:27 AM UTC-4, James Massey wrote:
>
> In looking at the Elasticsearch documentation, specifically the Update 
> API, it doesn't appear that Elasticsearch has the functionality to append 
> to an array, it can "update" it with new values, but not preserve the 
> existing values. Am I correct in saying this? 
>
> For an example of what I'm trying to figure out, I borrowed an example 
> from Exploring Elasticsearch. If I have data that looks like the following:
>
>  "_id": 1,
>   "handle": "ron",
>   "age": 28,
>   "hobbies": ["hacking", "the great outdoors"],
>   "computer": {"cpu": "pentium pro", "mhz": 200}
>
>
> Is there any way to add "cooking" to the list of hobbies without deleting 
> "hacking" and "the great outdoors" within Elasticsearch itself? 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3a2f8db5-1b45-48be-b641-b41a1728d55b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: OutOfMemoryError OOM while indexing Documents

2014-03-18 Thread Zachary Tong

Thanks for the rest of the info, that helps rule out a couple of 
possibilities.  Unfortunately, I was hoping you had fiddled with the merge 
settings and it was causing problems...but it looks like everything is 
default (which is good!).  Back to the drawing board

Would it be possible to get a heapdump and store it somewhere I can access 
it?  At this point, I think that is our best chance of debugging this 
problem.


Which size do you prefer as bulk size and how many threads can/should 
> process bulks at the same time? As you can see in _nodes.txt we have 12 
> available processors...
> Should we may slow down bulk loader with adding a wait of a few seconds?


Bulks tend to be most efficient around 5-15mb in size.  For your machine, I 
would start with 12 concurrent threads and slowly increase from there.  If 
you start running into rejections from ES, that's the point where you stop 
increasing threads because you've filled the bulk queue with pending tasks 
and ES cannot keep up anymore.  With rejections, the general pattern is to 
wait a random time (1-5s) and then retry all the rejected actions in a new, 
smaller bulk.

-Zach



On Tuesday, March 18, 2014 8:44:00 AM UTC-4, Alexander Ott wrote:
>
> Attached the elasticsearch.yml and the curl -XGET 'localhost:9200/_nodes/'
> We have 5 shard per index and we have not enabled any codecs.
>
> Which size do you prefer as bulk size and how many threads can/should 
> process bulks at the same time? As you can see in _nodes.txt we have 12 
> available processors...
> Should we may slow down bulk loader with adding a wait of a few seconds?
>
> Am Dienstag, 18. März 2014 13:22:57 UTC+1 schrieb Zachary Tong:
>>
>> My observations from your Node Stats
>>
>>- Your node tends to have around 20-25 merges happening at any given 
>>time.  The default max is 10...have you changed any of the merge policy 
>>settings?  Can you attach your elasticsearch.yml?
>>- At one point, your segments were using 24gb of the heap (due to 
>>associated memory structures like bloom filter, etc).  How many primary 
>>shards are in your index?
>>- Your bulks look mostly ok, but you are getting rejections.  I'd 
>>slow the bulk loader down a little bit (rejections mean ES is overloaded)
>>
>> If you can take a heap dump, I would be willing to load it up and look 
>> through the allocated objects.  That would be the fastest way to identify 
>> what is eating your heap and start to work on why.  To take a heap dump run 
>> this, zip it up and save somewhere:  jmap -dump:format=b,file=dump.bin 
>> 
>>
>> As an aside, it's hard to help debug when you don't answer all of the 
>> questions I've asked :P
>>
>> Unanswered questions from upthread:
>>
>>- Have you enabled any codecs and/or changed the `posting_format` of 
>>any fields in your document?
>>- curl -XGET 'localhost:9200/_nodes/'
>>
>>
>> Hope we can get this sorted for you soon!
>> -Zach
>>
>>
>> On Tuesday, March 18, 2014 5:29:40 AM UTC-4, Alexander Ott wrote:
>>>
>>> Attached the captured node stats and again the newest es_log.
>>> I changed the garbage collector from UseParNewGC to UseG1GC with the 
>>> result that the OutOfMemoryError doesn't occur. But as you can see in the 
>>> attached es_log file the warnings of monitor.jvm are still present.
>>>
>>>
>>> Am Montag, 17. März 2014 14:32:29 UTC+1 schrieb Zachary Tong:

 Ah, sorry, I misread your JVM stats dump (thought it was one long list, 
 instead of multiple calls to the same API).  With a single node cluster, 
 20 
 concurrent bulks may be too many.  Bulk requests have to sit in memory 
 while they are waiting to be processed, so it is possible to eat up your 
 heap with many pending bulk requests just hanging out, especially if they 
 are very large.  I'll know more once I can see the Node Stats output.

 More questions! :)

- How big are your documents on average?
- Have you enabled any codecs and/or changed the `posting_format` 
of any fields in your document?
- Are you using warmers?




 On Monday, March 17, 2014 8:36:04 AM UTC-4, Alexander Ott wrote:
>
> At the moment i can provide only the jvm stats ... i will capture the 
> other stats as soon as possible.
>
> We use 5-20 threads which will proccess bulks with a max size of 100 
> entries.
> We only use one node/maschine for development so we have no cluster 
> for development...
> The maschine has 64gb RAM and we increase the heap from 16gb to 32gb...
>  
>
> Am Montag, 17. März 2014 12:21:09 UTC+1 schrieb Zachary Tong:
>>
>> Can you attach the full Node Stats and Node Info output?  There were 
>> other stats/metrics that I wanted to check (such as field data, bulk 
>> queue/size, etc).
>>
>>- How large (physically, in kb/mb) are your bulk indexing 
>>requests?  Bulks should be 5-15mb in si

Re: Analyzer is closed - ERROR

2014-03-18 Thread Itamar Syn-Hershko

This could be a bug in the percolator then - I'd open an issue on github
with a minimal repro

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Tue, Mar 18, 2014 at 2:41 PM, Tomasz Romanczuk wrote:

> I don't have any custom code. My analyzer uses only tokenizer ( *whitespaces
> + 3 special characters: ( ) -* ) and hunspell for danish language. All I
> do is in my previous post.
>
> W dniu wtorek, 18 marca 2014 13:29:04 UTC+1 użytkownik Itamar Syn-Hershko
> napisał:
>>
>> Did you write the analyzer that gets run on the server, or are you simply
>> assembling an analysis chain from client without any custom coding on the
>> server side?
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko 
>> Freelance Developer & Consultant
>> Author of RavenDB in Action 
>>
>>
>> On Tue, Mar 18, 2014 at 2:18 PM, Tomasz Romanczuk wrote:
>>
>>> It's quite simple class:
>>> List filterNames = Lists.newArrayList();
>>> builder.startObject(FILTER);
>>> filterNames.add(FILTER_NAME_1);
>>> builder.startObject(FILTER_NAME_1);
>>> builder.field("type", "word_delimiter");
>>> builder.array("type_table", );
>>> builder.endObject();
>>>
>>> filterNames.add(FILTER_NAME_2);
>>> builder.startObject(FILTER_NAME_2);
>>> builder.field("type", "hunspell");
>>> builder.field("ignoreCase", "false");
>>> builder.field("locale", "da_DK");
>>> builder.endObject();
>>>
>>> builder.endObject();
>>>
>>> builder.startObject("analyzer");
>>> builder.startObject(NAME);
>>> builder.field("type", "custom");
>>> builder.field("tokenizer", "whitespace");
>>> builder.array(FILTER, filterNames.toArray(new
>>> String[filterNames.size()]));
>>>
>>> builder.endObject();
>>> builder.endObject();
>>>
>>> What can be faulty? It properly analyses text. Problem occures only when
>>> I restart module and try to refresh index setting (i.e. change dictionary
>>> language).
>>>
>>> W dniu wtorek, 18 marca 2014 12:51:28 UTC+1 użytkownik Itamar
>>> Syn-Hershko napisał:

 Your analyzer implementation is probably faulty. Lucene 4.6 started
 being more strict about analyzers lifecycle - I suggest you try it locally
 with plain Lucene code to first verify its implementation follows the life
 cycle rules.

 Reference: http://lucene.apache.org/core/4_6_0/core/org/
 apache/lucene/analysis/TokenStream.html

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko 
 Freelance Developer & Consultant
 Author of RavenDB in Action 


 On Tue, Mar 18, 2014 at 1:30 PM, Tomasz Romanczuk 
 wrote:

>  After starting node I try to refresh index setting (i.e. change
> analyzer), but something goes wrong, I have an error:
> 2014-03-18 12:02:40,810 WARN  [org.elasticsearch.index.indexing]
> [alerts_node] [_percolator][0] post listener [org.elasticsearch.index.
> percolator.PercolatorService$RealTimePercolat
> orOperationListener@702f2591] failed
> org.elasticsearch.ElasticSearchException: failed to parse query [316]
> at org.elasticsearch.index.percolator.PercolatorExecutor.
> parseQuery(PercolatorExecutor.java:361)
> at org.elasticsearch.index.percolator.PercolatorExecutor.
> addQuery(PercolatorExecutor.java:332)
> at org.elasticsearch.index.percolator.PercolatorService$
> RealTimePercolatorOperationListener.postIndexUnderLock(PercolatorS
> ervice.java:295)
> at org.elasticsearch.index.indexing.ShardIndexingService.
> postIndexUnderLock(ShardIndexingService.java:140)
> at org.elasticsearch.index.engine.robin.RobinEngine.
> innerIndex(RobinEngine.java:594)
> at org.elasticsearch.index.engine.robin.RobinEngine.index(
> RobinEngine.java:492)
> at org.elasticsearch.index.shard.service.InternalIndexShard.
> performRecoveryOperation(InternalIndexShard.java:703)
> at org.elasticsearch.index.gateway.local.
> LocalIndexShardGateway.recover(LocalIndexShardGateway.java:224)
> at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
> run(IndexShardGatewayService.java:174)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
> ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:908)
> at java.lang.Thread

Re: Detecting whether elasticsearch has finished indexing after adding documents