Re: duplicate documents in query,

2015-04-29 Thread Georgi Ivanov
1.5.2

On Wednesday, April 29, 2015 at 4:44:09 PM UTC+2, Georgi Ivanov wrote:


 Hi ,
 I have some strange issue .
 I get duplicate documents when querying:

 GET track_2011*/_search
 {
   query: {
 bool: {
   must: [
 {
   range: {
 ts: {
   gte: 2011-08-30T00:00:00Z,
   lte: 2011-08-31T23:59:00Z
 }
   }
 },
 {
   term: {
 entity_id: {
   value: 298082
 }
   }
 }
   ]
 }
   }
   ,
   sort: [
 {
   ts: {
 order: asc
   }
 }
   ],
   size: 90
  
 }



 Result (there are more, just showing duplicates):
 {
 _index: track_201108,
 _type: position,
 _id: 298082_1314758608000_1302,
 _score: null,
 _source: {
ts: 1314758608000,
entity_id: 298082,
loc: {
   type: point,
   coordinates: [
  103.69478,
  1.2346333
   ]
}
 },
 sort: [
1314758608000
 ]
  },
  {
 _index: track_201108,
 _type: position,
 _id: 298082_1314758608000_1302,
 _score: null,
 _source: {
ts: 1314758608000,
entity_id: 298082,
loc: {
   type: point,
   coordinates: [
  103.69478,
  1.2346333
   ]
}
 },
 sort: [
1314758608000
 ]
  }



 But if i get the document :

 curl -s  es01.host.com:9200/track_201108/position/298082_1314758608000_1302 
 | json_pp
 {
found : true,
_version : 1,
_type : position,
_index : track_201108,
_source : {
   hourly : false,
   loc : {
  type : point,
  coordinates : [
 103.69478,
 1.2346333
  ]
   },
   ts : 1314758608000,
   entity_id : 298082
},
_id : 298082_1314758608000_1302
 }




 So i have only one document (and it was never updated as version is 1 ).

 I don't  understand what is going on here .

 No special routing, no parent/child relations.

 Any ideas ?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


duplicate results in query

2015-04-29 Thread Georgi Ivanov
Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
  query: {
bool: {
  must: [
{
  range: {
ts: {
  gte: 2011-08-30T00:00:00Z,
  lte: 2011-08-31T23:59:00Z
}
  }
},
{
  term: {
entity_id: {
  value: 298082
}
  }
}
  ]
}
  }
  ,
  sort: [
{
  ts: {
order: asc
  }
}
  ],
  size: 90
  
}

Result (there are more, just showing duplicates):
{
_index: track_201108,
_type: position,
_id: 298082_1314758608000_1302,
_score: null,
_source: {
   ts: 1314758608000,
   entity_id: 298082,
   loc: {
  type: point,
  coordinates: [
 103.69478,
 1.2346333
  ]
   }
},
sort: [
   1314758608000
]
 },
 {
_index: track_201108,
_type: position,
_id: 298082_1314758608000_1302,
_score: null,
_source: {
   ts: 1314758608000,
   entity_id: 298082,
   loc: {
  type: point,
  coordinates: [
 103.69478,
 1.2346333
  ]
   }
},
sort: [
   1314758608000
]
 }

But if i get the document :

curl -s  
es01.vesseltracker.com:9200/track_201108/position/298082_1314758608000_1302 
| json_pp 
{
   found : true,
   _version : 1,
   _type : position,
   _index : track_201108,
   _source : {
  hourly : false,
  loc : {
 type : point,
 coordinates : [
103.69478,
1.2346333
 ]
  },
  ts : 1314758608000,
  entity_id : 298082
   },
   _id : 298082_1314758608000_1302
}


So i have only one document (and it was never updated as version is 1 ).

I don't  understand what is going on here . 

No special routing, no parent/child relations. 

Any ideas ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e9d61e9-ef34-47e8-925c-15addb510850%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


duplicate documents in query,

2015-04-29 Thread Georgi Ivanov

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
  query: {
bool: {
  must: [
{
  range: {
ts: {
  gte: 2011-08-30T00:00:00Z,
  lte: 2011-08-31T23:59:00Z
}
  }
},
{
  term: {
entity_id: {
  value: 298082
}
  }
}
  ]
}
  }
  ,
  sort: [
{
  ts: {
order: asc
  }
}
  ],
  size: 90
 
}



Result (there are more, just showing duplicates):
{
_index: track_201108,
_type: position,
_id: 298082_1314758608000_1302,
_score: null,
_source: {
   ts: 1314758608000,
   entity_id: 298082,
   loc: {
  type: point,
  coordinates: [
 103.69478,
 1.2346333
  ]
   }
},
sort: [
   1314758608000
]
 },
 {
_index: track_201108,
_type: position,
_id: 298082_1314758608000_1302,
_score: null,
_source: {
   ts: 1314758608000,
   entity_id: 298082,
   loc: {
  type: point,
  coordinates: [
 103.69478,
 1.2346333
  ]
   }
},
sort: [
   1314758608000
]
 }



But if i get the document :

curl -s  es01.host.com:9200/track_201108/position/298082_1314758608000_1302 
| json_pp
{
   found : true,
   _version : 1,
   _type : position,
   _index : track_201108,
   _source : {
  hourly : false,
  loc : {
 type : point,
 coordinates : [
103.69478,
1.2346333
 ]
  },
  ts : 1314758608000,
  entity_id : 298082
   },
   _id : 298082_1314758608000_1302
}




So i have only one document (and it was never updated as version is 1 ).

I don't  understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/abf4a5a9-495f-4480-b326-0d9562c696b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Big free space dis-balance

2015-04-08 Thread Georgi Ivanov
Hi,
I have 9 node cluster. 
I notice that the free space is greatly dis-balanced.

On node1 i have only 90GB left, while on other nodes I still have around 
180GB free.

I am pretty sure that no new shards will be allocated as the node is above 
the watermark.

I think this started when I upgraded to 1.5 or may be last version from 1.4 
series

Is there anything i can do about it ?

Thanks in advance
georgi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/046f7877-c402-447f-a954-68ef2a17c893%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


shard allocation per node

2015-03-30 Thread Georgi Ivanov
Hi,
What is the rule for primary shard allocation for single index?

I created one index with 9 primary and 0 replica shards. 
Elasticsearch allocated 5 primary shards on ES01 server (the node with 
least storage available) and 4 more shards on different nodes.

I have 9 servers. For this index only 5 servers are in use.

I don't think this is a correct behavior .

The free space is pretty much even on every host, except ES01 (i still 
don't understand why), and i did't hit any hi/low watermarks.

Any idea what is happening here ?

Georgi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d5f1140-cb10-4ac4-8bd7-2a41433e61f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Number of shards in 4 node Cluster

2015-03-18 Thread Georgi Ivanov
My rules is : 1 primary shard per server.

Also make some estimation how big will be the single index/shard 

I think it is not good if single shard exceed 10 GB, although there is no 
exact limit.


Georgi 

On Tuesday, March 17, 2015 at 7:00:23 PM UTC+1, John S wrote:

 Hi All,

 Is there any best practices of having on the number of shards for a 
 cluster? I have a 4 node cluster and used shards of 20.

 During any node failure or other events i doubts since the shards number 
 is high, replication to new node is taking more time...

 Is there any metrics or formula to be done for number or shards?

 Regards
 John


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/36ef3ed0-870f-41a5-915b-fb3ad919f7a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Index Polygons?

2015-02-24 Thread Georgi Ivanov
Like Jun said,
You need geo shape type.

The problem is, it is very slow to index shapes (except POINT).

I tried with line-strings and it is extremely slow with linestring 10 
points long or longer.

It is just killing the CPU.



On Tuesday, February 24, 2015 at 6:38:37 AM UTC+1, Sai Asuka wrote:

 So I see the elasticsearch claims to use GeoJSON as the format for 
 indexing... but when I look at the docs.. the same it gives is:


 {
 location : {
 type : polygon,
 coordinates : [
 [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 
 0.0] ]
 ]
 }
 }


 Doesn't GeoJSON look like this?

 {
 type: Feature,
 properties: {
name: Sparkle,
age: 11
 },
 geometry: {
   type : polygon,
   coordinates : [[[100.0, 0.0], [101.0,0.0], [101.0, 1.0], [100.0], 
 [100.0, 0.0]]]
 }
 }

 My question is how do I index polygons in elasticsearch if I want to 
 attach properties to it? If I wanted to perform a bulk load for example, 
 what does one document look like that has polygon information that I can 
 perform geospatial queries on?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d06e2c25-1f30-4c30-94dd-435ab9aa63f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Index Polygons?

2015-02-24 Thread Georgi Ivanov
You can whatever you want to the index just define your mapping like this:
{
 my_type : {
_all : {
   enabled : false
},
properties : {
   field1 : {
  type : double,
  index : no
   },
my_polygon : {
  tree : quadtree,//or geohash here
  type : geo_shape
   },
   field2 : {
  type : double,
  index : no
   }
}
 }
  }



On Tuesday, February 24, 2015 at 6:38:37 AM UTC+1, Sai Asuka wrote:

 So I see the elasticsearch claims to use GeoJSON as the format for 
 indexing... but when I look at the docs.. the same it gives is:


 {
 location : {
 type : polygon,
 coordinates : [
 [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 
 0.0] ]
 ]
 }
 }


 Doesn't GeoJSON look like this?

 {
 type: Feature,
 properties: {
name: Sparkle,
age: 11
 },
 geometry: {
   type : polygon,
   coordinates : [[[100.0, 0.0], [101.0,0.0], [101.0, 1.0], [100.0], 
 [100.0, 0.0]]]
 }
 }

 My question is how do I index polygons in elasticsearch if I want to 
 attach properties to it? If I wanted to perform a bulk load for example, 
 what does one document look like that has polygon information that I can 
 perform geospatial queries on?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dfd91b48-d02f-430e-87e4-2b86c3ebb626%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: doc_values for non analyzed fields

2014-11-29 Thread Georgi Ivanov
So,
There is no sense to reindex everything just to set doc_value to non
analyzed fields.
Non indexed fields are not in the field data anyway

Right ?



2014-11-29 11:50 GMT+01:00 Adrien Grand adrien.gr...@elasticsearch.com:

 Doc values cannot be used to fetch values, they are only used for sorting,
 scripts and aggregations. It is like fielddata, but computed at indexing
 time and stored on disk.

 On Fri, Nov 28, 2014 at 4:07 PM, Georgi Ivanov georgi.r.iva...@gmail.com
 wrote:

 Hi,
 Will it make any difference in terms of field data memory, if I set the
 field data format to doc_values for all fields that have mapping index :
 no ?

 Are these (non-analyzed) fields ever loaded in memory on first place ?


 Example field mapping :
 rot: {
 index: no,
 type: integer
 }


 I don't need to query on these fields, but i need to fetch them .
 Any negative impact on my queries ?


 Thank you
 Georgi

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/f4379be6-f750-4703-a163-211e7a6e0501%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f4379be6-f750-4703-a163-211e7a6e0501%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/EQnqv0tm75c/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5eK7qP9WVcLOSj0BhaRKk-%2BCftNwwH4HdFEbm%3DdJoKUQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5eK7qP9WVcLOSj0BhaRKk-%2BCftNwwH4HdFEbm%3DdJoKUQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGKxwg%3D6mycKc60NocPE7LrvJjJDDu3r2w31qX_dLo0QXQziFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


doc_values for non analyzed fields

2014-11-28 Thread Georgi Ivanov
Hi,
Will it make any difference in terms of field data memory, if I set the 
field data format to doc_values for all fields that have mapping index : 
no ?

Are these (non-analyzed) fields ever loaded in memory on first place ? 


Example field mapping :
rot: {
index: no,
type: integer
}


I don't need to query on these fields, but i need to fetch them .
Any negative impact on my queries ?


Thank you
Georgi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f4379be6-f750-4703-a163-211e7a6e0501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES java api: how to handle connectivity problems?

2014-11-28 Thread Georgi Ivanov
That's strange.

Can it be a problem in the code ? 
Something like looping forever ?

You can set the timeout to bulk request , but there is default timeout of 1 
minute.

May be some code will help.


On Friday, November 28, 2014 3:09:37 PM UTC+1, msbr...@gmail.com wrote:

 While testing how to handle es-cluster connectivity issues I ran into a 
 serious problem. The java api node client is connected and then the ES 
 server is killed. The application hangs in some bulkRequest, but this call 
 never returns. It also does not return, even if the cluster was started. On 
 console this exception is shown:

 Exception in thread 
 elasticsearch[event-collector/12240@amnesia][generic][T#2] 
 org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
 [SERVICE_UNAVAILABLE/1/state not recovered / 
 initialized];[SERVICE_UNAVAILABLE/2/no master];
 at 
 org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
 at 
 org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
 at 
 org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
 at 
 org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
 at 
 org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
 at 
 org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 I am wondering that this scenario does not work. Any other scenario e.g. 
 shutdown 1-of-2 nodes is transparently handled. But now the client 
 application seems hanging for ever.

 And ideas?

 regards,
 markus


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7409f05d-b2cf-458d-b081-081bb10384d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shards UNASSIGNED even tho they exist on disk

2014-11-13 Thread Georgi Ivanov
Sounds like you don't have enough space on disk(s)

Happened to me when upgrading to 1.4.

On Monday, November 10, 2014 2:45:09 PM UTC+1, Johan Öhr wrote:

 Hi,

 I have a problem with a few index, some of the shards (both replica and 
 primary) are UNASSIGNED, my cluster stays yellow.

 This is what the master says about that:
 [2014-11-10 06:53:01,223][WARN ][cluster.action.shard ] [node-master] 
 [index][9] received shard failed for [index][9], 
 node[9g2_kOrDSt-57UVI1bLfFg], [P], s[STARTED], indexUUID 
 [20P5SMNFTZyrUEVyUPCsbQ], reason [master 
 [node-master][07ZcjsurR3iIVsH6iSX0jw][data-node][inet[/xx.xx.xx.xx:9300]]{data=false,
  
 master=true} marked shard as started, but shard has not been created, mark 
 shard as failed]

 http://host:9200/index/_stats_shards:{failed:0,successful:13,
 total:20

 This happend when i dropped a node, and let it replicate itself together, 
 replication factor is 1 (two shards identical)
 I did it on two nodes, worked perfectly, then on the third node, i have 92 
 SHARDS Unassigned

 The only different between the first two nodes and the third is that it 
 ran with these settings:


   cluster.routing.allocation.disk.threshold_enabled: true,

   cluster.routing.allocation.disk.watermark.low: 0.85,

 cluster.routing.allocation.disk.watermark.high: 0.90,

 cluster.info.update.interval: 60s,

 indices.recovery.concurrent_streams: 10,

 cluster.routing.allocation.node_concurrent_recoveries: 40,


 Any idea if this can be fixed? 

 Ive tried to clean up the masters and restarted them, nothing
 Ive tried to delete _state on data-node on these index, nothing

 Thanks for help :)



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c1c3bd8-1e94-471d-ab06-fc643afe836e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to search non indexed field in elasticsearch

2014-11-07 Thread Georgi Ivanov
wildcard query is also working on non-indexed fields.


On Friday, November 7, 2014 8:11:30 AM UTC+1, ramky wrote:

 Thanks Nikolas Everett for your quick reply.

 Can you please provide me example to execute the same. I tried multiple 
 times but unable to execute.

 Thanks in advance

 On Thursday, November 6, 2014 9:44:55 PM UTC+5:30, Nikolas Everett wrote:

 You can totally use a script filter checking the field against _source.  
 Its super duper duper slow but you can do it if you need it rarely.

 On Thu, Nov 6, 2014 at 11:13 AM, Ivan Brusic iv...@brusic.com wrote:

 You cannot search/filter on a non-indexed field.

 -- 
 Ivan

 On Wed, Nov 5, 2014 at 11:45 PM, ramakrishna panguluri 
 panguluri@gmail.com wrote:

 I have 10 fields inserted into elasticsearch out of which 5 fields are 
 indexed.
 Is it possible to search on non indexed field?

 Thanks in advance.


 Regards
 Rama Krishna P

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/c63ac6bb-8717-470e-a5e4-01a8bd75b769%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/c63ac6bb-8717-470e-a5e4-01a8bd75b769%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDD0JYJeX%2BCmV%3DGACekwofjUYFQvoSWQ86Th3r-MBWZtw%40mail.gmail.com
  
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDD0JYJeX%2BCmV%3DGACekwofjUYFQvoSWQ86Th3r-MBWZtw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7c772529-df50-4d34-82f9-5f444ef6c5b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: I have a few million users, and I want to index for per user, but.....

2014-11-07 Thread Georgi Ivanov
Aliases may be ?
So one index , may aliases (user_1,user_2...)



On Friday, November 7, 2014 9:09:03 AM UTC+1, David shi wrote:


 I have a few million users, and will continue to grow, maybe a year later 
 increased to 1000W.

 each user have a lot of files , the file size is not fixed, maybe more 
 from 1M ~ 10M.

 I need to do is to give each user's document indexing, and allow the 
 current user can quickly search through the right content to the document.

 I have to think of is to build an index for each user, but there are 
 restrictions on the number of files in Linux single directory.

 You have any good suggestions for me??? 

 Thank you very much!



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4389e5ed-c51d-4788-b91c-1a0fb241ee5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to search non indexed field in elasticsearch

2014-11-07 Thread Georgi Ivanov
Taken from here

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html#query-dsl-wildcard-query
Matches documents that have fields matching a wildcard expression (*not
analyzed*)

I also use wildcard on non analyzed fields, and it is working.

2014-11-07 9:44 GMT+01:00 David Pilato da...@pilato.fr:

 wildcard query is also working on non-indexed fields.


 Are you sure?
 I don’t think so.

 --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr | @scrutmydocs
 https://twitter.com/scrutmydocs


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/iGBwluGlWpc/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5E0F733A-C7D5-452F-951A-E104A92ACF1C%40pilato.fr
 https://groups.google.com/d/msgid/elasticsearch/5E0F733A-C7D5-452F-951A-E104A92ACF1C%40pilato.fr?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGKxwgkYrJ8mstHk%2Bt8pdTUanZoVpTeLF9dh8-viV%2BYymx%3DdAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Upgrade to Es 1.4.0. Backup shards not initializing

2014-11-06 Thread Georgi Ivanov
Hi,
Just upgraded to 1.4
I don't see any errors, but only the primary shards are initialized.

Any idea what is happening ?

Georgi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f1fec2a0-7d6e-486d-b9aa-7c6774b5fb86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Upgrade to Es 1.4.0. Backup shards not initializing

2014-11-06 Thread Georgi Ivanov
Found it .
I had less than 15% free space on disks , and allocation was disabled.

The annoying part is that i had to enable DEBUG in logging.yml just to see 
this !

Will file a bug report.
This must be at least WARNING 


Hope this helps someone else.

Georgi

On Thursday, November 6, 2014 2:07:04 PM UTC+1, Georgi Ivanov wrote:

 Hi,
 Just upgraded to 1.4
 I don't see any errors, but only the primary shards are initialized.

 Any idea what is happening ?

 Georgi


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7b9fca68-8866-4f3b-b3cf-c00665b866f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-05 Thread Georgi Ivanov
Ok .. so it is Java

1. You are not doing this right .
2. You should use BulkRequest or better BulkProcessor class
3. Do NOT do setRefresh ! This way you are forcing ES to do the real
indexing which will load the cluster a LOT
4. Set the refresh interval of your index to something line 30s or 60s


Here is a snippet of code using BulkProcessor (it will not run , because i
removed some parts but it will give u an idea)



public class IndexFoo {
private Connection connection = null;

public Client client;
Integer bulkSize = 1000;
private CommandLine cmd;
//BulkRequestBuilder bulkRequest;
BulkProcessor bulkRequest;
private String index;
SetString hosts = new HashSetString();

private int threads = 5;


public IndexFoo(CommandLine cmd) throws SQLException, ParseException {
this.cmd = cmd;
 this.index = cmd.getOptionValue(index);
 if (cmd.hasOption(b)) {
this.bulkSize = Integer.valueOf(cmd.getOptionValue(b));
}
 if (cmd.hasOption(t)) {
this.threads = Integer.valueOf(cmd.getOptionValue(t));
}
 if (cmd.hasOption(h)) {
String[] hosts = cmd.getOptionValue(h).split(,);
for (String host : hosts) {
this.hosts.add(host);
}
}

this.connectES();

this.bulkRequest = this.getBulkProcessor();
}



private void processData(ResultSet rs) throws SQLException {
 while (rs.next()) {
 //index
bulkRequest.add(client.prepareIndex(myIndex, mytype,
id.toString()).setSource(mySource).request());

}//while
 this.bulkRequest.close();
System.out.println(Indexing done);

}

 private BulkProcessor getBulkProcessor(){
return BulkProcessor.builder(client, new BulkProcessor.Listener() {
 @Override
public void beforeBulk(long executionId, BulkRequest request) {
//System.out.println(Executing bulk #+executionId+
+request.numberOfActions());
}
 @Override
public void afterBulk(long executionId, BulkRequest request, Throwable
failure) {
}
 @Override
public void afterBulk(long executionId, BulkRequest request, BulkResponse
response) {
System.out.println(Bulk #+executionId+/+request.numberOfActions()+
executed in +response.getTook().secondsFrac()+ sec.);
if (response.hasFailures()) {
for (BulkItemResponse bulkItemResponse : response.getItems()) {
if (bulkItemResponse.isFailed()){
System.err.println(Failure message : +
bulkItemResponse.getFailureMessage());
}
}
System.exit(-1);
}
}
}).setConcurrentRequests(this.threads
).setBulkActions(this.bulkSize).build();
 }


}


2014-11-04 17:53 GMT+01:00 John D. Ament john.d.am...@gmail.com:

 And actually now that I'm looking at it again - I wanted to ask why I need
 to use setRefresh(true)?

 In my case, we were not seeing index data updated quick enough upon
 indexing a record.  setting refresh = true was doing it for us.  If there's
 a way to avoid it, that might help me here?


 On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote:

 Georgi,

 I'm indexing the data through regular index request via java

 final IndexResponse response = esClient.client().prepareIndex(indexName,
 type)
 .setSource(json).setRefresh(
 true).execute().actionGet();

 json in this case is a byte[] with the json data in it.

 The requests come in via multiple HTTP requests, but I'm not leveraging
 any specific multithreading within the ES client.  I hope this helps, I'm
 not 100% sure what information would help identify.

 John

 On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

 So you run OOM when you index data ?
 If so :
 How do you index the data ?
 Are you using BulkRequest ?
 Which programming language are you using ?
 Are you using multiple threads to index ?

 If you are using Bulk request , you should limit the size of the bulk.
 You can also tune the bulk request pool in ES.

 In general, you are very brief in describing you problem :)

 Georgi


 2014-11-04 17:05 GMT+01:00 John D. Ament john.d...@gmail.com:

 Georgi,

 Thanks for the quick reply!

 I have 4k indices.  We're creating an index per tenant.  In this
 environment we've created 4k tenants.

 We're running out of memory just letting the loading of records run.

 John


 On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

 Hi,
 I don't think 24k documents are large data.
 What is strange for me is 4000 indices.
 This is strange .. how many indices do you need ?

 On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

 When you are running OOM ? Example query(ies) ? How my nodes ? Some
 more info please :)

 Also, 6GB Heap is not too much, but that depends on your use case

 Georgi

 On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking
 up around 2.5 gb, spread across a little more than 4000 indices.  
 Currently
 our master node is set for 6gb of ram.  We're seeing that after loading
 this data the JVM will eventually crash, sometimes in as little as 5
 minutes.

 Is this not enough horse power for this data set

Re: Performance problems with large data volumes

2014-11-05 Thread Georgi Ivanov
Here is how to set refresh interval:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html
When you force refresh after every document, you are putting unnecessary
load to ES.

Indexing single document in a single call is completely fine, but is also
very slow and inefficient :)
This way you are also utilizing the available indexing threads in ES. You
can read the documentation about this here :
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
If you use bulk request , you can index (tens)thousands of docs per second,
depending on your hardware.

With BulkProcessor class you can set how many threads will run, how may
document will be sent in one bulk.. etc.
It is much more efficient then indexing single document.


2014-11-05 12:53 GMT+01:00 John D. Ament john.d.am...@gmail.com:

 Hi,

 I doubt the issue is that I'm not using bulk requests.  My requests come
 in one at a time, not in bulk.  If you can explain why bulk is required
 that would help.

 I can believe that the refresh is causing the issue.  I would prefer to
 test that one by itself.  How do I configure the refresh interval on the
 index?

 John


 On Wednesday, November 5, 2014 3:43:37 AM UTC-5, Georgi Ivanov wrote:

 Ok .. so it is Java

 1. You are not doing this right .
 2. You should use BulkRequest or better BulkProcessor class
 3. Do NOT do setRefresh ! This way you are forcing ES to do the real
 indexing which will load the cluster a LOT
 4. Set the refresh interval of your index to something line 30s or 60s


 Here is a snippet of code using BulkProcessor (it will not run , because
 i removed some parts but it will give u an idea)



 public class IndexFoo {
 private Connection connection = null;

 public Client client;
 Integer bulkSize = 1000;
 private CommandLine cmd;
 //BulkRequestBuilder bulkRequest;
 BulkProcessor bulkRequest;
 private String index;
 SetString hosts = new HashSetString();

 private int threads = 5;


 public IndexFoo(CommandLine cmd) throws SQLException, ParseException {
 this.cmd = cmd;
  this.index = cmd.getOptionValue(index);
  if (cmd.hasOption(b)) {
 this.bulkSize = Integer.valueOf(cmd.getOptionValue(b));
 }
  if (cmd.hasOption(t)) {
 this.threads = Integer.valueOf(cmd.getOptionValue(t));
 }
  if (cmd.hasOption(h)) {
 String[] hosts = cmd.getOptionValue(h).split(,);
 for (String host : hosts) {
 this.hosts.add(host);
 }
 }

 this.connectES();

 this.bulkRequest = this.getBulkProcessor();
 }



 private void processData(ResultSet rs) throws SQLException {
  while (rs.next()) {
  //index
 bulkRequest.add(client.prepareIndex(myIndex, mytype,
 id.toString()).setSource(mySource).request());

 }//while
  this.bulkRequest.close();
 System.out.println(Indexing done);

 }

  private BulkProcessor getBulkProcessor(){
 return BulkProcessor.builder(client, new BulkProcessor.Listener() {
  @Override
 public void beforeBulk(long executionId, BulkRequest request) {
 //System.out.println(Executing bulk #+executionId+
 +request.numberOfActions());
 }
  @Override
 public void afterBulk(long executionId, BulkRequest request, Throwable
 failure) {
 }
  @Override
 public void afterBulk(long executionId, BulkRequest request, BulkResponse
 response) {
 System.out.println(Bulk #+executionId+/+request.numberOfActions()+
 executed in +response.getTook().secondsFrac()+ sec.);
 if (response.hasFailures()) {
 for (BulkItemResponse bulkItemResponse : response.getItems()) {
 if (bulkItemResponse.isFailed()){
 System.err.println(Failure message : + bulkItemResponse.
 getFailureMessage());
 }
 }
 System.exit(-1);
 }
 }
 }).setConcurrentRequests(this.threads ).setBulkActions(this.
 bulkSize).build();
  }


 }


 2014-11-04 17:53 GMT+01:00 John D. Ament john.d...@gmail.com:

 And actually now that I'm looking at it again - I wanted to ask why I
 need to use setRefresh(true)?

 In my case, we were not seeing index data updated quick enough upon
 indexing a record.  setting refresh = true was doing it for us.  If there's
 a way to avoid it, that might help me here?


 On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote:

 Georgi,

 I'm indexing the data through regular index request via java

 final IndexResponse response = esClient.client().prepareIndex(indexName,
 type)
 .setSource(json).setRefresh(tr
 ue).execute().actionGet();

 json in this case is a byte[] with the json data in it.

 The requests come in via multiple HTTP requests, but I'm not leveraging
 any specific multithreading within the ES client.  I hope this helps, I'm
 not 100% sure what information would help identify.

 John

 On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

 So you run OOM when you index data ?
 If so :
 How do you index the data ?
 Are you using BulkRequest ?
 Which programming language are you using ?
 Are you using multiple threads to index ?

 If you are using Bulk request , you should limit the size

Re: Performance problems with large data volumes

2014-11-04 Thread Georgi Ivanov
Hi,
I don't think 24k documents are large data.
What is strange for me is 4000 indices. 
This is strange .. how many indices do you need ?

On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

When you are running OOM ? Example query(ies) ? How my nodes ? Some more 
info please :)

Also, 6GB Heap is not too much, but that depends on your use case

Georgi

On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up 
 around 2.5 gb, spread across a little more than 4000 indices.  Currently 
 our master node is set for 6gb of ram.  We're seeing that after loading 
 this data the JVM will eventually crash, sometimes in as little as 5 
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ee8b784c-2fd5-403d-853e-5a1e893831dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-04 Thread Georgi Ivanov
So you run OOM when you index data ?
If so :
How do you index the data ?
Are you using BulkRequest ?
Which programming language are you using ?
Are you using multiple threads to index ?

If you are using Bulk request , you should limit the size of the bulk.
You can also tune the bulk request pool in ES.

In general, you are very brief in describing you problem :)

Georgi


2014-11-04 17:05 GMT+01:00 John D. Ament john.d.am...@gmail.com:

 Georgi,

 Thanks for the quick reply!

 I have 4k indices.  We're creating an index per tenant.  In this
 environment we've created 4k tenants.

 We're running out of memory just letting the loading of records run.

 John


 On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

 Hi,
 I don't think 24k documents are large data.
 What is strange for me is 4000 indices.
 This is strange .. how many indices do you need ?

 On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

 When you are running OOM ? Example query(ies) ? How my nodes ? Some more
 info please :)

 Also, 6GB Heap is not too much, but that depends on your use case

 Georgi

 On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up
 around 2.5 gb, spread across a little more than 4000 indices.  Currently
 our master node is set for 6gb of ram.  We're seeing that after loading
 this data the JVM will eventually crash, sometimes in as little as 5
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/cJ2Y6-KQZus/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGKxwgkMCyBkXwg3MNhQp0hqGT6Czz3R2RPBC73B56Bo-yg-dA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ES and Java 8. Does it worth the effort ?

2014-10-30 Thread Georgi Ivanov
Hi ,
I wander if i should start using Java 8 with my ES cluster.

Are there any benefits using Java 8 ?
For example :
faster GC , faster Java itself .. anything ES would bebefit from Java 8 .. 
etc


Please share your experience.

Georgi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9cf82905-63cf-43f2-b14a-de8f21cb4b50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch support for Java 1.8?

2014-06-17 Thread Georgi Ivanov
As far as I know , ES will work just fine with java 1.8,
except script support.

I read some articles on the Internet that scripting support is broken with 
java 1.8

But I would love to hear someone who actually tried :)


On Tuesday, June 17, 2014 3:19:37 PM UTC+2, Chris Neal wrote:

 Hi,

 I saw this blog post from April stating java 1.7u55 as being safe for 
 Elasticsearch, but I didn't see anything about Java 1.8 support.  Just 
 wondering if it was :)


 http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/

 Thanks!
 Chris
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e7fa099-f52d-4e70-a533-e013eb0cd75c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem setting up cluster with NAT address

2014-06-17 Thread Georgi Ivanov
Doesn't sound like elasticsearch issue ...

I would look to my FW rules



On Tuesday, June 17, 2014 2:17:20 PM UTC+2, pmartins wrote:

 Hi, 

 I'm having some problems setting up a 1.2.1 ES cluster. I have two nodes, 
 each one in a different data center/network. 

 One of the nodes is behind a NAT address, so I set network.publish_host to 
 de NAT address. 

 Both nodes connect to each other without problems. The issue is when the 
 node behind the NAT address tries to connect to himself. In my network, he 
 doesn't know his NAT address and can't solve it. So I get the exception: 

 [2014-06-17 12:58:19,681][WARN ][cluster.service  ] 
 [vm-motisqaapp02] failed to reconnect to node 
 [vm-motisqaapp02][4oSfsIaBTSyQWdnxiTt7Cw][vm-motisqaapp02.***][inet[/10.10.1.135:9300]]{master=true}
  

 org.elasticsearch.transport.ConnectTransportException: 
 [vm-motisqaapp02][inet[/10.10.1.135:9300]] connect_timeout[30s] 
 at 
 org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:727)
  

 at 
 org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:656)
  

 at 
 org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:624)
  

 at 
 org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:146)
  

 at 
 org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:518)
  

 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
 Source) 
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
 Source) 
 at java.lang.Thread.run(Unknown Source) 
 Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException: 
 connection timed out: /10.10.1.135:9300 
 at 
 org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
  

 at 
 org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
  

 at 
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
  

 at 
 org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
  

 at 
 org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
  

 at 
 org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
  

 ... 3 more 

 vm-motisqaapp02 NAT address is 10.10.1.135, but locally it can't solve 
 this 
 address. Is there any way that I can setup other IP to comunicate locally? 



 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Problem-setting-up-cluster-with-NAT-address-tp4057849.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24e11c67-8133-4893-b665-09f31735f269%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Share a document across multiple indices

2014-06-17 Thread Georgi Ivanov
Will aliases help you in this case ?

For example :
index1 : [doc1]
index2 : [doc2]


Create an alias Docs for index1 and index2


The run queries against the alias?




On Monday, June 16, 2014 3:51:45 AM UTC+2, Martin Angers wrote:

 Hi,

 I'm wondering if this is a supported scenario in ElasticSearch, reading 
 the guide and API reference I couldn't find a way to achieve this.

 I'd like to index documents only once, say in a master index, and then 
 create secondary or meta indices that would only contain a subset of the 
 master index.

 For example, document A, B and C would be indexed once in the master 
 index. Then a secondary index would be able to see only documents A and B, 
 while another secondary index could see only documents B and C, etc. (and 
 by see I mean the search queries should only consider those documents)

 The idea being that documents could be relatively big, and they should not 
 be indexed multiple times.

 Does that make sense? Am I missing the right way to design such a 
 pattern? I am new to ES.

 Thanks,
 Martin



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ab9180d-6af9-45d4-8b2c-22f32869ee2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-17 Thread Georgi Ivanov
I don't know how you are doing the indexing .

Are you using bulk request or .. ? Bulk insert can greatly increase 
indexing speed.

You can also check node client. It should have better indexing speed 
because it will be 1 hop operation, compared to two hop with transport 
client. (assuming Java AP here)

You can hit the limits of the bulk thread pool(can be increased). If you 
are sending all indexing ops to one server only. One could try to hist all 
master nodes on round-robin basis.

You can monitor IOPs in marvel (or iostat locally on the server) to see if 
are not hitting IO limit.

On my ES cluster i reach 50k indexing ops per second.


On Monday, June 9, 2014 5:40:53 PM UTC+2, pranav amin wrote:

 Hi all,

 While doing some prototyping in ES using SSD's we got some good Write TPS. 
 But the Write TPS saturated after adding some more nodes! 


 Here are the details i used for prototyping -

 Requirement: To read data as soon as possible since the read is followed 
 by write. 
 Version of ES:1.0.0
 Document Size:144 KB
 Use of SSD for Storage: Yes
 Benchmarking Tool: Soap UI or Jmeter
 VM: Ubuntu, 64 Bit OS
 Total Nodes: 12
 Total Shards: 60
 Threads: 200
 Replica: 2
 Index Shards: 20
 Total Index:1 
 Hardware configuration: 4 CPU, 6 GB RAM, 3 GB Heap

 Using the above setup we got Write TPS ~= 500. 

 We wanted to know by adding more node if we can increase our Write TPS. 
 But we couldn't. 
 * By adding 3 more nodes (i..e Total Nodes = 15) the TPS just increase by 
 10 i.e. ~= 510. 
 * Adding more Hardware like CPU, RAM and increasing Heap didn't help as 
 well [8 CPU, 12 GB RAM, 5 GB Heap].

 Can someone help out or point ideas what will be wrong? Conceptually ES 
 should scale in terms of Write  Read TPS by adding more nodes. However we 
 aren't able to get that.

 Much appreciated if someone can point us in the right direction. Let me 
 know if more information is needed.

 Thanks
 Pranav.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3470cead-d70a-4dbc-af3c-4b47abce4d40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to get rid of org.elasticsearch.plugins information logging

2014-06-16 Thread Georgi Ivanov
Hello,
How can i get rid of 

Jun 16, 2014 10:38:13 AM org.elasticsearch.plugins
Information: [Thinker] loaded [], sites []


every time my client connects to ES ?

It is not a big problem, but this output is messing up with my shell 
scripts.

I am using transport client if this matters.
Is this some log4j configuration ? I am not using log4j atm.

Regards,
Georgi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/caea4a0b-bff7-4dfe-af92-654ab1a802ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch and Hadoop Questions

2014-06-09 Thread Georgi Ivanov
Please find my comments below :
 From what you said above that means that you can not run ES queries on
data in Hadoop over something like a 6 month time range without it having
to pull in all that data and index it first.   - *CORRECT *.* Es queries
can run only on ES*

And I am assuming that the opposite is all correct that Hadoop can not
run jobs on data in ES without it first pulling in that data to its storage
first. - *NOT CORRECT*
The thing is , that you can run MR jobs against data stored in ES (via
EsInputFormat)
So you can do some realy cool stuff reading(and writing) data form ES and
the use the power of MR to process/analyze/dowhateveryouwant the data.

In most common case with Hadoop MR job you do the following
1. Job config : input, output, input format, output format , etc
2. Mapper - proces each line of the input (stored on HDFS) and eventualy
emit ket/val to Reducer
3. In reducer process all values for one key and eventualy emit again to
the output (on HDFS)

With Es-hadoop you can set the job input data to be read from ES (so step
1) and then all steps can be the same.

I am giving you some typical scenarios  :
1. Read(via es query) from ES
1.1 Process the data in a MR job
1.2 Store the output to HDFS [OR Store output to ES again (ESindexing
operation)]


2. Run MR job against data stored on HDFS
2.1 Process the data
2.2 Store the output to ES (ES indexing)

Cheers
Georgi




2014-06-09 13:47 GMT+02:00 ES USER es.user.2...@gmail.com:

 Thanks.  So just one final question.  From what you said above that means
 that you can not run ES queries on data in Hadoop over something like a 6
 month time range without it having to pull in all that data and index it
 first.  And I am assuming that the opposite is all correct that Hadoop can
 not run jobs on data in ES without it first pulling in that data to its
 storage first.





 On Friday, June 6, 2014 5:03:03 PM UTC-4, Costin Leau wrote:

 ES stores data in its own internal format, which typically resides
 locally. What you are stating is partially correct - with the connector you
 would move/copy data between Hadoop and ES since, in order for ES to work
 with data, it needs to actually index it (that is, to see it).
 So you would use es-hadoop to index data from Hadoop in ES or/and query
 ES directly from Hadoop.


 On Fri, Jun 6, 2014 at 9:29 PM, ES USER es.use...@gmail.com wrote:

 I guess the problem I having wrapping my head around is exactly where
 the data is residing and in what format.

 If I understand the Georgi's email above is it that you can run map
 reduce jobs against data stored in local ES through by utilizing es-hadoop
 and you can also run ES queries against data in Hadoop utilizing es-hadoop.


   Is that correct?




 On Friday, June 6, 2014 12:39:44 PM UTC-4, Costin Leau wrote:

 Adding to what Georgi wrote, es-hadoop does not create the shards for
 you - that's up to you or index templates (which I highly recommend).
 However es-hadoop is aware of the target shards and will use them to
 parallelize the reads/writes (such as one task per shard).


 On Fri, Jun 6, 2014 at 2:45 PM, Georgi Ivanov georgi@gmail.com
 wrote:

 and i don't think this anyhow related with number of shards and nodes


 On Thursday, June 5, 2014 7:41:34 PM UTC+2, ES USER wrote:

 Try as I might and I have read all the stuff I can find on ES'
 website about this I understand somewhat how the integration works but 
 not
 the actual nuts and bolts of it.

 For example:

 Is Hadoop just storing the files that would normally be stored in the
 local filesystem for the ES indexes or is it storing the data that would
 normally be in those indexes and just accessed through es-hadoop?

 If it is the latter how do you go about determining whatto set for
 the number of nodes and shards.


 If anyone has any information on this or even better yet a place to
 point me to that has better references so that I can research this on my
 own it would be much appreciated.

 Thanks.

  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/90662a91-1557-4f61-86a2-bd2e620aec6f%40goo
 glegroups.com
 https://groups.google.com/d/msgid/elasticsearch/90662a91-1557-4f61-86a2-bd2e620aec6f%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/ed729795-a7d6-4320-9da2-16b214e653b0%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch

Re: Elasticsearch and Hadoop Questions

2014-06-06 Thread Georgi Ivanov
Hmm i am not sure i understand your questions.
Hadoop is distributed storage system (HDFS) and Map-reduce framework (MR) 
(among other things)
ES is distributed storage/search system (among other things)

So what es-hadoop is giving you:

You can read data from ES , and do some complex analysis , taking benefits 
MR
You can write data to ES - one can process some data stored on HDFS and 
write some pre-aggregated data to ES for example

es-hadoop is basically connector between ES and Hadoop

I hope this helps

On Thursday, June 5, 2014 7:41:34 PM UTC+2, ES USER wrote:

 Try as I might and I have read all the stuff I can find on ES' website 
 about this I understand somewhat how the integration works but not the 
 actual nuts and bolts of it.

 For example:

 Is Hadoop just storing the files that would normally be stored in the 
 local filesystem for the ES indexes or is it storing the data that would 
 normally be in those indexes and just accessed through es-hadoop?

 If it is the latter how do you go about determining whatto set for the 
 number of nodes and shards.


 If anyone has any information on this or even better yet a place to point 
 me to that has better references so that I can research this on my own it 
 would be much appreciated.

 Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f4019b07-a660-4a49-b9ec-b04bb1ad71e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Unable to get document by id

2014-05-28 Thread Georgi Ivanov
Hi,
Something strange here ..
I can find a document when searching for it , but i can not get it by ID

For example :
{
query: {
bool: {
must: [
{
term: {
position.ship_id: 50132}}
{
term: {
ts: 138524314}}]
should: [ ]}}
}

   - 
   - 
   Result is OK:
   
   {
   took: 5
   timed_out: false
   _shards: {
   total: 8
   successful: 8
   failed: 0}
   hits: {
   total: 1
   max_score: 1.4142135
   hits: [
   {
   _index: track_201311
   _type: position
   _id: 50132_138524314_-1_5.4194833_57.402333
   _score: 1.4142135
   _source: {
   hourly: true
   ts: 138524314
   ship_id: 50132
   
   
   }}]}}
   
   
   


But when i try this

curl  
'http://localhost:9200/track_201311/position/50132_138524314_-1_5.4194833_57.402333'


I get this 

{_index:track_201311,_type:position,_id:
50132_138524314_-1_5.4194833_57.402333,found:false}



I think this started when i upgraded to ES 1.2

Any idea what is going on ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1a33662e-9d90-4aa7-a53f-f0c57194578e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


cluster.routing.allocation.cluster_concurrent_rebalance not respected?

2014-05-13 Thread Georgi Ivanov
Hi,
In elasticsearch.yml i have :
cluster.routing.allocation.cluster_concurrent_rebalance : 6

still i see


curl http://localhost:9200/_cat/health?v
epoch  timestamp cluster status node.total node.data shards pri 
relo init unassign 
131043 16:24:03  mycluster   green   8 8676 338   
200



Number of relocating shards sometimes goes up to 3 , but i never see it 
goes to 6

Am i missing something here ?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0589a8ad-5352-4ecf-8b56-faeb10ef78a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch primary shards (re)location

2014-05-13 Thread Georgi Ivanov
Hi,
I have the following situation :
I have 8 node cluster.

Periodically some nodes are restarted, and their primary shards allocated 
on other nodes.

After the node is back, it contains much less primary shards then the rest 
of the nodes.
Now i have a situation, where one node holds many primary shards, while 
other node hold only few .

For example i have a node with 20+ primary shards, and a node with 3 
primary shards.

Is this a problem ? 
What will happen when the node with many primary shards fails ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69bbaf9e-4713-477a-8e1c-fde89f941c69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Difference between geo_point and geo_shape (point)

2014-03-31 Thread Georgi Ivanov
Thanks Alex,
That makes perfect sense.
For now I am sticking with geo_shape type here .
Except the index size , everything is much smoother here.

I could recommend geo_shape if one needs geo queries all the time (like me)

George


2014-03-31 9:09 GMT+02:00 Alexander Reelsen a...@spinscale.de:

 Hey,

 this is all about storing and computing. First, lets take a look at
 geo_point

 * Index: Is stored as two floats lat/lon in the index
 * Query: All geo points are loaded into memory (thus your big  fielddata)
 and then in memory calculations are executed

 Now the geo_shape

 * Index: The shape is converted into terms and then stored in the index
 (thus your big index size)
 * Query: A full-text search is basically used to check if a shape is
 inside of another (do they include the same terms?)


 Possible speed improvements:

 * geo_point: Use warmer APIs
 * geo_point: Maybe caching helps, your query location is always the same.
 * geo_point: Maybe the geo_hash_cell filter helps you in terms of speed
 (needs a special mapping)
 * geo_shape: Less precision, less index size, you can change that in the
 mapping

 At the end of day you are meeting a classic tradeoff here. Are willing to
 use more disk or are you willing to compute more things on query time?

 Hope it makes sense as a quick intro...


 --Alex




 On Wed, Mar 19, 2014 at 9:42 PM, Georgi Ivanov 
 georgi.r.iva...@gmail.comwrote:

 Hi,

 I am indexing some pretty big ammount of positions in ES (like 150M ) in
 monthy based indexes (201312 , 201311 etc)

 One document has a timestamp and location.

 My queries are like :
 Give me all positions inside this boundig box... etc

 I have 2 types of indexes with exaclty the same mapping except the
 location fields.
 Ex:
 loc: {
   type: geo_point
 }



 loc: {
   tree: quadtree
   type: geo_shape
 }


 It seems to me that there is big difference in the speed of the queries
 agains the two types of indexes.

 The index with location of type geo_shape is MUCH faster that the index
 with geo_point.
 With cold caches the query with geo_point runs for aout 26 seconds ,
 where the query with geo_shape runs for like 2 seconds.
 Also the query with geo_point type loads huge ammount of data in field
 cache (8GB for just one month data). With geo_shape field data is much less.

 The geo_shape mapping is with default precision and qudtree type.
 Both queries have the same logic.

 I would like to undestand why it is much fatser with geo_shape than
 geo_point.
 Can someone shade some light on this matter ?

 Ofc the index with geo_shape is like 30% bigger in size.

 Example query for index type geo_shape
 {
   query: {
 bool: {
   must: [
 {
   range: {
 ts: {
   from: 2013-11-01,
   to: 2013-12-30
 }
   }
 },
 {
   geo_shape: {
 loc: {
   shape: {
 type: envelope,
 coordinates: [
   [ 1.6754645,53.786],
   [14.345234, 51.3453  ]
 ]
   }
 }
   }
 }
   ],
 }

   },
   aggregations: {
 agg1: {
   terms: {
 field: e_id
   }
 }
   },
   size: 0
 }


 Example query for index type geo_point
 {
   query: {
 bool: {
   must: [
 {
   range: {
 ts: {
   from: 2013-11-01,
   to: 2013-12-30
 }
   }
 },
 {
   geo_bounding_box : {
   loc : {
   top_left : {
   lat : 40.73,
   lon : -74.1
   },
   bottom_right : {
   lat : 40.01,
   lon : -71.12
   }
   }
 }
 }
   ],
 }
   },
   aggregations: {
 agg1: {
   terms: {
 field: e_id
   }
 }
   },
   size: 0
 }

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.

 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/GYPrniLiJis/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https

Re: elasticsearch java interaction

2014-03-22 Thread Georgi Ivanov
I still see port 9200.
Several times we said this must be 9300 .

As master Yoda would say : Concentrate you must ! 

:))


On Friday, March 14, 2014 1:47:02 PM UTC+1, Venu Krishna wrote:

 Hi,
I am Y.Venu,i am totally new to this elasticsearch,now i am trying 
 to communicate java elastisearch,i have gone through the elasticsearch java 
 api's 

 1st i came across maven repositry.
 i have created pom.xml in my eclipse and in the dependency tag i have 
 just placed the code that i found in maven repositry 

  i.e.  
  

 dependency
 groupIdorg.elasticsearch/groupId
 artifactIdelasticsearch/artifactId
 version${es.version}/version
 /dependency

 After that i have created one class with the main method and i copied and 
 placed the code that i found in the client api of elasticsearch i.e.
  TransportClient.

 main()
 {
 Client client = new TransportClient()
   .addTransportAddress(new 
 InetSocketTransportAddress(host1, 9200))
   .addTransportAddress(new 
 InetSocketTransportAddress(host2, 9200));

   // on shutdown

   client.close();
   
   Settings settings = ImmutableSettings.settingsBuilder()
   .put(client.transport.sniff, true).build();
   TransportClient client1 = new TransportClient(settings);

 }

 After running this app javapplication,i am getting the errors like this



 In Main Method
 Mar 14, 2014 6:05:24 PM org.elasticsearch.node
 INFO: [Mister Machine] {elasticsearch/0.16.1}[11016]: initializing ...
 Mar 14, 2014 6:05:24 PM org.elasticsearch.plugins
 INFO: [Mister Machine] loaded []
 org.elasticsearch.common.inject.internal.ComputationException: 
 org.elasticsearch.common.inject.internal.ComputationException: 
 java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553)
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419)
   at 
 org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
   at 
 org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:52)
   at 
 org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:57)
   at 
 org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:377)
   at 
 org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:169)
   at 
 org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:224)
   at 
 org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:120)
   at 
 org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:105)
   at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:92)
   at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:69)
   at 
 org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:58)
   at 
 org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:146)
   at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
   at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166)
   at ES_Client.main(ES_Client.java:64)
 Caused by: org.elasticsearch.common.inject.internal.ComputationException: 
 java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553)
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419)
   at 
 org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
   at 
 org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46)
   at 
 org.elasticsearch.common.inject.MembersInjectorStore.get(MembersInjectorStore.java:66)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:69)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:31)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:39)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:35)
   at 
 org.elasticsearch.common.inject.internal.FailableCache$1.apply(FailableCache.java:35)
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:549)
   ... 17 more
 Caused by: java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
  

Re: elasticsearch java interaction

2014-03-20 Thread Georgi Ivanov
There is something wrong with your set-up

How many ES node s you have ?
On which IP addresses are ES hosts listening ?

I understood you have 2 hosts , but it seems you have only one on your 
local machine .

This is the code (a bit modified) I am using at the moment 


public void connectES() {
SetString hosts = new HashSetString();
  hosts.add(host1.mydomain.com);
  hosts.add(host2.host1.mydomain.com); // Make sure this resolvs to 
proper IP address
  Settings settings = 
ImmutableSettings.settingsBuilder().put(cluster.name, 
vesseltrackerES).build();

  TransportClient transportClient = new TransportClient(settings);
  for (String host : this.hosts) {
transportClient = transportClient.addTransportAddress(new 
InetSocketTransportAddress(host, 9300));
  }

  System.out.print(Connected to nodes : );
  for (DiscoveryNode node : transportClient.connectedNodes()) {
System.out.print(node.getHostName() +  , );
  }
  System.out.println();

  this.client = (Client) transportClient;
}


On Thursday, March 20, 2014 2:51:50 PM UTC+1, Venu Krishna wrote:

 Actually this is my elasticsearch index  http://localhost:9200/, as you 
 told i have replaced 9200 with 9300 in the above code ,then i executed the 
 application i am getting following exceptions.

 Mar 20, 2014 7:17:45 PM org.elasticsearch.client.transport
 WARNING: [Bailey, Gailyn] failed to get node info for 
 [#transport#-1][inet[localhost/127.0.0.1:9300]]
 org.elasticsearch.transport.NodeDisconnectedException: 
 [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected

 Connected
 Mar 20, 2014 7:17:50 PM org.elasticsearch.client.transport
 WARNING: [Bailey, Gailyn] failed to get node info for 
 [#transport#-1][inet[localhost/127.0.0.1:9300]]
 org.elasticsearch.transport.NodeDisconnectedException: 
 [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected

 Mar 20, 2014 7:17:50 PM org.elasticsearch.client.transport
 WARNING: [Bailey, Gailyn] failed to get node info for 
 [#transport#-1][inet[localhost/127.0.0.1:9300]]
 org.elasticsearch.transport.NodeDisconnectedException: 
 [][inet[localhost/127.0.0.1:9300]][/cluster/nodes/info] disconnected

 Thankyou

 On Thursday, March 20, 2014 7:12:14 PM UTC+5:30, David Pilato wrote:

 Use port 9300

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 20 mars 2014 à 14:34, Venu Krishna yvgk...@gmail.com a écrit :

 Thankyou for the reply.I am not getting any errors,but i am not able to 
 connect to my elasticsearch using java.Here my code.

 import java.net.InetSocketAddress;

 import org.elasticsearch.client.Client;
 import org.elasticsearch.client.transport.TransportClient;
 import org.elasticsearch.common.transport.InetSocketTransportAddress;


 public class JavaES_Client {

 void function()
 {
 //on StartUp
 System.out.println(In Function);
 
 Client client = new TransportClient()
 .addTransportAddress(new InetSocketTransportAddress(localhost, 
 9200));   // This is where my control is getting stuck,without any 
 exceptions or errors.
 
 
 System.out.println(Connected);
 //on ShutDown
 client.close();
 }
 
 
 public static void main(String[] args) {
 
 System.out.println(In Main Method);
 JavaES_Client jc = new JavaES_Client();
 System.out.println(Object Created);
 jc.function();
 
 }
 
 }


 On Thursday, March 20, 2014 2:20:25 PM UTC+5:30, Georgi Ivanov wrote:

 On Linux the file is  /etc/hosts
 On Windows c:\windows\system32\drivers\etc\hosts

 Open the file in text editor

 Add following lines:
 192.168.1.100 host1
 192.168.1.101 host2

 Make sure that 192.168.1.100/101 is the right IP address of the 
 host1/host2



 2014-03-20 8:35 GMT+01:00 Venu Krishna yvgk...@gmail.com:

 Hi Georgi Ivanov,
   yes,i am able to understand the Exception i.e. 
 UnresolvedAddressException,but you are telling that to make sure host1 and 
 host2 are resolved by adding entries to /etc/hosts to wherever the file in 
 on Windows,for this can you give me the steps how to approach this.Sorry i 
 am new to this and am learning i am unable to get the proper 
 example.Thanks 
 in advance for the help.


 On Thursday, March 20, 2014 2:36:10 AM UTC+5:30, Georgi Ivanov wrote:

 Well
 I think UnresolvedAddressException obviously means that your Java 
 client can not resolve host1 and host2

 make sure host1 and host2 are resolvable  by adding entries to 
 /etc/hosts ot wherever the file in on Windows



 On Friday, March 14, 2014 1:47:02 PM UTC+1, Venu Krishna wrote:

 Hi,
I am Y.Venu,i am totally new to this elasticsearch,now i am 
 trying to communicate java elastisearch,i have gone through the 
 elasticsearch java api's 

 1st i came across maven repositry.
 i have created pom.xml in my eclipse and in the dependency tag i 
 have just placed the code that i found in maven repositry

Re: Sort before filter?

2014-03-19 Thread Georgi Ivanov
I think sorting first, will be bad if u have more data. 
Sorting is not exaclty the fasted thinkg ..
It may sound good for small amount of data, but what if we have 10 B 
documents  ? Should ES go trought all documents just to sort them ?

I don't think this will be good.

On Wednesday, March 19, 2014 12:45:43 PM UTC+1, David Pfeffer wrote:

  I have an index that contains 30 GB worth of news stories. I want to 
 return the stories that contain a particular name in their text, sorted 
 chronologically. I only want the first 100 stories.

 ElasticSearch seems to approach this problem by filtering every story to 
 just those that match, then sorting those results and returning the top 
 100. This uses a reasonably large amount of resources to filter every 
 single one.

 Can I get ElasticSearch to instead sort first, and then filter in order 
 until it reaches the maximum (100). Granted that this would be 100 per 
 shard, but then the final step would be to take each shard's 100, sort them 
 all together, and take the top 100 of that result set. This should, at 
 least in my mind, use significantly less resources, as it would only need 
 to go through maybe 5000 or 1 items to find a match, as opposed to the 
 entirety of the index.

 *(Cross-posted 
 from 
 http://stackoverflow.com/questions/22467585/sort-before-filters-in-elasticsearch
  
 http://stackoverflow.com/questions/22467585/sort-before-filters-in-elasticsearch,
  
 because I didn't get an answer there for 2 days.)*


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/710176fc-2b8a-4046-b27a-7e25457f026c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch java interaction

2014-03-19 Thread Georgi Ivanov
Well
I think UnresolvedAddressException obviously means that your Java client 
can not resolve host1 and host2

make sure host1 and host2 are resolvable  by adding entries to /etc/hosts 
ot wherever the file in on Windows



On Friday, March 14, 2014 1:47:02 PM UTC+1, Venu Krishna wrote:

 Hi,
I am Y.Venu,i am totally new to this elasticsearch,now i am trying 
 to communicate java elastisearch,i have gone through the elasticsearch java 
 api's 

 1st i came across maven repositry.
 i have created pom.xml in my eclipse and in the dependency tag i have 
 just placed the code that i found in maven repositry 

  i.e.  
  

 dependency
 groupIdorg.elasticsearch/groupId
 artifactIdelasticsearch/artifactId
 version${es.version}/version
 /dependency

 After that i have created one class with the main method and i copied and 
 placed the code that i found in the client api of elasticsearch i.e.
  TransportClient.

 main()
 {
 Client client = new TransportClient()
   .addTransportAddress(new 
 InetSocketTransportAddress(host1, 9200))
   .addTransportAddress(new 
 InetSocketTransportAddress(host2, 9200));

   // on shutdown

   client.close();
   
   Settings settings = ImmutableSettings.settingsBuilder()
   .put(client.transport.sniff, true).build();
   TransportClient client1 = new TransportClient(settings);

 }

 After running this app javapplication,i am getting the errors like this



 In Main Method
 Mar 14, 2014 6:05:24 PM org.elasticsearch.node
 INFO: [Mister Machine] {elasticsearch/0.16.1}[11016]: initializing ...
 Mar 14, 2014 6:05:24 PM org.elasticsearch.plugins
 INFO: [Mister Machine] loaded []
 org.elasticsearch.common.inject.internal.ComputationException: 
 org.elasticsearch.common.inject.internal.ComputationException: 
 java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553)
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419)
   at 
 org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
   at 
 org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:52)
   at 
 org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:57)
   at 
 org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:377)
   at 
 org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:169)
   at 
 org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:224)
   at 
 org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:120)
   at 
 org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:105)
   at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:92)
   at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:69)
   at 
 org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:58)
   at 
 org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:146)
   at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
   at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166)
   at ES_Client.main(ES_Client.java:64)
 Caused by: org.elasticsearch.common.inject.internal.ComputationException: 
 java.lang.NoClassDefFoundError: Lorg/apache/lucene/store/Lock;
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:553)
   at 
 org.elasticsearch.common.inject.internal.MapMaker$StrategyImpl.compute(MapMaker.java:419)
   at 
 org.elasticsearch.common.inject.internal.CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
   at 
 org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:46)
   at 
 org.elasticsearch.common.inject.MembersInjectorStore.get(MembersInjectorStore.java:66)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:69)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:31)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:39)
   at 
 org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:35)
   at 
 org.elasticsearch.common.inject.internal.FailableCache$1.apply(FailableCache.java:35)
   at 
 

Re: Is this stacktrace a reason for cluster instability?

2014-03-19 Thread Georgi Ivanov
For me this exceptino is just saying that the ES couldnt convert : 
xxx-hdp13 to Date object

It sounds like just incorrect query .

So i dont think this is the reason for your troubles. 

Nore that the message is with severity DEBUG , and not Error or CRIT 

On Wednesday, March 19, 2014 11:17:23 AM UTC+1, Jelle Smet wrote:

 Hi List,

 I'm running ES 1.0.1 in a 6 node configuration.
 Some days ago we have experienced instability issues with our cluster.
 At a certain moment no documents could be indexed anymore.  After 
 restarting the indexing processes (Logstash), indexing worked again for a 
 short period of time only to stall again after a brief period.
 The ES cluster had to be restarted to restore normal behavior.  After 
 that, some (recent) index shards stayed unassigned.
 I dropped the replication value for those indexes to 0 in order to clear 
 the unassigned shards.  Enabling replication resulted  again into 
 unassigned shards for the impacted indexes.
 I have left the replication level to 0 for the troublesome indexes in 
 order to clear the cluster status.
 Meanwhile, indexing and replication works again for any newly created 
 indexes.

 What we noticed:

- The logs didn't reveal any immediate cause.  The only reported issue 
we saw in the logs prior to the incident is the below mentioned stack 
 trace.
Afterwards, we have countered this issue by creating a template which 
enforces the offending field to be treated as a string, which should 
prevent the below mentioned error.


- Our ES collected metrics revealed a sudden drop of total number of 
Java threads at the exact same moment.


 My question: Could the below mentioned stracktrace be the cause of any 
 cluster instability?

 Thanks,

 Jelle


 [2014-03-15 03:15:38,414][DEBUG][action.bulk  ] 
 [-xxx-logs-001] [logstash-2014.03.15][0] failed to execute bulk 
 item (index) index {[logstash-2014.03.15][logs][MBzzcesjQTSjUBhryTAgzQ], 
 source[{@source:tcp://:0:0:0:0:0:0:1:60186/,@tags:[],@fields:{timestamp:[Mar
  
 15 
 04:15:37],logsource:[xxx-sss01],program:[snmptrapd],snmptrapsource:[xxx-hdp04],snmptrapseverity:[INFORMATIONAL],message:[CLI/Telnet
  
 user logout: iRMC S2 CLI/Telnet user '' logout from 
 xxx.xxx.xxx.xxx]},@timestamp:2014-03-15T03:15:37.931+00:00,@message:13Mar
  
 15 04:15:37 xxx-sss01 snmptrapd: xxx-hdp04 INFORMATIONAL CLI/Telnet 
 user logout: iRMC S2 CLI/Telnet user
  '' logout from 
 xxx.xxx.xxx.xxx\n,@type:snmptrapd,@collector:[xxx-sss01.xxx.xx],@version:1}]}
 org.elasticsearch.index.mapper.MapperParsingException: failed to parse 
 [@fields.snmptrapsource]
 at 
 org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
 at 
 org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:616)
 at 
 org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
 at 
 org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:461)
 at 
 org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
 at 
 org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
 at 
 org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
 at 
 org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
 at 
 org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:371)
 at 
 org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:400)
 at 
 org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:153)
 at 
 org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
 at 
 org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed 
 to parse date field [xxx-hdp04], tried both date format 
 [dateOptionalTime], and timestamp number with locale []
 at 
 org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:582)
 at 
 org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:510)
 at 
 org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:215)
 at 
 

Re: Sort before filter?

2014-03-19 Thread Georgi Ivanov
I don't know what kind of problems you have .

You may try to post your mappings, number of documents , index count, sever 
count, server configuration (memory ?) etc.. here and we can try to think 
something.

30Gb doesnt sound so much for ES

On Wednesday, March 19, 2014 12:45:43 PM UTC+1, David Pfeffer wrote:

  I have an index that contains 30 GB worth of news stories. I want to 
 return the stories that contain a particular name in their text, sorted 
 chronologically. I only want the first 100 stories.

 ElasticSearch seems to approach this problem by filtering every story to 
 just those that match, then sorting those results and returning the top 
 100. This uses a reasonably large amount of resources to filter every 
 single one.

 Can I get ElasticSearch to instead sort first, and then filter in order 
 until it reaches the maximum (100). Granted that this would be 100 per 
 shard, but then the final step would be to take each shard's 100, sort them 
 all together, and take the top 100 of that result set. This should, at 
 least in my mind, use significantly less resources, as it would only need 
 to go through maybe 5000 or 1 items to find a match, as opposed to the 
 entirety of the index.

 *(Cross-posted 
 from 
 http://stackoverflow.com/questions/22467585/sort-before-filters-in-elasticsearch
  
 http://stackoverflow.com/questions/22467585/sort-before-filters-in-elasticsearch,
  
 because I didn't get an answer there for 2 days.)*


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/144aee16-9949-44a2-8a56-6b1f1b2f81fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


slow indexing geo_shape

2014-03-18 Thread Georgi Ivanov
Hi,
I am playing with geo_shape type .
I am experiencing very slow indexing times. For example one simple 
linestring with couple of hundred points could take up to 60 seconds to 
index.
I tries geohash and quadtree implementations.
With quadtree it is faster (like 50% faster) , but still not fast enough.

Using Java API (bulk indexing)

Mapping:

{
  entity: {
 properties: {
  id : {type: integer},
  track : {type: geo_shape,precision:20m, tree: 
quadtree},
  date : {type: date}
   }
 }
}

My ES cluster is tuned for indexing like follows:

index.refresh_interval: 30s
index.translog.flush_threshold_ops: 10
indices.memory.index_buffer_size:: 15%

threadpool.bulk.queue_size: 500
threadpool.bulk.size: 100
threadpool.bulk.type: fixed


Any tips how to make indexing faster ? 
My estimation is that for one day data i could index it for 10 hours (and i 
need to index like 3 years of data ). 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cb6b6f20-639c-4bb1-93d9-52f81658761c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.