date:20140702

Sorry. I meant on how many physical bare metal machines your 5 VMs are running?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 2 juil. 2014 à 07:59, Seungjin Lee sweetest0...@gmail.com a écrit :

yes, all same machines on which only ES with same configuration is running

2014-07-02 14:55 GMT+09:00 David Pilato da...@pilato.fr:
Are you using same physical machine for all your VMs?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 2 juil. 2014 à 07:09, Seungjin Lee sweetest0...@gmail.com a écrit :

Hi all,

I'm testing percolator performance, 50k/s is required condition with 3~4k
rules.

now I only have 1 simple rule, and 5 es vms with 1 shard and 4 replicas.

and using Java transport client like below

new TransportClient(settings)
.addTransportAddresses(transportAddressList.toArray(new
InetSocketTransportAddress[transportAddressList.size()]));

when I added just 1 address to transport client, percolator perf is about
10k/s

and when I added all 5 of vms to it, perf is about 15k/s

so it increases by about 1.5times only even though I added 4 more vm
addresses.

Is it supposed to be like this?

What I was thinking is, if it runs in for example round-robin fashion, there
should be about 5 times performance gain.

Could you comment anything on this?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL3_U412ecCDh0GONWJZy6S9zjY%2By8xvoVsPrt1849csfs-zUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/BAEDE7D4-320E-4123-A783-823FF1EC26ED%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Dealing with spam in this forum

2014-07-02 Thread Paul Brown

Hi, Clinton --

May I suggest:

- Some users (e.g., me) who read this list via an email subscription regard
ANY spam on the list as an unacceptable state of affairs. This is not a
problem with Apache lists, for example, so I would point the finger of
blame at Google Groups.

- Having N longstanding members who are willing to help ban spammers is
equivalent to having N longstanding members who are willing to quickly
admit new users. (And you're welcome to add me as N+1.)

- Banning is ineffective. Spammers will continuously sign up with new
accounts.

-- Paul

—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/

On Tue, Jul 1, 2014 at 11:36 AM, Clinton Gormley
clinton.gorm...@elasticsearch.com wrote:

Hi all

Recently we've had a few spam emails that have made it through Google's
filters, and there have been a calls for us to change to a
moderate-first-post policy. I am reluctant to adopt this policy for the
following reasons:

We get about 30 new users every day from all over the world, many of whom
are early in their learning phase and are quite stuck - they need help as
soon as possible. Fortunately this list is very active and helpful. In
contrast, we've only ever banned 34 users from the list for spamming. So
making new users wait for timezones to swing their way feels like a heavy
handed solution to a small problem. Yes, spammers are annoying but they are
a small minority on this list.

Instead, we have asked 10 of our long standing members to help us with
banning spammers. This way we have Spam Guardians active around the globe,
who only need to do something if a spammer raises their ugly head above the
parapet. One or two spam emails may get through, but hopefully somebody
will leap into action and stop their activity before it becomes too
tiresome.

This isn't an exclusive list. If you would like to be on it, feel free to
email me. Note: I expect you to be a long standing and currently active
member of this list to be included.

If this solution doesn't solve the problem, then we can reconsider
moderate-first-post, but we've managed to go 5 years without requiring it,
and I'd prefer to keep things as easy as possible for new users.

Clint

How to search the records with locations all in a polygon or multiPolygon?

2014-07-02 Thread 阙裕斌



I add the mappings and insert a record with 2 locations ([13, 13], [52, 
52]), 
and I want to search the results with it's locations all in the polygon，not 
one of the locations in the polygon. would you please tell me how to search 
the reslut? 

curl -XPOST localhost:9200/test5 -d '{ 
mappings : { 
gistype : { 
properties : { 
address:{ 
  properties:{ 
location:{type : geo_point}   
  } 
} 
} 
} 
} 
}' 

curl -XPUT 'http://localhost:9200/test5/gistype/1' -d '{ 
name: Wind  Wetter, Berlin, Germany, 
address: [ 
  { 
name:1, 
location: [13, 13] 
  }, 
  { 
name:2, 
location: [52, 52] 
  } 
] 
}' 

I searched like this , but I want to search the record locations all in the 
polygon. So it's wrong . 

curl -XGET 'http://localhost:9200/test5/gistype/_search?pretty=true' -d '{ 
  query: { 
filtered: { 
  query: { 
match_all: {} 
  }, 
  filter: { 
geo_polygon : { 
location : { 
points : [ 
{lat : 0, lon : 0}, 
{lat : 14, lon : 0}, 
{lat : 14, lon : 14}, 
{lat : 0, lon : 14} 
] 
} 
} 
  } 
} 
  } 
}'


@kimchy

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d156505-9695-4315-8826-884d8e4f1f8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Dealing with spam in this forum

2014-07-02 Thread Patrick Proniewski

Hi,

I do agree with Paul, 200%.
I've received in my mailbox at least 49 spams just for the 06/30. I won't call
this a few spam email. I'm subscribed for years on many mailing lists, and
I'm pretty sure that it would take years to get as much spam on those lists as
I get in 1 day on ES mailing list.

On 2 juil. 2014, at 08:18, Paul Brown p...@mult.ifario.us wrote:

Hi, Clinton --

May I suggest:

- Having N longstanding members who are willing to help ban spammers is
equivalent to having N longstanding members who are willing to quickly
admit new users. (And you're welcome to add me as N+1.)

- Banning is ineffective. Spammers will continuously sign up with new
accounts.

-- Paul

--
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/

On Tue, Jul 1, 2014 at 11:36 AM, Clinton Gormley
clinton.gorm...@elasticsearch.com wrote:

Hi all

This isn't an exclusive list. If you would like to be on it, feel free to
email me. Note: I expect you to be a long standing and currently active
member of this list to be included.

Clint

Re: help in query

2014-07-02 Thread surfer

oops there is an it that doesn't belong
On 07/02/2014 09:24 AM, surfer wrote:
That definitely helped it. Thank you Vineeth
Regards
giovanni

On 07/01/2014 07:19 PM, vineeth mohan wrote:
Hello Giovanni ,

I feel this will help
-
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_literal_multi_match_literal_query_2.html#_wildcards_in_field_names

Thanks
Vineeth

On Tue, Jul 1, 2014 at 10:19 PM, surfer sur...@crs4.it
mailto:sur...@crs4.it wrote:

Hi,

I'm indexing something like:

first doc= { v4 : myvalue }
second doc = { v1 : [ { v4 : myvalue }, { v5 :
anothervalue } ] }
third doc= { v1 : [ { v2 : [ {v4 : myvalue }] } ] }
fourth doc = { v1 : [ { v2 : [ { v3 : [ { v4 :
myvalue }] }
] } ] }

so nested dictionaries and array of dictionaries.

I was wondering if there is a query to obtain all the docs that have
v4 : myvalue and another condition that has to be satisfied
is that
this must happen inside a v1 dictionary and whatever number of
intermediate dictionaries (none, v2, v2 and v3) that is with the
three
docs written my query should give:

second doc, third doc and fourth doc

Any hint is appreciated
giovanni

--
You received this message because you are subscribed to the
Google Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscr...@googlegroups.com
mailto:elasticsearch%2bunsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/53B2E682.40700%40crs4.it.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscr...@googlegroups.com
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D1MiPxmqSUFaxZg7%3DsiUaUQPSkn_XWUfycz%2BqmXpEJbQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D1MiPxmqSUFaxZg7%3DsiUaUQPSkn_XWUfycz%2BqmXpEJbQ%40mail.gmail.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/53B3B3BE.3060307%40crs4.it
https://groups.google.com/d/msgid/elasticsearch/53B3B3BE.3060307%40crs4.it?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.

Re: help in query

2014-07-02 Thread surfer

That definitely helped it. Thank you Vineeth
Regards
giovanni



On 07/01/2014 07:19 PM, vineeth mohan wrote:
 Hello Giovanni ,

 I feel this will help
 - 
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_literal_multi_match_literal_query_2.html#_wildcards_in_field_names

 Thanks
   Vineeth


 On Tue, Jul 1, 2014 at 10:19 PM, surfer sur...@crs4.it
 mailto:sur...@crs4.it wrote:

 Hi,

 I'm indexing something like:

 first doc=   { v4 : myvalue }
 second doc = { v1 : [ { v4 : myvalue }, { v5 :
 anothervalue } ] }
 third doc=  { v1 : [ { v2 : [ {v4 : myvalue }] } ] }
 fourth doc  =  { v1 :  [ { v2 : [ { v3 : [ { v4 :
 myvalue }] }
 ] } ] }

 so nested dictionaries and array of dictionaries.

 I was wondering if there is a query to obtain all the docs that have
 v4 : myvalue and another condition that has to be satisfied is
 that
 this must happen inside a v1 dictionary and whatever number of
 intermediate dictionaries (none, v2, v2 and v3)  that is with the
 three
 docs written my query should give:

 second doc, third doc and fourth doc

 Any hint is appreciated
 giovanni

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearch+unsubscr...@googlegroups.com
 mailto:elasticsearch%2bunsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/53B2E682.40700%40crs4.it.
 For more options, visit https://groups.google.com/d/optout.


 -- 
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D1MiPxmqSUFaxZg7%3DsiUaUQPSkn_XWUfycz%2BqmXpEJbQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D1MiPxmqSUFaxZg7%3DsiUaUQPSkn_XWUfycz%2BqmXpEJbQ%40mail.gmail.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53B3B3BE.3060307%40crs4.it.
For more options, visit https://groups.google.com/d/optout.

Re: Queries with fields {...} don't return field with dot in their name

2014-07-02 Thread benq

Hello Vineeth,

the items that are indexed in elasticsearch really contains a field named 
response.user.

_source: {
clientip: aaa.bbb..ddd,

request: http://.aa/b/c;,

request.accept-encoding: gzip, deflate, request.accept-language: 
de-ch, response.content-type: text/html; charset=UTF-8, 

 response: 200,

response.age: 0, response.user: userAAA, @timestamp: 
2014-07-01T12:18:51.501+02:00, }


I realize there is an ambiguity between a field with a dot in its name and 
a field of a child document. Should fields with dot in their name be 
avoided?

Benoît

Le mardi 1 juillet 2014 19:17:41 UTC+2, vineeth mohan a écrit :

 Hello Ben , 

 Can you paste a sample feed.

 Thanks
Vineeth


 On Tue, Jul 1, 2014 at 8:26 PM, benq benoit@gmail.com javascript: 
 wrote:

 Hi all,

 I have a query that specify the fields to be returned as described here: 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
 However, it does not return the fields with a dot in their name, like 
 response.user.

 For example, 
 Ex:
 {
   size: 1000,
   fields: [@timestamp, request, response, response.user, 
 clientip],
   query: {match_all: {} },
   filter: {
 and: [
   { range: { @timestamp: { from: ... 

 ]
   }
 }

 The timestamp, request, response and clientip fields are returned. The 
 response.user is not.

 Any idea why?

 Regards,
 Benoît

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/cc08476f-0c6e-47bc-870a-2008386636c5%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/cc08476f-0c6e-47bc-870a-2008386636c5%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86f48242-6514-4d4b-9809-362e18af1d95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: limitation of 2,147,483,647 terms per segment index in Lucene

2014-07-02 Thread simonw

Peter, thanks so much for raising this. This looks aweful! I think we 
should move this into an issue on [1] (please feel free to create one) IMO 
we should aim to name the issue in a way to prevent this from happening 
altogether. Along the lines we should help you to recover but I don't know 
how tricky it will be. Lets start with the issue!!

simon


[1] https://github.com/elasticsearch/elasticsearch/issues

On Monday, June 30, 2014 5:49:32 PM UTC+2, Peter Portante wrote:

 Is there a way to recover a segment index of a shard that has exceeded 
 Lucene's 2^31 limit?

 Thanks,
 -peter


 [2014-06-30 10:53:02,187][WARN ][indices.cluster  ] [Patriots] 
 [vos][0] failed to start shard
 org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
 [vos][0] failed recovery
 at 
 org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.IllegalArgumentException: Too many documents, 
 composite IndexReaders cannot exceed 2147483647
 at 
 org.apache.lucene.index.BaseCompositeReader.init(BaseCompositeReader.java:77)
 at 
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:369)
 at 
 org.apache.lucene.index.StandardDirectoryReader.init(StandardDirectoryReader.java:43)
 at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:115)
 at 
 org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:385)
 at 
 org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:112)
 at 
 org.apache.lucene.search.SearcherManager.init(SearcherManager.java:89)
 at 
 org.elasticsearch.index.engine.internal.InternalEngine.buildSearchManager(InternalEngine.java:1364)
 at 
 org.elasticsearch.index.engine.internal.InternalEngine.start(InternalEngine.java:291)
 at 
 org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:709)
 at 
 org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:204)
 at 
 org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
 ... 3 more
 [2014-06-30 10:53:02,213][WARN ][cluster.action.shard ] [Patriots] 
 [vos][0] sending failed shard for [vos][0], node[bS9Lp_a9QZOjiab23Ztk4A], 
 [P], s[INITIALIZING], indexUUID [a0_HJrlgQq-UNCwL2QiVbg], reason [Failed to 
 start shard, message [IndexShardGatewayRecoveryException[[vos][0] failed 
 recovery]; nested: IllegalArgumentException[Too many documents, composite 
 IndexReaders cannot exceed 2147483647]; ]]
 [2014-06-30 10:53:02,213][WARN ][cluster.action.shard ] [Patriots] 
 [vos][0] received shard failed for [vos][0], node[bS9Lp_a9QZOjiab23Ztk4A], 
 [P], s[INITIALIZING], indexUUID [a0_HJrlgQq-UNCwL2QiVbg], reason [Failed to 
 start shard, message [IndexShardGatewayRecoveryException[[vos][0] failed 
 recovery]; nested: IllegalArgumentException[Too many documents, composite 
 IndexReaders cannot exceed 2147483647]; ]]



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01253247-c16b-44cb-aa61-d02bc20f44eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Min Hard Drive Requirements

2014-07-02 Thread Ophir Michaeli

 

Hi all,

I'm testing the indexing of 100 million documents, it took about 400GB of 
the hard drive.
Is there a minimum free hard drive space needed for the index to work OK?
I'm asking because after we indexed 100 million documents we tested the 
index and it worked OK, 
but then when trying to optimize the optimize took days and then the index 
did not respond.
The hard drive had only 10 GB free space so we tried to copy the index to a 
new hard drive with a bigger free space, but the index is still not 
functioning.

Thank you,
Ophir

  

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

Hello,

I try to indexing datetime mysql like this : 2013-05-01 00:00:00 
In ES it's represented like this : 2013-05-01T00:00:00.000Z
The real problem seems to be when I index this date : -00-00 00:00:00

I have used this mapping :

type:date,
 format:-MM-dd HH:mm:ss||MM/dd/||/MM/dd,
 index:not_analyzed


I have obtained this error :

[2014-07-02 10:11:56,503][INFO ][cluster.metadata ] [ik-test2] 
[_river] update_mapping [source] (dynamic)
can not be represented as java.sql.Timestamp
java.io.IOException: java.sql.SQLException: Value '7918-00-00 00:00:00 
 ...

 can not be represented as java.sql.Timestamp
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:920)
at 
com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1102)
at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:576)
at 
com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6592)
at 
com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:6192)
at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5058)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:590)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:565)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:356)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:257)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:228)
... 3 more
[2014-07-02 10:11:56,633][WARN 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] 
aborting river
[2014-07-02 10:12:01,392][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [1] of [69 items], 1 outstanding bulk requests
[2014-07-02 10:12:01,437][INFO ][cluster.metadata ] [ik-test2] 
[my_index] update_mapping [source] (dynamic)



Can you help me, with my problem ?

Thank to you in advance.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a572cbaa-5304-480d-9fc1-2e1783c36cea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Min Hard Drive Requirements

It will work until it's full, but then ES will fall over.
Merging does require a certain amount of disk space, usually the same
amount as the segment that is being merged as it has to take a copy of the
shard to work on. So for a 10GB segment, you'd need at least 10GB free.

How many shards do you have for the index, or how many are you trying to
optimise (merge) down to?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 2 July 2014 18:13, Ophir Michaeli ophirmicha...@gmail.com wrote:

Hi all,

I'm testing the indexing of 100 million documents, it took about 400GB of
the hard drive.
Is there a minimum free hard drive space needed for the index to work OK?
I'm asking because after we indexed 100 million documents we tested the
index and it worked OK,
but then when trying to optimize the optimize took days and then the index
did not respond.
The hard drive had only 10 GB free space so we tried to copy the index to
a new hard drive with a bigger free space, but the index is still not
functioning.

Thank you,
Ophir

One date field mapping for two different locale

2014-07-02 Thread Jahrit

Here's the problem.

I have data with date field that can be either in english or german date
format (or rather week and month naming convention). 

I.e.Mittwoch, 18. Juni 2012 or Wednesday, 18. June 2012

I can set up separate mappings for separate fileds for each nation's date.

{
website : {
properties : {
date_en : {
type : date, 
format : EEE, dd. MMM , 
locale : US
}
}
}
}

{
website : {
properties : {
date_de : {
type : date, 
format : EEE, dd. MMM , 
locale : DE
}
}
}
}

And this work properly, untill I will try to put engilsh date into german
date field and backwards.

I do not have option to receive some additional data with date's locale
information.

What I need is the option to recognize each type of date, save it internally
in timestamp (for sorting) and do that in one and the same field (because of
sorting and field naming convention).

Something like this.
{
website : {
properties : {
date : {
type : date, 
format : EEE, dd. MMM , 
locale : US||DE
}
}
}
}

What I want to achive is be able to sort all documents with different
country dates by date. Maybe there can be different approach. I'll gladly
try other solutions.




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/One-date-field-mapping-for-two-different-locale-tp4059118.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1404289689171-4059118.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Shards considered in write consistency

2014-07-02 Thread Varun Vasan V

Hey,

I have a question related to write consistency.

I have a elasticsearch cluster with 2 nodes. The nodes are configured as 

number_of_shards = 5
number_of_replicas = 1

If i set the action.write_consistency value as `quorum`, what is the number 
of active shards required to satisfy the quorum?
Please shed some light into the matter.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0029734f-7e86-47b5-8bf1-f0ebe6b6ad2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

As made, when I index date -00-00 00:00:00 the indexing stop completly 
with an error. (the begin work and stop instantly) 
I have tried to put (mapping) the type : string to my date but it doesn't 
work

Have you an idea to solve my problem ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

2014-07-02 Thread vineeth mohan

Hi Tanguy ,

How is this a valid date string - java.io.IOException:
java.sql.SQLException: Value '7918-00-00 00:00:00 ?
This value cant be mapped to any date format or is valid in anyway.

Thanks
Vineth

On Wed, Jul 2, 2014 at 3:21 PM, Tanguy Bernard bernardtanguy1...@gmail.com
wrote:

As made, when I index date -00-00 00:00:00 the indexing stop completly
with an error. (the begin work and stop instantly)
I have tried to put (mapping) the type : string to my date but it doesn't
work

Have you an idea to solve my problem ?

For more options, visit https://groups.google.com/d/optout.

Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

2014-07-02 Thread Clinton Gormley

What you can do is to set the mapping for the date field to have:

{ type: date, format: -MM-dd HH:mm:ss, ignore_malformed:
true }

then it will just ignore those invalid dates rather than throwing an error

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSnThkYccGcfTCQYQ%3DJiyiQ_jS1Aq8Tmu9_4x%2Bm_XVQTg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

In my mysql table (type : datetime) :

| date_source |
+-+
| 2008-09-15 18:29:07 |
| 2013-08-29 00:00:00 |
| 2013-07-04 00:00:00 |
| 2013-07-17 00:00:00 |
| 2013-07-17 00:00:00 |
| -00-00 00:00:00 |
...
If I use a mapping (type :string)

And I index :

PUT /_river/test/_meta
{
type : jdbc,
jdbc : {

url : jdbc:mysql://ip:port/database,
user : user,
password : password,
sql : select id_source as _id, title_source, date_source from 
source,  *// if I add this where date_source not like '%%', it's 
work but values miss for this date*
index : test,
type : source,
max_bulk_requests : 5  


}}




Le mercredi 2 juillet 2014 12:09:58 UTC+2, vineeth mohan a écrit :

 Hi Tanguy ,

 How is this a valid date string - java.io.IOException: 
 java.sql.SQLException: Value '7918-00-00 00:00:00   ?
 This value cant be mapped to any date format or is valid in anyway.

 Thanks
  Vineth




 On Wed, Jul 2, 2014 at 3:21 PM, Tanguy Bernard bernardt...@gmail.com 
 javascript: wrote:

 As made, when I index date -00-00 00:00:00 the indexing stop 
 completly with an error. (the begin work and stop instantly) 
 I have tried to put (mapping) the type : string to my date but it doesn't 
 work

 Have you an idea to solve my problem ?

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1fc6c18b-a192-4972-92b6-9210be9c46aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Index missing error Eelasticseach java

2014-07-02 Thread venuchitta

Hi,

I am new to elasticsearch. I am using JAVA Api to establish connection
with ES.

public void createIndex(final String index) {

getClient().admin().indices().prepareCreate(index).execute().actionGet();
}


public void createLocalCluster(final String clusterName) {
NodeBuilder builder = NodeBuilder.nodeBuilder();
Settings settings = ImmutableSettings.settingsBuilder()
.put(gateway.type, none)
.put(cluster.name, clusterName)
.build();
builder.settings(settings).local(false).data(true);
this.node = builder.node();
this.client = node.client();
}

public boolean existsIndex(final String index) {
IndicesExistsResponse response =
getClient().admin().indices().prepareExists(index).execute().actionGet();
return response.isExists(); 
}

public void openIndex(String name){

getClient().admin().indices().prepareOpen(name).execute().actionGet();
}

createLocalCluster(cerES);
createIndex(news);
System.out.println(existsIndex(news));

When i execute the above java code iam getting true response. But when i
close the java program and start the program again with the following code:
openIndex(news);

It is throwing IndexMissingException.But i can see the news index in Data
folder of eclipse. So how i retreive data from the node previously?. Is it
lost? or am i wrong somewhere?





--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Index-missing-error-Eelasticseach-java-tp4059080.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1404254107251-4059080.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Dealing with spam in this forum

2014-07-02 Thread Clinton Gormley



 I've received in my mailbox at least 49 spams just for the 06/30. I won't 
 call this a few spam email. I'm subscribed for years on many mailing 
 lists, and I'm pretty sure that it would take years to get as much spam on 
 those lists as I get in 1 day on ES mailing list. 


That's interesting... I'd only seen three spam emails, so I wondered where 
you got 49 from. I read the posts from my gmail account, so then I checked 
my spam folder and sure enough there were a lot more emails in there that I 
was unaware of.

I'm going to disable my spam filter for this group so that I get more 
visibility, and I'd ask other moderators to do the same.

Let's see how it goes for a while longer. We can always revisit this 
decision later on.

clint




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e790230-b81f-416b-b7b8-ef1589fb399c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

What this date is supposed to represent? 
month = 0 or day = 0 does not exist, right? 

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 2 juillet 2014 à 12:29:29, Tanguy Bernard (bernardtanguy1...@gmail.com) a 
écrit:

In my mysql table (type : datetime) :

| date_source         |
+-+
| 2008-09-15 18:29:07 |
| 2013-08-29 00:00:00 |
| 2013-07-04 00:00:00 |
| 2013-07-17 00:00:00 |
| 2013-07-17 00:00:00 |
| -00-00 00:00:00 |
...
If I use a mapping (type :string)

And I index :

PUT /_river/test/_meta
{
type : jdbc,
jdbc : {

    url : jdbc:mysql://ip:port/database,
    user : user,
    password : password,
    sql : select id_source as _id, title_source, date_source from source,  
// if I add this where date_source not like '%%', it's work but values 
miss for this date
    index : test,
    type : source,
    max_bulk_requests : 5  


}}




Le mercredi 2 juillet 2014 12:09:58 UTC+2, vineeth mohan a écrit :
Hi Tanguy ,

How is this a valid date string - java.io.IOException: java.sql.SQLException: 
Value '7918-00-00 00:00:00   ?
This value cant be mapped to any date format or is valid in anyway.

Thanks
         Vineth




On Wed, Jul 2, 2014 at 3:21 PM, Tanguy Bernard bernardt...@gmail.com wrote:
As made, when I index date -00-00 00:00:00 the indexing stop completly with 
an error. (the begin work and stop instantly) 
I have tried to put (mapping) the type : string to my date but it doesn't work

Have you an idea to solve my problem ?
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1fc6c18b-a192-4972-92b6-9210be9c46aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53b3e1b3.431bd7b7.f8fb%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: Shards considered in write consistency

In your case quorum means that you need all primaries to be allocated.
Which is the case here.

Doc explains that very well:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency
Have a look in detail at:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_how_primary_and_replica_shards_interact.html#_how_primary_and_replica_shards_interact
and
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-write.html

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 2 juillet 2014 à 11:40:19, Varun Vasan V (varunvasa...@gmail.com) a écrit:

Hey,

I have a question related to write consistency.

I have a elasticsearch cluster with 2 nodes. The nodes are configured as

number_of_shards = 5
number_of_replicas = 1

If i set the action.write_consistency value as `quorum`, what is the number of
active shards required to satisfy the quorum?
Please shed some light into the matter.

Thanks.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0029734f-7e86-47b5-8bf1-f0ebe6b6ad2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

have we a way to use highlight and fuzzy together ?

Hello
Everything is on subject
I have to use fuzzy for my fileds (title,content) and when I'm searching I 
want to see a part of the sentance where my keyword is.

This, together, doesn't work:
$params['body']['highlight']['fields'][$value]['fragment_size']=30;
$params['body']['query']['fuzzy']=0.2;

Have we a way to use highlight and fuzzy together or an other way 
equivalent ?


Thank to you in advance.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6628cb4-7e2c-4e21-b578-a14865450a83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

This date is created when a document is created, but an error occur and I 
have this -00-00 ^^
I'm in company while exist since 10 years, the database is old and they are 
this kind of error.

For the moment, I will use :

sql : select id_source as _id, title_source, date_source from source,  *// 
if I add this where date_source not like '%%', it's work but values 
miss for this date*
Or not index date_source. My goal was to sort my result with date_source.

Le mercredi 2 juillet 2014 12:40:58 UTC+2, David Pilato a écrit :

 What this date is supposed to represent? 
 month = 0 or day = 0 does not exist, right? 

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

I would recommend updating the SQL database! :)

So may be update all dates where date is -00-00 to 1970-01-01 if it fits
with your use case.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 2 juillet 2014 à 12:54:36, Tanguy Bernard (bernardtanguy1...@gmail.com) a
écrit:

This date is created when a document is created, but an error occur and I have
this -00-00 ^^
I'm in company while exist since 10 years, the database is old and they are
this kind of error.

For the moment, I will use :

sql : select id_source as _id, title_source, date_source from source, //
if I add this where date_source not like '%%', it's work but values miss
for this date
Or not index date_source. My goal was to sort my result with date_source.

Le mercredi 2 juillet 2014 12:40:58 UTC+2, David Pilato a écrit :
What this date is supposed to represent?
month = 0 or day = 0 does not exist, right?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] ElasticUI AngularJS Directives - Easily Build an Interface on top of Elasticsearch

2014-07-02 Thread Yousef El-Dardiry

Hi all,

I just open sourced a set of AngularJS Directives for Elasticsearch. It 
enables developers to rapidly build a frontend (e.g.: faceted search 
engine) on top of Elasticsearch.

http://www.elasticui.com (or github https://github.com/YousefED/ElasticUI)

It makes creating an aggregation and listing the buckets as simple as:

*ul 
eui-aggregation=ejs.TermsAggregation('text_agg').field('text').size(10)*
*li ng-repeat=bucket in aggResult.buckets{{bucket}}/li*
*/ul*

I think this was currently missing in the ecosystem, which is why I decided 
to build and open source it. I'd love any kind of feedback.

- Yousef

*-*
Another example; add a checkbox facet based on a field using one of the 
built-in widgets 
https://github.com/YousefED/ElasticUI/blob/master/docs/widgets.md:

*eui-checklist field='facet_field' size=10/eui-checklist*

Resulting in
[image: checklist screenshot]

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7677d4fb-b340-4957-a7e6-ef4ef5e8347e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: _all analyzer advice

Ah. Cheers.
I had looked at that page a few times but missed that.

On Tuesday, 1 July 2014 19:04:56 UTC+1, Glen Smith wrote:


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html

 On Tuesday, July 1, 2014 6:23:54 AM UTC-4, mooky wrote:

 Thanks.
 So default_index and default_search have special meaning.
 Is this in the docs anywhere?

 -N



 On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote:

 Totally. For example:

 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, lowercase]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, lowercase, stop]
 },


 On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote:

 Excellent. Thanks for the info.

 Is it possible to set my custom analyser as the default analyser for an 
 index (ie instead of standard_analyzer)

 -N

 On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote:

 You can set up an analyser for your index...

 ...
 my-index: {
 analysis: {
 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 custom_index: {
 tokenizer: whitespace,
 filter: [lower]
 },
 custom_search: {
 tokenizer: whitespace,
 filter: [lower]
 }
 }
 }
 }
 ...

 and then map your relevant field accordingly:

 {
 _timestamp: {
 enabled: true,
 store: yes
 },
 properties: {
 my_field: {
 type: string,
 index_analyzer: custom_index,
 search_analyzer: custom_search
 }
 }
 }


 Note that you can (and often should) set up index analysis and search 
 analysis differently (eg if you use synonyms, only expand search terms).

 Hope I haven't missed the point...

 On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote:

 Hi all,

 I have a google-style search capability in my app that uses the _all 
 field with the default (standard) analyzer (I don't configure anything - 
 so 
 its Elastic's default).

 There are a few cases where we don't quite get the behaviour we want, 
 and I am trying to work out how I tweak the analyzer configuration.

 1) if the user searches using 99.97, then they get the results they 
 expect, but if they search using 99.97%, they get nothing. They should 
 get 
 the results that match 99.97%. The default analyzer config loses the 
 %, I 
 guess.

 2) I have no idea what the text is ( : ) ) but the user wants to 
 search using 托克金通贸易 - which is in the data - but currently we get zero 
 results. It looks like the standard analyzer/tokenizer breaks on each 
 character.

 I *_think_* I just want a whitespace analyzer with lower-casing 
 However, 
 a) I am not exactly sure how to configure that, and;
 b) I am not 100% sure what I am losing/gaining vs standard analyzer. 
 (dont need stop-words - in any case default cfg for standard analyser 
 doesn't have any IIRC)

 (FWIW, on all our other text fields, we tend to use no analyzer)

 (Elastic 1.1.1 and 1.2 ...)

 Cheers.
 -M



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6796a0dc-5eaa-4db4-ab47-400215743c61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

Yes, it's just some date. I think that it can be update quickly. It's the
better way :)
Thank you all.

Le mercredi 2 juillet 2014 12:56:59 UTC+2, David Pilato a écrit :

I would recommend updating the SQL database! :)

So may be update all dates where date is -00-00 to 1970-01-01 if it
fits with your use case.

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr

Le 2 juillet 2014 à 12:54:36, Tanguy Bernard (bernardt...@gmail.com
javascript:) a écrit:

This date is created when a document is created, but an error occur and I
have this -00-00 ^^
I'm in company while exist since 10 years, the database is old and they
are this kind of error.

For the moment, I will use :

sql : select id_source as _id, title_source, date_source from source,
*//
if I add this where date_source not like '%%', it's work but values
miss for this date*
Or not index date_source. My goal was to sort my result with date_source.

Le mercredi 2 juillet 2014 12:40:58 UTC+2, David Pilato a écrit :

What this date is supposed to represent?
month = 0 or day = 0 does not exist, right?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr

https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] ElasticUI AngularJS Directives - Easily Build an Interface on top of Elasticsearch

Very cool, I'll pass this onto some of our devs :)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 2 July 2014 20:56, Yousef El-Dardiry yousefdard...@gmail.com wrote:

Hi all,

I just open sourced a set of AngularJS Directives for Elasticsearch. It
enables developers to rapidly build a frontend (e.g.: faceted search
engine) on top of Elasticsearch.

http://www.elasticui.com (or github
https://github.com/YousefED/ElasticUI)

It makes creating an aggregation and listing the buckets as simple as:

*ul
eui-aggregation=ejs.TermsAggregation('text_agg').field('text').size(10)*
*li ng-repeat=bucket in aggResult.buckets{{bucket}}/li*
*/ul*

I think this was currently missing in the ecosystem, which is why I
decided to build and open source it. I'd love any kind of feedback.

- Yousef

*-*
Another example; add a checkbox facet based on a field using one of the
built-in widgets
https://github.com/YousefED/ElasticUI/blob/master/docs/widgets.md:

*eui-checklist field='facet_field' size=10/eui-checklist*

Resulting in
[image: checklist screenshot]

spellcheck and completion suggester or what?

2014-07-02 Thread Bernd Fehling

Hi group, I have a special problem which I'm trying to solve.
I need search suggestions while typing text into a search box.
I tried different settings and options with ES, including term suggester, 
completion suggester and so on, but no success.

What I'm looking for is if I type in a search and I have already typed 
dan that I get suggestions like:
{
   
   - responseHeader: 
   {
  - status: 0,
  - QTime: 11
  },
   - spellcheck: 
   {
  - suggestions: 
  [
 - 
 [
- dan,
- 
{
   - numFound: 9,
   - startOffset: 0,
   - endOffset: 3,
   - suggestion: 
   [
  - dana,
  - danach,
  - danckert,
  - dando,
  - danger,
  - dangos,
  - danguolė,
  - daniel,
  - danish
  ]
   }
]
 ]
  }
   
}

So just the suggestions in alphabetical order from the index.
The above example is from SOLR but I need this feature for ES.
Any idea how to achive this?

Regards, Bernd

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3cd3b917-7680-416c-b92e-e0ee0a91a601%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Something I am finding difficult, using Aggregations

Having used elastic aggregations for a little bit (and having used Mongo
aggregations previously), I have been finding a couple of things a bit
difficult/awkward.
I am not sure if its because I don't know how to do it properly - or we
missing a feature/enhancement in elastic.

A common thing I want to do is aggregate on field x, but in the result, I
also want field y z (which are unique for a given x) - there doesn't seem
to be an easy way to do that.

Lets say I have some data:
{
id : 94538ef6-2998-4ddd-be00-1f5dc2654955,
quantity : 1234567.2342,
commodityId : 0e918fb8-6572-4663-a692-cbebe8aca7f2,
commodityName : Lead,
ownerId : 53e0f816-8a0a-4659-b868-c48035676b25,
ownerName : Simon Chan,
locationId : 1cdd4bc7-76d9-43fb-ac56-8f555164211a,
locationName : Shenyang - Shenyang Dongbei,
locationCode : W33,
locationCity : Shenyang,
locationCountry : China
}

Lets say I want to do a (term) aggregation on ownerId (because its unique,
while ownerName obviously is not) I will get results where the bucket key
is the id. However, what I want to display to the user is the ownerName -
not the id. Looking up the name from the id could be very expensive - but
its also unnecessary because the name will be unique for a given bucket -
we have the info to hand in the index. The same issue if I want to
aggregate by locationId, or commodityId. We dereference the data associated
with an id, so that we can search on them - but also we want to use this
information to create a label for a bucket when we aggregate.

Is there a simple way to retrieve ownerName while aggregating on ownerId?
The only way I know to do this is to:
a) make sure owner name is not_analyzed and
b) do a term subaggregation - which will give only 1 result.
Is there an easier way I have missed?

(FWIW doing the same thing in, say, a Mongo aggregation is simply a matter
of adding the ownerName as a key field - since its unique for a given id,
it wont change the aggregation results - the ownerName info is simply
extracted from the key data in the result).

Cheers,
M

Re: Realtime search + fast indexing

One thing you can consider is calling refresh() after indexing - which has
the effect I think you are looking for.
There are probably some performance considerations others here can comment
on better than I.
In any case, calling refresh() is what we do.

On Thursday, 26 June 2014 10:25:12 UTC+1, Nico Krijnen wrote:

Hi,

We have recently migrated our application from 'bare Lucene + Zoie for
realtime search' to Elastic Search. Elastic search is awesome and next to
scalability, it gives us lots of additional features. The one thing we
really miss though is realtime search.

Search is the core of our application. All our data is stored in the index
(primary data store). When a user adds a file or makes a change, their
subsequent search must reflect that change. With Zoie, the data was indexed
very quickly into a temporary Lucene memory index. Not having to write+read
it on disk makes the documents available for search much faster than NRT
Lucene. The memory index is flushed to disk asynchrounously from time to
time, not impacting indexing or search performance. Zoie also allows you to
wait for a specific 'version of the index' to be available for searching.
That way we could make the user's thread wait until their data was indexed
in memory, only pausing the thread of that user without having any
performance impact for all the other users.

Result: realtime search and insanely fast indexing.

With Elastic Search we have to do a refresh to make data available for
search. Lots of refreshes or the 1 second refresh interval will cause
significant slower indexing speed. We don't know beforehand when our users
will import documents or make lots of changes, so we cannot really increase
the refresh interval when needed to make indexing faster. We know that
'get' is realtime and we make use of that as much as possible, but in lots
of cases we really require a search to find the data.

Our plan is to implement some mechanism in Elastic Search to get the same
realtime search + fast indexing behavior that we had with Zoie. We need
some pointers though on what would be the best place in Elastic Search to
do something like this. After all it hooks into low level Elastic Search
and Lucene stuff.

I can imagine that 'realtime-search while indexing' is important for many
other Elastic Search users too. What are the chances of something like this
getting merged back into the main branch?

I'm planning to be at the Friday drinks tomorrow in Amsterdam. Is there
anyone attending with whom I could do some sparring with on this matter?

Thanks,
Nico

cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)

2014-07-02 Thread Andrew Davidoff

Hi,

I am trying to use cluster.routing.allocation.enable to speed up node 
restarts. As I understand it, if I set cluster.routing.allocation.enable to 
none, restart a node, then set cluster.routing.allocation.enable to 
all, the shards that go UNASSIGNED when the node goes down should start 
back up on the same node they were assigned to previously. But in practice 
when I do this, the shards get assigned across the entire cluster when I 
set cluster.routing.allocation.enable back to all, and then after that, 
some amount of rebalancing happens.

How can I avoid this, and make shards on a restarted node come back on the 
same node?

To be clear, here's exactly the sequence of events:

1) curl -XPUT -s $host:$port/_cluster/settings?pretty=1 -d 
'{persistent:{cluster.routing.allocation.enable: none}}'
2) service elasticsearch stop on one node of a 3 node cluster 
(discovery.zen.minimum_master_nodes: 2)
3) shards that were assigned to the now stopped node show as UNASSIGNED
4) service elasticsearch start on the same node as in (2)
5) wait a few minutes - shards mentioned in (3) still show as UNASSIGNED, 
each node sees the full cluster (/_cat/nodes)
6) curl -XPUT -s $host:$port/_cluster/settings?pretty=1 -d 
'{persistent:{cluster.routing.allocation.enable: all}}'
7) UNASSIGNED shards mentioned in (3) begin being assigned across all nodes 
in the cluster
8) After all UNASSIGNED nodes are assigned, some start rebalancing 
(migrating to other nodes)
9) Cluster is happy

The amount of data in this cluster is very large, and this process can take 
close to 24 hours. So I'd like very much to avoid that for routine restarts.

Thanks.
Andy

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/34bb65f7-a286-46f7-a9a1-5f4e72f06926%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Dealing with spam in this forum

I fall on the side of caring less about spam emails (since I have decent
spam filter on my email) and would rate easy access to the group much
higher.
I tend to add/remove myself from groups all the time - so adding a delay to
adding myself to a group with be a big PITA for me.

-M

On Wednesday, 2 July 2014 11:34:05 UTC+1, Clinton Gormley wrote:

I've received in my mailbox at least 49 spams just for the 06/30. I won't
call this a few spam email. I'm subscribed for years on many mailing
lists, and I'm pretty sure that it would take years to get as much spam on
those lists as I get in 1 day on ES mailing list.

That's interesting... I'd only seen three spam emails, so I wondered where
you got 49 from. I read the posts from my gmail account, so then I checked
my spam folder and sure enough there were a lot more emails in there that I
was unaware of.

I'm going to disable my spam filter for this group so that I get more
visibility, and I'd ask other moderators to do the same.

Let's see how it goes for a while longer. We can always revisit this
decision later on.

clint

Looking to build a logging solution with threshold alerting.

2014-07-02 Thread Joshua Hall

I am looking to build a logging solution and wanted to make sure that I am
not missing any key components.

The logs that I have are currently stored in a database which there is
limited access due to locking risks from bad queries.

My plan is to have the dba's write the logs from the database tables to a
file on a set interval then have logstash pick up the logs and write it to
elastic search. Then for viewing/searching the logs I will be using
kibana. Everything up to this point I have been able to make a proof of
concept for but the other request was to have alerting.

I have spent some time looking at this and the general response seems to be
to use percolation, but that seems to only make sense if you want to send
an alert if you receive a single error that matches a query and from what I
have seen there is no way to a threshold alerting system using percolation.

My thought to solve the threshold alerting is to create a simple web UI
that allows the user to enter in a query to search for, a threshold, a time
frame, and emails to send the alert to that would get stored in elastic
search. Then an app (Running as a windows service or cron job) that pulls
the alerts and then runs the queries and checks the time-frame and
threshold (Would run on some interval). If the count surpasses the
threshold then it would send an email to values stored in the email
addresses.

I know that SPM seems to cover this and move but we are currently looking
to see if we can do this without buying another product.

Is this the correct approach to take or should I be looking at doing
something else?

Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

2014-07-02 Thread Brian

Patrick,

* Well, I did answer your question. But probably not from the direction
you expected. hmm no, you didn't. My question was: it looks like I cant
retrieve/display [_all fields] content. Any idea? and you replied with
your logstash template where _all is disabled. I'm interested in disabling
_all, but that was not my question at this point.*

Fair enough. I don't know the inner details; I am just an enthusiastic end
user.

To the best of my knowledge, there is no content for the _all field; I view
this as an Elasticsearch psuedo field whose name is _all and whose index
terms are taken from all fields (by default), but still there is no actual
content for it.

And after I got into the habit of disabling the _all field, my hands-on
exploration of its nuances have ended. It's time for the experts to explain!

*Your answer to my second message, below, is informative and interesting
but fails to answer my second question too. I simply asked whether I need
to feed the complete modified mapping of my template or if I can just push
the modified part (ie. the _all:{enabled: false} part). *

Again, I have never done this, so I can only tell you what I do. I just
cannot tell you all the nuances of what Elasticsearch is capable of.

My recommendation is to try it. Elasticsearch is great at letting you
experiment and then telling you clearly if your attempt succeeds or fails.

So, try your scenario. If it fails, then it didn't work or you did
something wrong. If it succeeds, then you can see exactly what
Elasticsearch actually accepted as your mapping. For example:

curl 'http://localhost:9200/logstash-2014.06.30/_mapping?pretty=true'
echo

This particular query looks at one of my logstash-generated indices, and it
lets me verify that Elasticsearch and Logstash conspired to create the
mappings I expected. I used this command quite a bit until I finally got
everything configured correctly. (I actually verify the mapping via
Elasticsearch Head, but under the covers it's the same command.)

Brian

Problem accessing fields from a native script during percolation

2014-07-02 Thread petchemaite

  Hi all,

We're trying to figure out how to access fields from within a native 
AbstractSearchScript when it's called from a percolate request that 
contains the document to percolate.
We tried to use source mechanism and stored fields to no avail (no errors, 
but no matches). 
The same scripts are working fine for classic searches.

This was tried with Elasticsearch release 1.1.1 and a snapshot of 2.0.0.

We're running out of ideas, any help would be really appreciated
Thanks!


curl -XPOST http://localhost:9200/index1
curl -XPOST http://localhost:9200/index1/mytype/_mapping -d '{
  properties: {
source_field: { type: string },
stored_field: { type: string, stored: true }
  }
}'

curl -XPUT http://localhost:9200/index1/.percolator/1; -d '{
  query: {
filtered: {
  query : { match_all : {}},
  filter: {
script: {
  script: cooccurenceScript,
  params: {
map: { list : [ a ] }
  },
  lang: native
}
  }
}
  }
}'

curl -XPUT http://localhost:9200/index1/.percolator/2; -d '{
  query: {
filtered: {
  query : { match_all : {}},
  filter: {
script: {
  script: cooccurenceStoredScript,
  params: {
map: { list : [ a ] }
  },
  lang: native
}
  }
}
  }
}'

Native scripts:
package test;

import java.util.Map;
import org.elasticsearch.common.Nullable;
import org.elasticsearch.common.component.AbstractComponent;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.node.Node;
import org.elasticsearch.script.ExecutableScript;
import org.elasticsearch.script.NativeScriptFactory;


public class CooccurenceScriptFactory extends AbstractComponent implements 
NativeScriptFactory{

private final Node node;

@SuppressWarnings(unchecked)
@Inject
public CooccurenceScriptFactory(Node node, Settings settings) {
super(settings);
this.node = node;
}

@Override public ExecutableScript newScript (@Nullable 
MapString,Object params){
return new CooccurenceScript(node.client(), logger, params);
  }
}

package test;

import org.elasticsearch.ElasticsearchIllegalArgumentException;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.Nullable;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.xcontent.support.XContentMapValues;
import org.elasticsearch.script.AbstractSearchScript;
import org.elasticsearch.search.lookup.SourceLookup;

import java.util.List;
import java.util.Map;


public class CooccurenceScript extends AbstractSearchScript {

private ListString list = null;

@SuppressWarnings(unchecked)
public CooccurenceScript(Client client, ESLogger logger, @Nullable 
MapString,Object params) {
MapString, Object map = params == null ? null : 
XContentMapValues.nodeMapValue(params.get(map), null);
if (map == null) {
throw new ElasticsearchIllegalArgumentException(Missing the 
map parameter);
}
list = (ListString) map.get(list);
if (list == null || list.isEmpty()) {
throw new ElasticsearchIllegalArgumentException(Missing the 
list parameter or list is empty);
}
}

@Override
public java.lang.Object run() {
SourceLookup source = source();
@SuppressWarnings(unchecked)
ListObject values = (ListObject) source.get(source_field);
if (values == null || values.isEmpty()) {
return false;
}
for (Object localValue : values) {
boolean result = true;
for (String s : list) {
result = ((String) localValue).contains(s);
}
if (result) {
return true;
}
}
return false;
}

}

package test;

import org.elasticsearch.common.Nullable;
import org.elasticsearch.common.component.AbstractComponent;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.node.Node;
import org.elasticsearch.script.ExecutableScript;
import org.elasticsearch.script.NativeScriptFactory;

import java.util.Map;


public class CooccurenceStoredScriptFactory extends AbstractComponent 
implements NativeScriptFactory{

private final Node node;

@SuppressWarnings(unchecked)
@Inject
public CooccurenceStoredScriptFactory(Node node, Settings settings) {
super(settings);
this.node = node;
}

@Override public ExecutableScript newScript (@Nullable 
MapString,Object params){
return new CooccurenceStoredScript(node.client(), logger, params);
  }
}

package test;

import org.elasticsearch.ElasticsearchIllegalArgumentException;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.Nullable;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.xcontent.support.XContentMapValues;
import

Re: have we a way to use highlight and fuzzy together ?

2014-07-02 Thread Nikolas Everett

On Wed, Jul 2, 2014 at 6:47 AM, Tanguy Bernard bernardtanguy1...@gmail.com
wrote:

Hello
Everything is on subject
I have to use fuzzy for my fileds (title,content) and when I'm searching I
want to see a part of the sentance where my keyword is.

This, together, doesn't work:
$params['body']['highlight']['fields'][$value]['fragment_size']=30;
$params['body']['query']['fuzzy']=0.2;

Have we a way to use highlight and fuzzy together or an other way
equivalent ?

Usually its better to show a recreation with curl. PHP isn't always
understood.

Vocabulary point: fuzzy, prefix, and regex queries are called multi term
queries.

Anyway, there are three highlighters built in to Elasticsearch all of which
have different feature sets. I'm not sure if the plain highlighter
supports multi term queries, but you can try the fast vector highlighter or
the postings highlighter which do support multi term queries. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html

For completeness sake I should mention that I maintain a fourth highlighter
that also supports multi term queries but it is a plugin:
https://github.com/wikimedia/search-highlighter

Nik

Re: Dealing with spam in this forum

2014-07-02 Thread Jack Park

The behavior in my gmail-operated spam filter has been to toss out
lots of emails from this list as false positives. So, I keep sending
them back to my in box; pretty soon, gmail asks me to forward the good
ones to them to study, so I do. The result of that is that they catch
NONE of those spams. They also don't put enough information in the
header to allow me to see if all those spams come from the same IP
address. Otherwise, it might be possible for the group list to block
certain IP addresses.

On Wed, Jul 2, 2014 at 3:34 AM, Clinton Gormley cl...@traveljury.com wrote:
I've received in my mailbox at least 49 spams just for the 06/30. I won't
call this a few spam email. I'm subscribed for years on many mailing
lists, and I'm pretty sure that it would take years to get as much spam on
those lists as I get in 1 day on ES mailing list.

I'm going to disable my spam filter for this group so that I get more
visibility, and I'd ask other moderators to do the same.

Let's see how it goes for a while longer. We can always revisit this
decision later on.

clint

For more options, visit https://groups.google.com/d/optout.

Re: Index missing error Eelasticseach java

Use gateway type local instead of none, then your index persists across
cluster restarts.

Jörg


On Wed, Jul 2, 2014 at 12:35 AM, venuchitta venu.chitta1...@gmail.com
wrote:

 Hi,

 I am new to elasticsearch. I am using JAVA Api to establish connection
 with ES.

 public void createIndex(final String index) {

 getClient().admin().indices().prepareCreate(index).execute().actionGet();
 }


 public void createLocalCluster(final String clusterName) {
 NodeBuilder builder = NodeBuilder.nodeBuilder();
 Settings settings = ImmutableSettings.settingsBuilder()
 .put(gateway.type, none)
 .put(cluster.name, clusterName)
 .build();
 builder.settings(settings).local(false).data(true);
 this.node = builder.node();
 this.client = node.client();
 }

 public boolean existsIndex(final String index) {
 IndicesExistsResponse response =
 getClient().admin().indices().prepareExists(index).execute().actionGet();
 return response.isExists();
 }

 public void openIndex(String name){

 getClient().admin().indices().prepareOpen(name).execute().actionGet();
 }

 createLocalCluster(cerES);
 createIndex(news);
 System.out.println(existsIndex(news));

 When i execute the above java code iam getting true response. But when i
 close the java program and start the program again with the following code:
 openIndex(news);

 It is throwing IndexMissingException.But i can see the news index in Data
 folder of eclipse. So how i retreive data from the node previously?. Is it
 lost? or am i wrong somewhere?





 --
 View this message in context:
 http://elasticsearch-users.115913.n3.nabble.com/Index-missing-error-Eelasticseach-java-tp4059080.html
 Sent from the ElasticSearch Users mailing list archive at Nabble.com.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1404254107251-4059080.post%40n3.nabble.com
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGd%2BG%3DEhY2LHeH27ujsw0%2B4_%3DpPZoW3qkd8Grxe416Wdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Inter-document Queries

2014-07-02 Thread Theo Harris

Together with Zennet we brainstormed a solution building on top of Itamar's 
proposal. 

In one string field we append the current path to the all previous ones and 
since we are talking about funnels we need to store them only on the last 
event/document generated, e.g SessionEndedEvent.
Then we can use regex pattern matching to identify if the sequence of steps 
can be found anywhere in the stored paths string. This solution appears to 
be extremely fast. 

On Wednesday, June 11, 2014 1:14:59 AM UTC+3, Zennet Wheatcroft wrote:

 I simplified the actual problem in order to avoid explaining the domain 
 specific details. Allow me to add back more detail.

 We want to be able to search for multiple points of user action, towards a 
 conversion funnel, and condition on multiple fields. Let's add another 
 field (response) to the above model:
 {.., path:/promo/A, response: 200, ..}
 {.., path:/page/1, response: 401, ..}
 {.., path:/promo/D,response: 200, ..}
 {.., path:/page/23, response: 301, ..}
 {.., path:/page/2, response: 418, ..}
 Let's say we define three points through the conversion funnel:
 A: Visited path=/page/1
 B: Got response=401 from some path
 C: Exited at path=/sale/C

 And we want to know how many users did steps A-B-C in that order. If we 
 add an array prev_response like we did for prev_path, then we can use a 
 term filter to find documents with term path=/sale/C and prev_path=/page/1 
 and prev_response=401. But this will not distinguish between A-B-C and 
 B-A-C. Perhaps I could use the script filter for the last mile and from 
 the term filtered results throw out B-A-C and it will run more quickly 
 because of the reduced document set.

 Is there another way to implement this query?

 Zennet


 On Wednesday, June 4, 2014 5:01:19 PM UTC-7, Itamar Syn-Hershko wrote:

 You need to be able to form buckets that can be reduced again, either 
 using the aggregations framework or a query. One model that will allow you 
 to do that is something like this:

 { userid: xyz, path:/sale/B, previous_paths:[...], 
 tstamp:..., ... }

 So whenever you add a new path, you denormalize and add previous paths 
 that could be relevant. This might bloat your storage a bit and be slower 
 on writes, but it is very optimized for reads since now you can do an 
 aggregation that queries for the desired path and buckets on the user. To 
 check the condition of the previous path you should be able to bucket again 
 using a script, or maybe even with a query on a nested type.

 This is just from the top of my head but should definitely work if you 
 can get to that model

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Jun 5, 2014 at 2:36 AM, Zennet Wheatcroft zwhea...@atypon.com 
 wrote:

 Yes. I can re-index the data or transform it in any way to make this 
 query efficient. 

 What would you suggest?



 On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote:

 This model is not efficient for this type of querying. You cannot do 
 this in one query using this model, and the pre-processing work you do now 
 + traversing all documents is very costly.

 Is it possible for you to index the data (even as a projection) into 
 Elasticsearch using a different model, so you can use ES properly using 
 queries or the aggregations framework?

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft zwhea...@atypon.com
  wrote:

  Hi,

 I am looking for an efficient way to do inter-document queries in 
 Elasticsearch. Specifically, I want to count the number of users that 
 went 
 through an exit point B after visiting point A.

 In general terms, say we have some event log data about users actions 
 on a website:
 
 {userid:xyz, machineid:110530745, path:/promo/A, country
 :US, tstamp:2013-04-01 00:01:01}
 {userid:pdq, machineid:110519774, path:/page/1, country:
 CN, tstamp:2013-04-01 00:02:11}
 {userid:xyz, machineid:110530745, path:/promo/D, country
 :US, tstamp:2013-04-01 00:06:31}
 {userid:abc, machineid:110527022, path:/page/23, country
 :DE, tstamp:2013-04-01 00:08:00}
 {userid:pdq, machineid:110519774, path:/page/2, country:
 CN, tstamp:2013-04-01 00:08:55}
 {userid:xyz, machineid:110530745, path:/sale/B, country:
 US, tstamp:2013-04-01 00:09:46}
 {userid:abc, machineid:110527022 , path:/promo/A, 
 country:DE, tstamp:2013-04-01 00:10:46}
 
 And we have 500+M such entries.

 We want a count of the number of userids that visited path=/sale/B 
 after visiting path=/promo/A.

 What I did is to preprocess the data, sorting by userid, tstamp, 
 then compacting all events by the same userid into the same document. 
 Then 
 I wrote a script filter which traverses the path array

Re: Memory issues on ES client node

I'm not sure but it looks like a node tries to move some GB of document
hits around. This might have triggered timeouts at other places (probably
with node disconnects) and maybe the GB chunk is not yet GC collected, so
you see this in your heap analyzer tool.

It depends on the search results and search hits you generated if the
heaviness of the search result is expected or not, so it would be useful to
know more about your queries.

Jörg

On Wed, Jul 2, 2014 at 3:29 AM, Venkat Morampudi venkatmoramp...@gmail.com
wrote:

Thanks for reply Jörg. I don't have any logs. I will try to enable them it
would but it would take some time though. If there anything in particular
that we need to enable, please let me know.

-VM

On Tuesday, July 1, 2014 12:58:21 PM UTC-7, Jörg Prante wrote:

Do you have anything in your logs, i.e. many disconnects/reconnects?

Jörg

On Tue, Jul 1, 2014 at 7:59 PM, Venkat Morampudi venkatm...@gmail.com
wrote:

In the elastic search deployment we are seeing random client node
crashed due to out of memory exception. I got the memory dump from one of
the crash and analysed using Eclipse memory analyzer. I have attached leak
suspect report. Apparently 242 objects of type org.elasticsearch.action.
search.type.TransportSearchQueryThenFetchAction$AsyncAction are holding
almost 8gb of memory. I have spent some time on source code but couldn't
find anything obvious.

I would really appreciate any help with this issue.

-VM

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/37881ead-70c2-40d8-89b6-a771b2a36bdd%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/37881ead-70c2-40d8-89b6-a771b2a36bdd%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

For more options, visit https://groups.google.com/d/optout.

ElasticSearch 1.2.1 doesnt run on JDK 1.6?

2014-07-02 Thread David Marko

We have been using older Elasticsearch version here upgrading to 1.2.1 
shows use 'unknown class version errors' on JDK 1.6 . Docs says that JDK 
1.6 is support (and it was) . Is there some update here? What latest 
Elasticsearch version is available fo JDK 1.6?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a5df4ed1-6dd4-4402-8208-25d35c27ca49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic Search

2014-07-02 Thread Jamie A



 Thanks Mark,


 Yeah sorry I realized after the post that I should have done pastebin but 
I couldnt edit my post. Yes I am using the logstash dashboard. I changed 
the number of pages to a max record size of 10,000 results. I also realized 
that my query in kibana was only selecting the last days worth of records. 
So in the end I'm a dumbass. Works now after I change the date for the 
query.

:)

Jamie

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/427b5e09-40d5-4183-b3c6-a6a63514768a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch 1.2.1 doesnt run on JDK 1.6?

Docs say at least Java 7 is required from ES 1.2.0 on

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html

For Java 6, you have to use ES versions 1.2.0

Jörg

On Wed, Jul 2, 2014 at 4:21 PM, David Marko dmarko...@gmail.com wrote:

We have been using older Elasticsearch version here upgrading to 1.2.1
shows use 'unknown class version errors' on JDK 1.6 . Docs says that JDK
1.6 is support (and it was) . Is there some update here? What latest
Elasticsearch version is available fo JDK 1.6?

Re: cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)

2014-07-02 Thread Gregoire Seux

On Wed, Jul 02, 2014 at 05:43:26AM -0700, Andrew Davidoff wrote:

 How can I avoid this, and make shards on a restarted node come back on the 
 same node?

Hello,

I have exactly the same issue.
My objective is to make a rolling restart script which wait for green
cluster state before restarting a node.
I use:
 curl -XPUT -s $host:$port/_cluster/settings -d 
 '{transient:{cluster.routing.allocation.enable: new_primaries}}'

to allow the cluster to work (and be able to create indices) during
restart.

But same issue: node is back up but nothing happen until I enable all
allocation again

I have gone through elasticsearch documentation related to recovery,
gateway, cluster settings without finding any parameters to activate or
configure this initial recovery of local indices.

-- 
Grégoire

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20140702142754.GA2140%40criteo-scalasto.criteo.prod.
For more options, visit https://groups.google.com/d/optout.

Re: does snapshot restore lead to a memory leak?

2014-07-02 Thread Igor Motov

So, your search-only machines are running out of memory, while your 
index-only machines are doing fine. Did I understand you correctly? Could 
you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from 
the machine that runs out of memory, please run stats a few times with 1 
hour interval. I would like to see how memory consumption is increasing 
over time. Please, also run nodes info ones (curl localhost:9200/_nodes) 
and post here (or send me by email) the results. Thanks!

On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote:

 Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the 
 update.

 I explain:

- we have one cluster of one machine which is only meant for serving 
 search requests. the goal is  not to index anything to it. It contains 1.7k 
 indices, give it or take it. 
- every day, those 1.7k indices are reindexed, and snapshoted in pairs 
 to a S3 repository (producint 850 snapshots)repository. 
- every day, the one reading only cluster of the first point restores 
 those 850 snapshots to update its 1.7k indices from that same S3 
 repository. 

 It works like a real charm. Load has dropped dramatically, and we can set 
 a farm of temporary machines to do the indexing duties. 

 But memory consumption never stops growing.

 we don't get any out of memory error or anything. In fact, there is 
 nothing in the logs that shows any error, but after a week or a few days, 
 the host has its memory almost exhausted and elasticsearch is not 
 responding. The memory consumption is of course way ahead of the HEAP_SIZE
 We have to restart it and, when we do it we get the following error:

 java.util.concurrent.RejectedExecutionException: Worker has already been 
 shutdown
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 execute(DefaultChannelPipeline.java:636)
 at org.elasticsearch.common.netty.channel.Channels.
 fireExceptionCaughtLater(Channels.java:496)
 at org.elasticsearch.common.netty.channel.AbstractChannelSink.
 exceptionCaught(AbstractChannelSink.java:46)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 notifyHandlerException(DefaultChannelPipeline.java:658)
 at org.elasticsearch.common.netty.channel.
 DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(
 DefaultChannelPipeline.java:781)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels.
 java:725)
 at org.elasticsearch.common.netty.handler.codec.oneone.
 OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
 at org.elasticsearch.common.netty.handler.codec.oneone.
 OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendDownstream(DefaultChannelPipeline.java:591)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendDownstream(DefaultChannelPipeline.java:582)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels.
 java:704)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels.
 java:671)
 at org.elasticsearch.common.netty.channel.AbstractChannel.write(
 AbstractChannel.java:248)
 at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(
 NettyHttpChannel.java:158)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.
 onResponse(RestSearchAction.java:106)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.
 onResponse(RestSearchAction.java:98)
 at org.elasticsearch.action.search.type.
 TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim(
 TransportSearchQueryAndFetchAction.java:94)
 at org.elasticsearch.action.search.type.
 TransportSearchQueryAndFetchAction$AsyncAction.moveToSecondPhase(
 TransportSearchQueryAndFetchAction.java:77)
 at org.elasticsearch.action.search.type.
 TransportSearchTypeAction$BaseAsyncAction.innerMoveToSecondPhase(
 TransportSearchTypeAction.java:425)
 at org.elasticsearch.action.search.type.
 TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(
 TransportSearchTypeAction.java:243)
 at org.elasticsearch.action.search.span style=color: #
 ...

-- 
You received this

geo_polygon filter with non-zero rule filling

2014-07-02 Thread Clément SALAÜN

Is it possible to apply a geo_polygon filter with a non-zero rule 
https://en.wikipedia.org/wiki/Nonzero-rule ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/271d6449-1c2e-446c-9e35-4b45198ad381%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Queries with fields {...} don't return field with dot in their name

2014-07-02 Thread vineeth mohan

Hello Ben ,

This is defenitely an ambiguity.

By request.user , in the usual case ES expects a data like

request : {
user : vm
}

Try request\.user or something. Some mechanism to escape the dot.

Thanks
Vineeth

On Wed, Jul 2, 2014 at 1:13 PM, benq benoit.quart...@gmail.com wrote:

Hello Vineeth,

the items that are indexed in elasticsearch really contains a field named
response.user.

_source: {
clientip: aaa.bbb..ddd,

request: http://.aa/b/c;,

request.accept-encoding: gzip, deflate, request.accept-language:
de-ch, response.content-type: text/html; charset=UTF-8,

response: 200,

response.age: 0, response.user: userAAA, @timestamp:
2014-07-01T12:18:51.501+02:00, }

I realize there is an ambiguity between a field with a dot in its name and
a field of a child document. Should fields with dot in their name be
avoided?

Benoît

Le mardi 1 juillet 2014 19:17:41 UTC+2, vineeth mohan a écrit :

Hello Ben ,

Can you paste a sample feed.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 8:26 PM, benq benoit@gmail.com wrote:

Hi all,

I have a query that specify the fields to be returned as described here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/
current/search-request-fields.html
However, it does not return the fields with a dot in their name, like
response.user.

For example,
Ex:
{
size: 1000,
fields: [@timestamp, request, response, response.user,
clientip],
query: {match_all: {} },
filter: {
and: [
{ range: { @timestamp: { from: ...

]
}
}

The timestamp, request, response and clientip fields are returned. The
response.user is not.

Any idea why?

Regards,
Benoît

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/cc08476f-0c6e-47bc-870a-2008386636c5%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cc08476f-0c6e-47bc-870a-2008386636c5%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

For more options, visit https://groups.google.com/d/optout.

Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

2014-07-02 Thread Steve Mayzak

All,

This seems apropos to the current discussion and could help clear up some
confusion on recommendations etc. We, Elasticsearch, are hosting a Webinar
on ELK, given by the Logstash creator, Jordan Sissel.

Its today in 40 minutes.
http://www.elasticsearch.org/webinars/introduction-elk-stack/

On Wednesday, July 2, 2014 6:08:34 AM UTC-7, Brian wrote:

Patrick,

Fair enough. I don't know the inner details; I am just an enthusiastic end
user.

To the best of my knowledge, there is no content for the _all field; I
view this as an Elasticsearch psuedo field whose name is _all and whose
index terms are taken from all fields (by default), but still there is no
actual content for it.

And after I got into the habit of disabling the _all field, my hands-on
exploration of its nuances have ended. It's time for the experts to explain!

Again, I have never done this, so I can only tell you what I do. I just
cannot tell you all the nuances of what Elasticsearch is capable of.

My recommendation is to try it. Elasticsearch is great at letting you
experiment and then telling you clearly if your attempt succeeds or fails.

So, try your scenario. If it fails, then it didn't work or you did
something wrong. If it succeeds, then you can see exactly what
Elasticsearch actually accepted as your mapping. For example:

curl 'http://localhost:9200/logstash-2014.06.30/_mapping?pretty=true'
echo

This particular query looks at one of my logstash-generated indices, and
it lets me verify that Elasticsearch and Logstash conspired to create the
mappings I expected. I used this command quite a bit until I finally got
everything configured correctly. (I actually verify the mapping via
Elasticsearch Head, but under the covers it's the same command.)

Brian

Re: does snapshot restore lead to a memory leak?

2014-07-02 Thread JoeZ99

Igor.
Yes, that's right. My index only machines are just machines that are 
booted just for the indexing-snapshotting task. once there is no more tasks 
in queue, those machines are terminated. they only handle a few indices 
each time (their only purpose is to snapshot).

I will do as you tell me. I guess I'll better wait to the timeframe in 
which most of the restores occurs, because that's when the memory 
consumption grows more, so expect those postings in 5 or 6 hours. 

On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote:

 So, your search-only machines are running out of memory, while your 
 index-only machines are doing fine. Did I understand you correctly? Could 
 you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from 
 the machine that runs out of memory, please run stats a few times with 1 
 hour interval. I would like to see how memory consumption is increasing 
 over time. Please, also run nodes info ones (curl localhost:9200/_nodes) 
 and post here (or send me by email) the results. Thanks!

 On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote:

 Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch 
 the update.

 I explain:

- we have one cluster of one machine which is only meant for serving 
 search requests. the goal is  not to index anything to it. It contains 1.7k 
 indices, give it or take it. 
- every day, those 1.7k indices are reindexed, and snapshoted in pairs 
 to a S3 repository (producint 850 snapshots)repository. 
- every day, the one reading only cluster of the first point 
 restores those 850 snapshots to update its 1.7k indices from that same S3 
 repository. 

 It works like a real charm. Load has dropped dramatically, and we can set 
 a farm of temporary machines to do the indexing duties. 

 But memory consumption never stops growing.

 we don't get any out of memory error or anything. In fact, there is 
 nothing in the logs that shows any error, but after a week or a few days, 
 the host has its memory almost exhausted and elasticsearch is not 
 responding. The memory consumption is of course way ahead of the HEAP_SIZE
 We have to restart it and, when we do it we get the following error:

 java.util.concurrent.RejectedExecutionException: Worker has already been 
 shutdown
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 execute(DefaultChannelPipeline.java:636)
 at org.elasticsearch.common.netty.channel.Channels.
 fireExceptionCaughtLater(Channels.java:496)
 at org.elasticsearch.common.netty.channel.AbstractChannelSink.
 exceptionCaught(AbstractChannelSink.java:46)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 notifyHandlerException(DefaultChannelPipeline.java:658)
 at org.elasticsearch.common.netty.channel.
 DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(
 DefaultChannelPipeline.java:781)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels
 .java:725)
 at org.elasticsearch.common.netty.handler.codec.oneone.
 OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
 at org.elasticsearch.common.netty.handler.codec.oneone.
 OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendDownstream(DefaultChannelPipeline.java:591)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendDownstream(DefaultChannelPipeline.java:582)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels
 .java:704)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels
 .java:671)
 at org.elasticsearch.common.netty.channel.AbstractChannel.write(
 AbstractChannel.java:248)
 at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(
 NettyHttpChannel.java:158)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.
 onResponse(RestSearchAction.java:106)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.
 onResponse(RestSearchAction.java:98)
 at org.elasticsearch.action.search.type.
 TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim(
 TransportSearchQueryAndFetchAction.java:94)

Re: does snapshot restore lead to a memory leak?

This memory issue report might be related

https://groups.google.com/forum/#!topic/elasticsearch/EH76o1CIeQQ

Jörg


On Wed, Jul 2, 2014 at 5:34 PM, JoeZ99 jzar...@gmail.com wrote:

 Igor.
 Yes, that's right. My index only machines are just machines that are
 booted just for the indexing-snapshotting task. once there is no more tasks
 in queue, those machines are terminated. they only handle a few indices
 each time (their only purpose is to snapshot).

 I will do as you tell me. I guess I'll better wait to the timeframe in
 which most of the restores occurs, because that's when the memory
 consumption grows more, so expect those postings in 5 or 6 hours.


 On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote:

 So, your search-only machines are running out of memory, while your
 index-only machines are doing fine. Did I understand you correctly? Could
 you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from
 the machine that runs out of memory, please run stats a few times with 1
 hour interval. I would like to see how memory consumption is increasing
 over time. Please, also run nodes info ones (curl localhost:9200/_nodes)
 and post here (or send me by email) the results. Thanks!

 On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote:

 Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch
 the update.

 I explain:

- we have one cluster of one machine which is only meant for serving
 search requests. the goal is  not to index anything to it. It contains 1.7k
 indices, give it or take it.
- every day, those 1.7k indices are reindexed, and snapshoted in
 pairs to a S3 repository (producint 850 snapshots)repository.
- every day, the one reading only cluster of the first point
 restores those 850 snapshots to update its 1.7k indices from that same S3
 repository.

 It works like a real charm. Load has dropped dramatically, and we can
 set a farm of temporary machines to do the indexing duties.

 But memory consumption never stops growing.

 we don't get any out of memory error or anything. In fact, there is
 nothing in the logs that shows any error, but after a week or a few days,
 the host has its memory almost exhausted and elasticsearch is not
 responding. The memory consumption is of course way ahead of the HEAP_SIZE
 We have to restart it and, when we do it we get the following error:

 java.util.concurrent.RejectedExecutionException: Worker has already
 been shutdown
 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
 oSelector.registerTask(AbstractNioSelector.java:120)
 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
 oWorker.executeInIoThread(AbstractNioWorker.java:72)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
 oWorker.executeInIoThread(AbstractNioWorker.java:56)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
 oChannelSink.execute(AbstractNioChannelSink.java:34)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
 .execute(DefaultChannelPipeline.java:636)
 at org.elasticsearch.common.netty.channel.Channels.fireExceptio
 nCaughtLater(Channels.java:496)
 at org.elasticsearch.common.netty.channel.AbstractChannelSink.e
 xceptionCaught(AbstractChannelSink.java:46)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
 .notifyHandlerException(DefaultChannelPipeline.java:658)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipelin
 e$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.
 java:781)
 at org.elasticsearch.common.netty.channel.Channels.write(Channe
 ls.java:725)
 at org.elasticsearch.common.netty.handler.codec.oneone.OneToOne
 Encoder.doEncode(OneToOneEncoder.java:71)
 at org.elasticsearch.common.netty.handler.codec.oneone.OneToOne
 Encoder.handleDownstream(OneToOneEncoder.java:59)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
 .sendDownstream(DefaultChannelPipeline.java:591)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
 .sendDownstream(DefaultChannelPipeline.java:582)
 at org.elasticsearch.common.netty.channel.Channels.write(Channe
 ls.java:704)
 at org.elasticsearch.common.netty.channel.Channels.write(Channe
 ls.java:671)
 at org.elasticsearch.common.netty.channel.AbstractChannel.write(
 AbstractChannel.java:248)
 at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(N
 ettyHttpChannel.java:158)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.onRe
 sponse(RestSearchAction.java:106)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.onRe
 sponse(RestSearchAction.java:98)

Re: Min Hard Drive Requirements

2014-07-02 Thread Ophir Michaeli

When I tried to optimize the index had 51 shards.
Regards, Ophir

On Wednesday, July 2, 2014 11:27:50 AM UTC+3, Mark Walkom wrote:

How many shards do you have for the index, or how many are you trying to
optimise (merge) down to?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 2 July 2014 18:13, Ophir Michaeli ophirm...@gmail.com javascript:
wrote:

Hi all,

I'm testing the indexing of 100 million documents, it took about 400GB of
the hard drive.
Is there a minimum free hard drive space needed for the index to work OK?
I'm asking because after we indexed 100 million documents we tested the
index and it worked OK,
but then when trying to optimize the optimize took days and then the
index did not respond.
The hard drive had only 10 GB free space so we tried to copy the index to
a new hard drive with a bigger free space, but the index is still not
functioning.

Thank you,
Ophir

https://groups.google.com/d/msgid/elasticsearch/3405d84f-49d4-4cf9-836e-6b6bc09fdc74%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: Custom Query variables ?

2014-07-02 Thread Ivan Brusic

If you enable explanations, you can see why Lucene the rational behind the
scoring:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

You are probably correct in that the array length is influencing the
scoring. By default, Lucene will rate higher fields with fewer terms by
using length normalization. You can disable norms on the field:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#norms

You can fine-tune better by learning how to read Lucene's explanations. It
is difficult at first, but it is a useful skill.

Cheers,

Ivan

On Tue, Jul 1, 2014 at 1:02 AM, Pierrick Boutruche pboutru...@octo.com
wrote:

Up ? Any ideas ?

Le lundi 30 juin 2014 17:48:54 UTC+2, Pierrick Boutruche a écrit :

Hi everyone,

I'm creating on my own a little Geocoder. My goal is to be able to
retrieve a big city or a country with a string on input. This string can be
mistyped, so I indexed geonames cities5000 data (cities 5000 inhab), and
crossed theses data with countries admin data. So I got a 46000 cities
index with country, admin pop.

I created a search_field in which I put country, admin city name +
alternate names provided in cities5000 file.

I want, within this array, search for a string.

Currently, I'm just searching with a MatchQuery, like Paris in
search_field. Unfortunately, the first result is Paris... in Canada...

Still, the search_field data is this one, for Paris (CA) and Paris (FR):

[u'Paris', u'Paris', u'Canada', u'Ontario', u'Ontario']

[u'Paris', u'Paris', u'France', u'\xcele-de-France', u'Ile-de-France',
u'Paris', u'Paris']

I don't understand why Paris, CA is first, 'cause there's so much more
Paris in the second one...

Anyway, is there any way to make the number of my_query terms
appearance make the difference ? Because with alternate names, there will
be so much much more Paris that it has te count.

Actually I think the array length matters in the scoring and I don't want
it to... I thought of a custom query score, but I don't think I'm able to
get the query term in the script query.

Any ideas ?

Thanks !

For more options, visit https://groups.google.com/d/optout.

Re: Custom Query variables ?

For geo search, it would be a good approach to respect the searchers
preference by using a locale, so I suggest to add a locale fr filter to
the search.
Or an origin is added to the start query and all cities are ordered by geo
distance in relation to the origin. For country search, the origin could be
the capital city...

Jörg

On Wed, Jul 2, 2014 at 6:38 PM, Ivan Brusic i...@brusic.com wrote:

If you enable explanations, you can see why Lucene the rational behind the
scoring:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#norms

You can fine-tune better by learning how to read Lucene's explanations. It
is difficult at first, but it is a useful skill.

Cheers,

Ivan

On Tue, Jul 1, 2014 at 1:02 AM, Pierrick Boutruche pboutru...@octo.com
wrote:

Up ? Any ideas ?

Le lundi 30 juin 2014 17:48:54 UTC+2, Pierrick Boutruche a écrit :

Hi everyone,

I created a search_field in which I put country, admin city name +
alternate names provided in cities5000 file.

I want, within this array, search for a string.

Currently, I'm just searching with a MatchQuery, like Paris in
search_field. Unfortunately, the first result is Paris... in Canada...

Still, the search_field data is this one, for Paris (CA) and Paris
(FR):

[u'Paris', u'Paris', u'Canada', u'Ontario', u'Ontario']

[u'Paris', u'Paris', u'France', u'\xcele-de-France', u'Ile-de-France',
u'Paris', u'Paris']

I don't understand why Paris, CA is first, 'cause there's so much more
Paris in the second one...

Anyway, is there any way to make the number of my_query terms
appearance make the difference ? Because with alternate names, there will
be so much much more Paris that it has te count.

Actually I think the array length matters in the scoring and I don't
want it to... I thought of a custom query score, but I don't think I'm able
to get the query term in the script query.

Any ideas ?

Thanks !

For more options, visit https://groups.google.com/d/optout.

Defing default mapping to enable _timestamp for all indices

2014-07-02 Thread kazoompa

Hi,

I have the following ES setting defined in my YAML file:

http.enabled: false
discovery.zen.ping.multicast.enabled: false
index:
  mappings:
_default_:
  _timestamp:
enabled: true
store : true
  analysis:
analyzer:
  mica_index_analyzer:
type: custom
tokenizer: standard
filter: [standard,lowercase,mica_nGram_filter]
  mica_search_analyzer:
type: custom
tokenizer: standard
filter: [standard,lowercase]
filter:
  mica_nGram_filter:
type: nGram
min_gram: 2
max_gram: 20


My intention is to enable the _timestamp field for all created indices. The 
above does not seem to work, is the error in the syntax of the YAML or I am 
missing a step?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bd349c6-a7f4-4627-af0e-088ffe2a0418%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Wrong Scoring using match query on Sense

2014-07-02 Thread Ivan Brusic

If you enable explanations, you would see that length normalization is
scoring the document with the shorter field higher than the document with a
term frequency of 2.

The fieldNorm is incredibly lossy since it uses only 1 byte, so there must
be some inconsistencies between the example and your test case. The example
has a fieldNorm of 0.375, while it is 0.3125 in your case (and mine as
well). The example might not have deleted all the documents in the index
before the test.

Cheers,

Ivan

On Tue, Jul 1, 2014 at 1:43 AM, rayman idan.f...@gmail.com wrote:

I am trying to exercise the following example using Sense :
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/match-query.html
.

However when I ran

GET /my_index/my_type/_search
{
query: {
match: {
title: QUICK!
}
}
}

I got wrong scoring. I expect to see doc with is 3. But doc with id got
higher score. any idea?:

{
took: 2,
timed_out: false,
_shards: {
total: 1,
successful: 1,
failed: 0
},
hits: {
total: 3,
max_score: 0.5,
hits: [
{
_index: my_index,
_type: my_type,
_id: 1,
_score: 0.5,
_source: {
title: The quick brown fox
}
},
{
_index: my_index,
_type: my_type,
_id: 3,
_score: 0.44194174,
_source: {
title: The quick brown fox jumps over the quick dog
}
},
{
_index: my_index,
_type: my_type,
_id: 2,
_score: 0.3125,
_source: {
title: The quick brown fox jumps over the lazy dog
}
}
]
}
}

Thanks.

[ANN] Elasticsearch Servlet Transport plugin 2.2.0 released

2014-07-02 Thread Elasticsearch Team


Heya,


We are pleased to announce the release of the Elasticsearch Servlet Transport 
plugin, version 2.2.0.

The wares transport plugin allows to use the REST interface over servlets..

https://github.com/elasticsearch/elasticsearch-transport-wares/

Release Notes - elasticsearch-transport-wares - Version 2.2.0



Update:
 * [21] - Update to elasticsearch 1.2.0 
(https://github.com/elasticsearch/elasticsearch-transport-wares/pull/21)

New:
 * [22] - Add plugin release semi-automatic script 
(https://github.com/elasticsearch/elasticsearch-transport-wares/issues/22)
 * [17] - NodeServlet should use an elasticsearch node created elsewhere in the 
webapp 
(https://github.com/elasticsearch/elasticsearch-transport-wares/issues/17)



Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-transport-wares project repository: 
https://github.com/elasticsearch/elasticsearch-transport-wares/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53b448f6.c5bcb40a.4b4f.78e0SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Re: Problem Configuring AWS S3 for Backups

2014-07-02 Thread sabdalla80

Unfortunately, I tried with and without the region setting, no difference.

On Tuesday, July 1, 2014 7:43:21 PM UTC-4, Glen Smith wrote:

 I'm not sure it matters, but I noticed you aren't setting a region in 
 either your config or when registering your repo.

 On Tuesday, July 1, 2014 7:08:28 PM UTC-4, sabdalla80 wrote:

 I am not sure the version is the problem, I guess I can upgrade from V1.1 
 to latest. 
 Not able to load credential from supply chain, Any idea this error is 
 generated, Is there any other place that my credentials need to be besides 
 .yml file?
 Note, I am able to write/read to S3 remotely, so I don't have any 
 priviliges problems that I can think of.

 On Tuesday, July 1, 2014 4:44:17 PM UTC-4, David Pilato wrote:

 I think 2.1.1 should work fine as well.

 That said, you should upgrade to latest 1.1 (or 1.2)...

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 1 juil. 2014 à 22:13, Glen Smith gl...@smithsrock.com a écrit :

 According to
 https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/es-1.1
 you should use v2.1.0 of the plugin with ES 1.1.0.

 On Tuesday, July 1, 2014 9:03:04 AM UTC-4, sabdalla80 wrote:

 I am having a problem setting up backup and restore part of AWS on S3. 
 I have 2.1.1 AWS plugin  ElasticSearch V1.1.0

 My yml:

 cloud:
 aws:
 access_key: #
 secret_key: #
discovery:
 type: ec2

 When I try to register a repository:

 PUT /_snapshot/es_repository{

 type: s3,

 settings: {

   bucket: esbucket

 }}


 I get this error, it complains about loading my credentials! Is this 
 ElasticSearch problem or AWS?

 Note I am running as root user ubuntu on Ec2 and also running AWS 
 with root privileges as opposed to IAM role, not sure if it's a problem or 
 not.
error: RepositoryException[[es_repository] failed to create 
 repository]; nested: CreationException[Guice creation errors:\n\n1) Error 
 injecting constructor, com.amazonaws.AmazonClientException: Unable to load 
 AWS credentials from any provider in the chain\n  at 
 org.elasticsearch.repositories.s3.S3Repository.init(Unknown Source)\n 
  while locating org.elasticsearch.repositories.s3.S3Repository\n  while 
 locating org.elasticsearch.repositories.Repository\n\n1 error]; nested: 
 AmazonClientException[Unable to load AWS credentials from any provider in 
 the chain]; ,
status: 500
 }ode here...





  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/e7db355a-7710-4408-80de-60960fd16d1d%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/e7db355a-7710-4408-80de-60960fd16d1d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56de92a0-6c58-43b5-b4cc-df7c613ba4e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES doesn't work with rexster gremlin extension

2014-07-02 Thread Aaliyah

the problem is like the topic. I'm not sure if I misunderstood something or
I missed some configurations. The ES works fine in usual situations, but
doesn't work with rexster gremlin extension.

In java, I configured the graph as follows:

https://lh3.googleusercontent.com/-kd6vKDQdH6g/U7RbYPE-ciI/AHk/sSU5e5R3DMM/s1600/1.PNG

https://lh3.googleusercontent.com/-9HiEPJmS1FQ/U7Rbjmg4oCI/AHs/2FgdAtjiBHc/s1600/2.PNG

In the extension, I wrote:

*String query = v. + propKey + : ( + propValue + ); *

*if(((TitanGraph) graph).indexQuery(search,
query).vertices().iterator().hasNext()){...}*

When I invoked the extension, the rexster told: null pointer exception,
unknown index, etc. As follows:

https://lh4.googleusercontent.com/-I0w-hVgu0F0/U7RdQCcjOZI/AH4/4l13192eR2k/s1600/3.PNG

https://lh3.googleusercontent.com/-phHCRqzQN6I/U7RdUkyE8oI/AIA/rhkTlffd8lI/s1600/4.PNG

After this, I searched some advices from google and made some
configurations in rexster.xml

https://lh4.googleusercontent.com/-_6zRCaQtvRw/U7ReJ41k4jI/AIQ/Nwgon5WqPtU/s1600/6.PNG

Then, the problems seemed as another way. (As you can see, first_example is
the name of the graph.)

https://lh5.googleusercontent.com/-r8TvH5CxjqA/U7Rfb9NFNEI/AIs/epemSCk5-8c/s1600/8.PNG

Besides, when I invoked the extension, I've been told: the graph is
non-configured.

https://lh4.googleusercontent.com/-uPWbHmgKL28/U7RfO3PoVlI/AIk/6ndP888fGYA/s1600/7.PNG

I've also tried the embedded mode. The problems seem to be the same.

Plus, I'm using:

Titan 0.4.2

Tinkerpop 0.2.4

Cassandra 2.0.7

ElasticSearch 1.2.1

This problem's driving me crazy. Any pointer would be appreciated! Thanks
in advance!

Re: Recommended Hardware Specs Sharding\Index Strategy

2014-07-02 Thread mrno42

When you say - do not let a shard grow bigger than your JVM heap (this is
really a rough estimation) so segment merging will work flawlessly

are we counting all the primary and replicas shards of all indexes on that
node? So for example, if we had two indexes with on 10 node cluster. Each
index has 10 shards and 1 replica(40 total in cluster).

So per node, the heap size should be larger than:

1 shard for first index
1 shard for replica of first index
1 shard for second index
1 shard for replica second index

the four shards combined?

Thanks again for your advice

On Saturday, August 10, 2013 6:50:27 AM UTC-7, Jörg Prante wrote:

Your concern is a single shard getting too big. If you use 64bit JVM and
mmapfs (quite common), you can open even the largest files. So from this
point of view, a node can handle the biggest files. There is no real limit.

Another question is throughput performance with large shard files. For
example, the more mixed read/write operations are in the workload, the
smaller the Lucene indexes should be, to allow the JVM/OS a better load
distribution.

For selecting a total number of shards and shard size, here are some
general rules of thumb:

- do not select a smaller number of shards than your total number of nodes
you will add to the cluster. Each node should hold at least one shard.

- do not let a shard grow bigger than your JVM heap (this is really a
rough estimation) so segment merging will work flawlessly

- if you want fast recovery, or if you want to move shards around (not a
common case), the smaller a shard is the faster the operation will get done

In case you are worried about shards getting out of bounds, you can
reindex with a higher number of shards (having the _source enabled is
always an advantage for reindexing) with your favorite custom tool.
Reindexing can take significant time, and may not be an option if you can't
stop indexing.

Jörg

On Fri, Aug 9, 2013 at 4:32 PM, David Arata david...@gmail.com
javascript: wrote:

My concern is what would would be the best strategy so that an index or
single shard in an index does not get too big for a node to handle and if
its approaching that size what can be done?

Documents not being stored

2014-07-02 Thread Joseph Johnson

Hello,

I am attempting to set up a large scale ELK setup at work. Here is a
basic setup of what we have so far:

```
Nodes (approx 150)
[logstash]

  |
  |
  +---+
  |   |
Indexer1 Indexer2
[Redis]  [Redis]
[Logstash]   [Logstash]
  |   |
  |   |
  ++--+
   |
   |
 ES Master -- Kibana3
 [Master: yes]
 [Data: no]
   |
   |
 ES Data (4 data nodes)
 [Master: no]
 [Data: yes]
```

In case the formatting does not hold with the above, I've created a
paste here: https://baneofswitches.privatepaste.com/c8dfc2c30b


The Setup
=

* We have approximately 150 nodes configured to send to a shuffled
Redis instance on either Indexer1 or Indexer2. A sanitized version of
the node Logstash config is here:
https://baneofswitches.privatepaste.com/345b94064d

* Each indexer is identical. They both run their own independent Redis
service. They then each have a Logstash service that pulls events from
Redis and pushes them to the ES Master. They are using the http
protocol. A sanitized version of their config is here:
https://baneofswitches.privatepaste.com/e19eae690f

* The ES Master is configured to only be a Master, and is not set to be
a data node. It has 32 GB of RAM.

* There are 4 ES data nodes, configured to be data nodes only, they have
been configured to be ineligible to be elected as Masters. They have 62
GB RAM and the storage for ES is on SSDs

* We have Kibana3 configured to search from the ES Master.

* Average # of logs generated by all nodes total seems to be
approximately 7k/sec, with peaks up to about 16k/s.

* Indexer throughput seems to be good enough that one indexer can work
just fine during normal usage.

* We are using the default 5 shards with 1 replica


The Problem
===

When this setup is loaded as mentioned above, we are noticing that some
logs are being dropped. We were able to test this by running something like:

seq 1 5000 | xargs -I{} -n 1 -P 40 logger Testing unqString {} of 5000

Sometimes we would see all 5000 show up in Kibana, other times a subset
of them (for example 4800 events).


Troubleshooting
===

We have taken a number of steps to eliminate possibilities. We have
confirmed that logs are being reliably transferred from nodes to Redis
and from Redis through Logstash. We confirmed this by monitoring counts
over many trials. The Redis- logstash leg was tested by outputting to a
file and comparing counts.

That left the Logstash - ES leg. We tested this by writing a script
that pushed fake events via the bulk API. We were unable to reproduce
the problem with one request. However, when the cluster is under load
(we let 'real' logs flow) and we push via the bulk API with our script
we occasionally see partial loss of data.

It's important to note that partial loss here means that the request
succeeds (200 return code), and much of the data in the bulk request is
then searchable, however not all will be. For example, if we put the
cluster under load and push a request with a bulk of 5000 events in, we
will see 4968 of the 5000 in our subsequent search.

We have tried increasing the bulk api threadpool as well as giving a
greater percentage (50%) to the indexing buffer. Neither has fixed the
issue.


Conclusion


I am looking for feedback on how to troubleshoot this further and find
the cause. I am also looking for information to see if anyone else out
there is getting these sorts of incoming volume and what sorts of things
they had to do to get their setup working. I appreciate all feedback.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53B46818.7020005%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] ElasticUI AngularJS Directives - Easily Build an Interface on top of Elasticsearch

2014-07-02 Thread Matthew Morrison

Great idea. I'll give it a try ASAP.

On Wednesday, July 2, 2014 10:56:48 PM UTC+12, Yousef El-Dardiry wrote:

Hi all,

I just open sourced a set of AngularJS Directives for Elasticsearch. It
enables developers to rapidly build a frontend (e.g.: faceted search
engine) on top of Elasticsearch.

http://www.elasticui.com (or github
https://github.com/YousefED/ElasticUI)

It makes creating an aggregation and listing the buckets as simple as:

*ul
eui-aggregation=ejs.TermsAggregation('text_agg').field('text').size(10)*
*li ng-repeat=bucket in aggResult.buckets{{bucket}}/li*
*/ul*

I think this was currently missing in the ecosystem, which is why I
decided to build and open source it. I'd love any kind of feedback.

- Yousef

*-*
Another example; add a checkbox facet based on a field using one of the
built-in widgets
https://github.com/YousefED/ElasticUI/blob/master/docs/widgets.md:

*eui-checklist field='facet_field' size=10/eui-checklist*

Resulting in
[image: checklist screenshot]

See why you should attend BroadSoft Connections 2014
http://broadsoftconnections.com/

This email is intended solely for the person or entity to which it is
addressed and may contain confidential and/or privileged information. If
you are not the intended recipient and have received this email in error,
please notify BroadSoft, Inc. immediately by replying to this message, and
destroy all copies of this message, along with any attachment, prior to
reading, distributing or copying it.

Re: Min Hard Drive Requirements

Ok, how many were you reducing to? How big is the index?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 3 July 2014 02:03, Ophir Michaeli ophirmicha...@gmail.com wrote:

When I tried to optimize the index had 51 shards.
Regards, Ophir

On Wednesday, July 2, 2014 11:27:50 AM UTC+3, Mark Walkom wrote:

How many shards do you have for the index, or how many are you trying to
optimise (merge) down to?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 2 July 2014 18:13, Ophir Michaeli ophirm...@gmail.com wrote:

Hi all,

I'm testing the indexing of 100 million documents, it took about 400GB
of the hard drive.
Is there a minimum free hard drive space needed for the index to work OK?
I'm asking because after we indexed 100 million documents we tested the
index and it worked OK,
but then when trying to optimize the optimize took days and then the
index did not respond.
The hard drive had only 10 GB free space so we tried to copy the index
to a new hard drive with a bigger free space, but the index is still not
functioning.

Thank you,
Ophir

Re: Are there any facets that can be used to co-relate log events ?

2014-07-02 Thread Matthew Morrison

Hi Aditya,

I'm looking to do something similar, did you have any success with this 
problem?

Thanks
Matt

On Wednesday, January 22, 2014 11:53:36 PM UTC+13, Aditya Pavan Kumar 
Vegesna wrote:

  Hi

 I am looking for way to co-relate multiple log events and then calculate 
 the time duration between those events?

 e.g: Request log event  response log event - to calculate the difference 
 in timestamps to assess the performance of the application.

 Can anyone help me how this can be achieved.

 Thanks

 Pavan Kumar


-- 

See why you should attend BroadSoft Connections 2014 
http://broadsoftconnections.com/

This email is intended solely for the person or entity to which it is 
addressed and may contain confidential and/or privileged information. If 
you are not the intended recipient and have received this email in error, 
please notify BroadSoft, Inc. immediately by replying to this message, and 
destroy all copies of this message, along with any attachment, prior to 
reading, distributing or copying it.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d20a2379-c915-477c-877d-690895b22773%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Kibana browser compatibility issues

2014-07-02 Thread Laura

We are using Logstash-ElasticSearch-Kibana and just want to be able to open 
the index file in Kibana. What is the necessary plugin that will allow us 
to do this in something other than firefox?

On Monday, June 2, 2014 11:56:35 AM UTC-7, Binh Ly wrote:

 If you simply point the browser at the file system index.html, in my 
 experience, that only works in Firefox (and only if you explicitly do 
 http://server:9200;). The Kibana default assumes that you actually run 
 Kibana from a web server (or as an ES site plugin if you prefer) and that 
 ES is accessible from the same host as where Kibana is being served from. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/303741f3-a5ce-40c0-b9c0-b2284637c92c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Kibana browser compatibility issues

2014-07-02 Thread Brian

Laura,

The simplest way is to install Kibana as a site plug-in on the same node on
which you run Elasticsearch. Not the best way from a performance and
security perspective, but certainly the easiest way to start with an
absolute minimum of extra levers to pull and knobs to turn, so to speak.

So what does that really mean, a site plugin?

Assume you configure Elasticsearch to look for plugins within the
/opt/elk/plugins directory.

Then you unpack the Kibana3 distribution within /opt/kibana3. That means
you'll see the following files within /opt/kibana3/kibana-3.1.0:
app build.txt config.js css favicon.ico font img index.html
LICENSE.md README.md vendor

So then create the /opt/elk/plugins/kibana3 directory. Then:
$ ln -s /opt/kibana3/kibana-3.1.0 /opt/elk/plugins/kibana3/_site

Now when you start ES and point it to the correct configuration file which
in turn points it to the plugins directory as described above, Kibana will
be available at the following URL (assuming you're on the same host; change
localhost as needed, of course):

http://localhost:9200/_plugin/kibana3/

Hope this helps!

Brian

Visibility

2014-07-02 Thread smonasco

Hi,

I'm trying to get a lot more visibility and metrics into what's going on
under the hood.

Occasionally, we see spikes in memory. I'd like to get heap mem used on a
per shard basis. If I'm not mistaken, somewhere somehow, this Lucene index
that is a shard is using memory in the heap, and I'd like to collect metric.

It may also be an operation somewhere higher up in the elasticsearch level
where we are merging results from shards or results from indexes (maybe
elasticsearch doesn't bother to merge twice but merges once), that's also a
mem space I'd like to collect data on.

I think a per query mem use would also be something interesting, though,
perhaps obviously too much to keep up with for every query (maybe a future
opt-in feature, unless it's already there and I'm missing it).

Other cluster events like nodes entering and exiting the cluster or the
changing of the master would be nice to collect.

I'm guessing some of this isn't available and some of it is, but my
Google-Fu seems to be lacking. I'm pretty sure I can poll to figure out
the events happened, but was wondering if there was something in the java
client node where I could get a Future or some other hook to turn it into a
push instead of a pull.

Any help will be appreciated. I'm aware it's a wide net though.

--Shannon Monasco

downside to using Bulk API for small/single-doc sets?

2014-07-02 Thread Nikita Tovstoles

Hi,

I am using ES Java API to talk to an ES server. Sometimes I need to index a 
single doc, sometimes dozens or hundreds at a time. I'd prefer to keep my 
code simple (am a contrarian thinker) and wonder if I can get away with 
always using bulk API (ie BulkRequestBuilder). so that my interface to ES 
would look like so:

void indexDoc(Doc doc);
void indexDocs(CollectionDoc docs);

...but impl would always delegate to BulkRequestBuilder - with number of 
actions sometimes being ~ 1.

Is there a performance (or other) downside to this approach. Specifically, 
would bulk index updates (with set of size == 1) take significantly longer 
than non-bulk updates?

thanks,
-nikita

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9a915ef3-812b-4905-8e4e-852aeb43a81c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommended Hardware Specs Sharding\Index Strategy

The heap should be as big as your largest shard, irrespective of what index
it belongs to or if it's a replica.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 3 July 2014 05:50, mrno42 doug...@gmail.com wrote:

When you say - do not let a shard grow bigger than your JVM heap (this is
really a rough estimation) so segment merging will work flawlessly

So per node, the heap size should be larger than:

1 shard for first index
1 shard for replica of first index
1 shard for second index
1 shard for replica second index

the four shards combined?

Thanks again for your advice

On Saturday, August 10, 2013 6:50:27 AM UTC-7, Jörg Prante wrote:

Your concern is a single shard getting too big. If you use 64bit JVM
and mmapfs (quite common), you can open even the largest files. So from
this point of view, a node can handle the biggest files. There is no real
limit.

For selecting a total number of shards and shard size, here are some
general rules of thumb:

- do not select a smaller number of shards than your total number of
nodes you will add to the cluster. Each node should hold at least one shard.

- do not let a shard grow bigger than your JVM heap (this is really a
rough estimation) so segment merging will work flawlessly

- if you want fast recovery, or if you want to move shards around (not a
common case), the smaller a shard is the faster the operation will get done

Jörg

On Fri, Aug 9, 2013 at 4:32 PM, David Arata david...@gmail.com wrote:

My concern is what would would be the best strategy so that an index or
single shard in an index does not get too big for a node to handle and if
its approaching that size what can be done?

Re: Looking to build a logging solution with threshold alerting.

There was another thread on this very recently, and some people are using
riemann for this.
Take a look in the archives and you can probably find some useful info.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 2 July 2014 22:53, Joshua Hall joshuadeanh...@gmail.com wrote:

I am looking to build a logging solution and wanted to make sure that I am
not missing any key components.

The logs that I have are currently stored in a database which there is
limited access due to locking risks from bad queries.

I have spent some time looking at this and the general response seems to
be to use percolation, but that seems to only make sense if you want to
send an alert if you receive a single error that matches a query and from
what I have seen there is no way to a threshold alerting system using
percolation.

I know that SPM seems to cover this and move but we are currently looking
to see if we can do this without buying another product.

Is this the correct approach to take or should I be looking at doing
something else?

Re: Visibility