from:"Eric"

i'm able to get the mappings working as you suggested. However, the custom 
_id mapping is not working.

it's still generating a dynamic _id.

any ideas?


On Wednesday, April 30, 2014 5:07:57 PM UTC-4, Eric Sims wrote:
>
> should i keep the mapping in the original PUT /_river/mytest_river/_meta 
> statement or removing in lieu of the other separate mapping statement?
>
> because i tried what you just suggested and it didn't seem to make a 
> difference with having removed the mapping statement within the river.
>
> On Wednesday, April 30, 2014 3:56:22 PM UTC-4, Eric Sims wrote:
>>
>> i can't seem to understand how to fully set up my type mappings while 
>> using jdbc rivers and sql server.
>>
>> here's an example.
>>
>> PUT /_river/mytest_river/_meta
>> {
>> "type": "jdbc",
>> "jdbc": {
>>   "url":"jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase",
>>   "user":"myuser",
>>   "password":"xxx",
>>   "sql":"select * from dbo.musicalbum (nolock)",
>>   "strategy" : "oneshot",
>>   "index" : "myindex",
>>   "type" : "album",
>>   "bulk_size" : 100,
>>   "max_retries": 5,
>>   "max_retries_wait":"30s",
>>   "max_bulk_requests" : 5,
>>   "bulk_flush_interval" : "5s",
>>   "type_mapping": {
>>   "album": {"properties": {
>>"AlbumDescription": {"type": "string"},
>>"AlbumID": {"type": "string"},
>>"Artist": {"type": "string"},
>>"Genre": {"type": "string","index" : "not_analyzed"},
>>"Label": {"type": "string"},
>>"Title": {"type": "string"},
>>"_id" : {"path" : "AlbumID"}
>> }
>>   }
>>}
>> }
>> }
>>
>> so you can see i've specified both a select statement (which normally 
>> would dynamically produce the mapping for me) and also a type mapping. in 
>> the type mapping i've tried to specify that i want the _id to be the same 
>> as AlbumID, and also that i want the Genre to be not_analyzed. it ends up 
>> throwing multiple errors, only indexing one document, and not creating my 
>> full mapping.
>>
>> here's what the mapping ends up looking like: (skipping some of the 
>> columns altogether!)
>>
>> {
>>"myindex": {
>>   "mappings": {
>>  "album": {
>> "properties": {
>>"AlbumDescription": {
>>   "type": "string"
>>},
>>"AlbumID": {
>>   "type": "string"
>>},
>>"Artist": {
>>   "type": "string"
>>},
>>"Genre": {
>>   "type": "string"
>>},
>>"Title": {
>>   "type": "string"
>>}
>> }
>>  }
>>   }
>>}
>> }
>>
>> any assistance would be helpful. it's driving me nuts.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/03aec83a-b365-44d7-bbe0-89dd9e486ac1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with jdbc rivers and type mapping

forget that part - i didn't need the 
"myindex" : { "mappings" { 
part.

other issue still stands though.

On Wednesday, April 30, 2014 5:17:00 PM UTC-4, Eric Sims wrote:
>
> here's another weird bit. it doesn't seem to show the mappings right after 
> i set them:
>
> PUT /myindex/album/_mapping
> {
>   "myindex": {
> "mappings": {
>"album": {
>   "properties": {
>"albumdescription": {"type": "string"},
>"albumid": {"type": "string"},
>"artist": {"type": "string"},
>"genre": {"type": "string", "index" : "not_analyzed"},
>"label": {"type": "string", "analyzer": "whitespace"},
>"title": {"type": "string"},
>"time": {"type" : "string"},
>"_id" : {
> "index_name" : "album.AlbumID", 
> "path" : "full", 
> "type" : "string"
>}
> }
>}
> }
>   }
> }
>
>
> GET /myindex/album/_mapping
>
> returns this:
>
> {
>"myindex": {
>   "mappings": {
>  "album": {
> "properties": {}
>  }
>   }
>}
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf3b18c2-3393-413c-98c8-e6e84ac7c5ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with jdbc rivers and type mapping

here's another weird bit. it doesn't seem to show the mappings right after 
i set them:

PUT /myindex/album/_mapping
{
  "myindex": {
"mappings": {
   "album": {
  "properties": {
   "albumdescription": {"type": "string"},
   "albumid": {"type": "string"},
   "artist": {"type": "string"},
   "genre": {"type": "string", "index" : "not_analyzed"},
   "label": {"type": "string", "analyzer": "whitespace"},
   "title": {"type": "string"},
   "time": {"type" : "string"},
   "_id" : {
"index_name" : "album.AlbumID", 
"path" : "full", 
"type" : "string"
   }
}
   }
}
  }
}


GET /myindex/album/_mapping

returns this:

{
   "myindex": {
  "mappings": {
 "album": {
"properties": {}
 }
  }
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55b7887e-43e3-4836-bef7-55e4c9c6c8e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with jdbc rivers and type mapping

should i keep the mapping in the original PUT /_river/mytest_river/_meta 
statement or removing in lieu of the other separate mapping statement?

because i tried what you just suggested and it didn't seem to make a 
difference with having removed the mapping statement within the river.

On Wednesday, April 30, 2014 3:56:22 PM UTC-4, Eric Sims wrote:
>
> i can't seem to understand how to fully set up my type mappings while 
> using jdbc rivers and sql server.
>
> here's an example.
>
> PUT /_river/mytest_river/_meta
> {
> "type": "jdbc",
> "jdbc": {
>   "url":"jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase",
>   "user":"myuser",
>   "password":"xxx",
>   "sql":"select * from dbo.musicalbum (nolock)",
>   "strategy" : "oneshot",
>   "index" : "myindex",
>   "type" : "album",
>   "bulk_size" : 100,
>   "max_retries": 5,
>   "max_retries_wait":"30s",
>   "max_bulk_requests" : 5,
>   "bulk_flush_interval" : "5s",
>   "type_mapping": {
>   "album": {"properties": {
>"AlbumDescription": {"type": "string"},
>"AlbumID": {"type": "string"},
>"Artist": {"type": "string"},
>"Genre": {"type": "string","index" : "not_analyzed"},
>"Label": {"type": "string"},
>"Title": {"type": "string"},
>"_id" : {"path" : "AlbumID"}
> }
>   }
>}
> }
> }
>
> so you can see i've specified both a select statement (which normally 
> would dynamically produce the mapping for me) and also a type mapping. in 
> the type mapping i've tried to specify that i want the _id to be the same 
> as AlbumID, and also that i want the Genre to be not_analyzed. it ends up 
> throwing multiple errors, only indexing one document, and not creating my 
> full mapping.
>
> here's what the mapping ends up looking like: (skipping some of the 
> columns altogether!)
>
> {
>"myindex": {
>   "mappings": {
>  "album": {
> "properties": {
>"AlbumDescription": {
>   "type": "string"
>},
>"AlbumID": {
>   "type": "string"
>},
>"Artist": {
>   "type": "string"
>},
>"Genre": {
>   "type": "string"
>},
>"Title": {
>   "type": "string"
>}
> }
>  }
>   }
>}
> }
>
> any assistance would be helpful. it's driving me nuts.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6198cd66-c2b9-42e9-a8a8-f8ca2fba9ee5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with jdbc rivers and type mapping

no. i just tried deleting all indexes, then i did:

PUT /myindex

then 

PUT /myindex/album/_mapping
{
  "myindex": {
"mappings": {
   "album": {
  "properties": {
   "AlbumDescription": {"type": "string"},
   "AlbumID": {"type": "string"},
   "Artist": {"type": "string"},
   "Genre": {"type": "string","index" : "not_analyzed"},
   "Label": {"type": "string"},
   "Title": {"type": "string"},
   "_id" : {"path" : "AlbumID"}
}
   }
}
  }
}

then i ran the PUT statement in my previous post.

it still treats it as dynamic mappings

On Wednesday, April 30, 2014 3:56:22 PM UTC-4, Eric Sims wrote:
>
> i can't seem to understand how to fully set up my type mappings while 
> using jdbc rivers and sql server.
>
> here's an example.
>
> PUT /_river/mytest_river/_meta
> {
> "type": "jdbc",
> "jdbc": {
>   "url":"jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase",
>   "user":"myuser",
>   "password":"xxx",
>   "sql":"select * from dbo.musicalbum (nolock)",
>   "strategy" : "oneshot",
>   "index" : "myindex",
>   "type" : "album",
>   "bulk_size" : 100,
>   "max_retries": 5,
>   "max_retries_wait":"30s",
>   "max_bulk_requests" : 5,
>   "bulk_flush_interval" : "5s",
>   "type_mapping": {
>   "album": {"properties": {
>"AlbumDescription": {"type": "string"},
>"AlbumID": {"type": "string"},
>"Artist": {"type": "string"},
>"Genre": {"type": "string","index" : "not_analyzed"},
>"Label": {"type": "string"},
>"Title": {"type": "string"},
>"_id" : {"path" : "AlbumID"}
> }
>   }
>}
> }
> }
>
> so you can see i've specified both a select statement (which normally 
> would dynamically produce the mapping for me) and also a type mapping. in 
> the type mapping i've tried to specify that i want the _id to be the same 
> as AlbumID, and also that i want the Genre to be not_analyzed. it ends up 
> throwing multiple errors, only indexing one document, and not creating my 
> full mapping.
>
> here's what the mapping ends up looking like: (skipping some of the 
> columns altogether!)
>
> {
>"myindex": {
>   "mappings": {
>  "album": {
> "properties": {
>"AlbumDescription": {
>   "type": "string"
>},
>"AlbumID": {
>   "type": "string"
>},
>"Artist": {
>   "type": "string"
>},
>"Genre": {
>   "type": "string"
>},
>"Title": {
>   "type": "string"
>}
> }
>  }
>   }
>}
> }
>
> any assistance would be helpful. it's driving me nuts.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1bda2b24-8fc4-4706-a43f-cadf820ebc6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

help with jdbc rivers and type mapping

i can't seem to understand how to fully set up my type mappings while using 
jdbc rivers and sql server.

here's an example.

PUT /_river/mytest_river/_meta
{
"type": "jdbc",
"jdbc": {
  "url":"jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase",
  "user":"myuser",
  "password":"xxx",
  "sql":"select * from dbo.musicalbum (nolock)",
  "strategy" : "oneshot",
  "index" : "myindex",
  "type" : "album",
  "bulk_size" : 100,
  "max_retries": 5,
  "max_retries_wait":"30s",
  "max_bulk_requests" : 5,
  "bulk_flush_interval" : "5s",
  "type_mapping": {
  "album": {"properties": {
   "AlbumDescription": {"type": "string"},
   "AlbumID": {"type": "string"},
   "Artist": {"type": "string"},
   "Genre": {"type": "string","index" : "not_analyzed"},
   "Label": {"type": "string"},
   "Title": {"type": "string"},
   "_id" : {"path" : "AlbumID"}
}
  }
   }
}
}

so you can see i've specified both a select statement (which normally would 
dynamically produce the mapping for me) and also a type mapping. in the 
type mapping i've tried to specify that i want the _id to be the same as 
AlbumID, and also that i want the Genre to be not_analyzed. it ends up 
throwing multiple errors, only indexing one document, and not creating my 
full mapping.

here's what the mapping ends up looking like: (skipping some of the columns 
altogether!)

{
   "myindex": {
  "mappings": {
 "album": {
"properties": {
   "AlbumDescription": {
  "type": "string"
   },
   "AlbumID": {
  "type": "string"
   },
   "Artist": {
  "type": "string"
   },
   "Genre": {
  "type": "string"
   },
   "Title": {
  "type": "string"
   }
}
 }
  }
   }
}

any assistance would be helpful. it's driving me nuts.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c9af783-cf6c-4e41-a287-83ff5589350e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

trouble with plugins - phonetic, etc

2014-04-28 Thread Eric Sims

sorry for a noob question. i'm trying to understand phonetic searches - how 
to install and use them.

perhaps phonetics isn't the right way for my instance.

i'm trying to return music artist results for 'lil wayne', but account for 
the user to type 'little wayne'.

i've created and populated an index called /music/artist

so i've installed the phonetic plugin and config is like so:
 - (i created another index (since it won't allow me to put it into 
/music/artist))

PUT /music_admin
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_metaphone"]
}
},
"filter" : {
"my_metaphone" : {
"type" : "phonetic",
"encoder" : "metaphone",
"replace" : false
}
}
}
}
}

this feels wrong. i know. i'm confused at this point as to how to use the 
search. i have a field called 'artist' that i would be searching in.

please help!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c4b78cef-e10c-4f0b-9b5f-07c7cc8d03f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: elasticsearch 1.1.1 initialization failed

2014-04-18 Thread Eric Jain

This issue has been resolved with cloud-aws 2.1.1:

  https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/74


On Thursday, April 17, 2014 6:32:05 PM UTC-7, Eric Jain wrote:
>
> Just tried to upgrade elasticsearch 1.1.0 to 1.1.1 (with the cloud-aws 
> plugin 2.1.0), and am no longer able to start any nodes:
>
> 2014-04-18 01:19:42,754 [INFO] node - [Skywalker] version[1.1.1], 
> pid[22901], build[f1585f0/2014-04-16T14:27:12Z]
> 2014-04-18 01:19:42,767 [INFO] node - [Skywalker] initializing ...
> 2014-04-18 01:19:42,802 [INFO] plugins - [Skywalker] loaded [cloud-aws], 
> sites []
> 2014-04-18 01:19:50,019 [ERROR] bootstrap - {1.1.1}: Initialization Failed 
> ...
> 1) 
> NoSuchMethodError[org.elasticsearch.gateway.blobstore.BlobStoreGateway.(Lorg/elasticsearch/common/settings/Settings;Lorg/elasticsearch/threadpool/ThreadPool;Lorg/elasticsearch/cluster/ClusterService;)V]
>
> Anyone else see this issue?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cb86660c-82d1-4580-8b72-d1e78866a6c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: S3 gateway issues

2014-04-17 Thread Eric Jain

The S3 gateway from the cloud-aws 2.1.0 plugin works fine up to 
elasticsearch 1.1.0, but appears to be broken with 1.1.1, see my other post.


On Friday, April 11, 2014 1:01:34 AM UTC-7, David Pilato wrote:
>
> What is the cloud-aws plugin version please?
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | 
> @elasticsearchfr
>
>
> Le 11 avril 2014 à 07:43:49, Ankur Goel (ankr...@gmail.com ) 
> a écrit:
>
> Hi David ,
>
> thanks for replying ,
>
> I am using version
>
> "number" : "1.0.0",
> we have AWS plugin, we have removed S3 gateway for now ,
> will switch to local but just wanted to make sure why we are getting this 
> error, 
> It will be really helpful to avoid any surprises in future. 
>
>
> On Thursday, 10 April 2014 18:11:48 UTC+5:30, Ankur Goel wrote: 
>>
>> hi,
>>
>> I am using s3 gateway in a application , elastic search version 1.x  , I 
>> had a strange exception while starting my nodes , please take a look
>>
>>
>>
>> Error injecting constructor, java.lang.UnsupportedOperationException
>>   at org.elasticsearch.gateway.s3.S3Gateway.(Unknown Source)
>>   while locating org.elasticsearch.gateway.s3.S3Gateway
>>   while locating org.elasticsearch.gateway.Gateway
>> Caused by: java.lang.UnsupportedOperationException
>> at 
>> org.elasticsearch.cluster.metadata.RestoreMetaData$Factory.fromXContent(RestoreMetaData.java:462)
>> at 
>> org.elasticsearch.cluster.metadata.RestoreMetaData$Factory.fromXContent(RestoreMetaData.java:400)
>> at 
>> org.elasticsearch.cluster.metadata.MetaData$Builder.fromXContent(MetaData.java:1323)
>> at 
>> org.elasticsearch.gateway.blobstore.BlobStoreGateway.readMetaData(BlobStoreGateway.java:213)
>> at 
>> org.elasticsearch.gateway.blobstore.BlobStoreGateway.findLatestIndex(BlobStoreGateway.java:198)
>> at 
>> org.elasticsearch.gateway.blobstore.BlobStoreGateway.initialize(BlobStoreGateway.java:73)
>> at 
>> org.elasticsearch.gateway.s3.S3Gateway.(S3Gateway.java:97)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
>> Method)
>> at 
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>> at 
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>> at 
>> org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
>> at 
>> org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
>> at 
>> org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
>> at 
>> org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
>> at 
>> org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
>> at 
>> org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
>> at 
>> org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
>> at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
>> at 
>> org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
>> at 
>> org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
>> at 
>> org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
>> at 
>> org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
>> at 
>> org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
>> at 
>> org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
>> at 
>> org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
>> at 
>> org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
>> at 
>> org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
>> at 
>> org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
>> at 
>> org.elasticsearch.node.internal.InternalNode.(InternalNode.java:187)
>> at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
>>
>>
>> I am trying to understand what is happening here ,  the exception looks 
>> like it has happened while trying to recover index data but beyond that but 
>> I cannot get a clue , please help
>>
>  --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com

elasticsearch 1.1.1 initialization failed

2014-04-17 Thread Eric Jain

Just tried to upgrade elasticsearch 1.1.0 to 1.1.1 (with the cloud-aws 
plugin 2.1.0), and am no longer able to start any nodes:

2014-04-18 01:19:42,754 [INFO] node - [Skywalker] version[1.1.1], 
pid[22901], build[f1585f0/2014-04-16T14:27:12Z]
2014-04-18 01:19:42,767 [INFO] node - [Skywalker] initializing ...
2014-04-18 01:19:42,802 [INFO] plugins - [Skywalker] loaded [cloud-aws], 
sites []
2014-04-18 01:19:50,019 [ERROR] bootstrap - {1.1.1}: Initialization Failed 
...
1) 
NoSuchMethodError[org.elasticsearch.gateway.blobstore.BlobStoreGateway.(Lorg/elasticsearch/common/settings/Settings;Lorg/elasticsearch/threadpool/ThreadPool;Lorg/elasticsearch/cluster/ClusterService;)V]

Anyone else see this issue?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d5e85a3a-eac1-4a5e-b16d-69fa825b0ebb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Function Score Query and Native scripts

2014-04-12 Thread Eric T

Hi,

The function score documentation doesn't mention any support for native 
scripts, does it still work for the Function Score Query, if so is it the 
same syntax? 
I'm using the custom_filters_score query with a native script but the query 
is deprecated in the latest ES version. I'm still using 0.90.3 but I plan 
to upgrade to the latest version. 

It says that the script_score function for function_score is cached. Does 
this provide the same performance as the Native script? I'm wondering if 
it's necessary to still use a native script or convert it to the 
script_score function

thanks
Eric

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ac58df6-742f-4bcb-8ac2-856adb15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Question about scoring behaviour

2014-03-30 Thread Eric T

I created a new index that includes both the old "autocomplete" multi-field 
and a new multi-field called "autocompletenew" that contains omit_norms : 
true

I did the same query on the two fields and the results are here
https://gist.github.com/ewltang/33ab829c404130c935ac

The scoring is consistent for both but I find the query on the original 
field seems to return results that make more sense to me. For example 
"PaulJones" is the first result and then followed by PaulJones with one 
numerical digit. The second result is more random with "PaulJones" being 
second. The rest of the results contain longer variations of PaulJones.  

I was expecting that the query on autocompletenew to return the results 
that the query on the original field returns. I also didn't expect the 
first query to return the results that I want since the multi-field doesn't 
have omit_norms: true.  Is this the expected behaviour? 



On Friday, March 28, 2014 12:18:02 AM UTC-4, Eric T wrote:
>
> Hi Ivan,
>
> No I don't apply any boost at index time. 
>
> I did not disable norms on the uname.autocomplete field, I will have to 
> get back to you on the result. I'm using 0.90.2.
>
> thanks
> Eric
>
>
> On Thu, Mar 27, 2014 at 8:55 PM, Ivan Brusic  wrote:
>
>> The difference is the fieldNorm. This field holds any boosts (both 
>> document and field level) and any length normalization. It is only 1 byte, 
>> so it is incredibly lossy. Did you apply an index time boost to either the 
>> field or document?
>>
>> Have you tried disabling norms on ngram fields? Which version of 
>> elasticsearch are you using? I noticed you used the old format 
>> "omit_norms":true
>> instead of  
>> "norms": { "enabled": false }
>>
>> -- 
>> Ivan
>>
>>
>> On Thu, Mar 27, 2014 at 1:28 PM, Eric T  wrote:
>>
>>> Hello,
>>>
>>> I'm running a test of my query and mapping shown here:
>>> https://gist.github.com/ewltang/9c00155525784b620ca9
>>>
>>> I'm searching for "pauljones" in the uname field. In the results the 
>>> fifth document containing "pauljones10297" has a score of 16.027834, while 
>>> the 6th document containing "PaulJones" has a score of 5.008698.
>>> Why is the score for the 5th document so much higher than the 6th? 
>>>
>>> Regards,
>>> Eric
>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/6dc50a09-1090-463d-b8d0-ac6186789509%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6dc50a09-1090-463d-b8d0-ac6186789509%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/4LoViXRFa7A/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCo5y0VxaXfE72Fk0SOefnjH1_VXyrqfJoGjbhtywm7SA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCo5y0VxaXfE72Fk0SOefnjH1_VXyrqfJoGjbhtywm7SA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e124bce1-ff32-484d-9c30-3231cb508e96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is it possible to get the forward index table?

2014-03-28 Thread Eric Lu

hi,

Our system needs to calculate the TFs(term frequency) of the result of 
search. i.e, we get a set of documents by a "query_string" search, and we 
want to get the TF of every document.
As we know, elasticseach provides the inverted index(term>documents), 
but it do produce forward index table(document--->terms) when index a 
document. is it possible to get it?

If it is not possible, we have to manually process the search result to get 
the documents' TF. It actually is what elasticsearch has done before.

Or any suggestions? thank you.

eric

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/96fac536-22de-4448-a00b-5c0b7660a743%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Question about scoring behaviour

2014-03-27 Thread Eric T

Hi Ivan,

No I don't apply any boost at index time.

I did not disable norms on the uname.autocomplete field, I will have to get
back to you on the result. I'm using 0.90.2.

thanks
Eric


On Thu, Mar 27, 2014 at 8:55 PM, Ivan Brusic  wrote:

> The difference is the fieldNorm. This field holds any boosts (both
> document and field level) and any length normalization. It is only 1 byte,
> so it is incredibly lossy. Did you apply an index time boost to either the
> field or document?
>
> Have you tried disabling norms on ngram fields? Which version of
> elasticsearch are you using? I noticed you used the old format
> "omit_norms":true
> instead of
> "norms": { "enabled": false }
>
> --
> Ivan
>
>
> On Thu, Mar 27, 2014 at 1:28 PM, Eric T  wrote:
>
>> Hello,
>>
>> I'm running a test of my query and mapping shown here:
>> https://gist.github.com/ewltang/9c00155525784b620ca9
>>
>> I'm searching for "pauljones" in the uname field. In the results the
>> fifth document containing "pauljones10297" has a score of 16.027834, while
>> the 6th document containing "PaulJones" has a score of 5.008698.
>> Why is the score for the 5th document so much higher than the 6th?
>>
>> Regards,
>> Eric
>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/6dc50a09-1090-463d-b8d0-ac6186789509%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6dc50a09-1090-463d-b8d0-ac6186789509%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/4LoViXRFa7A/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCo5y0VxaXfE72Fk0SOefnjH1_VXyrqfJoGjbhtywm7SA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCo5y0VxaXfE72Fk0SOefnjH1_VXyrqfJoGjbhtywm7SA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP43LAsyfAAhy173AYg4QW0%3DZ%2BQAVpu%3DNPHzD4pDwwkX0Ftk3A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Question about scoring behaviour

2014-03-27 Thread Eric T

Hello,

I'm running a test of my query and mapping shown here:
https://gist.github.com/ewltang/9c00155525784b620ca9

I'm searching for "pauljones" in the uname field. In the results the fifth 
document containing "pauljones10297" has a score of 16.027834, while the 
6th document containing "PaulJones" has a score of 5.008698.
Why is the score for the 5th document so much higher than the 6th? 

Regards,
Eric


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6dc50a09-1090-463d-b8d0-ac6186789509%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Windows Elasticsearch cluster performance tuning

2014-03-23 Thread Eric Brandes

Interesting - so in general would you recommend consolidating all 400 
indexes in to a single index and using aliases/filters to address them?  
(they're currently broken out by user, and all operations are scoped to a 
specific user)

If I were to consolidate to a single index, how many shards would be 
recommended?

On Sunday, March 23, 2014 2:00:18 PM UTC-5, David Pilato wrote:
>
> Forget to say that you should extra large instances and not large.
> With larges, you could suffer from noisy neighbors.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 23 mars 2014 à 19:54, David Pilato > a 
> écrit :
>
> IMHO 800 shards per node is far too much. And with only 4gb of memory...
>
> I guess you have lot of GC or you forget to disable SWAP.
>
> My 2 cents.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 23 mars 2014 à 18:08, Eric Brandes > 
> a écrit :
>
> Hey all, I have a 3 node Elasticsearch 1.0.1 cluster running on Windows 
> Server 2012 (in Azure).  There's about 20 million documents that take up a 
> total of 40GB (including replicas).  There's about 400 indexes in total, 
> with some having millions of documents and some having just a few.  Each 
> index is set to have 3 shards and 1 replica.   The main cluster is running 
> on three  4 core machines with 7GB of ram.  The min/max JVM heap size is 
> set to 4GB.  
>
> The primary use case for this cluster is faceting/aggregations over the 
> documents.  There's almost no full text searching, so everything is pretty 
> much based on exact values (which are stored but not analyzed at index time)
>
> When doing some term facets on a few of these indexes (the biggest one 
> contains about 8 million documents) I'm seeing really long response times 
> (> 5 sec).  There are potentially thousands of distinct values for the term 
> I'm faceting on, but I would have still expected faster performance.
>
> So my goal is to speed up these queries to get the responses sub second if 
> possible.  To that end I had some questions:
> 1) Would switching to Linux give me better performance in general?
> 2) I could collapse almost all of these 400 indexes in to a single big 
> index and use aliases + filters instead.  Would this be advisable?
> 3) Would mucking with the field data cache yield any better results?
>
>
> If I can add any more data to this discussion please let me know!
> Thanks!
> Eric
>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/eb5fb6bf-be2c-4d5f-b73a-edc1ef5813f1%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/eb5fb6bf-be2c-4d5f-b73a-edc1ef5813f1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/C6157A06-390B-45C0-8425-3723F37D3766%40pilato.fr<https://groups.google.com/d/msgid/elasticsearch/C6157A06-390B-45C0-8425-3723F37D3766%40pilato.fr?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/04ea8acb-a62f-4232-a483-bdde916c48c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Windows Elasticsearch cluster performance tuning

2014-03-23 Thread Eric Brandes

Hey all, I have a 3 node Elasticsearch 1.0.1 cluster running on Windows 
Server 2012 (in Azure).  There's about 20 million documents that take up a 
total of 40GB (including replicas).  There's about 400 indexes in total, 
with some having millions of documents and some having just a few.  Each 
index is set to have 3 shards and 1 replica.   The main cluster is running 
on three  4 core machines with 7GB of ram.  The min/max JVM heap size is 
set to 4GB.  

The primary use case for this cluster is faceting/aggregations over the 
documents.  There's almost no full text searching, so everything is pretty 
much based on exact values (which are stored but not analyzed at index time)

When doing some term facets on a few of these indexes (the biggest one 
contains about 8 million documents) I'm seeing really long response times 
(> 5 sec).  There are potentially thousands of distinct values for the term 
I'm faceting on, but I would have still expected faster performance.

So my goal is to speed up these queries to get the responses sub second if 
possible.  To that end I had some questions:
1) Would switching to Linux give me better performance in general?
2) I could collapse almost all of these 400 indexes in to a single big 
index and use aliases + filters instead.  Would this be advisable?
3) Would mucking with the field data cache yield any better results?


If I can add any more data to this discussion please let me know!
Thanks!
Eric

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eb5fb6bf-be2c-4d5f-b73a-edc1ef5813f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Facets & multi-valued, numeric fields

2014-03-13 Thread Eric Jain

For sorting, elasticsearch lets me specify how I want to deal with fields 
that contain multiple numeric values, so I can have elasticsearch use e.g. 
the max value in each document.

Is there a similar option I can use when aggregating documents? For 
example, I might want to get the average of the max value in each document.

  http://stackoverflow.com/questions/22368807/facets-multi-valued-fields

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fa3ab4f6-1c95-4742-85b0-a13d414163de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: 3,000 events/sec Architecture

2014-03-12 Thread Eric

Yes, currently logstash is reading files that syslog-ng created. We already 
had the syslog-ng architecture in place so just kept rolling with that.


On Tuesday, March 11, 2014 11:16:42 PM UTC-4, Otis Gospodnetic wrote:
>
> Hi,
>
> Is that Logstash instance reading files that are produces by syslog-ng 
> servers?  Maybe not but if yes, have you considered using Rsyslog with 
> omelasticsearch instead to simplify the architecture?
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Tuesday, March 4, 2014 10:11:59 AM UTC-5, Eric wrote:
>>
>> Hello,
>>
>> I've been working on a POC for Logstash/ElasticSearch/Kibana for about 2 
>> months now and everything has worked out pretty good and we are ready to 
>> move it to production. Before building out the infrastructure, I want to 
>> make sure my shard/node/index setup is correct as that is the main part 
>> that I'm still a bit fuzzy on. Overall my setup is this:
>>
>> Servers
>> Networking Gear   
>>   syslog-ng server
>> End Points   ->   Load Balancer 
>>  >   syslog-ng server  --> Logs 
>> stored in 5 flat files on SAN storage
>> Security Devices 
>> syslog-ng server
>> Etc.
>>
>> I have logstash running on one of the syslog-ng servers and is basically 
>> reading the input of 5 different files and sending them to ElasticSearch. 
>> So within ElasticSearch, I am creating 5 different indexes a day so I can 
>> do granular user access control within Kibana.
>>
>> unix-$date
>> windows-$date
>> networking-$date
>> security-$date
>> endpoint-$date
>>
>> My plan is to have 3 ElasticSearch servers with ~10 gig of RAM each on 
>> them. For my POC I have 2 and it's working fine for 2,000 events/second. My 
>> main concern is how I setup the ElasticSearch servers so they are as 
>> efficient as possible. With my 5 different indexes a day, and I plan on 
>> keeping ~1 month of logs within ES, is 3 servers enough? Should I have 1 
>> master node and the other 2 be just basic setups that are data and 
>> searching? Also, will 1 replica be sufficient for this setup or should I do 
>> 2 to be safe? In my POC, I've had a few issues where I ran out of memory or 
>> something weird happened and I lost data for a while so wanted to try to 
>> limit that as much as possible. We'll also have quite a few users 
>> potentially querying the system so I didn't know if I should setup a 
>> dedicated search node for one of these.
>>
>> Besides the ES cluster, I think everything else should be fine. I have 
>> had a few concerns about logstash keeping up with the amount of entries 
>> coming into syslog-ng but haven't seen much in the way of load balancing 
>> logstash or verifying if it's able to keep up or not. I've spot checked the 
>> files quite a bit and everything seems to be correct but if there is a 
>> better way to do this, I'm all ears.
>>
>> I'm going to have my KIbana instance installed on the master ES node, 
>> which shouldn't be a big deal. I've played with the idea of putting the ES 
>> servers on the syslog-ng servers and just have a separate NIC for the ES 
>> traffic but didn't want to bog down the servers a whole lot. 
>>
>> Any thoughts or recommendations would be greatly appreciated.
>>
>> Thanks,
>> Eric
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f1637e2-c712-4e56-91be-32116d92a3ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES Stops Works Randomly

2014-03-07 Thread Eric

Hello,

I have 2 ES servers that are being fed by 1 logstash server and viewing the 
logs in Kibana. This is a POC to work out any issues before going into 
production. The system has ran for ~1 month and every few days, Kibana will 
stop showing logs at some random time in the middle of the night. Last 
night, the last log entry I received in Kibana was around 18:30. When I 
checked on the ES servers, it showed the master running and the secondary 
not running (from /sbin/service elasticsearch status), but I was able to do 
a curl on the localhost and it returned information. So not sure what's up 
with that. Anyway, when I do a status on the master node, I get this:

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "gis-elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 6,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 186,
  "active_shards" : 194,
  "relocating_shards" : 0,
  "initializing_shards" : 7,
  "unassigned_shards" : 249
}

When I view the indexes, via "ls ...nodes/0/indeces/" it shows all indexes 
being modified today for some reason and there are new file for today's 
date.So I think I'm starting to catch back up after I restarted both 
servers but not sure why it failed in the first place. When I look at the 
logs on the master, I only see 4 warning errors at 18:57 and then the 
2ndary leaving the cluster. I don't see any logs on the secondary (Pistol) 
on why it stopped working or what truly happened.

[2014-03-06 18:57:04,121][WARN ][transport] [ElasticSearch 
Server1] Transport response handler not found of id [64147630]
[2014-03-06 18:57:04,124][WARN ][transport] [ElasticSearch 
Server1] Transport response handler not found of id [64147717]
[2014-03-06 18:57:04,124][WARN ][transport] [ElasticSearch 
Server1] Transport response handler not found of id [64147718]
[2014-03-06 18:57:04,124][WARN ][transport] [ElasticSearch 
Server1] Transport response handler not found of id [64147721]

[2014-03-06 19:56:08,467][INFO ][cluster.service  ] [ElasticSearch 
Server1] removed 
{[Pistol][sIAMHNj6TMCmrMJGW7u97A][inet[/10.1.1.10:9301]]{client=true, 
data=false},}, reason: 
zen-disco-node_failed([Pistol][sIAMHNj6TMCmrMJGW7u97A][inet[/10.13.3.46:9301]]{client=true,
 
data=false}), reason failed to ping, tried [3] times, each with maximum 
[30s] timeout
[2014-03-06 19:56:12,304][INFO ][cluster.service  ] [ElasticSearch 
Server1] added 
{[Pistol][sIAMHNj6TMCmrMJGW7u97A][inet[/10.1.1.10:9301]]{client=true, 
data=false},}, reason: zen-disco-receive(join from 
node[[Pistol][sIAMHNj6TMCmrMJGW7u97A][inet[/10.13.3.46:9301]]{client=true, 
data=false}])

Any idea on additional logging or troubleshooting I can turn on to keep 
this from happening in the future? Since the shards are not caught up, 
right now I"m just seeing a lot o debug messages about failed to parse. I'm 
assuming that will be corrected once we catch up.

[2014-03-07 10:06:52,235][DEBUG][action.search.type   ] [ElasticSearch 
Server1] All shards failed for phase: [query]
[2014-03-07 10:06:52,223][DEBUG][action.search.type   ] [ElasticSearch 
Server1] [windows-2014.03.07][3], node[W6aEFbimR5G712ddG_G5yQ], [P], 
s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@74ecbbc6] lastShard [true]
org.elasticsearch.search.SearchParseException: [windows-2014.03.07][3]: 
from[-1],size[-1]: Parse Failure [Failed to parse source 
[{"facets":{"0":{"date_histogram":{"field":"@timestamp","interval":"10m"},"global":true,"facet_filter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"(ASA
 
AND 
Deny)"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1394118412373,"to":"now"}}}],"size":0}]]

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0a693898-16e3-449f-9bf5-6adc97251e09%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: 3,000 events/sec Architecture

2014-03-04 Thread Eric Luellen

Zach,

Thanks for the information. With my POC, I have 2 10 gig VMs and I'm 
keeping 7 days of logs with no issues but that is a fairly large jump and I 
could see where it may pose an issue. 

As far as the 150 indexes, I'm not sure on the shards per index/replicas. 
That is the part that I'm the weakest on in ES setup. I'm not exactly sure 
how I should set up the ES cluster as far as the shards, replicas, master 
node, data node, search node etc.

I fully agree with the logstash directly to ES. I have 1 logstash instance 
right now failing 5 files and directly feeding in to ES and I've enjoyed 
not having another application to have to worry about.

Eric


On Tuesday, March 4, 2014 10:32:26 AM UTC-5, Zachary Lammers wrote:
>
> Based on my experience, I think you may have an issue with OOM trying to 
> keep a month of logs with ~10gb ram / server.
>
> Say, for instance, 5 indexes a day for 30 days = 150 indexes.  How many 
> shards per index/replicas?
>
> I ran some tests with 8GB assigned to my 20x ES data nodes, and after a ~7 
> days of single index per day of all log data, my cluster would crash due to 
> data nodes going OOM.  I know I can't perfectly compare, and I'm someone 
> new to ES myself, but as soon as I removed the 'older' servers from the 
> cluster that had smaller ram, and gave ES 16GB for each data node, I've not 
> gone OOM since.  I was working with higher data rates, but I'm not sure the 
> volume mattered as much as my shard count per index per node.
>
> For reference, my current lab config is 36 data nodes, running single 
> index per day (18 shards/1 replica), and I can index near 40,000 per second 
> at beginning of day, closer to 30,000 per second near end of day when index 
> is much larger.  I used to run 36 shards/1 replica, but I wanted the 
> shards/index/per node to be minimal, as I'd really like to keep 60 days 
> (except I'm running out of disk space on my old servers first!)  To pipe 
> the data in, I'm running 45 separate logstash instances, each monitoring a 
> single FIFO that I have scripts simply catting data into.  Eash LS instance 
> is joining the ES cluster (no redis/etc, I've had too many issues not going 
> direct to ES).  I recently started over after keeping steady with 25B log 
> events over ~12 days (but ran out of disk so had to delete old indexes).  I 
> tried updating to LS1.4b2/ES1.0.1, but it failed miserably, LS1.4b2 was 
> extremely, extremely slow in indexing, so I'm still LS 1.3.3 and ES0.90.9.
>
> As for master question, I can't answer.  I'm only running one right now 
> for this lab cluster, which I know is not recommended, but I have zero idea 
> how many I should truly have.  Like I said, I'm new to this :)
>
> -Zachary
>
> On Tuesday, March 4, 2014 9:11:59 AM UTC-6, Eric Luellen wrote:
>>
>> Hello,
>>
>> I've been working on a POC for Logstash/ElasticSearch/Kibana for about 2 
>> months now and everything has worked out pretty good and we are ready to 
>> move it to production. Before building out the infrastructure, I want to 
>> make sure my shard/node/index setup is correct as that is the main part 
>> that I'm still a bit fuzzy on. Overall my setup is this:
>>
>> Servers
>> Networking Gear   
>>   syslog-ng server
>> End Points   ->   Load Balancer 
>>  >   syslog-ng server  --> Logs 
>> stored in 5 flat files on SAN storage
>> Security Devices 
>> syslog-ng server
>> Etc.
>>
>> I have logstash running on one of the syslog-ng servers and is basically 
>> reading the input of 5 different files and sending them to ElasticSearch. 
>> So within ElasticSearch, I am creating 5 different indexes a day so I can 
>> do granular user access control within Kibana.
>>
>> unix-$date
>> windows-$date
>> networking-$date
>> security-$date
>> endpoint-$date
>>
>> My plan is to have 3 ElasticSearch servers with ~10 gig of RAM each on 
>> them. For my POC I have 2 and it's working fine for 2,000 events/second. My 
>> main concern is how I setup the ElasticSearch servers so they are as 
>> efficient as possible. With my 5 different indexes a day, and I plan on 
>> keeping ~1 month of logs within ES, is 3 servers enough? Should I have 1 
>> master node and the other 2 be just basic setups that are data and 
>> searching? Also, will 1 replica be sufficient for this setup or should

3,000 events/sec Architecture

2014-03-04 Thread Eric Luellen

Hello,

I've been working on a POC for Logstash/ElasticSearch/Kibana for about 2
months now and everything has worked out pretty good and we are ready to
move it to production. Before building out the infrastructure, I want to
make sure my shard/node/index setup is correct as that is the main part
that I'm still a bit fuzzy on. Overall my setup is this:

Servers
Networking Gear
syslog-ng server
End Points -> Load Balancer
> syslog-ng server --> Logs
stored in 5 flat files on SAN storage
Security Devices
syslog-ng server
Etc.

I have logstash running on one of the syslog-ng servers and is basically
reading the input of 5 different files and sending them to ElasticSearch.
So within ElasticSearch, I am creating 5 different indexes a day so I can
do granular user access control within Kibana.

unix-$date
windows-$date
networking-$date
security-$date
endpoint-$date

My plan is to have 3 ElasticSearch servers with ~10 gig of RAM each on
them. For my POC I have 2 and it's working fine for 2,000 events/second. My
main concern is how I setup the ElasticSearch servers so they are as
efficient as possible. With my 5 different indexes a day, and I plan on
keeping ~1 month of logs within ES, is 3 servers enough? Should I have 1
master node and the other 2 be just basic setups that are data and
searching? Also, will 1 replica be sufficient for this setup or should I do
2 to be safe? In my POC, I've had a few issues where I ran out of memory or
something weird happened and I lost data for a while so wanted to try to
limit that as much as possible. We'll also have quite a few users
potentially querying the system so I didn't know if I should setup a
dedicated search node for one of these.

Besides the ES cluster, I think everything else should be fine. I have had
a few concerns about logstash keeping up with the amount of entries coming
into syslog-ng but haven't seen much in the way of load balancing logstash
or verifying if it's able to keep up or not. I've spot checked the files
quite a bit and everything seems to be correct but if there is a better way
to do this, I'm all ears.

I'm going to have my KIbana instance installed on the master ES node, which
shouldn't be a big deal. I've played with the idea of putting the ES
servers on the syslog-ng servers and just have a separate NIC for the ES
traffic but didn't want to bog down the servers a whole lot.

Any thoughts or recommendations would be greatly appreciated.

Thanks,
Eric

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/13a76e46-91b5-41fe-9667-f674706fe127%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: "No module named elasticsearch"

Hello Honza, it does work in a virtualenv, which is suitable for my 
purposes.Thanks for your help - Eric

On Monday, March 3, 2014 10:37:44 AM UTC-8, Eric Greene wrote:
>
>
> <https://lh3.googleusercontent.com/-EzeBiRu2EhE/UxTLoSk13CI/B6I/EKQVNd3zifM/s1600/no-module-elasticsearch.jpg>
> Hi I am getting started using elasticsearch.
>
> I want to try to use the python elasticsearch 
> client<http://elasticsearch-py.readthedocs.org/en/latest/#>
> .
>
> I ran pip install elasticsearch.  Installation seems successful.  But when 
> I try to import elasticsearch, I get an error "No module named 
> elasticsearch".
>
> What could be the trouble here?
>
> Thanks for any help.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6433e343-b5ab-42ca-b560-ad81a0ea2b21%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: No indexing with JDBC River plugin

I have figured it out.  CHMOD of the mySQL-connector-river jar file to 755 
fixed my issue.  - Eric



On Monday, March 3, 2014 10:17:28 AM UTC-8, Eric Greene wrote:
>
> Hi I am getting started with elasticsearch and the jdbc river plugin.  I 
> want to sync to a mysql database.  I appear to have everything set up 
> correctly... ES starts with no trouble, I have installed the plugin and 
> copied the mysql connector.  However when I begin with a simple test 
> command:
>
> curl -XPUT 'localhost:9200/my_index/videos/_meta' -d '{
> "type" : "jdbc",
> "jdbc" : {
> "url" : "jdbc:mysql://localhost:3306/my_db",
> "user" : "eric",
> "password" : "my_password",
> "sql" : "select title, description, created, active from video",
> "index" : "my_index",
> "type" : "videos"
> }
> }'
>
>
> Nothing from the mysql db is indexed.  Instead it just indexes the 
> information above instead of the mysql data.
>
> So if I follow the above with 
> curl -XGET 'localhost:9200/my_index/_search?pretty&q=*
>
>
> I get the following for hits:
>
> "hits" : {
> "total" : 1,
> "max_score" : 1.0,
> "hits" : [ {
>   "_index" : "my_index",
>   "_type" : "videos",
>   "_id" : "_meta",
>   "_score" : 1.0, "_source" : {
> "type" : "jdbc",
> "jdbc" : {
> "url" : "jdbc:mysql://localhost:3306/my_db",
> "user" : "eric",
> "password" : "my_password",
> "sql" : "select title, description, created, active from video",
> "index" : "my_index",
> "type" : "videos"
> }
>
>
> Logs show that the index was created.  But it isn't an index from mysql.  
>
> I have also double and tripled checked the mysql credentials are working 
> and that the sql statement works.  I have also verified there is in fact 
> data in the video table.
>
> It just seems like the river is being ignored entirely.
>
> The following is from the log file:
>
> [2014-03-03 10:09:28,565][INFO ][cluster.metadata ] [es_node] 
> [my_index] creating index, cause [auto(index api)], shards [5]/[1], 
> mappings []
> [2014-03-03 10:09:28,800][INFO ][cluster.metadata ] [es_node] 
> [my_index] update_mapping [videos] (dynamic)
>
> What could I be missing here? 
>
> I am using the following versions:
>
> ES version 1.0.1
> JDBC River version 1.0.0.1
> mySQL-connector-java version 5.1.28
>
> on Ubuntu 12.04
>
> Thanks for any help.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce4e-1ef2-46a8-9bea-c9a7371abbaa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"No module named elasticsearch"




Hi I am getting started using elasticsearch.

I want to try to use the python elasticsearch 
client
.

I ran pip install elasticsearch.  Installation seems successful.  But when 
I try to import elasticsearch, I get an error "No module named 
elasticsearch".

What could be the trouble here?

Thanks for any help.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9a20f2c4-d234-4bfb-8bcd-5e37845e90cb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No indexing with JDBC River plugin

Hi I am getting started with elasticsearch and the jdbc river plugin.  I 
want to sync to a mysql database.  I appear to have everything set up 
correctly... ES starts with no trouble, I have installed the plugin and 
copied the mysql connector.  However when I begin with a simple test 
command:

curl -XPUT 'localhost:9200/my_index/videos/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/my_db",
"user" : "eric",
"password" : "my_password",
"sql" : "select title, description, created, active from video",
"index" : "my_index",
"type" : "videos"
}
}'


Nothing from the mysql db is indexed.  Instead it just indexes the 
information above instead of the mysql data.

So if I follow the above with 
curl -XGET 'localhost:9200/my_index/_search?pretty&q=*


I get the following for hits:

"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
  "_index" : "my_index",
  "_type" : "videos",
  "_id" : "_meta",
  "_score" : 1.0, "_source" : {
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/my_db",
"user" : "eric",
"password" : "my_password",
"sql" : "select title, description, created, active from video",
"index" : "my_index",
"type" : "videos"
}


Logs show that the index was created.  But it isn't an index from mysql.  

I have also double and tripled checked the mysql credentials are working 
and that the sql statement works.  I have also verified there is in fact 
data in the video table.

It just seems like the river is being ignored entirely.

The following is from the log file:

[2014-03-03 10:09:28,565][INFO ][cluster.metadata ] [es_node] 
[my_index] creating index, cause [auto(index api)], shards [5]/[1], 
mappings []
[2014-03-03 10:09:28,800][INFO ][cluster.metadata ] [es_node] 
[my_index] update_mapping [videos] (dynamic)

What could I be missing here? 

I am using the following versions:

ES version 1.0.1
JDBC River version 1.0.0.1
mySQL-connector-java version 5.1.28

on Ubuntu 12.04

Thanks for any help.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/71a431cb-58a6-4fd6-87fb-6ec08ee37bc4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"Illegal character in path"

2014-03-01 Thread Eric Jain

Just had this error (elasticsearch 1.0.1), don't recall seeing it before:

  java.net.URISyntaxException: Illegal character in path at index 29: 
/cache/11_/seg0/index12450141[12450141].ts

Should I be concerned?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/34bb0d6a-598a-4ae7-9fa1-200e51da92a2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

debug logs when indexing by bulks

2014-02-19 Thread Eric Lu

Hi,
 
I'm using bulks to index billions of docs. I check the logs and I find that 
it keeps logging like this:
[2014-02-20 00:00:01,325][DEBUG][action.bulk  ] [Will o' the 
Wisp] [...][21] failed to execute bulk item (index) index {...}
java.lang.ArrayIndexOutOfBoundsException
[2014-02-20 00:00:04,345][DEBUG][action.bulk  ] [Will o' the 
Wisp] [...][29] failed to execute bulk item (index) index {...}
java.lang.ArrayIndexOutOfBoundsException
...

but my indices are keep increasing. how to solve it out?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/382ee60a-4e63-41cc-a5da-da2e0df8689b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Index Mapping/Routing Help

2014-02-19 Thread Eric Luellen

Thanks! What was throwing me off is that I'm still having the UNIX logs 
also write to logstash-date as well so I was seeing that information in my 
main dashboard. I wasn't thinking about it writing 2 different times. 
Thanks again.

On Tuesday, February 18, 2014 4:52:38 PM UTC-5, Binh Ly wrote:
>
> Yup, you will need to go into your Kibana dashboard - top right corner - 
> Configure Dashboard | Index and change the settings there to point to your 
> new index(es) instead of the default logstash-* indexes.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aa721942-fe0b-4c7c-a2dc-8e95a1abcc02%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Index Mapping/Routing Help

2014-02-18 Thread Eric Luellen

Thanks for that information. When I'm looking in Kibana now, it's showing 
the correct type but it still shows the index as the original 
logstash-2014-02-18. Not sure why it isn't showing the unix-date index. If 
I look at ElasticSearch, I can see that it did create the new index I told 
it to though.


On Tuesday, February 18, 2014 12:53:22 PM UTC-5, Binh Ly wrote:
>
> You should be able to use the input type to direct log events to specific 
> indexes. For example:
>
> input {
>   file { 
> type => "unixlogs"
> path => "/var/log/UNIX/*.log"
>   } 
> }
>
> output {
>   if [type] == "unixlogs" {
> elasticsearch { 
>   host => "localhost"
>   index => "unix-%{+.MM.dd}"
> }
>   }
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f034ce7-68c9-4f56-918b-bc4c887f74fb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Index Mapping/Routing Help

2014-02-17 Thread Eric Luellen

Hello,

Currently I have the following setup.

Syslog --> Logstash --> ElasticSearch --> Kibana

Logstash is creating a daily index 
"/etc/elasticsearch/data/test-elasticsearch/nodes/0/indices/logstash-2014.02.04"
 
and I'm viewing all of the logs through Kibana. We want to set up some user 
based access control using the kibana-authentication-proxy setup due to it 
supporting 

   - Per-user kibana index supported. now you can use index 
   kibana-int-userA for user A and kibana-int-userB for user B

I'd like to make it where all logs coming in from logstash with a location 
of "/var/log/UNIX/*.log" get sent to a new index of unix-2014.02.04 instead 
of the logstash one. That way I can use the Kibana auth proxy to give my 
UNIX users access only to their logs. I've read a little about creating the 
mappings but wasn't sure how to tie it all together. I saw you could do 
various things with API calls but was curious if I could set all of this up 
in the elasticsearch.yml file from the start.

Thanks,
Eric

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56e2fc09-c179-4839-a23f-67a805f563ce%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: "term_stats" equivalent in aggregations? (ES 1.0)

2014-02-13 Thread Eric Nelson

Perfect! Thanks Binh.

---Eric

On Thursday, February 13, 2014 12:18:37 PM UTC-7, Binh Ly wrote:
>
> You should be able to do a stats sub under terms like this:
>
> {
>   "aggs": {
> "terms": {
>   "terms": {
> "field": "term_field"
>   },
>   "aggs": {
> "stats": {
>   "stats": {
> "field": "stats_field"
>   }
> }
>   }
> }
>   }
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3cfac2f-4002-4762-94d2-aa805fe01674%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"term_stats" equivalent in aggregations? (ES 1.0)

2014-02-13 Thread Eric Nelson

Hi all. LOVE the new 1.0 release, and learning about the new aggregation 
framework. Can you tell me if there is an aggregation equivalent to the 
term_stats facet? The term aggregation isn't quite the same. I want to 
bucket on one field, and calculate stats on another. Any help on this would 
be greatly appreciated.

--Eric

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a2a4286e-18ac-4e2b-906f-2279dd915512%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: facets.total and hits.total dont match

2014-01-20 Thread Eric Rodriguez

Hi,

I don't have the link right now but IIRC when you have more than 1 shard there 
is no certainty about facet count accuracy...
The best "workaround" is either 1 shard to get exact count or extend the number 
of results asked for facet to achieve better (still not exact) count.

Eric

Sent from my iPhone

> On 21 Jan 2014, at 07:58, Chetana  wrote:
> 
> I have indexed some records by making test_field to be 'analyzed'. If the 
> analyzed field causing this issue, is there any other facet type/work around 
> which can solve the problem?
>  
> 
>> On Tuesday, January 21, 2014 12:15:45 PM UTC+5:30, Chetana wrote:
>> I have an application where I need both search results and facet 
>> information. Everytime a query is framed based on some filter condition and 
>> query words and it is passed to both facet and search request as given 
>> below. The field (test_field) on which the facet to be applied is present in 
>> all documents.
>>  
>> BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
>> SearchRequestBuilder srb = client.prepareSearch("Test");
>> srb.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery( 
>> boolQueryBuilder);
>> and
>> TermsFacetBuilder facBuilder = FacetBuilders.termsFacet("test_field");
>>   facBuilder.facetFilter(FilterBuilders
>>   .queryFilter(boolQueryBuilder));
>>   facBuilder.fields("test_field");
>>   facBuilder.global(true);   // I tried commenting this too, but 
>> I get the same result
>>   srb.addFacet(facBuilder);
>>  
>> "hits" : {
>> "total" : 117,
>> "max_score" : null,
>> "hits" : [ {
>>  }]
>> 
>>   "facets" : {
>> "assettype" : {
>>   "_type" : "terms",
>>   "missing" : 5,
>>   "total" : 119,
>>   "other" : 0,
>>   "terms" : [ {
>> }]
>>  
>> But the hit count is different from the facet count. Can anyone please 
>> explain me why this discrepancy?
>>  
>>  
>> Thanks
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a8ed55c0-6599-4612-995d-28d3340e69f7%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/E2A7D14B-CCA0-48D8-BB70-56FC0ACA3112%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Bulk indexing slow down when data amount increase

2014-01-14 Thread Eric Lu

I have set the replica to 0 and queue to 50. and it can index about 7 - 8 
millions documents per hour now. It's acceptable . Though i don't know 
which change makes it.

Thank you all.

在 2014年1月13日星期一UTC+8下午9时04分35秒，Eric Lu写道：
>
> I observed the GC occured once every 15 seconds when  heap mem was 75% of 
> the heap size. Is it too frequent? there is no OOMs.
>
> I set refresh interval to 30s. 
>
> I'll try to use a smaller queue and set replica to 0
>
> Thank you.
>
> 在 2014年1月13日星期一UTC+8下午8时42分56秒，Jörg Prante写道：
>>
>> 12 hours is an absurdly long time for indexing 10 million docs.
>>
>> queue:1000 is much too high for production. For test it may be ok (it 
>> effectively disables queue rejections) but on production, you play with the 
>> risk of starving your cluster resources.
>>
>> Do you rmonitor the resource usage of ES, especially the heap? Is GC 
>> starving your cluster? Do you see OOMs?
>>
>> Do you evaluate the bulk responses for errors? Do you throttle bulk 
>> request concurrency? 
>>
>> Do you set refresh interval to -1? 
>>
>> Hint: if 5 nodes is your maximum, you can also bulk index with 5 shards 
>> and replica level 0, after bulk, you can increase replica level to 1.
>>
>> Jörg
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b9fab05-fa3e-455c-b8ba-1253b72c9e46%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Bulk indexing slow down when data amount increase

2014-01-13 Thread Eric Lu

I observed the GC occured once every 15 seconds when  heap mem was 75% of 
the heap size. Is it too frequent? there is no OOMs.

I set refresh interval to 30s. 

I'll try to use a smaller queue and set replica to 0

Thank you.

在 2014年1月13日星期一UTC+8下午8时42分56秒，Jörg Prante写道：
>
> 12 hours is an absurdly long time for indexing 10 million docs.
>
> queue:1000 is much too high for production. For test it may be ok (it 
> effectively disables queue rejections) but on production, you play with the 
> risk of starving your cluster resources.
>
> Do you rmonitor the resource usage of ES, especially the heap? Is GC 
> starving your cluster? Do you see OOMs?
>
> Do you evaluate the bulk responses for errors? Do you throttle bulk 
> request concurrency? 
>
> Do you set refresh interval to -1? 
>
> Hint: if 5 nodes is your maximum, you can also bulk index with 5 shards 
> and replica level 0, after bulk, you can increase replica level to 1.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8ba26c0a-00cd-46ed-9610-eeb5b5f6243b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bulk indexing slow down when data amount increase

2014-01-13 Thread Eric Lu

Hi, guys
I'm using elasticsearch to index a large number  of documents. A document 
is about 0.5KB. 
My elasticsearch cluster has 5 nodes(all data nodes). Each nodes are 
running oracle Java version: 1.7.0_13 and both have 16GB RAM with 8GB 
allocated to the JVM. And the index has 50 shards and 1 replicas.
I set the bulk thread pool to size:30 and queue:1000.
I use one thread to indexing documents by bulk,  bulk size is 1000.
In the beginning, the performance is very good. It can index about 10 
million documents per hour. But with the increasing of indexing document, 
it slows down. When the cluster has 500 million document indexed, i noticed 
that it spent about 12 hours to index 10 million documents.

Is it normal? Or what is the bottleneck that throttling it？

Any help？

Regards
Eric

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a381d703-3657-4669-8104-918d82c6c0be%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Is there a help document about bigdesk plugin？

2014-01-09 Thread Eric Lu

Or some detail introduction about the various charts？

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9a893a8-9c16-4d2d-9efe-7218a4895896%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch Missing Data

2014-01-09 Thread Eric Luellen

Alexander,

1. The only odd log entry was at 19:00 on 1/7/14, which was about 1 hr. 
before logs stopped. These logs are on the master and She-Hulk is the only 
other node.

[2014-01-07 19:00:02,947][DEBUG][indices.recovery ] [ElasticSearch 
Server1] [logstash-2014.01.08][0] recovery completed from 
[She-Hulk][_MtrVsSmQIaM-BErhEtg9w][inet[/10.1.11.111:9300]], took[333ms]
   phase1: recovered_files [1] with total_size of [71b], took [68ms], 
throttling_wait [0s]
 : reusing_files   [0] with total_size of [0b]
   phase2: start took [13ms]
 : recovered [17] transaction log operations, took [12ms]
   phase3: recovered [0] transaction log operations, took [164ms]
[2014-01-07 19:00:03,375][DEBUG][indices.recovery ] [ElasticSearch 
Server1] [logstash-2014.01.08][2] recovery completed from 
[She-Hulk][_MtrVsSmQIaM-BErhEtg9w][inet[/10.1.11.111:9300]], took[502ms]
   phase1: recovered_files [1] with total_size of [71b], took [30ms], 
throttling_wait [0s]
 : reusing_files   [0] with total_size of [0b]
   phase2: start took [6ms]
 : recovered [6] transaction log operations, took [38ms]
   phase3: recovered [13] transaction log operations, took [20ms]
[2014-01-07 19:00:06,898][INFO ][cluster.metadata ] [ElasticSearch 
Server1] [logstash-2014.01.08] update_mapping [logs] (dynamic)

Also, on She-Hulk I got an error stating that the master_left at 20:52 
because it wasn't pingable, but not sure why.

2.I am not sure. I was thinking that the shard should still be there but 
just unassigned and once it came back up, it'd start processing it.
3. On both my master and my 2ndary, the config is in 
/etc/elasticsearch/elasticsearch.yml and it is ran by 
/etc/init.d/elasticsearch. On the master, it works fine and make the 
correct node name, cluster name, data directory, etc. It is an identical 
setup on the 2ndary but it only grabs the cluster name. Everything else 
defaults to some other location.On the secondary, the only data location is 
in /var/lib/elasticsearch/node-name. In the config I tell it to go to 
/etc/elasticsearch/data. On the master it is in the correct location of 
/etc/elasticsearch/data. 

So overall, I guess the first issue was something weird happened to my 
server and not much I can do about that. I'm more interested in the 3rd 
question now since I still don't know why it's not reading that full config 
file but obviously part of it since it's part of my cluster.




On Thursday, January 9, 2014 3:30:40 AM UTC-5, Alexander Reelsen wrote:
>
> Hey,
>
> a couple of things:
>
> 1. Did you check the log files? Most likely in /var/log/elasticsearch if 
> you use the packages. Is there anything suspicious at the time of your 
> outage? Please check your master node as well, if you have one (not sure if 
> it is a master or client node from the cluster health).
> 2. Why should elasticsearch pull your data? Any special configuration you 
> didnt mention? Or what exactly do you mean here?
> 3. Happy to debug your issue with the init script. The elasticsearch.yml 
> file should be in /etc/elasticsearch/ and not in /etc - anything manually 
> moved around? Can you still reproduce it?
>
>
> --Alex
>
>
>
>
> On Wed, Jan 8, 2014 at 8:10 PM, Eric Luellen 
> > wrote:
>
>> Hello,
>>
>> I've had my elasticsearch instance running for about a week with no 
>> issues, but last night it stopped working. When I went to look in Kibana, 
>> it stops logging around 20:45 on 1/7/14. I then restarted the service on 
>> both both elasticsearch servers and it started logging again and back 
>> pulled some logs from 07:10 that morning, even though I restarted the 
>> service around 10:00. So my questions are:
>>
>> 1. Why did it stop working? I don't see any obvious errors.
>> 2. When I restarted it, why didn't it go back and pull all of the data 
>> and not just some of it? I see that there are no unassigned shards.
>>
>> curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
>> {
>>   "cluster_name" : "my-elasticsearch",
>>   "status" : "green",
>>   "timed_out" : false,
>>   "number_of_nodes" : 3,
>>   "number_of_data_nodes" : 2,
>>   "active_primary_shards" : 40,
>>   "active_shards" : 80,
>>   "relocating_shards" : 0,
>>   "initializing_shards" : 0,
>>   "unassigned_shards" : 0
>>
>> Are there any additional queries or logs I can look at to see what is 
>> going on? 
>>
>> On a slight side note, when I restarted my 2nd elasticsearch server it 
>> isn't reading from the /etc/elasticsearch.yml file like it should. It isn't 
>>

Elasticsearch Missing Data

2014-01-08 Thread Eric Luellen

Hello,

I've had my elasticsearch instance running for about a week with no issues,
but last night it stopped working. When I went to look in Kibana, it stops
logging around 20:45 on 1/7/14. I then restarted the service on both both
elasticsearch servers and it started logging again and back pulled some
logs from 07:10 that morning, even though I restarted the service around
10:00. So my questions are:

1. Why did it stop working? I don't see any obvious errors.
2. When I restarted it, why didn't it go back and pull all of the data and
not just some of it? I see that there are no unassigned shards.

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "my-elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 40,
"active_shards" : 80,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

Are there any additional queries or logs I can look at to see what is going
on?

On a slight side note, when I restarted my 2nd elasticsearch server it
isn't reading from the /etc/elasticsearch.yml file like it should. It isn't
creating the node name correctly or putting the data files in the spot I
have configured. I'm using CentOS and doing everything via
/etc/init.d/elasticsearch on both servers and the elasticsearch1 server
reads everything correctly but elasticsearch2 does not.

Thanks for your help.
Eric

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fc191ee4-b312-4c52-89d9-de04c4309b65%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: ElasticSearch Index Wrong Date

2014-01-02 Thread Eric Luellen

Not sure what happened, but after restarting Logstash everything is working 
fine. I guess it just wasn't a fan of the change in  year.

On Thursday, January 2, 2014 10:03:05 AM UTC-5, Eric Luellen wrote:
>
> Hello,
>
> I recently setup my elasticsearch instance and everything has been working 
> fine. However, when I looked at Kibana today I saw that the logs stopped 
> showing up as soon as 2014 hit. When looking at my data on the cluster, I 
> see this:
>
> ls -altr data/my-cluster/nodes/0/indices/
> total 44
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 20 09:39 kibana-int
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 25 14:00 
> logstash-2013.12.26
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 26 14:00 
> logstash-2013.12.27
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 27 14:00 
> logstash-2013.12.28
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 28 14:00 
> logstash-2013.12.29
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 29 14:00 
> logstash-2013.12.30
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 30 14:00 
> logstash-2013.12.31
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 31 14:00 
> logstash-2013.01.01
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 31 14:00 
> logstash-2014.01.01
> drwxr-xr-x  8 elasticsearch elasticsearch 4096 Jan  1 14:00 
> logstash-2013.01.02
>
> As you can see, there is one 2014 file and 2 2013 files for the new year 
> that shouldn't be there. For some reason, elasticsearch thinks it's 2013 
> still and creating folders with the wrong date. I confirmed that all of my 
> servers have the correct time on them. How can I fix this on 
> elasticsearch's end?
>
> Thanks,
> Eric
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57c938de-7b7c-471f-afd7-df9c409fda5c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ElasticSearch Index Wrong Date

2014-01-02 Thread Eric Luellen

Hello,

I recently setup my elasticsearch instance and everything has been working 
fine. However, when I looked at Kibana today I saw that the logs stopped 
showing up as soon as 2014 hit. When looking at my data on the cluster, I 
see this:

ls -altr data/my-cluster/nodes/0/indices/
total 44
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 20 09:39 kibana-int
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 25 14:00 
logstash-2013.12.26
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 26 14:00 
logstash-2013.12.27
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 27 14:00 
logstash-2013.12.28
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 28 14:00 
logstash-2013.12.29
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 29 14:00 
logstash-2013.12.30
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 30 14:00 
logstash-2013.12.31
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 31 14:00 
logstash-2013.01.01
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Dec 31 14:00 
logstash-2014.01.01
drwxr-xr-x  8 elasticsearch elasticsearch 4096 Jan  1 14:00 
logstash-2013.01.02

As you can see, there is one 2014 file and 2 2013 files for the new year 
that shouldn't be there. For some reason, elasticsearch thinks it's 2013 
still and creating folders with the wrong date. I confirmed that all of my 
servers have the correct time on them. How can I fix this on 
elasticsearch's end?

Thanks,
Eric

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d2ed1eb-e7b4-4c51-8b14-15e065d05592%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Unassigned Shards

2013-12-20 Thread Eric Luellen

I got the initial issue fixed of me getting data back again. However I 
still don't understand how to fix the unassigned shards issue and how to 
properly restart elasticsearch without it complaining.

On Friday, December 20, 2013 9:28:53 AM UTC-5, Eric Luellen wrote:
>
> Mark,
>
> I used the rpm install. I'll take a look at the plugins. Thanks.
>
> On Thursday, December 19, 2013 5:07:53 PM UTC-5, Mark Walkom wrote:
>>
>> Did you install ES via a rpm/deb or using the zip? I ask because your 
>> data store directory is custom.subl
>>
>> Check out these plugins for monitoring - elastichq, kopf, bigdesk. They 
>> will give you an overview of your cluster and might give you insight into 
>> where your problem lies. The other best place to check is the ES logs.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 20 December 2013 08:52, Eric Luellen  wrote:
>>
>>> I think I made my situation even worse. I tried deleting the shards and 
>>> starting over and now elasticsearch isn't even creating the 
>>> /etc/elasticsearch/data/my-cluster/node folder.
>>>
>>>
>>> On Thursday, December 19, 2013 4:04:41 PM UTC-5, Eric Luellen wrote:
>>>>
>>>> Hello,
>>>>
>>>> Currently I have my syslog-ng --> logstash --> elasticsearch1 & 
>>>> elastisearch2 setup working pretty good. It's accepting over 300 events 
>>>> per 
>>>> second and hasn't bogged the systems down at all. However I'm running into 
>>>> 2 issues that I don't quite understand. 
>>>>
>>>> 1. When viewing the information in Kibana, it appears to be anywhere 
>>>> from 15 min to an hr behind on the "all events" view. Sometimes when I 
>>>> search for new logs it shows up correctly but overall it seems like it's 
>>>> lagging behind trying to keep up with what logstash is sending it. That 
>>>> being said, I'm concerned that logs are being dropped and I don't know 
>>>> about it. Are there any commands I can use to validate this type of 
>>>> information or what I can do to make sure elasticsearch/KIbana is keeping 
>>>> up?
>>>>
>>>> 2. I've had to restart elasticsearch a few times and every time I do, 
>>>> it completely breaks things. Once it starts back up it doesn't continue to 
>>>> show the logs in Kibana correctly and when I run a health check, it says 
>>>> there are unassigned shards. I've not been able to fix this and in the 
>>>> past 
>>>> I've always just had to delete them and start from scratch again.
>>>>
>>>> Any idea what is going on with this or how I can more cleanly restart 
>>>> or reboot the servers and recover from it?
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/1f9a1c4a-94cf-49d7-a4d1-22ffb0b64727%40googlegroups.com
>>> .
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9c2df0af-0f11-485d-b892-ca8d7b8a108f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Unassigned Shards

2013-12-20 Thread Eric Luellen

Mark,

I used the rpm install. I'll take a look at the plugins. Thanks.

On Thursday, December 19, 2013 5:07:53 PM UTC-5, Mark Walkom wrote:
>
> Did you install ES via a rpm/deb or using the zip? I ask because your data 
> store directory is custom.subl
>
> Check out these plugins for monitoring - elastichq, kopf, bigdesk. They 
> will give you an overview of your cluster and might give you insight into 
> where your problem lies. The other best place to check is the ES logs.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 20 December 2013 08:52, Eric Luellen 
> > wrote:
>
>> I think I made my situation even worse. I tried deleting the shards and 
>> starting over and now elasticsearch isn't even creating the 
>> /etc/elasticsearch/data/my-cluster/node folder.
>>
>>
>> On Thursday, December 19, 2013 4:04:41 PM UTC-5, Eric Luellen wrote:
>>>
>>> Hello,
>>>
>>> Currently I have my syslog-ng --> logstash --> elasticsearch1 & 
>>> elastisearch2 setup working pretty good. It's accepting over 300 events per 
>>> second and hasn't bogged the systems down at all. However I'm running into 
>>> 2 issues that I don't quite understand. 
>>>
>>> 1. When viewing the information in Kibana, it appears to be anywhere 
>>> from 15 min to an hr behind on the "all events" view. Sometimes when I 
>>> search for new logs it shows up correctly but overall it seems like it's 
>>> lagging behind trying to keep up with what logstash is sending it. That 
>>> being said, I'm concerned that logs are being dropped and I don't know 
>>> about it. Are there any commands I can use to validate this type of 
>>> information or what I can do to make sure elasticsearch/KIbana is keeping 
>>> up?
>>>
>>> 2. I've had to restart elasticsearch a few times and every time I do, it 
>>> completely breaks things. Once it starts back up it doesn't continue to 
>>> show the logs in Kibana correctly and when I run a health check, it says 
>>> there are unassigned shards. I've not been able to fix this and in the past 
>>> I've always just had to delete them and start from scratch again.
>>>
>>> Any idea what is going on with this or how I can more cleanly restart or 
>>> reboot the servers and recover from it?
>>>
>>> Thanks,
>>> Eric
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/1f9a1c4a-94cf-49d7-a4d1-22ffb0b64727%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/384e753f-6e9a-49d4-9db9-c03af81d9ba3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Unassigned Shards

2013-12-19 Thread Eric Luellen

I think I made my situation even worse. I tried deleting the shards and 
starting over and now elasticsearch isn't even creating the 
/etc/elasticsearch/data/my-cluster/node folder.

On Thursday, December 19, 2013 4:04:41 PM UTC-5, Eric Luellen wrote:
>
> Hello,
>
> Currently I have my syslog-ng --> logstash --> elasticsearch1 & 
> elastisearch2 setup working pretty good. It's accepting over 300 events per 
> second and hasn't bogged the systems down at all. However I'm running into 
> 2 issues that I don't quite understand. 
>
> 1. When viewing the information in Kibana, it appears to be anywhere from 
> 15 min to an hr behind on the "all events" view. Sometimes when I search 
> for new logs it shows up correctly but overall it seems like it's lagging 
> behind trying to keep up with what logstash is sending it. That being said, 
> I'm concerned that logs are being dropped and I don't know about it. Are 
> there any commands I can use to validate this type of information or what I 
> can do to make sure elasticsearch/KIbana is keeping up?
>
> 2. I've had to restart elasticsearch a few times and every time I do, it 
> completely breaks things. Once it starts back up it doesn't continue to 
> show the logs in Kibana correctly and when I run a health check, it says 
> there are unassigned shards. I've not been able to fix this and in the past 
> I've always just had to delete them and start from scratch again.
>
> Any idea what is going on with this or how I can more cleanly restart or 
> reboot the servers and recover from it?
>
> Thanks,
> Eric
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1f9a1c4a-94cf-49d7-a4d1-22ffb0b64727%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Unassigned Shards

2013-12-19 Thread Eric Luellen

Hello,

Currently I have my syslog-ng --> logstash --> elasticsearch1 &
elastisearch2 setup working pretty good. It's accepting over 300 events per
second and hasn't bogged the systems down at all. However I'm running into
2 issues that I don't quite understand.

1. When viewing the information in Kibana, it appears to be anywhere from
15 min to an hr behind on the "all events" view. Sometimes when I search
for new logs it shows up correctly but overall it seems like it's lagging
behind trying to keep up with what logstash is sending it. That being said,
I'm concerned that logs are being dropped and I don't know about it. Are
there any commands I can use to validate this type of information or what I
can do to make sure elasticsearch/KIbana is keeping up?

2. I've had to restart elasticsearch a few times and every time I do, it
completely breaks things. Once it starts back up it doesn't continue to
show the logs in Kibana correctly and when I run a health check, it says
there are unassigned shards. I've not been able to fix this and in the past
I've always just had to delete them and start from scratch again.

Any idea what is going on with this or how I can more cleanly restart or
reboot the servers and recover from it?

Thanks,
Eric

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cab9c4e5-4e1a-4acd-b3ac-77fdc7ef6bef%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Help with Cluster

I ran that command and saw some fairly old files that were no longer there 
that it was trying to read. I believe Elasticsearch got behind on indexing 
the files and they were removed before it could finish. I'm not sure but 
that's just a guess. I have removed all of the files and started fresh. 
Currently everything is green across the board. I guess my issue now is how 
to ensure that doesn't happen again and how to make sure syslog-ng --> 
logstash --> elasticsearch doesn't drop any packets or get backed up. 
Thanks.

On Tuesday, December 17, 2013 3:38:49 PM UTC-5, David Pilato wrote:
>
> What gives the following?
>
> curl -XGET 'http://localhost:9200/_cluster/state?pretty'
>
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet <https://twitter.com/dadoonet> | 
> @elasticsearchfr<https://twitter.com/elasticsearchfr>
>
>
> Le 17 décembre 2013 at 20:34:43, Eric Luellen 
> (eric.l...@gmail.com) 
> a écrit:
>
> Hmmm. I'm not sure why my status is red then. The only thing I can see 
> from the cluster-health documentation page is that a specific shard is not 
> allocated in the cluster. When I look at my cluster health, I do  see this: 
>
>"unassigned_shards" : 60
>
> Guess I need to figure out why I have so many unassigned shards. I think I 
> am feeding too much data in elasticsearch at the moment. I've turned on the 
> logstash server shipping to elasticsearch and I'm still getting logs coming 
> in and it's been about 10 minutes. 
>
> As far as the logstash node goes, I have this config on the elasticsearch 
> portion.
>
>  output {
>   elasticsearch {
> embedded => "false"
> host => "192.168.0.20" cluster => "my-cluster"
>   }
> }
>  
> So there is no reason it should be there. However, as you said, I'm not 
> terribly worried about that now, but I am concerned about my red status.
>
>
> On Tuesday, December 17, 2013 2:07:29 PM UTC-5, David Pilato wrote: 
>>
>>  Yes you can rename it using 
>> http://logstash.net/docs/1.3.1/outputs/elasticsearch#node_name
>>  
>>  You have a real problem here as your cluster should not be red.
>>  But it should not be caused by the logstash node.
>>  
>> Did you set embedded to false (it's default on 1.3.1 but not sure about 
>> previous version)?
>>
>>  -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>  @dadoonet <https://twitter.com/dadoonet> | 
>> @elasticsearchfr<https://twitter.com/elasticsearchfr>
>>  
>>
>> Le 17 décembre 2013 at 19:45:18, Eric Luellen (eric.l...@gmail.com) a 
>> écrit:
>>
>>  Thanks for the information. I don't mind it being there, I would just 
>> confused of why it was there. If it stays there, will my cluster status 
>> continue to show red on the health? That was my main concern. Also, if it 
>> stays there, I wish I could rename it from the default Lupo it is to the 
>> name of the server so I can distinguish it better. 
>>
>>
>> On Tuesday, December 17, 2013 10:46:56 AM UTC-5, David Pilato wrote: 
>>>
>>>  I'd not worry of the non data node.
>>>  It's only a node which connect to the cluster to give a client to 
>>> logstash.
>>>  
>>>  If you really don't want it, then you can use 
>>> http://logstash.net/docs/1.3.1/outputs/elasticsearch_http 
>>>  
>>>  HTH
>>>
>>>  -- 
>>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>>  @dadoonet <https://twitter.com/dadoonet> | 
>>> @elasticsearchfr<https://twitter.com/elasticsearchfr>
>>>  
>>>
>>> Le 17 décembre 2013 at 16:32:33, Eric Luellen (eric.l...@gmail.com) a 
>>> écrit:
>>>
>>>  I am working on building out a small POC for Logstash and 
>>> Elasticsearch. To start, I have a 2 server setup. 
>>>
>>>  
>>>- Server 1 - logstash1 - running "java -jar 
>>>logstash-1.2.2-flatjar.jar agent -f indexer.conf" 
>>>- 
>>>   - This server is tailing logs from a syslog config file and then 
>>>   sending them to an ElasticSearch server. 
>>> - Server 2 - elasticsearch1 - running elasticsearch as a daemon 
>>>(CentOS box that i used an rpm instal - version - 0.90.3.) 
>>>- 
>>>   - This server is also running Kibana."java -jar 
>>>   /etc/logstash/logstash-1.2.2-flatjar.jar web" 
>>> 
>

Re: Help with Cluster

Thanks for the information. I don't mind it being there, I would just 
confused of why it was there. If it stays there, will my cluster status 
continue to show red on the health? That was my main concern. Also, if it 
stays there, I wish I could rename it from the default Lupo it is to the 
name of the server so I can distinguish it better.


On Tuesday, December 17, 2013 10:46:56 AM UTC-5, David Pilato wrote:
>
> I'd not worry of the non data node.
> It's only a node which connect to the cluster to give a client to logstash.
>
> If you really don't want it, then you can use 
> http://logstash.net/docs/1.3.1/outputs/elasticsearch_http 
>
> HTH
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet <https://twitter.com/dadoonet> | 
> @elasticsearchfr<https://twitter.com/elasticsearchfr>
>
>
> Le 17 décembre 2013 at 16:32:33, Eric Luellen 
> (eric.l...@gmail.com) 
> a écrit:
>
> I am working on building out a small POC for Logstash and Elasticsearch. 
> To start, I have a 2 server setup. 
>
>  
>- Server 1 - logstash1 - running "java -jar logstash-1.2.2-flatjar.jar 
>agent -f indexer.conf" 
>- 
>   - This server is tailing logs from a syslog config file and then 
>   sending them to an ElasticSearch server. 
> - Server 2 - elasticsearch1 - running elasticsearch as a daemon 
>(CentOS box that i used an rpm instal - version - 0.90.3.) 
>- 
>   - This server is also running Kibana."java -jar 
>   /etc/logstash/logstash-1.2.2-flatjar.jar web" 
> 
> Overall things seem to be working pretty well. I started to do some 
> general diagnostics on the elasticsearch server to see how the cluster was 
> doing, and I saw that it was red.
>  
>  [root@elasticsearch1 elasticsearch]# curl -XGET '
>> http://localhost:9200/_cluster/health?pretty=true'
>> {
>>   "cluster_name" : "my-cluster",
>>   "status" : "red",
>>   "timed_out" : false,
>>   "number_of_nodes" : 2,
>>   "number_of_data_nodes" : 1,
>>   "active_primary_shards" : 35,
>>   "active_shards" : 35,
>>   "relocating_shards" : 0,
>>   "initializing_shards" : 0,
>>   "unassigned_shards" : 55
>
>
> When I saw that it was red and that there were 2 nodes, I was confused as 
> there should only be 1 elasticsearch node. Upon digging further, I see this:
>
>  [root@elasticsearch1 elasticsearch]# curl 
>> localhost:9200/_nodes/process?pretty
>> {
>>   "ok" : true,
>>   "cluster_name" : "my-cluster",
>>   "nodes" : {
>> "ab8COl6pTj-kJSzrXZTE2w" : {
>>   "name" : "Lupo",
>>   "transport_address" : "inet[/192.168.0.10:9300]",
>>   "hostname" : "logstash1",
>>   "version" : "0.90.3",
>>   "attributes" : {
>> "client" : "true",
>> "data" : "false"
>>   },
>>   "process" : {
>> "refresh_interval" : 1000,
>> "id" : 4380,
>> "max_file_descriptors" : 3200
>>   }
>> },
>> "FMgeliZPRdQZwy-IZ9MUIp" : {
>>   "name" : "ElasticSearch Server1",
>>   "transport_address" : "inet[/192.168.0.20:9300]",
>>   "hostname" : "elasticsearch1",
>>   "version" : "0.90.3",
>>   "http_address" : "inet[/192.168.0.20:9200]",
>>   "attributes" : {
>> "master" : "true"
>>   },
>>   "process" : {
>> "refresh_interval" : 1000,
>> "id" : 15653,
>> "max_file_descriptors" : 65535
>>   }
>> }
>>   }
>
>  
> I am confused why server1, logstash1, is showing up in the elasticsearch 
> cluster. I'm only running logstash as an indexer and not the built in 
> elasticsearch feature. How do I get this server to stop showing up in my 
> cluster? When I look on the logstash1 server, I don't see any elasticsearch 
> data or indexes like I do on my elasticsearch1 servers. So I don't think 
> data is truly going to it, but I don't want it to show up. 
>
> Thanks,
> Eric
>
>  --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/79821bd7-3679-4fb9-b78f-8c4b292357c7%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b9275fb-8f59-4b59-b532-a153167e8ed1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Help with Cluster

Hmmm. I'm not sure why my status is red then. The only thing I can see from 
the cluster-health documentation page is that a specific shard is not 
allocated in the cluster. When I look at my cluster health, I do  see this:

  "unassigned_shards" : 60

Guess I need to figure out why I have so many unassigned shards. I think I 
am feeding too much data in elasticsearch at the moment. I've turned on the 
logstash server shipping to elasticsearch and I'm still getting logs coming 
in and it's been about 10 minutes. 

As far as the logstash node goes, I have this config on the elasticsearch 
portion.

output {
  elasticsearch {
embedded => "false"
host => "192.168.0.20" cluster => "my-cluster"
  }
}

So there is no reason it should be there. However, as you said, I'm not 
terribly worried about that now, but I am concerned about my red status.


On Tuesday, December 17, 2013 2:07:29 PM UTC-5, David Pilato wrote:
>
> Yes you can rename it using 
> http://logstash.net/docs/1.3.1/outputs/elasticsearch#node_name
>
> You have a real problem here as your cluster should not be red.
> But it should not be caused by the logstash node.
>
> Did you set embedded to false (it's default on 1.3.1 but not sure about 
> previous version)?
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet <https://twitter.com/dadoonet> | 
> @elasticsearchfr<https://twitter.com/elasticsearchfr>
>
>
> Le 17 décembre 2013 at 19:45:18, Eric Luellen 
> (eric.l...@gmail.com) 
> a écrit:
>
> Thanks for the information. I don't mind it being there, I would just 
> confused of why it was there. If it stays there, will my cluster status 
> continue to show red on the health? That was my main concern. Also, if it 
> stays there, I wish I could rename it from the default Lupo it is to the 
> name of the server so I can distinguish it better. 
>
>
> On Tuesday, December 17, 2013 10:46:56 AM UTC-5, David Pilato wrote: 
>>
>>  I'd not worry of the non data node.
>>  It's only a node which connect to the cluster to give a client to 
>> logstash.
>>  
>>  If you really don't want it, then you can use 
>> http://logstash.net/docs/1.3.1/outputs/elasticsearch_http 
>>  
>>  HTH
>>
>>  -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>  @dadoonet <https://twitter.com/dadoonet> | 
>> @elasticsearchfr<https://twitter.com/elasticsearchfr>
>>  
>>
>> Le 17 décembre 2013 at 16:32:33, Eric Luellen (eric.l...@gmail.com) a 
>> écrit:
>>
>>  I am working on building out a small POC for Logstash and 
>> Elasticsearch. To start, I have a 2 server setup. 
>>
>>  
>>- Server 1 - logstash1 - running "java -jar 
>>logstash-1.2.2-flatjar.jar agent -f indexer.conf" 
>>- 
>>   - This server is tailing logs from a syslog config file and then 
>>   sending them to an ElasticSearch server. 
>> - Server 2 - elasticsearch1 - running elasticsearch as a daemon 
>>(CentOS box that i used an rpm instal - version - 0.90.3.) 
>>- 
>>   - This server is also running Kibana."java -jar 
>>   /etc/logstash/logstash-1.2.2-flatjar.jar web" 
>> 
>> Overall things seem to be working pretty well. I started to do some 
>> general diagnostics on the elasticsearch server to see how the cluster was 
>> doing, and I saw that it was red.
>>  
>>  [root@elasticsearch1 elasticsearch]# curl -XGET '
>>> http://localhost:9200/_cluster/health?pretty=true'
>>> {
>>>   "cluster_name" : "my-cluster",
>>>   "status" : "red",
>>>   "timed_out" : false,
>>>   "number_of_nodes" : 2,
>>>   "number_of_data_nodes" : 1,
>>>   "active_primary_shards" : 35,
>>>   "active_shards" : 35,
>>>   "relocating_shards" : 0,
>>>   "initializing_shards" : 0,
>>>   "unassigned_shards" : 55
>>
>>
>> When I saw that it was red and that there were 2 nodes, I was confused as 
>> there should only be 1 elasticsearch node. Upon digging further, I see this:
>>
>>  [root@elasticsearch1 elasticsearch]# curl 
>>> localhost:9200/_nodes/process?pretty
>>> {
>>>   "ok" : true,
>>>   "cluster_name" : "my-cluster",
>>>   "nodes" : {
>>> "ab8COl6pTj-kJSzrXZTE2w" : {
>>>   "name" : &

Help with Cluster