date:20150424

Relationshilp search

2015-04-24 Thread ming ming

hi! i am a newbie to this forum and i've got a problem recently with 
relationship model search(Elasticsearch + Ruby on Rails). I have a document 
model that has_many geolocations, through georeferences. look like this 

*Document*
 has_many :geolocations, through: :georeferences
 has_many :georeferences, as: :georeferenceable, dependent: :destroy

*Georeference*
 belongs_to :georeferenceable, polymorphic: true
 belongs_to :geolocation

*Geolocation*
 has_many :georeferences
 has_many :documents, through: :georeferences

I also have a field that contain polygon data(coordinates) inside 
Geolocation. now i want to search the coordinate to return document. is it 
possible to do that ?

this is my mapping :

mapping _source: { excludes: ['attachment'] } do
indexes :title, analyzer: 'english', index_options: 'offsets'
indexes :attachment, type: 'attachment'
indexes :geolocations, type: 'geo_shape'
end
  end


this is my as_index_json

def as_indexed_json(options={})
as_json(
  only: 'title',
  include: {geolocations: {only: [:type, :coordinates]}},
  methods: [:attachment]
  ) 
  end

this is my search function

def search_lat_and_lon(radius, lat, lon)
  __elasticsearch__.search(
 {
  from: 0,
  size: 100,
  query:{
filtered: {
  query: { 
match_all: {} 
  }, 
  filter:{
geo_shape: {
  _cache: true,
  geolocations: {
shape: {
  type: "circle",
  coordinates: [lon,lat],
  radius: "#{radius}m"
}
  }
}
  }
}
  }
}
  )
end




thanks in advance ! Help everyone can help me 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fdeed2d4-6b8a-40d9-9589-1682210ab52e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Issue when MatchPhasePrefix and Sort

2015-04-24 Thread Mark Walkom

See
http://www.elastic.co/guide/en/elasticsearch/guide/master/_limiting_memory_usage.html#circuit-breaker

Basically ES is protecting you against potential OOM killers.

On 25 April 2015 at 01:27, TB  wrote:

> The field is not of 594 MB, could this be related to where the JVM does
> not have enough memory allocated?
> I have set  "mlock_all: true" in the config..
> I have not the ulimit or changes the ES_MIN and ES_MAX variables, could
> this be related to that?
>
>
> On Thursday, April 23, 2015 at 8:16:33 PM UTC-5, Jason Wee wrote:
>>
>> Hmmm why would a field that large? at 594mb?
>>
>> Jason
>>
>> On Fri, Apr 24, 2015 at 2:11 AM, TB  wrote:
>> > When executing a search with MatchPhasePrefix on a Propety which is a
>> large
>> > string.
>> > The search fails with error.
>> > "Data too large, data for [Field] would be larger than limit of
>> > [623326003/594.4mb]]
>> > ElasticsearchException[java.lang.OutOfMemoryError: Java heap space]
>> >
>> > need help
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to elasticsearc...@googlegroups.com.
>> > To view this discussion on the web visit
>> >
>> https://groups.google.com/d/msgid/elasticsearch/f8f49c1c-b09b-4576-8b19-e786ec7727b6%40googlegroups.com.
>>
>> > For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c08fb292-d4ec-4348-9173-77eab3f547bd%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_ogpDpNq5ANixeC67YTa39hGAdPpVEtZKZgXbsLFGAnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Marvel showing unresponsive nodes but active data

2015-04-24 Thread Tristan Hammond

Hi all.

So we have an Elasticsearch cluster up and running. 5 nodes (1 master, 1 
client and 3 data). At some point the Marvel dash began stating on all our 
nodes and indices that "no report has been received for more than 2m." I'm 
not sure when this seems to have started, but the cluster status is green, 
the shard and index count up top is correct, and the dash is showing 
correct document count, index rate and request rate on the indices. 

I've Googled around a bunch and haven't been able to find someone else 
having this issue. A rolling restart is something I'd considered, but since 
it's powering search on our site I'd like it to be a last resort.

Any insight would be much appreciated.

Cheers,
- Tristan -

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/63760db3-2913-4f18-9442-9599187cbea3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Issue when MatchPhasePrefix and Sort

2015-04-24 Thread TB

The field is not of 594 MB, could this be related to where the JVM does not 
have enough memory allocated?
I have set  "mlock_all: true" in the config..
I have not the ulimit or changes the ES_MIN and ES_MAX variables, could 
this be related to that?


On Thursday, April 23, 2015 at 8:16:33 PM UTC-5, Jason Wee wrote:
>
> Hmmm why would a field that large? at 594mb? 
>
> Jason 
>
> On Fri, Apr 24, 2015 at 2:11 AM, TB > 
> wrote: 
> > When executing a search with MatchPhasePrefix on a Propety which is a 
> large 
> > string. 
> > The search fails with error. 
> > "Data too large, data for [Field] would be larger than limit of 
> > [623326003/594.4mb]] 
> > ElasticsearchException[java.lang.OutOfMemoryError: Java heap space] 
> > 
> > need help 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to elasticsearc...@googlegroups.com . 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/f8f49c1c-b09b-4576-8b19-e786ec7727b6%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c08fb292-d4ec-4348-9173-77eab3f547bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

issue with multi_match queries for nested documents

2015-04-24 Thread Anatoly Petkevich

I need multilanguage text search for documents and decided to use types and 
multi_match queries for that as described in 
http://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html 

It works smoothly for flat documents, but I cannot figure out whether it is 
applicable to nested documents too.
Here is a test index "persons"
 

{
"settings" : {
"number_of_shards" : 2
},
"mappings" : {
"person_eng":{
"properties": {
"name" : {
"type" : "string", "analyzer" : "english"
},
"car":{
"properties": {
"make": {
"type": "string", "analyzer" : "english"
},
"model": {
"type": "string", "analyzer" : "english"
}
},
"type" : "nested"
}
}
},
   "person_de":{
"properties": {
"name" : {
"type" : "string", "analyzer" : "german"
},
"car":{
"properties": {
"make": {
"type": "string", "analyzer" : "german"
},
"model": {
"type": "string", "analyzer" : "german"
}
},
"type" : "nested"
}
}
}

}
}


After putting documents

curl -XPUT 'http://localhost:9200/persons/person_eng/1' -d '{
  "name" : "Bob Mahoney",
  "car" :
{
  "make" : "Vickers Armstrongs",
  "model" : "Matilda"
}
}'

curl -XPUT 'http://localhost:9200/persons/person_de/1' -d '{
  "name" : "Heinz Guderian",
  "car" : 
{
  "make" : "Porsche",
  "model" : "Tiger"
}
}'


the following query works as expected

{
  "query": {
"multi_match": {
  "query": "Heinz",
  "fields": [
"person_de.name",
"person_eng.name"
  ]
}
  }
}

but a nested query doesn't return any result

{
  "query": {
"nested": {
  "path": "car",
  "query": {
"multi_match": {
  "query": "Heinz",
  "fields": [
"person_de.car.make",
"person_eng.car.make"
  ]
}
  }
}
  }
}


What is wrong here?

If fully qualified field name (including type name) is not supported in 
multi_match query for nested documents, I quess it can be replaced by 
corresponding 
dis_max or boolean query.

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58601881-763c-4194-9f04-0a0735a3f822%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Get a fixed random sample from all documents

2015-04-24 Thread Sebastian Rickelt

Hi,

I want to fetch a fixed large number of documents randomly from 
Elasticsearch to compute some statistics (100,000 out of 10 M documents). 
The randomness has to be predictable so that I get the same documents with 
every request.

My problem is that scan and scroll is fast but as I understand the order is 
not predictable. On the other side I could use the 'random_score' function 
with a fixed seed in my query. That would fix the order problem but deep 
pagination is very slow. Has anyone done this before? Any ideas or pointers 
how to do this with Elasticsearch?

Any help appreciated.

Cheers,

Sebastian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e00e363a-5346-48bd-807c-4b221bed7c28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch crashed after start

2015-04-24 Thread Ann Yablunovskaya

[2015-04-24 03:38:04,501][INFO ][node ] [server] 
version[1.5.1], pid[29957], build[5e38401/2015-04-09T13:41:35Z]
[2015-04-24 03:38:04,502][INFO ][node ] [server] 
initializing ...
[2015-04-24 03:38:04,536][INFO ][plugins  ] [server] loaded 
[marvel, shield, license], sites [marvel]
[2015-04-24 03:38:04,562][ERROR][bootstrap] Exception
org.elasticsearch.ElasticsearchIllegalStateException: Failed to created 
node environment
at 
org.elasticsearch.node.internal.InternalNode.(InternalNode.java:162)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:213)
at 
org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.nio.file.AccessDeniedException: 
/opt/elasticsearch/data/server/cluster/nodes/1
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:383)
at java.nio.file.Files.createDirectory(Files.java:630)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:734)
at java.nio.file.Files.createDirectories(Files.java:720)
at 
org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:105)
at 
org.elasticsearch.node.internal.InternalNode.(InternalNode.java:160)
... 4 more

I fix this problem, sorry. The reason was in deleting elasticsearch user 
and missing file privileges. So, I changed file permissions and it works

пятница, 24 апреля 2015 г., 5:03:39 UTC+3 пользователь Ann Yablunovskaya 
написал:
>
> Hi!
>
> I don't understand, what happend.
> OS CentOS 7.1
> I have ES cluster with two servers.
> It have the same configuration.
>
> I tried to configure shield and marvel but my second ES instanse have 
> suddenly crashed.
>
>
> [root@server bin]# ./elasticsearch -v
> Version: 1.5.1, Build: 5e38401/2015-04-09T13:41:35Z, JVM: 1.7.0_79
>
> [root@server bin]# service elasticsearch-server start
> Redirecting to /bin/systemctl start  elasticsearch-server.service
>
> [root@server bin]# service elasticsearch-server status
> Redirecting to /bin/systemctl status  elasticsearch-server.service
> elasticsearch-server.service - Starts and stops a single elasticsearch 
> instance on this system
>Loaded: loaded (/usr/lib/systemd/system/elasticsearch-server.service; 
> enabled)
>Active: active (running) since Thu 2015-04-23 21:50:09 EDT; 5s ago
>  Docs: http://www.elasticsearch.org
>   Process: 15958 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -d 
> -p /var/run/elasticsearch/elasticsearch-server.pid 
> -Des.default.config=$CONF_FILE -Des.default.path.home=$ES_HOME 
> -Des.default.path.logs=$LOG_DIR -Des.default.path.data=$DATA_DIR 
> -Des.default.path.work=$WORK_DIR -Des.default.path.conf=$CONF_DIR 
> (code=exited, status=0/SUCCESS)
>  Main PID: 15964 (java)
>CGroup: /system.slice/elasticsearch-server.service
>└─15964 java -Xms30g -Xmx30g -Djava.awt.headless=true 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiating...
>
> Apr 23 21:50:08 server.local systemd[1]: PID file 
> /var/run/elasticsearch/elasticsearch-server.pid not readable (yet?) after 
> start.
> Apr 23 21:50:09 server.local systemd[1]: Started Starts and stops a single 
> elasticsearch instance on this system.
>
> [root@server bin]# telnet localhost 9200
> Trying ::1...
> telnet: connect to address ::1: Connection refused
> Trying 127.0.0.1...
> telnet: connect to address 127.0.0.1: Connection refused
>
> [root@server bin]# service elasticsearch-server status
> Redirecting to /bin/systemctl status  elasticsearch-server.service
> elasticsearch-server.service - Starts and stops a single elasticsearch 
> instance on this system
>Loaded: loaded (/usr/lib/systemd/system/elasticsearch-server.service; 
> enabled)
>Active: failed (Result: exit-code) since Thu 2015-04-23 21:50:15 EDT; 
> 16s ago
>  Docs: http://www.elasticsearch.org
>   Process: 15958 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -d 
> -p /var/run/elasticsearch/elasticsearch-server.pid 
> -Des.default.config=$CONF_FILE -Des.default.path.home=$ES_HOME 
> -Des.default.path.logs=$LOG_DIR -Des.default.path.data=$DATA_DIR 
> -Des.default.path.work=$WORK_DIR -Des.default.path.conf=$CONF_DIR 
> (code=exited, status=0/SUCCESS)
>  Main PID: 15964 (code=exited, status=3)
>
> Apr 23 21:50:08 server.local systemd[1]: PID file 
> /var/run/elasticsearch/elasticsearch-server.pid not readable (yet?) after 
> start.
> Apr 23 21:50:09 server.local.local systemd[1]: Started Starts and stops a 
> single elastics

Re: WordCloud in Elasticsearch

2015-04-24 Thread mark

>A visualization of the 'significant' words is all I'm after.

The main question then is "significant compared to what?".

Straight popularity counts (e.g. terms agg) will just tell you the term 
"the" is very popular.
To use significant_terms you need to provide a foreground set and a 
background set to compare for differences.
Examples therefore might be:
  *What idioms do presidents use? *:  All presidential speeches Vs normal 
English (e.g. a sample of English Wikipedia content)
  *What is different about Obama as a president?*  Obama speeches Vs all 
other presidential speeches
  *What is Obama talking about now?:*  Obama speeches 2015 Vs all prior 
Obama speeches.






On Friday, April 24, 2015 at 2:06:19 PM UTC+1, Jeff Fogarty wrote:
>
> Hi Alfredo,  My goal is to use the features in ES to create a wordcloud as 
> easy as possible.  The termvector or significant terms query seem to be the 
> most useful.  
>
> A visualization of the 'significant' words is all I'm after.
>
> On Thursday, April 23, 2015 at 10:26:14 AM UTC-5, Alfredo Serafini wrote:
>>
>> Hi Jeff
>>
>> IMHO a wordcloud visualization is simple to construct over facets, so if 
>> you have aggregations which counts how many documents you have for every 
>> term, this is probably the most simple way to construct it.
>> If you want to use the term vectors it's important to understand what you 
>> want to describe, in particular.
>>
>> What do you want to visualize? What do you expect emerging from data?
>>
>> Il giorno giovedì 23 aprile 2015 15:08:36 UTC+2, Jeff Fogarty ha scritto:
>>>
>>>
>>>
>>> I looking to create a wordcloud in Jupyter (IPython Notebook) using 
>>> either python or javascript.  I have a collection of Presidential speeches 
>>> from the millercenter.org loading into ES.  I'm able to execute a 
>>> termvector query which returns the below;
>>>
>>>term
>>>term_freq
>>>ttf
>>>doc_freq
>>>
>>> Is termvector the appropriate query for a wordcloud?  If so, which 
>>> numerical value should I use?
>>>
>>> Thanks for your help.
>>>
>>> Jeff
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8012a91b-6202-4385-9dcb-0860be24a72e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Web page feature request

2015-04-24 Thread Attila Nagy

Hi,

I use google to find elasticsearch related answers which tends to find the 
relevant pages from elastic.co's docs page, for example this one:
http://www.elastic.co/guide/en/elasticsearch/reference/0.90/query-dsl-nested-filter.html

Note that this is for 0.90, which is quite annoying, but every other 
versions pop up eventually for different questions.

Could you please put at least a "most recent version of this page" link to 
these pages, which points to the latest version, or set up a version 
switcher, which makes jumping to the given version easier?

Thanks,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f6e48a13-7ae2-4342-b123-944469b0250d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: WordCloud in Elasticsearch

2015-04-24 Thread Jeff Fogarty

Hi Alfredo,  My goal is to use the features in ES to create a wordcloud as 
easy as possible.  The termvector or significant terms query seem to be the 
most useful.  

A visualization of the 'significant' words is all I'm after.

On Thursday, April 23, 2015 at 10:26:14 AM UTC-5, Alfredo Serafini wrote:
>
> Hi Jeff
>
> IMHO a wordcloud visualization is simple to construct over facets, so if 
> you have aggregations which counts how many documents you have for every 
> term, this is probably the most simple way to construct it.
> If you want to use the term vectors it's important to understand what you 
> want to describe, in particular.
>
> What do you want to visualize? What do you expect emerging from data?
>
> Il giorno giovedì 23 aprile 2015 15:08:36 UTC+2, Jeff Fogarty ha scritto:
>>
>>
>>
>> I looking to create a wordcloud in Jupyter (IPython Notebook) using 
>> either python or javascript.  I have a collection of Presidential speeches 
>> from the millercenter.org loading into ES.  I'm able to execute a 
>> termvector query which returns the below;
>>
>>term
>>term_freq
>>ttf
>>doc_freq
>>
>> Is termvector the appropriate query for a wordcloud?  If so, which 
>> numerical value should I use?
>>
>> Thanks for your help.
>>
>> Jeff
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d2857e82-6d79-41b4-8d19-6e3f25ede0e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread Pccom Frank

Thank you! Success!

elasticsearch-plugin  --install
elasticsearch/elasticsearch-mapper-attachments/2.4.3
 -> Installing
elasticsearch/elasticsearch-mapper-attachments/2.4.3...
Trying
http://download.elasticsearch.org/elasticsearch/elasticsearch-mapper-attachments/elasticsearch-mapper-attachments-2.4.3.zip.
..
Downloading
...

Re: Apply word_delimiter token filter on words having 5 chars or more.

2015-04-24 Thread Ivan Brusic

Your best option would be to write your own filter. It should be easy since
you have access to the source of the delimiter and length filters. Look at
the existing filter plugins for examples on how to deploy.

Ivan
On Apr 24, 2015 10:39 AM, "Nassim"  wrote:

> Hi,
>
> Is it possibile to apply the word_delimiter token filter only on words
> having 5 chars or more ?
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e69d8c2b-1981-407f-b71d-9cc4fb69213d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDaO_or39PgyqATHAPZ71tRJbjLFuSwyY941uFvU%3DWyVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread Pccom Frank

 /usr/local/bin/elasticsearch-plugin --list
Installed plugins:
- mapper-attachments
- head


On Fri, Apr 24, 2015 at 8:47 AM, Pccom Frank  wrote:

> Thank you! Success!
>
> elasticsearch-plugin  --install
> elasticsearch/elasticsearch-mapper-attachments/2.4.3
>  -> Installing
> elasticsearch/elasticsearch-mapper-attachments/2.4.3...
> Trying
> http://download.elasticsearch.org/elasticsearch/elasticsearch-mapper-attachments/elasticsearch-mapper-attachments-2.4.3.zip.
> ..
> Downloading
> ...

Re: DynamoDB river plugin

2015-04-24 Thread Nilay Khandhar

Hi,

Somehow i solved to run the plugin in elastic search but now the issue is 
it is not fetching the value from my dynamoDB table i.e. Recipe.

i tried this


curl -XPUT 'localhost:9200/_river/dynamodb/_meta' -d '{
"type" : "dynamodb",
"dynamodb" : {
"access_key" : "XX",
"secret_key" : "XX",
"table_name" : "Recipe",
"id_field" : "recipeID"
}
}'


What it does it add the created river data in index but then my node 
stopped working. It is no more connectable.

Not sure what i could do. ANy suggestion? did you faced the same issue?

Waiting for your reply!

Thanks in Advance!!

Regards,
Nilay

On Monday, February 10, 2014 at 5:26:50 PM UTC+5:30, Kevin Wang wrote:
>
> Hi,
> I've created a river plugin for AWS DynamoDB. It can fetch data from 
> DynamoDB and index into Elasticsearch.
>
>
> https://github.com/kzwang/elasticsearch-river-dynamodb
>
>
>
> Thanks,
> Kevin
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7fb9de6a-2907-4686-b5ec-31e232dc4842%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread David Pilato

Yes. I told you. The plugin version is 2.4.3 not 2.43

/usr/local/bin/elasticsearch-plugin install 
elasticsearch/elasticsearch-mapper-attachments/2.4.3

-- 
David Pilato - Developer | Evangelist 
elastic.co
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 






> Le 24 avr. 2015 à 13:38, Pccom Frank  a écrit :
> 
> This is the FreeBSD elasticsearch-plugin command:
>  
> /usr/local/bin/elasticsearch-plugin
> Usage:
> -u, --url [plugin location]   : Set exact URL to download the plugin 
> from
> -i, --install [plugin name]   : Downloads and installs listed plugins 
> [*]
> -t, --timeout [duration]  : Timeout setting: 30s, 1m, 1h... 
> (infinite by default)
> -r, --remove  [plugin name]   : Removes listed plugins
> -l, --list: List installed plugins
> -v, --verbose : Prints verbose messages
> -s, --silent  : Run in silent mode
> -h, --help: Prints this help message
> 
>  [*] Plugin name could be:
>  elasticsearch/plugin/version for official elasticsearch plugins 
> (download from download.elasticsearch.org 
> )
>  groupId/artifactId/version   for community plugins (download from maven 
> central or oss sonatype)
>  username/repository  for site plugins (download from github 
> master)
> 
> 
> On Fri, Apr 24, 2015 at 7:24 AM, Pccom Frank  > wrote:
> I successfully install the header plugin in FreeBSD by its port 
> /usr/ports/textproc/elasticsearch-plugin-head
> In FreeBSD, it is installed by the command
> make install clean
> 
> Not
> bin/plugin
> 
> The problem is that there is no such a port for mapper attachments.
> 
> This is my original problem.
> 
> FreeBSD alternatively provide a command:
>  /usr/local/bin/elasticsearch-plugin
> But I do not know how to use it.
> This is my original question.
> 
> On Fri, Apr 24, 2015 at 7:18 AM, Pccom Frank  > wrote:
> You can find all versions here:
> https://github.com/elastic/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch
>  
> 
> 
> On Fri, Apr 24, 2015 at 7:16 AM, Pccom Frank  > wrote:
> https://github.com/elastic/elasticsearch-mapper-attachments/tree/v2.4.3#version-243-for-elasticsearch-14
>  
> 
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAJV_ms4ccaTLfjeZr5mhm5%3DOdXwCBoZLgzwoUMC1s9_-AGTUkw%40mail.gmail.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CE8D601D-594B-4490-8836-5DBE9B88D910%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread Eran

Wow, awsome. I'll try that, Thanks!

On Friday, April 24, 2015 at 2:17:45 PM UTC+3, 
christian...@elasticsearch.com wrote:
>
> Hi Eran,
>
> If you are assigning your own ID, Elasticsearch need to search and check 
> if the document already exists before writing it. This could explain why 
> the bulk insert performance goes down as the size of the index grows. If 
> you are not going to update the documents, I would therefore recommend 
> allowing Elasticsearch to assign the document ID automatically.
>
> Best regards,
>
> Christian
>
>
>
> On Friday, April 24, 2015 at 7:49:56 AM UTC+1, Eran wrote:
>>
>> Hello,
>>
>> I've created an index I use for logging.
>>
>> This means there are mostly writes, and some searches once in a while.
>> In the phase of the first loading, I'm using several clients to 
>> concurrently index documents using the bulk API.
>>
>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>
>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with 
>> an IO provisioned volume set to 7000 IOPS.
>>
>> Looking at the metrics, I see that the CPU and memory are fine, the write 
>> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>>
>> How come I'm only indexing, but most of the IOPS are read?
>>
>> I am attaching some screen captures from the BigDesk plugin, that show 
>> the two states of the index, ater about 20% of the graphs is the point in 
>> time where I stopped the clients, so you can see the load drop of.
>>
>> My settings are:
>>
>> threadpool.bulk.type: fixed
>> threadpool.bulk.size: 32 # availableProcessors
>> threadpool.bulk.queue_size: 1000
>>
>> # Indices settings
>> indices.memory.index_buffer_size: 50%
>>   
>> 
>>376,1 97%
>> indices.cache.filter.expire: 6h
>>
>> bootstrap.mlockall: true
>>
>>
>> and I've change the index settings to:
>>
>>
>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
>> I also tried "refresh_interval":"-1"
>>
>>
>> Please let me know what else I need to provide if needed (settings, logs, 
>> metrics)
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/84687c05-49a5-4e0a-9a4f-41e4136a120a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Geo Mapping from Twitter

2015-04-24 Thread Sree

It worked.

On Friday, 24 April 2015 09:04:10 UTC+5:30, Sree wrote:
>
> Hi all,
>
> "coordinates" : {
> "type" : "Point",
> "coordinates" : [
> 100.41404641,
> 5.37384675
> ]
> },
>
> This is the Geo coordinates from Twitter. I tried it with 
>
> "coordinates": {"properties": {
> "coordinates": 
> {"type": "geo_point",
> "lat_lon": true,
> "geohash": true},
> }}
>
> But its not working. Can you please tell me the right mapping for the 
> above.
>
> Thanks,
> Sreejith
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/596b5201-95a3-4493-ac45-7cbc1f47bd3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shuffle results by a property

2015-04-24 Thread Cassiano Tartari

Hello!

I have a marketplace and I would like to sort the search results mixing
products advertisers.

Can someone help me?

I'm making a filtered query like this:


"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
...
}
}
},
{
"match": {
...
}
}
],
"should": [
{
"match": {
...
}
},
{
"multi_match": {
"query": "",
"type": "best_fields",
"fields": [

],
"tie_breaker": 0.3,
"fuzziness": 1
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"nested": {
...
}
}
},
{
"not": {
"filter": {
"term": {
   ...
}
}
}
},
{
"not": {
"filter": {
"term": {
   ...
}
}
}
}
]
}
}
}
}

   The advertisers are mapped into product as objects.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAUnKK5XuWz9tAyUXV6WSPzgrzGRM8BGEYvNLCCZe3500Px%2BEg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread Pccom Frank

This is the FreeBSD elasticsearch-plugin command:

/usr/local/bin/elasticsearch-plugin
Usage:
-u, --url [plugin location]   : Set exact URL to download the
plugin from
-i, --install [plugin name]   : Downloads and installs listed
plugins [*]
-t, --timeout [duration]  : Timeout setting: 30s, 1m, 1h...
(infinite by default)
-r, --remove  [plugin name]   : Removes listed plugins
-l, --list: List installed plugins
-v, --verbose : Prints verbose messages
-s, --silent  : Run in silent mode
-h, --help: Prints this help message

 [*] Plugin name could be:
 elasticsearch/plugin/version for official elasticsearch plugins
(download from download.elasticsearch.org)
 groupId/artifactId/version   for community plugins (download from
maven central or oss sonatype)
 username/repository  for site plugins (download from github
master)


On Fri, Apr 24, 2015 at 7:24 AM, Pccom Frank  wrote:

> I successfully install the header plugin in FreeBSD by its port
> /usr/ports/textproc/elasticsearch-plugin-head
> In FreeBSD, it is installed by the command
> make install clean
>
> Not
> bin/plugin
>
> The problem is that there is no such a port for mapper attachments.
>
> This is my original problem.
>
> FreeBSD alternatively provide a command:
>  /usr/local/bin/elasticsearch-plugin
> But I do not know how to use it.
> This is my original question.
>
> On Fri, Apr 24, 2015 at 7:18 AM, Pccom Frank 
> wrote:
>
>> You can find all versions here:
>>
>> https://github.com/elastic/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch
>>
>> On Fri, Apr 24, 2015 at 7:16 AM, Pccom Frank 
>> wrote:
>>
>>>
> https://github.com/elastic/elasticsearch-mapper-attachments/tree/v2.4.3#version-243-for-elasticsearch-14
>

>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJV_ms4ccaTLfjeZr5mhm5%3DOdXwCBoZLgzwoUMC1s9_-AGTUkw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread Pccom Frank

I successfully install the header plugin in FreeBSD by its port
/usr/ports/textproc/elasticsearch-plugin-head
In FreeBSD, it is installed by the command
make install clean

Not
bin/plugin

The problem is that there is no such a port for mapper attachments.

This is my original problem.

FreeBSD alternatively provide a command:
 /usr/local/bin/elasticsearch-plugin
But I do not know how to use it.
This is my original question.

On Fri, Apr 24, 2015 at 7:18 AM, Pccom Frank  wrote:

> You can find all versions here:
>
> https://github.com/elastic/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch
>
> On Fri, Apr 24, 2015 at 7:16 AM, Pccom Frank 
> wrote:
>
>>
 https://github.com/elastic/elasticsearch-mapper-attachments/tree/v2.4.3#version-243-for-elasticsearch-14

>>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJV_ms6i4tL%2BRfnfXSF-W6ML7CCt37jDknuX58VgsJWFjK4qjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread Pccom Frank

You can find all versions here:
https://github.com/elastic/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch

On Fri, Apr 24, 2015 at 7:16 AM, Pccom Frank  wrote:

>
>>> https://github.com/elastic/elasticsearch-mapper-attachments/tree/v2.4.3#version-243-for-elasticsearch-14
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJV_ms7EQqmYrtu%3DHw8RLmjGKvFxDshr%2BtAFi0wHbi7edExE4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread christian . dahlqvist

Hi Eran,

If you are assigning your own ID, Elasticsearch need to search and check if 
the document already exists before writing it. This could explain why the 
bulk insert performance goes down as the size of the index grows. If you 
are not going to update the documents, I would therefore recommend allowing 
Elasticsearch to assign the document ID automatically.

Best regards,

Christian



On Friday, April 24, 2015 at 7:49:56 AM UTC+1, Eran wrote:
>
> Hello,
>
> I've created an index I use for logging.
>
> This means there are mostly writes, and some searches once in a while.
> In the phase of the first loading, I'm using several clients to 
> concurrently index documents using the bulk API.
>
> At first, indexing takes 200 ms for a bulk of 5000 documents.
> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>
> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with 
> an IO provisioned volume set to 7000 IOPS.
>
> Looking at the metrics, I see that the CPU and memory are fine, the write 
> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>
> How come I'm only indexing, but most of the IOPS are read?
>
> I am attaching some screen captures from the BigDesk plugin, that show the 
> two states of the index, ater about 20% of the graphs is the point in time 
> where I stopped the clients, so you can see the load drop of.
>
> My settings are:
>
> threadpool.bulk.type: fixed
> threadpool.bulk.size: 32 # availableProcessors
> threadpool.bulk.queue_size: 1000
>
> # Indices settings
> indices.memory.index_buffer_size: 50%
>   
> 
>376,1 97%
> indices.cache.filter.expire: 6h
>
> bootstrap.mlockall: true
>
>
> and I've change the index settings to:
>
>
> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
> I also tried "refresh_interval":"-1"
>
>
> Please let me know what else I need to provide if needed (settings, logs, 
> metrics)
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3ad37d7-a070-4065-aa85-6f38d4329502%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread Pccom Frank

>
>
>> https://github.com/elastic/elasticsearch-mapper-attachments/tree/v2.4.3#version-243-for-elasticsearch-14
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJV_ms6AFDHSp0OLqmsuDnF_b-o17GS0L28z8wOPfccSCrD7wQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread David Pilato

No. 2.43 does not exist.


-- 
David Pilato - Developer | Evangelist 
elastic.co
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 






> Le 24 avr. 2015 à 12:53, Pccom Frank  a écrit :
> 
> You can find mapper attachment plugin 2.43, which is for elasticseach version 
> 1.43, at https://github.com/elastic/elasticseach-mapper-attachments 
>  click the 2.43 
> link will bring you to 2.43 to download version 2.43
> 
> On Apr 24, 2015 12:44 AM, "David Pilato"  > wrote:
> Ok but this mapper attachment version does not exist.
> 
> Just per curiosity could you give a link which shows that?
>> The elasticseach version in FreeBSD is 1.43, match mapper attachment 2.43.
>> 
> 
> 
> 
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> 
> Le 24 avr. 2015 à 05:49, Pccom Frank  > a écrit :
> 
>> I am using Freebsd, there is no such thing as bin/plugin, only 
>> elasticseach-plugin functioning as bin/plugin, I guess. It won't follow the 
>> official doc. The elasticseach version in FreeBSD is 1.43, match mapper 
>> attachment 2.43.
>> 
>> On Apr 23, 2015 2:39 AM, "David Pilato" > > wrote:
>> The command you write is totally wrong. You set a url, you define the wrong 
>> version...
>> And it sounds you renamed plugin script but that's not an issue.
>> 
>> Whatever. Doc says:
>> 
>> bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.5.0
>> Try
>> 
>> bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.3
>> 
>> 
>> 
>> 
>> David
>> 
>> Le 23 avr. 2015 à 04:48, Pccom Frank > > a écrit :
>> 
>>> Hi,
>>> Please help to install plugins for elasticsearch on FreeBSD.
>>> 
>>> I tried different ways, I always fail. The following is one of them:
>>> 
>>> root@mail:/usr/local/lib/elasticsearch/plugins # elasticsearch-plugin --url 
>>> https://github.com/elastic/elasticsearch-mapper-attachments.git 
>>>  --install 
>>> elasticsearch/elasticsearch-mapper-attachments/2.43
>>> -> Installing elasticsearch/elasticsearch-mapper-attachments/2.43...
>>> Trying https://github.com/elastic/elasticsearch-mapper-attachments.git 
>>> ...
>>> Downloading 
>>> ..DONE
>>> failed to extract plugin 
>>> [/usr/local/lib/elasticsearch/plugins/mapper-attachments.zip]: 
>>> ZipException[error in opening zip file]
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearch+unsubscr...@googlegroups.com 
>>> .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/efee40ff-7cdb-4bd0-9abc-b76a4fcd4b4d%40googlegroups.com
>>>  
>>> .
>>> For more options, visit https://groups.google.com/d/optout 
>>> .
>> 
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/PtYl6Y_i2rU/unsubscribe 
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearch+unsubscr...@googlegroups.com 
>> .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8523380A-A246-49CE-A559-C1A532502FF9%40pilato.fr
>>  
>> .
>> For more options, visit https://groups.google.com/d/optout 
>> .
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com 
>> .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAJV_ms5uba-0AcE8xQ4Uw4q-g1h%2BDONjVOV235wjO-324WOAEQ%40mail.gmail.com
>>  
>> .
>> For more options, vi

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-24 Thread Pccom Frank

You can find mapper attachment plugin 2.43, which is for elasticseach
version 1.43, at https://github.com/elastic/elasticseach-mapper-attachments
click the 2.43 link will bring you to 2.43 to download version 2.43
On Apr 24, 2015 12:44 AM, "David Pilato"  wrote:

> Ok but this mapper attachment version does not exist.
>
> Just per curiosity could you give a link which shows that?
>
> The elasticseach version in FreeBSD is 1.43, match mapper attachment 2.43.
>
>
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 24 avr. 2015 à 05:49, Pccom Frank  a écrit :
>
> I am using Freebsd, there is no such thing as bin/plugin, only
> elasticseach-plugin functioning as bin/plugin, I guess. It won't follow the
> official doc. The elasticseach version in FreeBSD is 1.43, match mapper
> attachment 2.43.
>  On Apr 23, 2015 2:39 AM, "David Pilato"  wrote:
>
>> The command you write is totally wrong. You set a url, you define the
>> wrong version...
>> And it sounds you renamed plugin script but that's not an issue.
>>
>> Whatever. Doc says:
>>
>> bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.5.0
>>
>> Try
>>
>> bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.3
>>
>>
>>
>>
>>
>> David
>>
>> Le 23 avr. 2015 à 04:48, Pccom Frank  a écrit :
>>
>> Hi,
>> Please help to install plugins for elasticsearch on FreeBSD.
>>
>> I tried different ways, I always fail. The following is one of them:
>>
>> root@mail:/usr/local/lib/elasticsearch/plugins # elasticsearch-plugin
>> --url https://github.com/elastic/elasticsearch-mapper-attachments.git
>> --install elasticsearch/elasticsearch-mapper-attachments/2.43
>> -> Installing elasticsearch/elasticsearch-mapper-attachments/2.43...
>> Trying https://github.com/elastic/elasticsearch-mapper-attachments.git...
>> Downloading
>> ..DONE
>> failed to extract plugin
>> [/usr/local/lib/elasticsearch/plugins/mapper-attachments.zip]:
>> ZipException[error in opening zip file]
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/efee40ff-7cdb-4bd0-9abc-b76a4fcd4b4d%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/PtYl6Y_i2rU/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/8523380A-A246-49CE-A559-C1A532502FF9%40pilato.fr
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAJV_ms5uba-0AcE8xQ4Uw4q-g1h%2BDONjVOV235wjO-324WOAEQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/PtYl6Y_i2rU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/427F95F8-A65D-41A8-8B7C-3E66ABFC03B2%40pilato.fr
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread Eran

I'm using the newest version, 1.5.1
I'm assigning my own ID using path:

"_id": {
"path": "msg_id"
},


msg_id is a self generated, hashed identifier (it's actually somewhat like 
a cookie ID)

On Friday, April 24, 2015 at 1:47:39 PM UTC+3, 
christian...@elasticsearch.com wrote:
>
> Hi Eran,
>
> Which version of Elasticsearch are you using?
>
> Are you assigning your own document IDs or letting Elasticsearch assign 
> them automatically?
>
> Best regards,
>
> Christian
>
>
>
> On Friday, April 24, 2015 at 7:49:56 AM UTC+1, Eran wrote:
>>
>> Hello,
>>
>> I've created an index I use for logging.
>>
>> This means there are mostly writes, and some searches once in a while.
>> In the phase of the first loading, I'm using several clients to 
>> concurrently index documents using the bulk API.
>>
>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>
>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with 
>> an IO provisioned volume set to 7000 IOPS.
>>
>> Looking at the metrics, I see that the CPU and memory are fine, the write 
>> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>>
>> How come I'm only indexing, but most of the IOPS are read?
>>
>> I am attaching some screen captures from the BigDesk plugin, that show 
>> the two states of the index, ater about 20% of the graphs is the point in 
>> time where I stopped the clients, so you can see the load drop of.
>>
>> My settings are:
>>
>> threadpool.bulk.type: fixed
>> threadpool.bulk.size: 32 # availableProcessors
>> threadpool.bulk.queue_size: 1000
>>
>> # Indices settings
>> indices.memory.index_buffer_size: 50%
>>   
>> 
>>376,1 97%
>> indices.cache.filter.expire: 6h
>>
>> bootstrap.mlockall: true
>>
>>
>> and I've change the index settings to:
>>
>>
>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
>> I also tried "refresh_interval":"-1"
>>
>>
>> Please let me know what else I need to provide if needed (settings, logs, 
>> metrics)
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69c5c4f1-61b9-4dba-915a-93fba2b818e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread christian . dahlqvist

Hi Eran,

Which version of Elasticsearch are you using?

Are you assigning your own document IDs or letting Elasticsearch assign 
them automatically?

Best regards,

Christian



On Friday, April 24, 2015 at 7:49:56 AM UTC+1, Eran wrote:
>
> Hello,
>
> I've created an index I use for logging.
>
> This means there are mostly writes, and some searches once in a while.
> In the phase of the first loading, I'm using several clients to 
> concurrently index documents using the bulk API.
>
> At first, indexing takes 200 ms for a bulk of 5000 documents.
> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>
> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with 
> an IO provisioned volume set to 7000 IOPS.
>
> Looking at the metrics, I see that the CPU and memory are fine, the write 
> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>
> How come I'm only indexing, but most of the IOPS are read?
>
> I am attaching some screen captures from the BigDesk plugin, that show the 
> two states of the index, ater about 20% of the graphs is the point in time 
> where I stopped the clients, so you can see the load drop of.
>
> My settings are:
>
> threadpool.bulk.type: fixed
> threadpool.bulk.size: 32 # availableProcessors
> threadpool.bulk.queue_size: 1000
>
> # Indices settings
> indices.memory.index_buffer_size: 50%
>   
> 
>376,1 97%
> indices.cache.filter.expire: 6h
>
> bootstrap.mlockall: true
>
>
> and I've change the index settings to:
>
>
> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
> I also tried "refresh_interval":"-1"
>
>
> Please let me know what else I need to provide if needed (settings, logs, 
> metrics)
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1ee7a991-d6d5-4240-be92-e73db63cccf5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Snapshot is stuck in IN_PROGRESS

2015-04-24 Thread Pradeep Reddy

Joining new nodes with 1.5.x version then terminating old nodes solved the 
issue.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/468a4caf-b7ed-4b72-8d3e-11943b4a863a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread Eran

It is an issue, as I am hitting 7000 read operations per second (the limit 
of my volume's iops)

As the index grow larger the problem worsens, and as I was once able to 
update with a 10 clients concurrently, now I can barely use one client.

Also, I used an _optimize endpoint to have all segments synced, and even 
then, the read operations spike immediately on the first indexing operation 
(I'm using BigDesk to follow this). So I do not think it is a merge effect, 
as my intuition would be a merge happens every once in a while?
Maybe this is actually a result of me not using "doc values"? could that be 
it?

On Friday, April 24, 2015 at 12:28:50 PM UTC+3, David Pilato wrote:

> That’s normal. I was just answering that even if you think you are only 
> writing data while indexing, you are also reading data behind the scene to 
> merge Lucene segments.
> You can potentially try to play with index.translog.flush_threshold_size 
>
>
> http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html
>
> And increase the transaction log size?
>
> It might help reducing the number of segments generated but that said you 
> will always have READs operations.
>
> Actually, is it an issue for you? If not, keeping all defaults values 
> might be good.
>
> Best
>
>
> -- 
> *David Pilato* - Developer | Evangelist 
> *elastic.co *
> @dadoonet  | @elasticsearchfr 
>  | @scrutmydocs 
> 
>
>
>
>
>  
> Le 24 avr. 2015 à 10:45, Eran > a écrit :
>
> Hey David,
>
> I suspect it indeed might be the cause, but I'm kind of a newbie here. 
> What metric do I need to monitor, what would be a problematic value, and 
> basically, how can I play with merge settings to test if I can improve this?
> Some rules of thumbs for a newbie would be appreciated.
>
> I installed the plugin SegmentSpy, and here is a screenshot, if that helps.
>
> Eran
>
> On Friday, April 24, 2015 at 11:02:27 AM UTC+3, David Pilato wrote:
>>
>> Merging segments could be the cause here?
>>
>> David
>>
>> Le 24 avr. 2015 à 09:54, Eran  a écrit :
>>
>> Forgot some stats:
>>
>> I have 10 shards, no replicas, all on the same machine.
>> ATM, there are some 1.5 billion records in the index.
>>
>>
>> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>>>
>>> attachments hereby
>>>
>>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:

 Hello,

 I've created an index I use for logging.

 This means there are mostly writes, and some searches once in a while.
 In the phase of the first loading, I'm using several clients to 
 concurrently index documents using the bulk API.

 At first, indexing takes 200 ms for a bulk of 5000 documents.
 As time goes by, the indexing time increases, and gets to 1000-4500 ms.

 I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, 
 with an IO provisioned volume set to 7000 IOPS.

 Looking at the metrics, I see that the CPU and memory are fine, the 
 write IOPS are at 300, but the read IOPS have slowly gone up and got to 
 7000.

 How come I'm only indexing, but most of the IOPS are read?

 I am attaching some screen captures from the BigDesk plugin, that show 
 the two states of the index, ater about 20% of the graphs is the point in 
 time where I stopped the clients, so you can see the load drop of.

 My settings are:

 threadpool.bulk.type: fixed
 threadpool.bulk.size: 32 # availableProcessors
 threadpool.bulk.queue_size: 1000

 # Indices settings
 indices.memory.index_buffer_size: 50%

  376,1 97%
 indices.cache.filter.expire: 6h

 bootstrap.mlockall: true

 and I've change the index settings to:

 {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
 I also tried "refresh_interval":"-1"

 Please let me know what else I need to provide if needed (settings, 
 logs, metrics)

>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> -- 
> You received this message be

Re: BulkProcessor pest practices

2015-04-24 Thread David Pilato

If you make your bulk final, I think this could work:

private final BulkProcessor bulk;

CrmApp() {
Client esClient = new TransportClient(
ImmutableSettings.builder().put("cluster.name", "devoxx")
).addTransportAddress(
new InetSocketTransportAddress("127.0.0.1", 9300)
);

bulk = BulkProcessor.builder(esClient, new BulkProcessor.Listener() {
@Override
public void beforeBulk(long executionId, BulkRequest request) {
logger.debug("[{}] going to execute {} requests", executionId, 
request.numberOfActions());
}

@Override
public void afterBulk(long executionId, BulkRequest request, 
BulkResponse response) {
logger.debug("[{}] ok", executionId);
}

@Override
public void afterBulk(long executionId, BulkRequest request, Throwable 
failure) {
logger.warn("We have a problem", failure);
bulk.add(request);
}
})
.setBulkActions(pageSize)
.setFlushInterval(TimeValue.timeValueSeconds(5))
.build();
}



-- 
David Pilato - Developer | Evangelist 
elastic.co
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 






> Le 24 avr. 2015 à 10:59, mzrth_7810  a écrit :
> 
> Hey guys,
> 
> I'm using the BulkProcessor to index documents in elasticsearch. Its 
> definitely made my indexing throughput greater than it was before. 
> 
> Anyway, I was wondering if there were some best practices around exception 
> handling with the bulk processor. For example it would be good to schedule 
> retries in certain scenarios. 
> 
> At the moment all I'm doing is logging. I was wondering if someone could 
> point me to a resource with an example of handling a 
> NodeNotConnectedException and doing a retry. I don’t know how to access the 
> contents of the bulkProcessor from within the afterBulk method in the 
> Listener. 
> 
> 
> 
> public void beforeBulk(long executionId, BulkRequest bulkRequest) 
> {
> }
> 
> @Override
> public void afterBulk(long executionId, BulkRequest bulkRequest, 
> BulkResponse bulkResponse) {
> if (bulkResponse.hasFailures()) {
> Log.error("We have failures");
> for (BulkItemResponse bulkItemResponse : 
> bulkResponse.getItems()) {
> if (bulkItemResponse.isFailed()) {
> Log.error(bulkItemResponse.getId() + " failed 
> with message: " + bulkItemResponse.getFailureMessage());
> }
> }
> }
> }
> 
> @Override
> public void afterBulk(long executionId, BulkRequest bulkRequest, 
> Throwable t) {
> Log.error("An exception occurred while indexing", t);
> 
>  // How do I add this back to the list of requests?
> 
> }
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/7e7d476d-6a8d-4a82-bbd0-e331a08d1bb4%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4ABE0924-A01E-421F-A5A6-D1A34A0EC236%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread David Pilato

That’s normal. I was just answering that even if you think you are only writing 
data while indexing, you are also reading data behind the scene to merge Lucene 
segments.
You can potentially try to play with index.translog.flush_threshold_size 

http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html
 


And increase the transaction log size?

It might help reducing the number of segments generated but that said you will 
always have READs operations.

Actually, is it an issue for you? If not, keeping all defaults values might be 
good.

Best


-- 
David Pilato - Developer | Evangelist 
elastic.co
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 






> Le 24 avr. 2015 à 10:45, Eran  a écrit :
> 
> Hey David,
> 
> I suspect it indeed might be the cause, but I'm kind of a newbie here. 
> What metric do I need to monitor, what would be a problematic value, and 
> basically, how can I play with merge settings to test if I can improve this?
> Some rules of thumbs for a newbie would be appreciated.
> 
> I installed the plugin SegmentSpy, and here is a screenshot, if that helps.
> 
> Eran
> 
> On Friday, April 24, 2015 at 11:02:27 AM UTC+3, David Pilato wrote:
> Merging segments could be the cause here?
> 
> David
> 
> Le 24 avr. 2015 à 09:54, Eran > a écrit :
> 
>> Forgot some stats:
>> 
>> I have 10 shards, no replicas, all on the same machine.
>> ATM, there are some 1.5 billion records in the index.
>> 
>> 
>> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>> attachments hereby
>> 
>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:
>> Hello,
>> 
>> I've created an index I use for logging.
>> 
>> This means there are mostly writes, and some searches once in a while.
>> In the phase of the first loading, I'm using several clients to concurrently 
>> index documents using the bulk API.
>> 
>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>> 
>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with an 
>> IO provisioned volume set to 7000 IOPS.
>> 
>> Looking at the metrics, I see that the CPU and memory are fine, the write 
>> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>> 
>> How come I'm only indexing, but most of the IOPS are read?
>> 
>> I am attaching some screen captures from the BigDesk plugin, that show the 
>> two states of the index, ater about 20% of the graphs is the point in time 
>> where I stopped the clients, so you can see the load drop of.
>> 
>> My settings are:
>> 
>> threadpool.bulk.type: fixed
>> threadpool.bulk.size: 32 # availableProcessors
>> threadpool.bulk.queue_size: 1000
>> 
>> # Indices settings
>> indices.memory.index_buffer_size: 50%
>>  
>>  
>>376,1 97%
>> indices.cache.filter.expire: 6h
>> 
>> bootstrap.mlockall: true
>> 
>> 
>> and I've change the index settings to:
>> 
>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
>> I also tried "refresh_interval":"-1"
>> 
>> 
>> Please let me know what else I need to provide if needed (settings, logs, 
>> metrics)
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com
>>  
>> .
>> For more options, visit https://groups.google.com/d/optout 
>> .
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .
> 

-- 
You received this message because you are subscribed to the Google

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread Jason Wee

merging graph you shared, looks normal to me.

we had es with 10 shards too, and i monitor the segment using
segmentspy, the segment graph in your attachment shown pretty same
with ours.

jason

On Fri, Apr 24, 2015 at 4:45 PM, Eran  wrote:
> Hey David,
>
> I suspect it indeed might be the cause, but I'm kind of a newbie here.
> What metric do I need to monitor, what would be a problematic value, and
> basically, how can I play with merge settings to test if I can improve this?
> Some rules of thumbs for a newbie would be appreciated.
>
> I installed the plugin SegmentSpy, and here is a screenshot, if that helps.
>
> Eran
>
> On Friday, April 24, 2015 at 11:02:27 AM UTC+3, David Pilato wrote:
>>
>> Merging segments could be the cause here?
>>
>> David
>>
>> Le 24 avr. 2015 à 09:54, Eran  a écrit :
>>
>> Forgot some stats:
>>
>> I have 10 shards, no replicas, all on the same machine.
>> ATM, there are some 1.5 billion records in the index.
>>
>>
>> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>>>
>>> attachments hereby
>>>
>>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:

 Hello,

 I've created an index I use for logging.

 This means there are mostly writes, and some searches once in a while.
 In the phase of the first loading, I'm using several clients to
 concurrently index documents using the bulk API.

 At first, indexing takes 200 ms for a bulk of 5000 documents.
 As time goes by, the indexing time increases, and gets to 1000-4500 ms.

 I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory,
 with an IO provisioned volume set to 7000 IOPS.

 Looking at the metrics, I see that the CPU and memory are fine, the
 write IOPS are at 300, but the read IOPS have slowly gone up and got to
 7000.

 How come I'm only indexing, but most of the IOPS are read?

 I am attaching some screen captures from the BigDesk plugin, that show
 the two states of the index, ater about 20% of the graphs is the point in
 time where I stopped the clients, so you can see the load drop of.

 My settings are:

 threadpool.bulk.type: fixed
 threadpool.bulk.size: 32 # availableProcessors
 threadpool.bulk.queue_size: 1000

 # Indices settings
 indices.memory.index_buffer_size: 50%

 376,1 97%
 indices.cache.filter.expire: 6h

 bootstrap.mlockall: true


 and I've change the index settings to:


 {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
 I also tried "refresh_interval":"-1"


 Please let me know what else I need to provide if needed (settings,
 logs, metrics)

>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHO4itxe6Kv7wLfNxzvC-xa73sN4UUUj0Zgqcuk4Bogix-nkjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

BulkProcessor pest practices

2015-04-24 Thread mzrth_7810

Hey guys,

I'm using the BulkProcessor to index documents in elasticsearch. Its 
definitely made my indexing throughput greater than it was before. 

Anyway, I was wondering if there were some best practices around exception 
handling with the bulk processor. For example it would be good to schedule 
retries in certain scenarios. 

At the moment all I'm doing is logging. I was wondering if someone could 
point me to a resource with an example of handling a 
NodeNotConnectedException and doing a retry. I don’t know how to access the 
contents of the bulkProcessor from within the afterBulk method in the 
Listener. 



public void beforeBulk(long executionId, BulkRequest 
bulkRequest) {
}

@Override
public void afterBulk(long executionId, BulkRequest 
bulkRequest, BulkResponse bulkResponse) {
if (bulkResponse.hasFailures()) {
Log.error("We have failures");
for (BulkItemResponse bulkItemResponse : 
bulkResponse.getItems()) {
if (bulkItemResponse.isFailed()) {
Log.error(bulkItemResponse.getId() + " failed 
with message: " + bulkItemResponse.getFailureMessage());
}
}
}
}

@Override
public void afterBulk(long executionId, BulkRequest 
bulkRequest, Throwable t) {
Log.error("An exception occurred while indexing", t);

 // How do I add this back to the list of requests?

}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7e7d476d-6a8d-4a82-bbd0-e331a08d1bb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread Eran

Hey David,

I suspect it indeed might be the cause, but I'm kind of a newbie here. 
What metric do I need to monitor, what would be a problematic value, and 
basically, how can I play with merge settings to test if I can improve this?
Some rules of thumbs for a newbie would be appreciated.

I installed the plugin SegmentSpy, and here is a screenshot, if that helps.

Eran

On Friday, April 24, 2015 at 11:02:27 AM UTC+3, David Pilato wrote:
>
> Merging segments could be the cause here?
>
> David
>
> Le 24 avr. 2015 à 09:54, Eran > a écrit :
>
> Forgot some stats:
>
> I have 10 shards, no replicas, all on the same machine.
> ATM, there are some 1.5 billion records in the index.
>
>
> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>>
>> attachments hereby
>>
>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:
>>>
>>> Hello,
>>>
>>> I've created an index I use for logging.
>>>
>>> This means there are mostly writes, and some searches once in a while.
>>> In the phase of the first loading, I'm using several clients to 
>>> concurrently index documents using the bulk API.
>>>
>>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>>
>>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, 
>>> with an IO provisioned volume set to 7000 IOPS.
>>>
>>> Looking at the metrics, I see that the CPU and memory are fine, the 
>>> write IOPS are at 300, but the read IOPS have slowly gone up and got to 
>>> 7000.
>>>
>>> How come I'm only indexing, but most of the IOPS are read?
>>>
>>> I am attaching some screen captures from the BigDesk plugin, that show 
>>> the two states of the index, ater about 20% of the graphs is the point in 
>>> time where I stopped the clients, so you can see the load drop of.
>>>
>>> My settings are:
>>>
>>> threadpool.bulk.type: fixed
>>> threadpool.bulk.size: 32 # availableProcessors
>>> threadpool.bulk.queue_size: 1000
>>>
>>> # Indices settings
>>> indices.memory.index_buffer_size: 50%
>>> 
>>> 
>>>  376,1 97%
>>> indices.cache.filter.expire: 6h
>>>
>>> bootstrap.mlockall: true
>>>
>>>
>>> and I've change the index settings to:
>>>
>>>
>>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
>>> I also tried "refresh_interval":"-1"
>>>
>>>
>>> Please let me know what else I need to provide if needed (settings, 
>>> logs, metrics)
>>>
>>>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Apply word_delimiter token filter on words having 5 chars or more.

2015-04-24 Thread Nassim

Hi,

Is it possibile to apply the word_delimiter token filter only on words 
having 5 chars or more ? 

Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e69d8c2b-1981-407f-b71d-9cc4fb69213d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread David Pilato

Merging segments could be the cause here?

David

> Le 24 avr. 2015 à 09:54, Eran  a écrit :
> 
> Forgot some stats:
> 
> I have 10 shards, no replicas, all on the same machine.
> ATM, there are some 1.5 billion records in the index.
> 
> 
>> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>> attachments hereby
>> 
>>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:
>>> Hello,
>>> 
>>> I've created an index I use for logging.
>>> 
>>> This means there are mostly writes, and some searches once in a while.
>>> In the phase of the first loading, I'm using several clients to 
>>> concurrently index documents using the bulk API.
>>> 
>>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>> 
>>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with 
>>> an IO provisioned volume set to 7000 IOPS.
>>> 
>>> Looking at the metrics, I see that the CPU and memory are fine, the write 
>>> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>>> 
>>> How come I'm only indexing, but most of the IOPS are read?
>>> 
>>> I am attaching some screen captures from the BigDesk plugin, that show the 
>>> two states of the index, ater about 20% of the graphs is the point in time 
>>> where I stopped the clients, so you can see the load drop of.
>>> 
>>> My settings are:
>>> 
>>> threadpool.bulk.type: fixed
>>> threadpool.bulk.size: 32 # availableProcessors
>>> threadpool.bulk.queue_size: 1000
>>> 
>>> # Indices settings
>>> indices.memory.index_buffer_size: 50%
>>> 
>>> 
>>>  376,1 97%
>>> indices.cache.filter.expire: 6h
>>> 
>>> bootstrap.mlockall: true
>>> 
>>> 
>>> and I've change the index settings to:
>>> 
>>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
>>> I also tried "refresh_interval":"-1"
>>> 
>>> 
>>> Please let me know what else I need to provide if needed (settings, logs, 
>>> metrics)
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/590AAAE0-75D2-45D2-B105-86DF6521%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

2015-04-24 Thread Eran

Forgot some stats:

I have 10 shards, no replicas, all on the same machine.
ATM, there are some 1.5 billion records in the index.


On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>
> attachments hereby
>
> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:
>>
>> Hello,
>>
>> I've created an index I use for logging.
>>
>> This means there are mostly writes, and some searches once in a while.
>> In the phase of the first loading, I'm using several clients to 
>> concurrently index documents using the bulk API.
>>
>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>
>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with 
>> an IO provisioned volume set to 7000 IOPS.
>>
>> Looking at the metrics, I see that the CPU and memory are fine, the write 
>> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>>
>> How come I'm only indexing, but most of the IOPS are read?
>>
>> I am attaching some screen captures from the BigDesk plugin, that show 
>> the two states of the index, ater about 20% of the graphs is the point in 
>> time where I stopped the clients, so you can see the load drop of.
>>
>> My settings are:
>>
>> threadpool.bulk.type: fixed
>> threadpool.bulk.size: 32 # availableProcessors
>> threadpool.bulk.queue_size: 1000
>>
>> # Indices settings
>> indices.memory.index_buffer_size: 50%
>>   
>> 
>>376,1 97%
>> indices.cache.filter.expire: 6h
>>
>> bootstrap.mlockall: true
>>
>>
>> and I've change the index settings to:
>>
>>
>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"5"}}}
>> I also tried "refresh_interval":"-1"
>>
>>
>> Please let me know what else I need to provide if needed (settings, logs, 
>> metrics)
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

37 matches

Mail list logo