Need Clarification on Shards Replication

2014-01-01 Thread Anantha Govindarajan
I have one es master and data-node and indexing documents to that (1 shard 
+ 1 Replica), after indexing few documents (say 1 million and still 
indexing docs), adding one more data node to the cluster , now the shards 
started replicating to new node. How this replication happens ?  In the 
mean i am still indexing new documents to that index.

   1. Whether datanode1 will send index segments to datanode2 ? 
   2. Whether datanode1 will send documents one by one (as IndexRequests) 
   to datanode2 instead of copying segments ?
   3. Whether datanode1 will send whole index to datanode2 ?


How will *indices.store.throttle.type: merge 
& indices.store.throttle.max_bytes_per_sec: 50mb* these settings react with 
respect to the above test scenario ?



Anantha Govindarajan.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/326cfecc-b59c-4e4c-b5e9-e369e841a02e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


elastic and language stem (polish)

2014-01-01 Thread Rafath Khan
 Hello everybody

I'm struggling with polish stem, I've indexed my documents with polish 
stemm 

@@elastic.index index: index, type: type, id: data[:id],
body: {
settings: {index: {
analysis: {analyzer: {default: {type: 'snowball', 
language: 'Polish'}}},
filter: {my_stemmer: {type: 'stemmer', name: 
'polish'}}
}}, 
type_id: data[:type_id], descr: data[:descr].strip,
   search: "#{data[:type_id]} #{data[:descr]} 
#{data[:descr].to_ascii}"}


and now I don't know how to use polish analyzer to make query, can anybody 
provide an example? 

I've tried this example: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html
but I dont understand which index I should use?

Im using ruby elasticsearch_api gem like this:

@@elastic.search index: index, type: type, body: {query: {match: {search:query
}}}

so where this: 

index :
  analysis :
analyzer :
  standard :
alias: [alias1, alias2]
type : standard
stopwords : [test1, test2, test3]


should I put?

I'm using this stemmer: 
https://github.com/elasticsearch/elasticsearch-analysis-stempel

Thanks for reply, 
best regards and happy new year! :)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c5201c97-d06b-4780-a7a5-b82fa0611cdb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: OR query

2014-01-01 Thread avinash paul
Thank you Ivan will definitely try it out.

-paul


On Tue, Dec 31, 2013 at 10:51 PM, Ivan Brusic  wrote:

> You are better of using a proper boolean filter for better performance.
> Queries cannot be cached and query string query analyzes the terms. Here is
> an example of your filter with a nested bool (should) filter:
>
> "filter": {
>   "and": {
> "filters": [
>   {
> "bool": {
>   "must": [
> {
>   "bool": {
> "should": [
>   {
> "term": {
>   "state": "MA"
> }
>   },
>   {
> "term": {
>   "state": "NY"
> }
>   }
> ]
>   }
> },
> {
>   "range": {
> "costOutofstateTution": {
>   "gte": 0,
>   "lte": 3
> }
>   }
> }
>   ]
> }
>   }
> ]
>   }
> }
>
> Cheers,
>
> Ivan
>
>
> On Mon, Dec 30, 2013 at 10:03 PM, paul  wrote:
>
>> I got the query wotking by using
>>
>> {
>>   "query_string": {
>>  "default_field": "state",
>>  "query": "MA NY"
>>   }
>>   }
>>
>> - Paul
>>
>> On Tuesday, 31 December 2013 11:07:06 UTC+5:30, paul wrote:
>>>
>>> My query is as below ,  which gives me all the colleges with state code
>>> "MA" i want all the colleges that are in "MA" or "NY" how to add OR filter
>>>
>>> {
>>>   "query": {
>>> "filtered": {
>>>   "query": {
>>> "nested": {
>>>   "path": "programs",
>>>   "query": {
>>> "bool": {
>>>   "must": [
>>> {
>>>   "match": {
>>> "programs.progName": "Computer and Information
>>> Sciences"
>>>   }
>>> },
>>> {
>>>   "range": {
>>> "programs.Bachelor": {
>>>   "gt": 0
>>> }
>>>   }
>>> }
>>>   ]
>>> }
>>>   }
>>> }
>>>   },
>>>   "filter": {
>>> "and": {
>>>   "filters": [
>>> {
>>>   "bool": {
>>> "must": [
>>>   {
>>> "term": {
>>>   "state": "MA"
>>> }
>>>   },
>>>   {
>>> "range": {
>>>   "costOutofstateTution": {
>>> "gte": 0,
>>> "lte": 3
>>>   }
>>> }
>>>   }
>>> ]
>>>   }
>>> }
>>>   ]
>>> }
>>>   }
>>> }
>>>   }
>>> }
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/d23102f3-3180-4cdc-9d51-8ca960c7bcd0%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/rd6Lh_U0lzI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC5FF-F%3DLJzpsVUvcq1n%2B%2B_9DFcKgRFJ0r%3Dv3SS7jX_tQ%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO066G0Ng7dhg3U8L%3Dc49%2BDkM_xWP5feXNYN%3Dfa6Nx55oqSn%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Deb repos are offline?

2014-01-01 Thread David Pilato
See this: 
https://groups.google.com/forum/?nomobile=true#!original/elasticsearch/5CFRD-DLaT0/mShAN8rJFOAJ

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 2 janv. 2014 à 01:21, Stas Oskin  a écrit :

Hi,

It seems the logstash deb repos are currently offline, and return access denied?

root@ip-10-0-0-59:/# apt-get install logstash
Reading package lists... Done
Building dependency tree   
Reading state information... Done
The following extra packages will be installed:
  default-jre-headless
Suggested packages:
  default-jre
The following NEW packages will be installed:
  default-jre-headless logstash
0 upgraded, 2 newly installed, 0 to remove and 48 not upgraded.
Need to get 76.8 MB/76.8 MB of archives.
After this operation, 82.8 MB of additional disk space will be used.
Do you want to continue [Y/n]? y
WARNING: The following packages cannot be authenticated!
  logstash
Install these packages without verification [y/N]? y
Err http://packages.elasticsearch.org/logstash/1.3/debian/ stable/main logstash 
all 1.3.2-1+debian
  403  Forbidden
Failed to fetch 
http://packages.elasticsearch.org/logstash/1.3/debian/pool/main/l/logstash/logstash_1.3.2-1debian_all.deb
  403  Forbidden
E: Unable to fetch some archives, maybe run apt-get update or try with 
--fix-missing?

Any ideas?

Thanks!
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2233e3c7-0ab3-4595-b232-8b0afc5ffe25%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/F566096A-9794-4A67-BB63-FF28D3FE7C6A%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread project2501
Thanks so much Jörg. I appreciate the tip and will try it out.


On Wednesday, January 1, 2014 4:16:50 PM UTC-5, Jörg Prante wrote:
>
> You requested the fields "id" and "ratings", but you did not declare them 
> in the mapping.
>
> Because of this, ES is extracting them from source, it means ES loads all 
> the 10MB sized docs in an extra step to extract the fields from the 
> _source. This process surely takes several seconds for the result set you 
> showed above.
>
> To improve this, declare "id" and "ratings" as fields in the mapping with 
> attribute "store" set to yes. And do not forget to disable _source and 
> _all, if you only want to search on fields "id" and "ratings". This will 
> save a lot of resources in the index.
>
> Hint: in the results, the field _id is already delivered. No need to 
> double this information in another field "id".
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2cd5d20b-998c-41c8-84cf-fe34bbbc8303%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Deb repos are offline?

2014-01-01 Thread Stas Oskin
Hi,

It seems the logstash deb repos are currently offline, and return access 
denied?

root@ip-10-0-0-59:/# apt-get install logstash
Reading package lists... Done
Building dependency tree   
Reading state information... Done
The following extra packages will be installed:
  default-jre-headless
Suggested packages:
  default-jre
The following NEW packages will be installed:
  default-jre-headless logstash
0 upgraded, 2 newly installed, 0 to remove and 48 not upgraded.
Need to get 76.8 MB/76.8 MB of archives.
After this operation, 82.8 MB of additional disk space will be used.
Do you want to continue [Y/n]? y
WARNING: The following packages cannot be authenticated!
  logstash
Install these packages without verification [y/N]? y
Err http://packages.elasticsearch.org/logstash/1.3/debian/ stable/main logstash 
all 1.3.2-1+debian
  403  Forbidden
Failed to fetch 
http://packages.elasticsearch.org/logstash/1.3/debian/pool/main/l/logstash/logstash_1.3.2-1debian_all.deb
  403  Forbidden
E: Unable to fetch some archives, maybe run apt-get update or try with 
--fix-missing?


Any ideas?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2233e3c7-0ab3-4595-b232-8b0afc5ffe25%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Bulk throughput issues

2014-01-01 Thread tdjb
Hmm, ok, thank you for that info Jörg. I had previously been using one 
client with 64 concurrent requests as the hardware we are running on has 32 
cores. It sounds like I might need to try bumping that number up to see 
what happens.

On Wednesday, January 1, 2014 5:27:40 AM UTC-7, Jörg Prante wrote:
>
> There is no need for more than one client instance per JVM. You can 
> increase the bulk request concurrency in the BulkProcessor with 
> "setConcurrentRequests" to avoid blocking threads, until you reach the 
> sweet spot where client submitting resources matches the indexing capacity 
> of the cluster. 
>
> This is a matter of dynamic balance, which is different from setup to 
> setup. The default request concurrency is 1. For a higher value, you have 
> to prepare enough heap resources and maybe run your doc construction in 
> multiple threads to exploit the advantages.
>
> As a rule of thumb, use 4 * available cores for the concurrency, and 
> ~1-10MB for the bulk size.
>
> For example, I often operate with a bulk size of 1000 docs and a 
> concurrency level of 32.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8024abb9-e9a8-4e71-9321-9fcb0692c50c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Retrieve associated fields in a highlight query

2014-01-01 Thread Chris Lees
I'm trying to get a highlight query working on an indexed JSON doc (not an 
attachment!).

Here's an example doc:

{
...
"file": {
"blocks": [
{
"name": "1st Block",
"fields": [
{
"id": "A",
"name": "FirstField",
"type": "String",
"desc": "This is my first field"
},
...
]
},
{
"name": "2nd Block",
"fields": [
{
"id": "C",
"name": "SecondField",
"type": "String",
"desc": "This is my second field"
},
...
]
}
]
}
}

I want to run a query which will search on the id, name or desc fields 
within a blocks.fields entry, but which returns enough information for the 
user to see the block.name too. Here's my current query which just 
highlights the matched field correctly:

{
  "query": {
"query_string": {
  "query": "SecondField"
}
  },
  "highlight": {
"fields": {
  "file.blocks.fields.id": {},
  "file.blocks.fields.name": {},
  "file.blocks.fields.desc": {}
}
  }
}

This returns the document and a highlight section as below:

highlight: {
"file.blocks.fields.name" : [
"SecondField"
]
}

What I'd really like is to get the return something which also returns:

"file.blocks.name": "2nd Block",
"file.blocks.fields.id": "C",
"file.blocks.fields.desc": "This is my second field"

Is this possible (without doing extra processing in my application)? Or is 
a highlight query the wrong thing to use here?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/713cc3ea-006e-410e-88ee-3564435041e9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread joergpra...@gmail.com
You requested the fields "id" and "ratings", but you did not declare them
in the mapping.

Because of this, ES is extracting them from source, it means ES loads all
the 10MB sized docs in an extra step to extract the fields from the
_source. This process surely takes several seconds for the result set you
showed above.

To improve this, declare "id" and "ratings" as fields in the mapping with
attribute "store" set to yes. And do not forget to disable _source and
_all, if you only want to search on fields "id" and "ratings". This will
save a lot of resources in the index.

Hint: in the results, the field _id is already delivered. No need to double
this information in another field "id".

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3mRJkroeuzT9U%2BRv9np%3DH%3DCisF3PARzAJx7nj6ktBeA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to save Query JSON as it is in an Index?

2014-01-01 Thread Alexander Reelsen
Hey,

you could simply use the toString(), or even better the toXContent()
representation of a query and store it in another index, when executing
queries using your java API.

Alternatively you could set the thresholds for the slow index log very log
and thus log every index operation (this has a performance impact), see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-slowlog.html#index-slow-log


--Alex



On Tue, Dec 31, 2013 at 7:53 PM, Search User  wrote:

> I want to save the queries my users are executing and let them re-run at a
> later time.
>
> Thanks,
>
>
> On Tuesday, December 31, 2013 1:21:16 AM UTC-5, Daniel Guo wrote:
>>
>> I don't get your point, can you describe more detail?
>>
>> On Tuesday, December 31, 2013 8:23:22 AM UTC+8, Search User wrote:
>>>
>>> I want to save the query JSON as it as in a field in an ES index. I
>>> should be able to retrieve queries and run it at a later time. I don't need
>>> features like percolator. I am using Java client to index and retrieve.
>>>
>>> What should I do to achieve this?
>>>
>>> Thanks.
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/28ee8010-df47-489f-90df-07e41d831d42%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8Ua5bLHhhnf5r2VdL99LDQVbazfu%3DsLK1z_w8UHn5W4w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Finding duplicate documents or its count based on some field names

2014-01-01 Thread Alexander Reelsen
Hey,

another very simple solution could be a terms facet, using a script field,
which simply concatenates the two fields you want to check for. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_term_scripts


--Alex


On Tue, Dec 31, 2013 at 1:57 PM, Yann Barraud wrote:

> Hi,
>
> You can check this :
>
> http://github.com/yannbrrd/elasticsearch-entity-resolution
>
> Le samedi 28 décembre 2013 06:16:16 UTC+1, Narinder Kaur a écrit :
>
>> Hi All,
>>
>> I need to know, if Elasticsearch has some feature to find the
>> duplicate documents or documents counts if I want to see how many documents
>> are having same values against two or more fields. I can do that for one
>> field using facets, but what if I need to do it against more than one
>> field. For Example : Suppose I have following doc in Es
>>
>> doc 1 :
>>
>> {
>> name : abc
>> age:22
>> country:usa
>> gender:male
>> }
>>
>> doc 2 :
>>
>> {
>> name:xyz
>> age:27
>> country:usa
>> gender:male
>> }
>>
>> doc 3:
>>
>> {
>> name:xyz
>> age:22
>> country:india
>> gender:female
>> }
>>
>> doc 4
>> {
>> name:abc
>> age:22
>> country:usa
>> gender:female
>> }
>>
>> So now my requirement is to find all doc having same age and same
>> country, So that  doc1 and doc4 are duplicate for me, OR In  simple  words,
>> I want to have unique clause on a single fields or composite fields key. Is
>> this possible??
>>
>> Please let me know if its possible using Elasticsearch, as I think it is
>> very important feature for me.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7d4ebe89-f777-499c-a215-f794c33d88a3%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9u46nmj7Kzx0WZ0zUJ7xeT4e00HAh8Ce7j5DrnVY4uEg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: how does facets work

2014-01-01 Thread Alexander Reelsen
Hey

Small correction: When using the global flag, facets ignore the query. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets.html#_scope

That might help in your case (though from my experience it is likely that
you will end up with a couple of facets, from which each has a different
facet filter. But your mileage may vary).


--Alex


On Fri, Dec 27, 2013 at 6:58 PM, Ivan Brusic  wrote:

> Facets work on the documents returned by the query.  This behavior will
> not work in your case since you would like to gather facets on a greater
> set of documents, not just the ones returned by the query. To solve this
> issue, elasticsearch provides a post filter, which affects the result
> document set, but not the set of documents that the facets work on.
>
> The term "filter" is a bit overloaded in elasticsearch, so the team
> renamed the post filter to a more explicit "post_filter":
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html
> https://github.com/elasticsearch/elasticsearch/issues/4119
>
> The post filter documentation has some insight on how the filters affect
> the facets. In your case, you want a filtered query (
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html
>  )
> with the color as the filter, but the model filter will be applied as a
> post filter. Hopefully this makes sense. :)
>
> Cheers,
>
> Ivan
>
>
> On Fri, Dec 27, 2013 at 9:41 AM, Volker  wrote:
>
>> Dear Readers
>>
>> I have a question about facets and doing some filtering based on facets.
>>
>> In the moment I am using hibernate search in combination with bobobrowse
>> for facetting and I am thinking about switching to ES. But before that I
>> would like to check whether I can still get the same functionality.
>>
>> lets asume that I have an index about cars and some facets -- eg. model
>> and color.
>>
>> color
>> [ ] red (10)
>> [ ] blue (5)
>> [ ] green (2)
>>
>> model
>> [ ] bmw (4)
>> [ ] vw (5)
>> [ ] ford (8)
>>
>> if I select a model I would like to get only color facets for that model,
>> but I still would like to get facets for all models. eg:
>>
>> color
>> [ ] red (2)
>> [ ] blue (2)
>> [ ] green (1)
>>
>> model
>> [ ] bmw (4)
>> [x] vw (5)
>> [ ] ford (8)
>>
>> I have searched I did not find an example about this usecase. Is this
>> possible and if yes, how do I filter a query to get these results?
>>
>> Kind regards
>>
>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/9d1f3008-aff2-4936-8e6c-7611734e7418%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDj9Q9EsfvH5%2BE1AkZiCZLVuq3RZkSrRFeaW_NM3Uc6Gw%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FX_La_JB3kyEp7af%2B3A2hEQ2DQ2x0skFi-c_fFppPjw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Non Alphanumeric character searching

2014-01-01 Thread Alexander Reelsen
Hey,

most likely those special chars have been removed before your data has been
stored in the inverted index - and thus cannot be searched for. This highly
depends on the mapping for a field. You can play around with the analyze
API to find out, how a string is tokenized and stored. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

Or use the awesome inquisitor plugin, which offers a nice GUI around that
functionality, see https://github.com/polyfractal/elasticsearch-inquisitor


--Alex


On Mon, Dec 30, 2013 at 3:05 PM, deep saxena  wrote:

> #%##%#%#$%#%#$%#$ my data contain this string.
>
> I am firing this query, but not able to search the data. any clues why it
> is not searching? if I put abc in between #%##%#%#abc$%#%#$%#$ and fire the
> same query which this query string it find out the result for me.
>
> {
>   "from" : 0,
>   "size" : 3,
>   "query" : {
> "filtered" : {
>   "query" : {
> "bool" : {
>   "should" : {
> "query_string" : {
>   "query" : "\"#%##%#%#$%#%#$%#$\"",
>   "default_field" : "DATA"
> }
>   }
> }
>   },
>   "filter" : {
> "range" : {
>   "timestamp" : {
> "from" : 0,
> "to" : 1388412035468,
> "include_lower" : true,
> "include_upper" : true
>   }
> }
>   }
> }
>   }
> }
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b13816bb-e08b-4627-a517-dce0f90ca581%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-_kNROsaOVN0ZDbp7x9VDRD_HxAAP2UUeMO8j189OLkA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Several filter in query : [filtered] query does not support [filter]

2014-01-01 Thread Alexander Reelsen
Hey,

the query looks ok from a birds eye view. Can you say, which elasticsearch
version you are using? Did you try with the 0.90.9?
However the mapping and and indexed data is also important. Can you create
a gist, so one can recreate the issue locally - I guess it is reproducible,
right? That would be great!


--Alex



On Mon, Dec 30, 2013 at 1:30 PM, Moh  wrote:

> Hi,
>
> I want to use several filters in my query.
>
> I follow official documentation
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-and-filter.html
>
> But it does not work.
>
> Saisissez le code ici...{
> "error": "SearchPhaseExecutionException[Failed to execute phase
> [query], all shards failed; shardFailures
> {[FD4W1oSYQPG1MHRySK_RPQ][shops][4]: SearchParseException[[shop][4]:
> from[-1],size[-1]: Parse Failure [Failed to parse source [ . ; nested:
> QueryParsingException[[shops] [filtered] query does not support [filter]];
> }]",
> "status": 400
> }
>
>
> And the query :
>
> {
>   "query": {
> "match_all": {}
>   },
>   "filter": {
> "and": [
>   {
> "geo_distance": {
>   "distance": "500m",
>   "location": {
> "lat": 45.9402896,
> "lon": 4.7779216
>   }
> }
>   },
>   {
> "numeric_range": {
>   "rate": {
> "gte": 3
>   }
> }
>   }
> ],
> "_cache": true
>   },
>   "sort": [
> {
>   "_geo_distance": {
> "order": "asc",
> "unit": "m",
> "location": {
>   "lat": 45.9402896,
>   "lon": 4.7779216
> }
>   }
> }
>   ]
> }
>
>
>
>
>
> What is wrong with this query ?
>
> Regards
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/54da5513-b6a9-4a3a-b704-a07d52c2bd43%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_noYEsstuvfSk7VbTuYBVLyPxp5qOBwOxJT_XZtp4V%2BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: facets on nested objects

2014-01-01 Thread Alexander Reelsen
Hey,

wondering about the facet part a bit (maybe I didnt look good enough),
shouldnt the facet field not be "products.product" instead of "products"
only in order to filter correctly?

Can you create a small gist to reproduce your behaviour easily?


--Alex


On Sun, Dec 29, 2013 at 10:14 AM, oreno  wrote:

> Hi,
> I'm trying to run facet on nested objects but at the moment I'm not getting
> the results I'm after.
> In the below query you can see the main query which is suppose to count the
> number of users which have a nested object(buying event) between 2013-10-01
> and 2013-10-10 AND also have a TV as one of the  products in that event.
> On the facet side I'm looking to get the same result, only have it done by
> facet - that way I can get the counts for all products between the dates
> (not just TV), without running the main query for each of them (the way my
> system works now)
>
> What I'm expecting to get is the same count for the main query ("total":
> 284070) and for the facet calculation (term": "TV","count": 535445) which
> is
> not the case at the moment.
>
> Does anyone know what I'm doing wrong here?
>
> Thanks in advanced,
>
> curl-XPOST'http: //X: 9200/sample/_search?pretty=true'-d'{
> "size": 0,
> "query": {
> "nested": {
> "query": {
> "bool": {
> "must": [{
> "term": {
>
> "events.products.product": "TV"
> }
> },
> {
> "range": {
>
> "events.event_time": {
> "from":
> "2013-10-01",
> "to":
> "2013-10-10",
>
> "include_lower": true,
>
> "include_upper": true
> }
> }
> }]
> }
> },
> "path": "events"
> }
> },
> "facets": {
> "tags": {
> "terms": {
> "field": "product",
> "size": 200
> },
> "nested": "events",
> "facet_filter": {
> "range": {
> "events.event_time": {
> "from": "2013-10-01",
> "to": "2013-10-10",
> "include_lower": true,
> "include_upper": true
> }
> }
> }
> }
> }
> }'{
> "took": 96,
> "timed_out": false,
> "_shards": {
> "total": 20,
> "successful": 20,
> "failed": 0
> },
> "hits": {
> "total": 284070,
> "max_score": 4.171436,
> "hits": []
> },
> "facets": {
> "tags": {
> "_type": "terms",
> "missing": 0,
> "total": 13036875,
> "other": 1901080,
> "terms": [{
> "term": "TV",
> "count": 535445
> },
> {
> "term": "DISHWASHER",
> "count": 375003
> },
> {
> "term": "RADIO",
> "count": 316831
> },
> .
>
>
>
>
> mapping:
>
> {
> "user": {
> "_ttl": {
> "enabled": true
> },
> "properties": {
> "name": {
> "type": "string"
> },
> "events": {
> "type": "nested",
> "properties": {
> "event_time": {
> "type": "Date"
> },
> "products": {
> "properties": {
> "product": {
>

Re: Question about index optimize

2014-01-01 Thread Alexander Reelsen
Hey,

can you make sure, that your setting regarding open files is actually
applied, by using the nodes info API. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-info.html

curl -XGET 'http://localhost:9200/_nodes?process'

check the max_file_descriptors parameter.

You can also check http://localhost:9200/_cluster/stats?process for
currently open file descriptors...


--Alex




On Sat, Dec 28, 2013 at 4:25 AM, Jack Park  wrote:

> After a rather amazing run, I got the "too many open files" report again.
>
> The platform is a commodity pc with 8gb ram, 1tb hard disk, running a
> slightly out of date Ubuntu.
>
> I added lines to /etc/security/limits.conf; something about max and
> min 32000 and booted with ./elasticsearch -f -Xmx4g -Xms2g
> -Des.index.store.type=niofs -Des.max-open-files=true
>
> and still got the crash.
> I was running the same program on a different platform and, in about
> the same level of importing, that one blew out with No Node Available;
> looking at the nix console there says No route to host (never mind
> that it had been running fine on a local gigabit network with no
> outside interference.
>
> I am still interested whether there is some background "too many open
> files" going on.
>
> Thanks in advance for ideas.
>
> Cheers
> Jack
>
>
> Log trace below:
> Exception in thread "Thread-16" Exception in thread "Thread-2098"
> org.elasticsea
> rch.index.engine.IndexFailedEngineException: [vertices][4] Index failed
> for [cor
> e#42641.359576]
> at
> org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.ja
> va:497)
> at
> org.elasticsearch.index.shard.service.InternalIndexShard.index(Intern
> alIndexShard.java:386)
> at
> org.elasticsearch.action.index.TransportIndexAction.shardOperationOnP
> rimary(TransportIndexAction.java:212)
> at
> org.elasticsearch.action.support.replication.TransportShardReplicatio
>
> nOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplic
> ationOperationAction.java:556)
> at
> org.elasticsearch.action.support.replication.TransportShardReplicatio
>
> nOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperat
> ionAction.java:426)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException:
> /usr/local/lib/elasticsearch-0.90.9/da
> ta/elasticsearch/nodes/0/indices/vertices/4/index/_k7z.fdt (Too many open
> files)
>
> On Fri, Dec 27, 2013 at 5:19 PM, Jack Park 
> wrote:
> > This page
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-optimize.html
> > talks about merging segments.
> >
> > My curiosity/interest in this process grew when, after an overnight
> > data import, the index went down with Too Many Open Files.
> >
> > Sure, I was able to find some instructions about telling nix to set
> > max files to 32000. Question is this: really, how many open files does
> > ES keep when importing monster loads?
> >
> > Would it make sense to do an optimize following a monster load and
> > before further work? If so, from a Java client, how do you do that?
> >
> > Thanks in advance.
> > Jack
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fx32AV0%2B-KEoct251y%3DRRT6Hb2iXqZXdzSf77bvy%3DPxxQ%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-x3Q8ZWVqsS99bhy6Mrq%2BPFpSFZ-qQAQ6tqd_UmVD9mg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Range query problems

2014-01-01 Thread Alexander Reelsen
Hey,

can you provide an example as mentioned in http://elasticsearch.org/help -
so one can reproduce your problem on elasticsearch 0.90.8?

In addition you should try using a filtered query, with a match_all query
and a bool filter, which contains a term filter and a range filter. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html


--Alex


On Fri, Dec 27, 2013 at 11:26 PM, anivlis  wrote:

> Hi,
> I want to retrive the last two days of a seller_id, the query is:
> {
>   "from": 0,
>   "size": 50,
>   "query": {
> "term": {
>   "seller.id": 4
> }
>   },
>   "filter": {
>
>   "range": {
> "date_created": {
>   "from": "now/d-1d",
>   "to": "now",
>   "include_lower": false,
>   "include_upper": true
> }
>   }
>}
> }
>
> The query return two days: yesterday and two day ago. We had elastic
> 0.90.3 version and it was working but then we changed to elastic 0.90.8
> version and it didn't work. I don't know if the problem is related to
> elastic's version. Any idea about that?
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0e5291f3-8995-4889-8de9-1d3af257cdb3%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8i8j4Bd8hWg-GMm80bNwtL4Oa4pwkURxfZ8AcUQEd8_w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread project2501
Here is my mapping.

mapping = {'doc': {'properties': 
{'ngrams':{'index':'not_analyzed','type':'string'},"dates": {"type" : 
"date", "format" : 
"-MM-dd"},'locations':{'index':'not_analyzed','type':'string'},'concept':{'index':'not_analyzed','type':'string'},
 
'entities.currencies': {'index':'not_analyzed', 'type':'string' }, 
'entities.actions': {'index':'not_analyzed','type':'string' }, 
'entities.things': {'index':'not_analyzed','type':'string' }, 
'entities.places': {'index':'not_analyzed','type':'string' }, 
'entities.people': 
{'search_analyzer':'simple','index_analyzer':'simple','type':'string' }, 
'entities.dates': {'index':'not_analyzed', 'type':'string' }, 'text': { 
"analyzer":"standard", "term_vector":"yes", 'type':'string','term_vector' : 
'with_positions_offsets'} ,'location': {'type': 'geo_point', 'store': 
'yes'},'concepts':{'type':'string', 'store':'no'

The result of the document fields are dynamically mapped, strings.

On Wednesday, January 1, 2014 1:43:29 PM UTC-5, project2501 wrote:
>
> I haven't disabled those fields, but since I query only select fields, 
> would it matter to disable those?
> I do a lot of query_string queries with highlights and facets, but at the 
> moment, even the simplest query is dog slow.
>
> On Wednesday, January 1, 2014 12:57:56 PM UTC-5, Jörg Prante wrote:
>>
>> 10MB are very large for a single document. Have you disabled _source and 
>> _all field in the mapping?
>>
>> Jörg
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53cd966a-294c-4679-97a0-f823b93da701%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread project2501
I haven't disabled those fields, but since I query only select fields, 
would it matter to disable those?
I do a lot of query_string queries with highlights and facets, but at the 
moment, even the simplest query is dog slow.

On Wednesday, January 1, 2014 12:57:56 PM UTC-5, Jörg Prante wrote:
>
> 10MB are very large for a single document. Have you disabled _source and 
> _all field in the mapping?
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/62e0198c-dd6f-4482-bc3b-bc696401c4e8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread project2501
I moved to ES 0.90.9 and JDK1.7. Still slower than tar.

curl -X POST "http://localhost:9200/documents/_search?pretty=true"; -d '
{
"query": {
"query_string": {
"query": "(text:\"understanding\" ) "
}
},
"fields": [
"id",
"ratings"
]
} '

Here's a result. See how long it takes? That's obscene for 50 documents.


{
  "took" : 4294,
  "timed_out" : false,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  },
  "hits" : {
"total" : 11,
"max_score" : 0.048666354,
"hits" : [ {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "eb7b1c4a-b8c9-4bdc-8baf-aa16002831f3",
  "_score" : 0.048666354,
  "fields" : {
"id" : "eb7b1c4a-b8c9-4bdc-8baf-aa16002831f3",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "c5ce2faf-db3d-4623-9ecc-398f4a78f123",
  "_score" : 0.030950457,
  "fields" : {
"id" : "c5ce2faf-db3d-4623-9ecc-398f4a78f123",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "87666287-6b01-4c99-8c8e-430f3417014b",
  "_score" : 0.030665938,
  "fields" : {
"id" : "87666287-6b01-4c99-8c8e-430f3417014b",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "f10c9a47-7efa-424d-bf65-7072bbbf64be",
  "_score" : 0.028295785,
  "fields" : {
"id" : "f10c9a47-7efa-424d-bf65-7072bbbf64be",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "b1d2f103-d2c6-4b4e-8fd2-8615b9ed6bb0",
  "_score" : 0.02708165,
  "fields" : {
"id" : "b1d2f103-d2c6-4b4e-8fd2-8615b9ed6bb0",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "6aa4b05b-ee38-49b4-a3e2-1f2a01e62f40",
  "_score" : 0.025010176,
  "fields" : {
"id" : "6aa4b05b-ee38-49b4-a3e2-1f2a01e62f40",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "78d5a3d7-4c45-4f1b-a596-993916aed2de",
  "_score" : 0.02360665,
  "fields" : {
"id" : "78d5a3d7-4c45-4f1b-a596-993916aed2de",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "21be8c9e-3e73-44e8-9b31-db44fa0a4faa",
  "_score" : 0.020865528,
  "fields" : {
"id" : "21be8c9e-3e73-44e8-9b31-db44fa0a4faa",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "1be7eb7e-dd8b-4582-aa0d-2edb20fa4339",
  "_score" : 0.020655818,
  "fields" : {
"id" : "1be7eb7e-dd8b-4582-aa0d-2edb20fa4339",
"ratings" : [ ]
  }
}, {
  "_index" : "documents",
  "_type" : "doc",
  "_id" : "a695fdd5-d025-4f77-8b25-c7fdea8bbbe3",
  "_score" : 0.017684866,
  "fields" : {
"id" : "a695fdd5-d025-4f77-8b25-c7fdea8bbbe3",
"ratings" : [ ]
  }
} ]
  }
}


On Wednesday, January 1, 2014 12:44:46 PM UTC-5, project2501 wrote:
>
> Hi,
>   Thanks for the response.
>
> The box has 15GB RAM. 4GB allocated to ES.
>
> The mapping is simple and has only about 6 not_analyzed fields, 1 date 
> field and 1 text field. 
>
> The documents are large however, 10MB each with 100's of fields. Only a 
> couple are being returned and the response documents are less than 10k 
>  each (only two two small fields returned).
>
> Sun JDK 1.6
>
> I will try Oracle JDK 1.7 and latest ES.
>
> I was trying different things with node/replica to see if there is a 
> change in performance.
> Cluster health is green.
>
>
> On Wednesday, January 1, 2014 12:30:49 PM UTC-5, Jörg Prante wrote:
>>
>> There can be lots of reasons - EC2-related, OS related, Java related, 
>> cluster setup related, index related, query related...
>>
>> Can you give an example of your mapping and a document you have indexed?
>>
>> How much RAM is your EC2 instance? Do you use hardware virtualization? 
>> Did you disable swap and enable mlock?
>>
>> What is the cluster health, is it green? If you have just one node, why 
>> is there 1 replica? It makes not much sense.
>>
>> Side notes: Please use ES version 0.90.9 as it is the latest of the 0.90 
>> branch with bugs fixed. And what vendor is jdk1.6.0_45? Please use Java 
>> 7, especially if you use OpenJDK 6.
>>
>> Jörg
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/201a4ca6-bc05-48f4-85ff-d3d2e8b152cc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread project2501
Correction, health status is 'yellow'. Only one node.

On Wednesday, January 1, 2014 12:44:46 PM UTC-5, project2501 wrote:
>
> Hi,
>   Thanks for the response.
>
> The box has 15GB RAM. 4GB allocated to ES.
>
> The mapping is simple and has only about 6 not_analyzed fields, 1 date 
> field and 1 text field. 
>
> The documents are large however, 10MB each with 100's of fields. Only a 
> couple are being returned and the response documents are less than 10k 
>  each (only two two small fields returned).
>
> Sun JDK 1.6
>
> I will try Oracle JDK 1.7 and latest ES.
>
> I was trying different things with node/replica to see if there is a 
> change in performance.
> Cluster health is green.
>
>
> On Wednesday, January 1, 2014 12:30:49 PM UTC-5, Jörg Prante wrote:
>>
>> There can be lots of reasons - EC2-related, OS related, Java related, 
>> cluster setup related, index related, query related...
>>
>> Can you give an example of your mapping and a document you have indexed?
>>
>> How much RAM is your EC2 instance? Do you use hardware virtualization? 
>> Did you disable swap and enable mlock?
>>
>> What is the cluster health, is it green? If you have just one node, why 
>> is there 1 replica? It makes not much sense.
>>
>> Side notes: Please use ES version 0.90.9 as it is the latest of the 0.90 
>> branch with bugs fixed. And what vendor is jdk1.6.0_45? Please use Java 
>> 7, especially if you use OpenJDK 6.
>>
>> Jörg
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/883e7933-3248-49ec-828c-c6bcc89c2a69%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread joergpra...@gmail.com
10MB are very large for a single document. Have you disabled _source and
_all field in the mapping?

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFjDTAdeh3FdEw7u1B4N6gsrL78GvP%3DBCJ-3WAU6Mt7oA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread project2501
Hi,
  Thanks for the response.

The box has 15GB RAM. 4GB allocated to ES.

The mapping is simple and has only about 6 not_analyzed fields, 1 date 
field and 1 text field. 

The documents are large however, 10MB each with 100's of fields. Only a 
couple are being returned and the response documents are less than 10k 
 each (only two two small fields returned).

Sun JDK 1.6

I will try Oracle JDK 1.7 and latest ES.

I was trying different things with node/replica to see if there is a change 
in performance.
Cluster health is green.


On Wednesday, January 1, 2014 12:30:49 PM UTC-5, Jörg Prante wrote:
>
> There can be lots of reasons - EC2-related, OS related, Java related, 
> cluster setup related, index related, query related...
>
> Can you give an example of your mapping and a document you have indexed?
>
> How much RAM is your EC2 instance? Do you use hardware virtualization? Did 
> you disable swap and enable mlock?
>
> What is the cluster health, is it green? If you have just one node, why is 
> there 1 replica? It makes not much sense.
>
> Side notes: Please use ES version 0.90.9 as it is the latest of the 0.90 
> branch with bugs fixed. And what vendor is jdk1.6.0_45? Please use Java 
> 7, especially if you use OpenJDK 6.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/33cd7099-b336-4fef-a259-64c29a4d8d88%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread joergpra...@gmail.com
There can be lots of reasons - EC2-related, OS related, Java related,
cluster setup related, index related, query related...

Can you give an example of your mapping and a document you have indexed?

How much RAM is your EC2 instance? Do you use hardware virtualization? Did
you disable swap and enable mlock?

What is the cluster health, is it green? If you have just one node, why is
there 1 replica? It makes not much sense.

Side notes: Please use ES version 0.90.9 as it is the latest of the 0.90
branch with bugs fixed. And what vendor is jdk1.6.0_45? Please use Java 7,
especially if you use OpenJDK 6.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE%2B_WOkb4X_itTEYeLpMsuT4sk7XLthh0pQaws%2BvsqNRA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


4-5 second query time. Only 50 documents. Need help

2014-01-01 Thread project2501
Hi,
  I'm running ES 0.90 on a very big EC2 instance. Ubuntu 13.04 64bit.

/home/ubuntu/installs/jdk1.6.0_45/bin/java -Xms4000m -Xmx4000m -Xss256k 
-Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch 
-Des.path.home=/home/ubuntu/installs/elasticsearch-0.90.6 -cp 
:/home/ubuntu/installs/elasticsearch-0.90.6/lib/elasticsearch-0.90.6.jar:/home/ubuntu/installs/elasticsearch-0.90.6/lib/*:/home/ubuntu/installs/elasticsearch-0.90.6/lib/sigar/*
 
org.elasticsearch.bootstrap.ElasticSearch


I have only 50 documents in an index yet every query takes 3-6 seconds to 
run. Here is a sample query.

curl -X POST "http://localhost:9200/documents/_search?pretty=true"; -d '
{
"query": {
"query_string": {
"query": "(text:\"understanding\" ) "
}
},
"fields": [
"id",
"ratings"
]
} '

Why does it take so long to query? I have 1 shard and 1 replica. I thought 
ES was supposed to be fast?

Any ideas here? This is pretty disappointing.





-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c5b4f1e-7e04-4a04-ad44-c3a58c5b7c60%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: highlight encoder

2014-01-01 Thread Jun Ohtani
Hi,

You try to use parameter "encoder" instead of "escape_html".

See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html#_encoder

Regards,


2014/1/1 Guozhu Wei 

> Dear all
>
> Happy new year!
>
> I have a problem when I use the highlight, I want to highlight a field
> 'abs' while it contains html, like:
>
> the content of field 'abs':
> ... The molecule:(L2L3M)n ...
>
> I want to escape the html when highlight and found that the "encoder"
> option.
>
> I use this option:
>
> "highlight": {
> "escape_html": "html",
> "pre_tags": [""],
> "post_tags": [""],
> "fields": {
> "abs": {
> "fragment_size": 150,
> "number_of_fragments": 1,
> "no_match_size": 150
> }
> }
> },
>
> but nothing changed, the html tags like '' still returned untouched.
>
> Any hints of what could be wrong?
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CANnJ_2ObfTrUa9fvb2hkB0wNhSaPw5u9fSc_h_RM%2B56irmCMzg%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
---
Jun Ohtani
blog : http://blog.johtani.info

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPW8A5yoqwNRRuAxBOzH_DpFXzSYzA%2Bp3F4HGM%3D4aSWvVA8tZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


highlight encoder

2014-01-01 Thread Guozhu Wei
Dear all

Happy new year!

I have a problem when I use the highlight, I want to highlight a field
'abs' while it contains html, like:

the content of field 'abs':
... The molecule:(L2L3M)n ...

I want to escape the html when highlight and found that the "encoder" option.

I use this option:

"highlight": {
"escape_html": "html",
"pre_tags": [""],
"post_tags": [""],
"fields": {
"abs": {
"fragment_size": 150,
"number_of_fragments": 1,
"no_match_size": 150
}
}
},

but nothing changed, the html tags like '' still returned untouched.

Any hints of what could be wrong?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANnJ_2ObfTrUa9fvb2hkB0wNhSaPw5u9fSc_h_RM%2B56irmCMzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Highlight analyzed field with stop words

2014-01-01 Thread mayap
Hi,

In the following query we perform exact match which includes stop word, 
which is not highlighted, since the highlight is performed on analyzed 
field:
{
  "query": {
"match": {
  "_all": {
"query": "\"My name is\""
  }
}
  },
  "fields": [
"_source"
  ],
  "highlight": {
"fields": {
  "text.*": {}
}
  }
}
result:  My name is .
If the match is not exact we do want to exclude the stop words from the 
highlighting. (That's why I don't think that defining our own analyzer will 
solve the problem)

Q 1. Is it possible in case of exact match the perform highlighting on 
_source field with some parameters? We tried that and got no highlighting 
at all. Is there some other possibility?
Q 2. Why do the words "My", "name" are highlighted separately? We use fvh 
since in the mapping we defined:
template_textEnglish: {
mapping: {
index: analyzed
store: no
analyzer: english
type: string
*term_vector: with_positions_offsets*
}
match: text.English.*
}
}

Any help would be appreciated.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b8a99c46-eb77-4cd5-a6f1-adbbbe12a79d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Bulk throughput issues

2014-01-01 Thread joergpra...@gmail.com
There is no need for more than one client instance per JVM. You can
increase the bulk request concurrency in the BulkProcessor with
"setConcurrentRequests" to avoid blocking threads, until you reach the
sweet spot where client submitting resources matches the indexing capacity
of the cluster.

This is a matter of dynamic balance, which is different from setup to
setup. The default request concurrency is 1. For a higher value, you have
to prepare enough heap resources and maybe run your doc construction in
multiple threads to exploit the advantages.

As a rule of thumb, use 4 * available cores for the concurrency, and
~1-10MB for the bulk size.

For example, I often operate with a bulk size of 1000 docs and a
concurrency level of 32.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHaSgLRMTv3Uh_C5Z87_seMXyVFeVn-7_kwA3s2Fte99A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: shard stucked in initializing state (elasticsearch crash test)

2014-01-01 Thread joergpra...@gmail.com
Sorry, I just see tat you already restarted the node...

Is there something in the logs? At debug level? The cluster should tell
about if it receives the shard at all, and maybe the reason why it rejects
the shard.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHeEcDz%3DhGv%2Bpb96Qk%3D3%3DDM-jnNpf26P%2BwKv7u%3DuE7%2Bzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: shard stucked in initializing state (elasticsearch crash test)

2014-01-01 Thread joergpra...@gmail.com
Can you find out if the initializing shards were stuck because of a
previous OOM? If so, there is not much that can be done except a node cold
restart (JVM shutdown and start).

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoERPQEgpMAzwqNw9JpeUwpFWWu-qeqYVbtJ-%3DE-kRi%2BUw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Problem of ElasticSearch on ZFS

2014-01-01 Thread joergpra...@gmail.com
Which Ubuntu version? What ZFS product/version?

What hardware controller do you use for ZFS disks?

You have to carefully prepare ZFS for apps like ES, to avoid RAM
overallocation, block misalignments, and double caching.

For example:

- only assign a fragment of RAM to ZFS adaptive replacement cache
(zfs_arc_max) - ES is doing all the caching

- use the noop scheduler (ZFS reorders IO requests)

- use a read ahead block size of ZFS block size (drive ops must match ZFS
block size)

- turn on write through of the disks (turn of write cache)

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGFyqzpRg275X%3DHxYSVVfpZV5z6auyN9qA-LdD9cVyGVw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 16Gb RAM / 8 cores : best config ?

2014-01-01 Thread joergpra...@gmail.com
Do not use VM if you want maximum single node performance.

And, fuzzylikethis is slow, yes. Have you tried the parameters
minSimilarity, prefixLength, ignoreTF, maxnumterms?

Or use another query type which is faster?

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoErV1%2Bb%3DQrx-ZQodzDJAGMjC%3DAxE0f6iR0rBtEUrcqmPg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 16Gb RAM / 8 cores : best config ?

2014-01-01 Thread Yann Barraud
Hi,

Thanks for the answer. I have an index with 2,5M docs, and 8,5 Go. I'm
doing fuzzylikethisfield queries, and custom score queries.

OS is ubuntu, virtualized.

And the results are really slow. As if it was running an a single core...


Cordialement,
Yann Barraud


2013/12/31 joergpra...@gmail.com 

> No easy answer, it depends on your OS and your requirements - indexing
> load, searching load, analytics ... you should start with ES default
> settings and heap size increased to 50% of RAM, that is  8G
>
> Jörg
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/r5wx8u2prns/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFi_LSygR1WSJhetfo97WjoE_niZ1T2r%2BpP%2BvHCJMfE3w%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BhvuXd820HakDfKT%3DkWZf3Kzp_a5q6seyZJ4-fVBUXyGra6HA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Feedback on docs

2014-01-01 Thread joergpra...@gmail.com
The differences are

- the prepare() method allow to build immutable request objects. Mutable
and immutable objects have different characteristics due to performance in
a multithreaded system design, since immutable objects are free of side
effects. In Java 8 you can look at the Lambda feature, it comes from
functional programming, which makes writing correct code much easier. This
will be the future of Java, and I'm sure that ES API will change in that
way.

- what you see is the asynchronous API. All ES methods are asynchronously
executed. There are two styles for each ES method call: one with future
objects, and one with listeners. So you can receive answers by other
threads. Both styles coexist for convenience to write better code.

- I have prepared javadocs at
http://xbib.org/elasticsearch/1.0.0.Beta2-SNAPSHOT/apidocs/index.html (as
of November 2013, I can update it if required)

Jörg



On Tue, Dec 31, 2013 at 8:54 PM, Paul Houle  wrote:

>
>
> The big is one is that there are some cross-cutting patterns in the API I
> don't totally understand.  For instance,
>
> * what is the difference between index() and prepareIndex()?
> * what is up with the execute(),  actionGet() and get() methods of various
> sorts?
> * are javadocs available for IndexRequest() and similar objects?
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%3DY8QmMDLXWU%2BcEVC8MoQWiQgSSFgQKH8sUKgG_KJyDw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.