Re: Elasticsearch-Hadoop: EsOutputFormat and the 'date' type

2014-07-11 Thread Andrew Nixon
Hi Costin, thank you for your reply.

My issue actually came down to the ordering of my matches. I had a
'match:*' as the first dynamic template which disabled norms. Although this
template didn't explicitly define a type for any matched field it would
automatically set the 'date' field to a string type. The "date_" template
would then match but fail to set the type to date as it had already been
defined. Simply reordering my dynamic templates so that the date matcher
came before the catch all solved the issue :)
On 10 Jul 2014 22:52, "Costin Leau"  wrote:

> Make sure the template does match. This might not be always obvious
> however it's easy to test out. First, check your template and after
> defining the template, send a request with a sample payload to see whether
> the doc gets properly created. A common mistake is defining the template
> after the index is created which makes it useless; the template gets
> applied when a the index is created (and thus it becomes part of its
> mapping).
> Second, if the mapping appears correct, double-check your es-hadoop
> configuration and potentially turn on logging to see the payload sent by
> es-hadoop to elasticsearch.
>
> Hope this helps,
>
> On 7/1/14 11:09 PM, Telax wrote:
>
>> Hello,
>>
>> I'm interested in using the EsOutputFormat class in a hadoop mapreduce
>> task.
>> During experimentation I have noticed that there is no direct handling
>> for 'date' objects.
>> My data contains a number of 'date' fields which must be transposed into
>> the Elasticsearch index, however, I am
>> currently unable to successfully transpose those fields which should be
>> of type 'date' as instead they are simple
>> submitted into the index as 'string' type.
>> Using templates, I have tried to define a dynamic_date_formats as well as
>> explicitly specifying a date type and format
>> mapping for a matched field in a dynamic template which matches against
>> the name of those fields which should be 'date'
>> types.
>> In either case, data fields indexed into my Elasticsearch cluster which
>> should be recognized as 'date' types are only
>> set as strings .
>>
>> Here is an example template similar to that with which I have been
>> experimenting.
>> {
>>  "template" : "index-name-*",
>> "mappings" : {
>>"_default_" : {
>>  "dynamic_date_formats" : ["-MM-dd hh:mm"]
>>  "dynamic_templates" : [
>>   { "date_field_template": {
>> "match":  "date_*",
>> "mapping": {
>>  "type":   "date",
>>  "format" : ""-MM-dd hh:mm""
>> }
>>}
>>}
>> }
>>
>> Any help on this issue would be greatly appreciated.
>> Thanks
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to
>> elasticsearch+unsubscr...@googlegroups.com > unsubscr...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/10ff4422-
>> ccdb-4fc0-8ccb-34b4b5e5180a%40googlegroups.com
>> > ccdb-4fc0-8ccb-34b4b5e5180a%40googlegroups.com?utm_medium=
>> email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Costin
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/elasticsearch/WPT086_Q1ZI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/53BF0B05.6060205%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO4iR%2BHyAtqNF%2B0mGsz6-W9JkED7E6mhaeW4wd1_v%2BpP6%3DUA-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: _all and compress fields

2014-07-11 Thread vineeth mohan
Hello ,

I believe the compression is enabled by default.
Disabling _all means that you wont be able to search on all fields
together.
That is in , query_string , you will need to specify which field you want
to query , else the search will give 0 results.
This will certainly help in reducing the space.

I believe store is also disabled by default.
For highlighting _source is used if store is disabled , so it wont affect
highlighting.


Thanks
Vineeth


On Sat, Jul 12, 2014 at 12:20 AM, IronMike  wrote:

> I am looking for ways to reduce index size,
> So I turned on compression, it helped a little but not much.
>
> I did 2 more things that helped but not quite sure what the effects are:
>
> - I turned off _all field for all. What is the downside to this?
>
> - I tuned off "store" field for "file" field, which handles text from
> binary files like pdfs..? Should the "store" field for attachments be ON if
> I need highlighting in my queries?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/51994e55-fba3-4968-b146-834f1ff97fba%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DFq5gxLvHUv89F%3D8axKWcXXPTky1qo8ci3ZRjC7z3vdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic Search nodes / servers with constant high CPU utilization in idle state

2014-07-11 Thread webish
After changing to Oracle Java it seemed the problem was resolved.  CPU on 
both nodes dropped to 5% or less during the same state; however, in the 
last 24 hrs history is repeating itself.

CPU on one node is 75% and on the other is 50%

On Wednesday, July 9, 2014 3:47:12 AM UTC-4, webish wrote:
>
> Ahh, ok.  Sorry.
>
> NodeA:
> https://gist.github.com/w3b1sh/ea6a2b3fbfc837d5d9d8
>
> NodeB:
> https://gist.github.com/w3b1sh/2ca12bd920ebf20644ef
>
> On Wednesday, July 9, 2014 3:36:53 AM UTC-4, Mark Walkom wrote:
>
> The output of hot threads is only from the one node, ie requesting it 
> isn't at a cluster level, if you can add the other node it'd help.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>  
>
> On 9 July 2014 17:32, webish  wrote:
>
> Definitely.
>
> Here are all the captures thus far in gists.
>
> *CLUSTER:*
> curl -XGET 'http://localhost:9200/_nodes/stats'
> https://gist.github.com/w3b1sh/a4759e5aa4efbe780fa7
>
> curl -XGET 'http://localhost:9200/'
> https://gist.github.com/w3b1sh/f7a727b1bae53772e56d
>
> *THREADS:*
> curl -XGET 'http://localhost:9200/_nodes/hot_threads'
> https://gist.github.com/w3b1sh/ea6a2b3fbfc837d5d9d8
>
> curl localhost:9200/_cat/thread_pool
> https://gist.github.com/w3b1sh/778194c19ec3ed724f1e
>
>
> On Wednesday, July 9, 2014 3:11:25 AM UTC-4, Ivan Brusic wrote:
>
> Yes, please use gist/pastebin in the future.
>
> Try using the hot threads API [1] to see if there are any threads that are 
> truly busy. If your system is truly idle, your thread pools should be 
> almost empty [2]. Your output above has only one management thread in use, 
> which could be simply the thread serving up the output.
>
> [1] http://www.elasticsearch.org/guide/en/elasticsearch/
> reference/current/cluster-nodes-hot-threads.html
> [2] http://www.elasticsearch.org/guide/en/elasticsearch/
> reference/current/cat-thread-pool.html
>  
> Cheers,
>
> Ivan
>
>
> On Tue, Jul 8, 2014 at 10:54 PM, webish  wrote:
>
> Thanks Mark.  My mistake.
>
> I can try switching to Oracle Java.  There is no TTL.  I've used Marvel 
> for development testing.  Perhaps I can install the plugin...
>
>
> On Wednesday, July 9, 2014 1:35:20 AM UTC-4, Mark Walkom wrote:
>
> It's better to pop long output like that into a gist/pastebin, it makes it 
> easier to read the thread.
> It's also worth installing a monitoring plugin like ElasticHQ or marvel, 
> as they provide graphical insight into what is happening and will 
> extrapolate some of the raw figures out.
>
> If you can change to Oracle java you will get better performance. Are you 
> using TTL?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>  
>
> On 9 July 2014 14:54, webish  wrote:
>
> I'm seeing very high CPU utilization in an idle or close to idle state.  I 
> haven't seen it use less than 40% on each node over the course of days. 
>  I'm not certain how long it has been like this but could be weeks.
>
> Any help resolving this would be greatly appreciated!
>
> *Details:*
>
> There are 2 nodes in the cluster.
>
> AWS instances are:
>  vCPU ECUMemory (GiB) Instance Storage (GB) Linux/UNIX Usage 
>
> r3.2xlarge 82661 1 x 160 SSD
>
>  curl -XGET 'http://localhost:9200/'
> {
>   "status" : 200,
>   "name" : "Eddie Brock",
>   "version" : {
> "number" : "1.1.0",
> "build_hash" : "2181e113dea80b4a9e31e58e9686658a2d46e363",
> "build_timestamp" : "2014-03-25T15:59:51Z",
> "build_snapshot" : false,
> "lucene_version" : "4.7"
>   },
>   "tagline" : "You Know, for Search"
> }
>
>
> % java -version
> java version "1.7.0_55"
> OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1~0.12.04.2
> )
>  OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
>
>
> curl -XGET 'http://localhost:9200/_nodes/stats'
> {
>"cluster_name":"ab_elastic",
>"nodes":{
>   "kpHPVkBCyjOkRaSpS5Q":{
>  "timestamp":1404880773859,
>  "name":"Milan",
>  "transport_address":"removed",
>  "host":"removed",
>  "ip":[
> "removed",
> "NONE"
>  ],
>  "indices":{
> "docs":{
>"count":17793781,
>"deleted":466537
> },
> "store":{
>"size_in_bytes":9902722189,
>"throttle_time_in_millis":8777423
> },
> "indexing":{
>"index_total":3548638,
>"index_time_in_millis":2610694,
>"index_current":1,
>"delete_total":0,
>"delete_time_in_millis":0,
>"delete_current":0
> },
> "get":{
>"total":0,
>"time_in_millis":0,
>"exists_total":0,
>"exists_time_in_millis":0,
>"missing_total":0,
>"missin

Re: Question about querying on a URL field

2014-07-11 Thread Jack Park
Exploring the web, I started applying a cleanup for special
characters, plus hints from various answers, trying these queries:
{"term":{"url":"http\\:\\/\\/google.com\\/"}}
{"term":{"field":{"url":"http\\:\\/\\/google.com\\/"}}}
{"term":{"search_string":{"url":"http\\:\\/\\/google.com\\/"}}}

with no real difference.

On Fri, Jul 11, 2014 at 7:06 PM, Jack Park  wrote:
> This query
> {\"query\":{\"field\":{\"url\":\"http://google.com/\"}
>
> seems to fail.
> It represents a stringified JSON object without any processing on the
> simple url itself.
>
> A snippet of the giant error message is below.
>
> Thanks in advance for ideas.
> Jack
>
> DataProvider.getNodeByURL Error: 
> {"error":"SearchPhaseExecutionException[Failed
> to execute phase [query], all shards failed; shardFailures 
> {[p6M6zWBxReGkMx_qcCV
> PDg][topics][2]: SearchParseException[[topics][2]: from[-1],size[-1]: Parse 
> Fail
> ure [Failed to parse source 
> [{\"query\":{\"field\":{\"url\":\"http://google.com/
> \"}}}]]]; nested: QueryParsingException[[topics] Failed to parse query 
> [http://g
> oogle.com/]]; nested: ParseException[Cannot parse 'http://google.com/': 
> Lexical
> error at line 1, column 19.  Encountered:  after : \"\"]; nested: 
> TokenMgrE
> rror[Lexical error at line 1, column 19.  Encountered:  after : \"\"]; 
> }{[p
> 6M6zWBxReGkMx_qcCVPDg][topics][3]: SearchParseException[[topics][3]: 
> from[-1],si
> ze[-1]: Parse Failure [Failed to parse source 
> [{\"query\":{\"field\":{\"url\":\"
> http://google.com/\"}}}]]]; nested: QueryParsingException[[topics] Failed to 
> par

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fzew26GDbXvgYUz%2BQoRmrB-Hok-V7eZwxBMFCrph3kcnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: What do these metrics mean?

2014-07-11 Thread Shannon Monasco
Thanks Ivan!
On Jul 11, 2014 5:56 PM, "Ivan Brusic"  wrote:

> The default in 0.90 (not sure about 0.19.) should still be a fixed search
> thread pool:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/modules-threadpool.html
>
> I find that the current queries and active threads tends to be the same
> number. When I look at my graphs, they have the same movements.
>
> If you do not have any queued requests you might want to set a lower cap
> if your cluster is experiencing slowness.
>
> BTW, the best course of action that you can take is to simply upgrade and
> not worry about thread settings. The Lucene 4 improvements (Elasticsearch
> 0.90 I believe) were monumental in the space savings. New versions will
> offer tons of other improvements, but the 0.90 release was especially great!
>
> Cheers,
>
> Ivan
>
>
> On Fri, Jul 11, 2014 at 3:09 PM, Shannon Monasco 
> wrote:
>
>> I have never seen any queuing on search threads.  Not sure if in .19 it
>> defaults to cache or not but that's the behavior I see.
>>
>> What about current queries from indices stats?
>> On Jul 11, 2014 2:53 PM, "Ivan Brusic"  wrote:
>>
>>>  Your second paragraph is correct. The threads are the total number of
>>> search threads at your disposal, active is the number of ongoing threads
>>> and queue are the number of threads that cannot be run since your thread
>>> pool is exhausted, which should be when active == threads, but not always
>>> the case.
>>>
>>> The default number of search threads is based upon the number of
>>> processors (3x # of available processors). There is no good metric for
>>> determining a balance since searches can be either lightweight
>>> (milliseconds) or heavyweight (minutes), but I would argue that the key
>>> metric to monitor is your queue. Is it normally empty? Spiky behavior?
>>> Requests constantly queued?
>>>
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
>>>
>>> Cheers,
>>>
>>> Ivan
>>>
>>>
>>> On Fri, Jul 11, 2014 at 1:19 PM, smonasco  wrote:
>>>
 I should probably preface everything with I'm running a 5 node cluster
 with version 0.19.3 and should be up to version 1.1.2 by the middle of
 August, but I have some confusion around metrics I'm seeing, what they mean
 and what are good values.

 In thread_pools I see threads, active and queued.  Queued + active !=
 threads.  I assume this really is a work pool and you have active threads,
 a thread count in the work pool and queued work.  So some explanation
 around this would be nice.

 I've correlated some spikes in search threads with heap mem utilization
 explosions.  current searches sort of also correlate, but I have more
 current searches than search threads and there is not search threadpool
 queueing.

 I'm not sure how current searches correlate (or if they should/do) with
 search threads.

 I've observed the following:

 Devestating: 10,000 current searches on worst index sustained over
 hours with not much change ending at the same time as spikes of > 1000
 search threads (where we generally average < 50) and a heap explosion.

 Oddly OK: current searches averaging 15 on worst index spiking to 105
 with search threads averaging 50 with maxes of 300 spiking to averages of
 120 and maxes of > 1000


 So...  I guess, what are good ranges for search threads and current
 searches?

 --Shannon Monasco

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/8723911c-f764-4286-9f52-7750273a7610%40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/elasticsearch/d9V58pThwWY/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCMhHOKnX4LFSF5KsQk24CJj6u23w_VyXw%3D2NbOAhjEow%40mail.gmail.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  --
>> You received this message because you are subscri

Question about querying on a URL field

2014-07-11 Thread Jack Park
This query
{\"query\":{\"field\":{\"url\":\"http://google.com/\"}

seems to fail.
It represents a stringified JSON object without any processing on the
simple url itself.

A snippet of the giant error message is below.

Thanks in advance for ideas.
Jack

DataProvider.getNodeByURL Error: {"error":"SearchPhaseExecutionException[Failed
to execute phase [query], all shards failed; shardFailures {[p6M6zWBxReGkMx_qcCV
PDg][topics][2]: SearchParseException[[topics][2]: from[-1],size[-1]: Parse Fail
ure [Failed to parse source [{\"query\":{\"field\":{\"url\":\"http://google.com/
\"}}}]]]; nested: QueryParsingException[[topics] Failed to parse query [http://g
oogle.com/]]; nested: ParseException[Cannot parse 'http://google.com/': Lexical
error at line 1, column 19.  Encountered:  after : \"\"]; nested: TokenMgrE
rror[Lexical error at line 1, column 19.  Encountered:  after : \"\"]; }{[p
6M6zWBxReGkMx_qcCVPDg][topics][3]: SearchParseException[[topics][3]: from[-1],si
ze[-1]: Parse Failure [Failed to parse source [{\"query\":{\"field\":{\"url\":\"
http://google.com/\"}}}]]]; nested: QueryParsingException[[topics] Failed to par

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fyga%2BveZH2cpbi_gZN75ycSYdKwtw9xChoQHi0C71STFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: What do these metrics mean?

2014-07-11 Thread Ivan Brusic
The default in 0.90 (not sure about 0.19.) should still be a fixed search
thread pool:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/modules-threadpool.html

I find that the current queries and active threads tends to be the same
number. When I look at my graphs, they have the same movements.

If you do not have any queued requests you might want to set a lower cap if
your cluster is experiencing slowness.

BTW, the best course of action that you can take is to simply upgrade and
not worry about thread settings. The Lucene 4 improvements (Elasticsearch
0.90 I believe) were monumental in the space savings. New versions will
offer tons of other improvements, but the 0.90 release was especially great!

Cheers,

Ivan


On Fri, Jul 11, 2014 at 3:09 PM, Shannon Monasco  wrote:

> I have never seen any queuing on search threads.  Not sure if in .19 it
> defaults to cache or not but that's the behavior I see.
>
> What about current queries from indices stats?
> On Jul 11, 2014 2:53 PM, "Ivan Brusic"  wrote:
>
>> Your second paragraph is correct. The threads are the total number of
>> search threads at your disposal, active is the number of ongoing threads
>> and queue are the number of threads that cannot be run since your thread
>> pool is exhausted, which should be when active == threads, but not always
>> the case.
>>
>> The default number of search threads is based upon the number of
>> processors (3x # of available processors). There is no good metric for
>> determining a balance since searches can be either lightweight
>> (milliseconds) or heavyweight (minutes), but I would argue that the key
>> metric to monitor is your queue. Is it normally empty? Spiky behavior?
>> Requests constantly queued?
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Fri, Jul 11, 2014 at 1:19 PM, smonasco  wrote:
>>
>>> I should probably preface everything with I'm running a 5 node cluster
>>> with version 0.19.3 and should be up to version 1.1.2 by the middle of
>>> August, but I have some confusion around metrics I'm seeing, what they mean
>>> and what are good values.
>>>
>>> In thread_pools I see threads, active and queued.  Queued + active !=
>>> threads.  I assume this really is a work pool and you have active threads,
>>> a thread count in the work pool and queued work.  So some explanation
>>> around this would be nice.
>>>
>>> I've correlated some spikes in search threads with heap mem utilization
>>> explosions.  current searches sort of also correlate, but I have more
>>> current searches than search threads and there is not search threadpool
>>> queueing.
>>>
>>> I'm not sure how current searches correlate (or if they should/do) with
>>> search threads.
>>>
>>> I've observed the following:
>>>
>>> Devestating: 10,000 current searches on worst index sustained over hours
>>> with not much change ending at the same time as spikes of > 1000 search
>>> threads (where we generally average < 50) and a heap explosion.
>>>
>>> Oddly OK: current searches averaging 15 on worst index spiking to 105
>>> with search threads averaging 50 with maxes of 300 spiking to averages of
>>> 120 and maxes of > 1000
>>>
>>>
>>> So...  I guess, what are good ranges for search threads and current
>>> searches?
>>>
>>> --Shannon Monasco
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/8723911c-f764-4286-9f52-7750273a7610%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/d9V58pThwWY/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCMhHOKnX4LFSF5KsQk24CJj6u23w_VyXw%3D2NbOAhjEow%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view 

Re: What types of SSDs?

2014-07-11 Thread Mark Walkom
You could setup a hot and cold based allocation system, put your highly
accessed (hot) indexes on the SSDs and then the rest on the spinning disk.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 11 July 2014 23:35, John Smith  wrote:

> Right now I have 4 boxes...
>
> 2x 32 cores 200GB RAM with RAID10 SATA1 + the Fusion IO
>
> 2x 24 cores 96GB RAM with RAID10 SAS but regular mechanical drives.
>
> I only test them as pairs. So it's clusters of 2
>
> On the surface all searches seem to perform quite close to each other.
> Only when looking at the stats in HQ and Marvel the true story is told. For
> instance most warnings with Fusion IO are yellow at best. While with the
> SAS Raid 10 (Regular SATA Drives) they reach red.
>
> I'm hopping I can get some regular SSDs to put on the SAS boxes and see if
> it's better.
>
>
>
>
> On Thursday, 10 July 2014 18:00:11 UTC-4, Jörg Prante wrote:
>>
>>  Did you consider SSD with RAID0 (Linux, ext4, noatime) and SAS2 (6g/s)
>> or SAS3 (12g/s) controller?
>>
>> I have for personal use at home LSI SAS 2008 of 4x128g SSD RAID0 with
>> sustained 800 MB/s write and 950 MB/s read, on a commodity dual AMD C32
>> socket server mainboard. I do not test with JMeter but on this single node
>> hardware alone I observe 15k bulk index operations per second, and
>> scan/scroll over 45m docs takes less than 70 min.
>>
>> I'm waiting until SAS3 is affordable for me. For the future I have on my
>> list: LSI SAS 3008 HBA and SAS3 SSDs. For personal home use, Fusion IO is
>> too heavy for my wallet. Even for commercial purpose I do not consider it
>> as a cost effective solution.
>>
>> Just a note: if you want spend your money to accelerate ES, buy RAM. You
>> will get more performance than from drives. Reason is the lower latency.
>> Low latency will speed up applications like ES more than the fastest I/O
>> drive is able to. That reminds me that I'm waiting since ages for DDR4
>> RAM...
>>
>> Jörg
>>
>>
>> On Thu, Jul 10, 2014 at 10:13 PM, John Smith  wrote:
>>
>>> Using 1.2.1
>>>
>>> I know each system and functionality is different but just curious when
>>> people say buy SSDs for ES, what types of SSDs are they buying?
>>>
>>> Fortunately for me I had some Fusion IO cards to test with, but just
>>> wondering if it's worth the price and if I should look into off the shelf
>>> SSDs like Samsung EVOs using SAS instead of pure SATA.
>>>
>>> So far from my testing it seems that all search operation regardless of
>>> the drive type seem to return in the same amount of time. So I suppose
>>> caching is playing a huge part here.
>>>
>>>  Though when looking at the HQ indexing stats like query time, fetch
>>> time, refresh time etc... The Fusion IO fares a bit better then regular
>>> SSDs using SATA.
>>>
>>> For instance refresh time for Fusion IO is 250ms while for regular SSDs
>>> (SATA NOT SAS, will test SAS when I get a chance) it's just above 1 second.
>>> Even with fusion IO I do see some warnings on the index stats, but
>>> slightly better then regular SSDs
>>>
>>> Some strategies I picked for my indexes...
>>> - New index per day, plus routing by "user"
>>> - New index per day for monster users.
>>>
>>> Using JMeter to test...
>>> - Achieved 3,500 index operations per second (Not bulk) avg document
>>> size 2,500 bytes (Fusion IO seemed to perform a bit better)
>>> - Created a total of 25 indexes totaling over 100,000,000 documents
>>> anywhere between 3,000,000 to 5,000,000 documents per index.
>>> - Scroll query to retrieve 15,000,000 documents out of the 100,000,000
>>> (all indexes) took 25 minutes regardless of drive type.
>>>
>>> P.s: I want to index 2,000,000,000 documents per year so about 4,000,000
>>> per day. So you can see why Fusion IO could be expensive :)
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/24928d08-6354-4661-8164-9ff665709285%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/13e20470-a38e-4d89-be98-5d6e26b0f0aa%40googlegroups.com
> 

Re: How to improve search performance in ES ?

2014-07-11 Thread Mark Walkom
Can you elaborate on how you're measuring and comparing these response
times and why you feel they are slow?
It might also help if you can put a sample query and document into
a gist/pastebin

Also, is your cluster under load when you run these queries? What metrics
are you gathering around that side?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 July 2014 05:01, coder  wrote:

> Hi Jörg,
>
> I have seen these links. I'm using ngram tokenizer. Issue which I'm facing
> is slow response time. For that I need some suggestions, how can I improve
> it ? Is there anyway by which I can query in a better way ? Also, I'm using
> a match query in a field in one of my filters but I have read that term
> filters are more effective. Can you give me some insight how can I use term
> filter in this case even if the field on which I want to apply the filter
> is not present in all the documents.
>
> Thanks
>
>
> On Saturday, 12 July 2014 00:09:50 UTC+5:30, Jörg Prante wrote:
>
>> For autocompletion, you should use the completion suggester
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/search-suggesters-completion.html
>>
>> or edge ngram tokenizer
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/analysis-edgengram-tokenizer.html
>>
>> Jörg
>>
>>
>> On Fri, Jul 11, 2014 at 8:11 PM, coder  wrote:
>>
>>> Hi,
>>>
>>> I'm working on improving the search response of ES but not able to do
>>> anything. My scenario is something like this:
>>>
>>> I'm using 3 ES queries to get relevant results for my autocompleter.
>>>
>>> 1. A function score query with a match query  ( To get a correct match
>>> if user typed query is available in documents based on popularity)
>>>
>>> 2. A multi match query  (To handle those scenarios in which a user types
>>> some text which is present in different fields in a document since my
>>> documents are multi fields like name, address, city, state, country )
>>>
>>> 3. A query string (In order to ensure if I missed user query by the
>>> above type I'll be able to search using more powerful but less accurate
>>> query string)
>>>
>>> Along with all the 3 queries, I'm using 4 filters (clubbed using AND
>>> filter).
>>>
>>> My performance is really bad and I want to improve it along with
>>> delivering relevat results in my autocompleter.
>>>
>>> Can anyone help me how can I improve this ? Any way I can club the
>>> queries for better performance ?
>>>
>>> I have read that I BOOL filters should be used instead of AND filter
>>> since they use bitset which are cached internally. I think this makes one
>>> improvement because if in the first query ES stores the information of
>>> filters in bitset, it can reuse it in other two queries. That will make the
>>> thigs a little fast but based on queries, I'm not able to do any
>>> improvement ?
>>>
>>> Is there any way by which I can combine match and multi-match queries (
>>> 1 and 2) into a single effective query.
>>>
>>> Also, in place of query_string should I use some other query for faster
>>> execution.
>>>
>>> Any suggestions are welcome.
>>> Thanks
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/5d99495b-20ef-46b6-a069-365574fdc0a9%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/741a7bc5-ffd7-4ba7-9296-ff6fff8f559f%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZJUXjDVyXFtQSB0-0Mghf5FHC%2Bmm%2BiJUiSCjKJtPDCbg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Docker & Elasticsearch using Unicast

2014-07-11 Thread Mark Walkom
ES will try to connect via unicast based on whatever you have in your
config.
What does the discovery.zen.ping.unicast line look like?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 July 2014 05:00, Tony P.  wrote:

> I've been playing with Elasticsearch and had a working cluster in a
> multicast environment using VMs. I recently tried adapting that to work
> within Docker and I'm running into a wall with the unicast configuration.
>
> Current setup is two nodes: one on the host and another in a docker
> container (dockerfile/elasticsearch)
>
> I'm running the container with:
> $ docker run -d -h "elasticsearch-node-01" --name="elasticsearch-node-01" \
> -p 9201:9200 -p 9301:9300 -v \
> /etc/elasticsearch/cluster/:/data \
> dockerfile/elasticsearch /elasticsearch/bin/elasticsearch \
> -Des.config=/data/elasticsearch.yml
>
> It spins up on the docker0 bridge with some IP, 172.17.0.xx
> docker0 is bound to 172.17.42.1
> 127.0.0.1 refers to the host in this example
>
> I can access the elasticsearch node in the container with any of the
> commands:
> $ curl 127.0.0.1:9201
> $ curl 172.17.42.1:9201
> $ curl 172.17.0.xx:9200
>
> But when I add it to a cluster via unicast, I see an exception thrown with
> "No route to host" being the reason.
>
> After trying a few different IPs to see which could be accessed through
> the bridge, it seems that the container doesn't know how to speak to the
> host to join the host node's cluster or tell the host node that it has a
> cluster that can be joined. The host node logs also shows a similar error:
>
> [2014-07-10 12:00:37,264][INFO][discovery.zen] [elasticsearch-node-test]
> failed to send join request to master
> [[elasticsearch-node-01][I3LiEOyeSome3djzr37uuQ][elasticsearch-node-01][inet[/172.17.0.14:9300]]],
> reason [org.elasticsearch.transport.RemoteTransportException:
> [elasticsearch-node-01][inet[/172.17.0.14:9300]][discovery/zen/join];
> org.elasticsearch.transport.ConnectTransportException:
> [elasticsearch-node-test][inet[/128.59.222.215:9300]]
> connect_timeout[30s]; java.net.NoRouteToHostException: No route to host]
>
> There are two conversations I've found here with docker, elasticsearch and
> unicast but neither provide an answer to my issue:
> https://groups.google.com/d/msg/elasticsearch/OsGJcxuW1vI/qybPOrgE4fMJ
> https://groups.google.com/d/msg/elasticsearch/2p9jXbCwRC8/mm4BPt5iQfgJ
>
> Any ideas on what I'm doing wrong? Is it because elasticsearch is
> attempting to search based on the node's hostname (elasticsearch-node-01)
> instead of the IP address? The hostname, elasticsearch-node-01 isn't valid
> since it's just generated by passing it to docker. Should I use the IP as
> the hostname if that's what ES is using to add the node to the cluster?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d52a92e3-0956-48b6-8816-cea5ed31a412%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YE_web%3Dk%2BYLeXTHH1dKpZuVk72GPthtSHrAgqecsk%2B2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Percolator Memory Usage -- 10-1 Disk-Memory Usage. Why?

2014-07-11 Thread Adam Georgiou
To clarify, the "documents" I'm referring to as being stored in "the index
I'm percolating against" are my .percolator indexed queries, and there are
no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou"  wrote:

> *Going to try and keep this concise.*
>
> *Issue (Potential bug?)*
>
>- My cluster has been running into memory issues; garbage collection
>loops, stopping the world, etc.
>- In a test cluster I ran a few experiments. After a `jmap` i've
>determined that the
>`org.elasticsearch.index.percolator.PercolatorQueriesRegistry` is taking up
>nearly 40% of my heap, even though my percolator queries are a fraction of
>the size of the *regular *documents I'm storing.
>- I understand that percolate queries are all always kept in memory
>
> ,
>and I'm trying to plan accordingly, but to put things in perspective the
>index I'm percolating on contains *documents that are ~**317M on disk
>and taking up ~3Gb in memory*. I've determined this ratio through jmap
>output and by just watching the heap size before and after opening the
>index with the queries.
>- My test cluster consists of a single node (v1.0.1) and the index I'm
>storing percolator queries in has 5 shards and *0 replicas*.
>
> *Question*
>
> A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
> there something specific about the way percolator documents are stored
> under the hood that makes them take up so much memory compared to the way
> their JSON representations are stored on disk?
>
> -Adam
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/NRKENFOwmmE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF4J1at2fcdeySbPQ6hYjsp1YXVJH7%2BJE5U_W-rvTU6Y1emdBA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


slow response time on some queries

2014-07-11 Thread Patrick Müssig
Hi,

we are running 2 Nodes Cluster a 40gb with 19,975,978 Documents separated 
in  84 Indexes and all indexes have 2 shard and one replica. We see in 
ElasticHQ that our query time on each cluster is 
Node1 6.5msNode2 16.33ms  and our fetch time

Node1 1.69msNode 2 3.12ms
but our end response time in PHP is 150ms sometimes more sometimes less. We 
use as client Elastica and i spend time to profile this and figure out that 
the complete
response from PHP client to ES Server are the bottleneck. Why get we in 
ElasticHQ a so good performance, put in the PHP client not? Our SErvers are 
in an internal network
and all communication runs though this network.

Is the query now the problem or our network or is there a other issue what 
you see?

thanks for hints
Patrick

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0fceb041-909d-4f1f-80a4-25653114103f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Percolator Memory Usage -- 10-1 Disk-Memory Usage. Why?

2014-07-11 Thread Adam Georgiou
*Going to try and keep this concise.*

*Issue (Potential bug?)*

   - My cluster has been running into memory issues; garbage collection 
   loops, stopping the world, etc.
   - In a test cluster I ran a few experiments. After a `jmap` i've 
   determined that the 
   `org.elasticsearch.index.percolator.PercolatorQueriesRegistry` is taking up 
   nearly 40% of my heap, even though my percolator queries are a fraction of 
   the size of the *regular *documents I'm storing.
   - I understand that percolate queries are all always kept in memory 
   
,
 
   and I'm trying to plan accordingly, but to put things in perspective the 
   index I'm percolating on contains *documents that are ~**317M on disk 
   and taking up ~3Gb in memory*. I've determined this ratio through jmap 
   output and by just watching the heap size before and after opening the 
   index with the queries.
   - My test cluster consists of a single node (v1.0.1) and the index I'm 
   storing percolator queries in has 5 shards and *0 replicas*.

*Question*

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is 
there something specific about the way percolator documents are stored 
under the hood that makes them take up so much memory compared to the way 
their JSON representations are stored on disk?

-Adam

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: What do these metrics mean?

2014-07-11 Thread Shannon Monasco
I have never seen any queuing on search threads.  Not sure if in .19 it
defaults to cache or not but that's the behavior I see.

What about current queries from indices stats?
On Jul 11, 2014 2:53 PM, "Ivan Brusic"  wrote:

> Your second paragraph is correct. The threads are the total number of
> search threads at your disposal, active is the number of ongoing threads
> and queue are the number of threads that cannot be run since your thread
> pool is exhausted, which should be when active == threads, but not always
> the case.
>
> The default number of search threads is based upon the number of
> processors (3x # of available processors). There is no good metric for
> determining a balance since searches can be either lightweight
> (milliseconds) or heavyweight (minutes), but I would argue that the key
> metric to monitor is your queue. Is it normally empty? Spiky behavior?
> Requests constantly queued?
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
>
> Cheers,
>
> Ivan
>
>
> On Fri, Jul 11, 2014 at 1:19 PM, smonasco  wrote:
>
>> I should probably preface everything with I'm running a 5 node cluster
>> with version 0.19.3 and should be up to version 1.1.2 by the middle of
>> August, but I have some confusion around metrics I'm seeing, what they mean
>> and what are good values.
>>
>> In thread_pools I see threads, active and queued.  Queued + active !=
>> threads.  I assume this really is a work pool and you have active threads,
>> a thread count in the work pool and queued work.  So some explanation
>> around this would be nice.
>>
>> I've correlated some spikes in search threads with heap mem utilization
>> explosions.  current searches sort of also correlate, but I have more
>> current searches than search threads and there is not search threadpool
>> queueing.
>>
>> I'm not sure how current searches correlate (or if they should/do) with
>> search threads.
>>
>> I've observed the following:
>>
>> Devestating: 10,000 current searches on worst index sustained over hours
>> with not much change ending at the same time as spikes of > 1000 search
>> threads (where we generally average < 50) and a heap explosion.
>>
>> Oddly OK: current searches averaging 15 on worst index spiking to 105
>> with search threads averaging 50 with maxes of 300 spiking to averages of
>> 120 and maxes of > 1000
>>
>>
>> So...  I guess, what are good ranges for search threads and current
>> searches?
>>
>> --Shannon Monasco
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/8723911c-f764-4286-9f52-7750273a7610%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/d9V58pThwWY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCMhHOKnX4LFSF5KsQk24CJj6u23w_VyXw%3D2NbOAhjEow%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFDU5W%2BNEi9W5FmjuLsxAQzgQNU2p-LMh%2BV5i3x%2B%2B%3D8cgDHdPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: What do these metrics mean?

2014-07-11 Thread Ivan Brusic
Your second paragraph is correct. The threads are the total number of
search threads at your disposal, active is the number of ongoing threads
and queue are the number of threads that cannot be run since your thread
pool is exhausted, which should be when active == threads, but not always
the case.

The default number of search threads is based upon the number of processors
(3x # of available processors). There is no good metric for determining a
balance since searches can be either lightweight (milliseconds) or
heavyweight (minutes), but I would argue that the key metric to monitor is
your queue. Is it normally empty? Spiky behavior? Requests constantly
queued?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html

Cheers,

Ivan


On Fri, Jul 11, 2014 at 1:19 PM, smonasco  wrote:

> I should probably preface everything with I'm running a 5 node cluster
> with version 0.19.3 and should be up to version 1.1.2 by the middle of
> August, but I have some confusion around metrics I'm seeing, what they mean
> and what are good values.
>
> In thread_pools I see threads, active and queued.  Queued + active !=
> threads.  I assume this really is a work pool and you have active threads,
> a thread count in the work pool and queued work.  So some explanation
> around this would be nice.
>
> I've correlated some spikes in search threads with heap mem utilization
> explosions.  current searches sort of also correlate, but I have more
> current searches than search threads and there is not search threadpool
> queueing.
>
> I'm not sure how current searches correlate (or if they should/do) with
> search threads.
>
> I've observed the following:
>
> Devestating: 10,000 current searches on worst index sustained over hours
> with not much change ending at the same time as spikes of > 1000 search
> threads (where we generally average < 50) and a heap explosion.
>
> Oddly OK: current searches averaging 15 on worst index spiking to 105 with
> search threads averaging 50 with maxes of 300 spiking to averages of 120
> and maxes of > 1000
>
>
> So...  I guess, what are good ranges for search threads and current
> searches?
>
> --Shannon Monasco
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8723911c-f764-4286-9f52-7750273a7610%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCMhHOKnX4LFSF5KsQk24CJj6u23w_VyXw%3D2NbOAhjEow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Costin Leau
Hi,

I've opened up issue #230 to address your use case. Rather than offering a
dedicated field for the ID, I opted to introduce an "include", "exclude"
option to select (or remove) certain fields from a document before being
saved to es. This will basically allow documents to be filtered and thus
exclude the 'metadata' or fields that are not needed in ES directly through
es-hadoop.

Cheers,


On Fri, Jul 11, 2014 at 9:36 PM, Brian Thomas 
wrote:

> I was just curious if there was a way of doing this without doing this, I
> can add the field if necessary.
>
> For alternatives, what if in addition to es.mapping.id, there is another
> property available also, like es.mapping.id.exlude that will not include
> the id field in the source document.  In elasticsearch, you can create and
> update documents without having to include the id in the source document,
> so I think it would make sense to be able to do that with
> elasticsearch-hadoop also.
>
> On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:
>
>> You need to specify the id of the document you want to update somehow.
>> Since in es-hadoop things are batch focused, each
>> doc needs its own id specified somehow hence the use of 'es.mapping.id'
>> to indicate its value.
>> Is there a reason why this approach does not work for you - any
>> alternatives that you thought of?
>>
>> Cheers,
>>
>> On 7/7/14 10:48 PM, Brian Thomas wrote:
>> > I am trying to update an elasticsearch index using
>> elasticsearch-hadoop.  I am aware of the *es.mapping.id*
>> > configuration where you can specify that field in the document to use
>> as an id, but in my case the source document does
>> > not have the id (I used elasticsearch's autogenerated id when indexing
>> the document).  Is it possible to specify the id
>> > to update without having the add a new field to the MapWritable object?
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to
>> > elasticsearc...@googlegroups.com > unsubscr...@googlegroups.com>.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msgid/elasticsearch/ce6161ad-
>> d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
>> > > d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=
>> email&utm_source=footer>.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> Costin
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJogdmd-EBAvd7hC3CZs%2BhjoohNuC_%2B%3Da%2B2k_kqKeKO9-jLFmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Dynamic Filtered Index Aliases

2014-07-11 Thread John Cherniavsky
I'm looking to deploy what is effectively user segmentation of my data. I 
have a moderately high number of users (say 10K), and I want each to be 
able to query an alias that is their own.

This is not meant as a security measure, but as a backstop. Developers on 
my team will not try to guess another user's index name, but I don't want 
to count on every query remembering to add user filtering.

>From the docs


---

curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{
"add" : {
 "index" : "test1",
 "alias" : "alias2",
 "filter" : { "term" : { "user" : "kimchy" } }
}
}
]
}
--


Is what I want except that I want the term == alias so



---
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{
"add" : {
 "index" : "test1",
 "alias" : "alias_[X]",
 "filter" : { "term" : { "user" : [X] } }
}
}
]
}
--


then 

http://localhost:9200/alias_bob/_search


would let you search only bob's data

Does ES have anything like this? Is it coming?


Thanks,
John

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/99901585-e0f5-4905-82d6-632586e96db2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


What do these metrics mean?

2014-07-11 Thread smonasco
I should probably preface everything with I'm running a 5 node cluster with 
version 0.19.3 and should be up to version 1.1.2 by the middle of August, 
but I have some confusion around metrics I'm seeing, what they mean and 
what are good values.

In thread_pools I see threads, active and queued.  Queued + active != 
threads.  I assume this really is a work pool and you have active threads, 
a thread count in the work pool and queued work.  So some explanation 
around this would be nice.

I've correlated some spikes in search threads with heap mem utilization 
explosions.  current searches sort of also correlate, but I have more 
current searches than search threads and there is not search threadpool 
queueing.

I'm not sure how current searches correlate (or if they should/do) with 
search threads.

I've observed the following:

Devestating: 10,000 current searches on worst index sustained over hours 
with not much change ending at the same time as spikes of > 1000 search 
threads (where we generally average < 50) and a heap explosion.

Oddly OK: current searches averaging 15 on worst index spiking to 105 with 
search threads averaging 50 with maxes of 300 spiking to averages of 120 
and maxes of > 1000


So...  I guess, what are good ranges for search threads and current 
searches?

--Shannon Monasco

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8723911c-f764-4286-9f52-7750273a7610%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can I use the java client of newer version to connect to a old version server?

2014-07-11 Thread Ivan Brusic
The code is suspicious since it has an explicit check for versions prior to
1.2

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/admin/cluster/state/ClusterStateRequest.java#L121-L124

Don't know much else about the code to comment further.

Cheers,

Ivan


On Fri, Jul 11, 2014 at 3:30 AM, xzer LR  wrote:

> I am using TransportClient, the following is how I retrieve the client
> instance:
>
> Client client = new
> TransportClient(sb.build()).addTransportAddresses(esAddresses);
>
> 在 2014年7月11日星期五UTC+9下午6时51分26秒,David Pilato写道:
>>
>> Are you using a TransportClient or NodeClient?
>> If NodeClient, could you try with the TransportClient?
>>
>> --
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
>> @dadoonet  | @elasticsearchfr
>> 
>>
>>
>> Le 11 juillet 2014 à 11:14:59, xzer LR (xia...@gmail.com) a écrit:
>>
>> As a test result, I got exceptions when I tried to use the newest 1.2.2
>> java client to connect to a 1.0.3 cluster:
>>
>>  18:05:41.020 [elasticsearch[Slipstream][transport_client_worker][T#1]{New
>> I/O worker #1}] [INFO ] [] org.elasticsearch.client.transport[105] -
>> [Slipstream] failed to get local cluster state for
>> [#transport#-1][e-note][inet[/192.168.200.81:9300]], disconnecting...
>> org.elasticsearch.transport.RemoteTransportException: [server-cat][inet[/
>> 192.168.21.81:9300]][cluster/state]
>> java.lang.IndexOutOfBoundsException: Readable byte limit exceeded: 48
>> at org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(
>> AbstractChannelBuffer.java:236) ~[elasticsearch-1.2.2.jar:na]
>> at org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(
>> ChannelBufferStreamInput.java:132) ~[elasticsearch-1.2.2.jar:na]
>> at 
>> org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141)
>> ~[elasticsearch-1.2.2.jar:na]
>> at 
>> org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272)
>> ~[elasticsearch-1.2.2.jar:na]
>> at org.elasticsearch.common.io.stream.HandlesStreamInput.
>> readString(HandlesStreamInput.java:61) ~[elasticsearch-1.2.2.jar:na]
>> at org.elasticsearch.common.io.stream.StreamInput.
>> readStringArray(StreamInput.java:362) ~[elasticsearch-1.2.2.jar:na]
>> at org.elasticsearch.action.admin.cluster.state.
>> ClusterStateRequest.readFrom(ClusterStateRequest.java:132)
>> ~[elasticsearch-1.2.2.jar:na]
>> at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(
>> MessageChannelHandler.java:209) ~[elasticsearch-1.2.2.jar:na]
>> at org.elasticsearch.transport.netty.MessageChannelHandler.
>> messageReceived(MessageChannelHandler.java:109)
>> ~[elasticsearch-1.2.2.jar:na]
>>
>> I didn't find any metioned break change about this exceptioin.
>>
>> 在 2014年7月4日星期五UTC+9下午3时31分07秒,David Pilato写道:
>>>
>>>  Well. It depends.
>>>
>>> 1.0 is incompatible with 0.90
>>> 1.2 should work with 1.x IIRC.
>>>
>>> From 1.0, we try to keep this compatible. If not, release notes will
>>> tell you.
>>>
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>
>>>
>>> Le 4 juil. 2014 à 07:09, xzer LR  a écrit :
>>>
>>>  For some reasons, we have several separated elasticsearch clusters for
>>> our front applicaitons. We want to upgrade our clusters' version to the
>>> newest version but apparently it is impossible to upgrade all the clusters
>>> at the same time, which means our single application have to connect to
>>> multiple clusters with different versions.
>>>
>>> My question is whether the elasticsearch java client has the ability to
>>> work correctly with an old version server?
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/baa98ec5-ffcf-46f9-bfdd-7afbd213b19d%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/77e32825-812a-46c8-82b4-93a5e4b12788%
>> 40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>  --
> You received this message because you are subscribed to the Google Group

Re: Kibana3 - terms panel with range facet chart

2014-07-11 Thread William Findley
Hey, I'm looking to do a nearly identical thing. I need to analyze the 
ranges of service times as recorded in my logs.  I found this post when I 
was googling for an answer.  Is there an easy way to do it just yet?

On Thursday, January 9, 2014 12:26:27 PM UTC-5, Erik Paulsson wrote:
>
> Hey all,
>
> I just started using ElasticSearch with LogStash and Kibana.  I'm able to 
> extract fields from my log statements using logstash/grok.  In Kibana I 
> have taken some of these fields and created stats panels using them for 
> stats like total/mean/min/max which works great for just seeing a 
> calculated number value quickly.
> What I would like to do next is create a bar chart that can display the 
> count of occurrences for my extracted field within different ranges.  So 
> say my field is called "upload_size", I would like to create a pie chart 
> that displays the count of files uploaded within defined ranges.
> For example I would like to see counts of "upload_size" fields with values 
> in these ranges: 0-10KB, 10KB-100KB, 100KB-1MB, 1MB-10MB, 10MB-100MB, 
> 100MB-1GB, 1GB+ and plotted in a pie chart.
> I've experimented with the "terms" panel creating a pie chart but don't 
> don't see a way to define ranges.  It seems this would be possible using 
> ElasticSearch "range facets": 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-range-facet.html
>
> Is it possible to do this currently in Kibana3?  If not, is this on the 
> roadmap?  I am using Kibana3 milestone 4.
>
> Thanks,
> Erik
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d9baa836-4eec-4aa5-bf01-171d59007ba1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: automatic ID generation (noob question)

2014-07-11 Thread rtm443x
Good point!

thanks

On Friday, July 11, 2014 7:46:52 PM UTC+1, Glen Smith wrote:
>
> Yeah, since the response is referring to your local file name, it's pretty 
> clear that the problem here is with curl.exe - it's obviously not sending 
> the file contents in your second example, it's sending the file name.
>
> On Friday, July 11, 2014 10:01:39 AM UTC-4, rtm...@googlemail.com wrote:
>>
>> Hi all, 
>> first post. Am working through the ElasticSearch Server book (packt). Not 
>> getting what I expect with automatic id gen.
>> Using similar example from your website [1] <
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html>
>>  
>>
>>
>> With a file go.txt (given below) I entered this
>>
>>   curl.exe -XPOST http://localhost:9200/twitter/tweet 
>> 
>>  
>> -T go.txt
>>
>> got back this
>>
>> {"_index":"twitter","_type":"
>> tweet","_id":"VJcZR33ETjC1W8LX1CnL0w","_version":1,"created":true}
>>
>> Ok, it works as expected, but this
>>
>>   curl.exe -XPOST http://localhost:9200/twitter/tweet/ 
>> 
>>  
>> -T go.txt
>>
>> failed:
>>
>>
>> {"_index":"twitter","_type":"tweet","_id":"go.txt","_version":6,"created":false}
>>
>> Only difference is the trailing backslash on the URL.
>> FYI contents of go.txt is straight off your website [1]
>>
>> ---
>> {
>> "user" : "kimchy",
>> "post_date" : "2009-11-15T14:12:12",
>> "message" : "trying out Elasticsearch"
>> }
>> ---
>>
>> Even [1] shows a trailing forward slash used, and it apparently succeeds.
>>
>> Furthermore, only found this when it failed with a forwardslash (ie: 
>> curl.exe -XPOST http://localhost:9200/twitter/tweet/ 
>> 
>>  
>> -T go.txt), I added "?pretty" for readability (ie: curl.exe -XPOST 
>> http://localhost:9200/twitter/tweet/ 
>> ?pretty
>>  
>> -T go.txt) and it succeeded! Which I'm sure it shouldn't do.
>>
>> Machine is windows server 2008 R2 (64-bit), curl is 7.33.0, java is 
>> 1.7.0.55. Everything is being run locally, single node etc. Really basic. 
>> Used a file as embedding the contents in-line is a bit scrappy in windows.
>>
>> Need more info? can anyone reproduce?
>>
>> thanks
>>
>> jan
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77c26830-5649-49b8-8c06-5ee732c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing files from filesystem

2014-07-11 Thread Ivan Brusic
Hopefully you accept pull requests faster than the core team. :)

-- 
Ivan


On Fri, Jul 11, 2014 at 11:55 AM, David Pilato  wrote:

> I love your plan Ivan! :-)
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 11 juil. 2014 à 20:36, Ivan Brusic  a écrit :
>
> Never used FSRiver, but from what I read, it should be exactly what you
> want. The code is open-sourced, so I would just check out the project,
> update the Elasticsearch version to 1.2.1 and find whatever bugs come up.
> Then submit a pull request and contribute back to the project. :)
>
> Cheers,
>
> Ivan
>
>
> On Fri, Jul 11, 2014 at 1:13 AM, Daniel Berretz 
> wrote:
>
>> I just had a look at their website, the youtube video of their own
>> presentation and I red a bit about it in generally, how it works.
>> For me, it now just looks like I give him a file C:\Apache\logs.txt and
>> it works with it.
>> What I look for is something I can for example say: Check our company´s
>> drive where are sub folders like marketing, projects with again have sub
>> folders and so on and index me into elasticsearch the path and the name to
>> each file in each of those subfolders and if it is a word document or a pdf
>> then also put its content into elasticsearch. So we can search not only for
>> file names and path but also in the file contents.
>> I did a small tool for it written in Delphi (because we develop in
>> Delphi) but it uses some libs we want to get rid of so we can use that
>> system in our product as well for indexing documents. Logstash doesn´t look
>> like it is made for that.
>> So is there a plugin or something else which is able to do so?
>>
>>
>> On Friday, July 11, 2014 9:20:50 AM UTC+2, Mark Walkom wrote:
>>
>>> Check out Logstash, it'll do most of what you want.
>>> http://logstash.net/
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 11 July 2014 17:15, Dan Ber  wrote:
>>>
  Hey,

 I just wondered if it is somehow possible to index files from a
 directory on HDD and their contents if they are textfiles or word documents
 and maybe even PDFs.
 I read about FSRiver but could not test it becauser it seems to be not
 working with es 1.2.1 due to a bug.

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBGKzVvTiUTsGForwjm-tMGDRWynPSBLcFM7wVFoVuGOQ%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7A605A6B-532D-496F-BECB-C0D0B190D495%40pilato.fr
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: How to improve search performance in ES ?

2014-07-11 Thread coder
Hi Jörg,

I have seen these links. I'm using ngram tokenizer. Issue which I'm facing 
is slow response time. For that I need some suggestions, how can I improve 
it ? Is there anyway by which I can query in a better way ? Also, I'm using 
a match query in a field in one of my filters but I have read that term 
filters are more effective. Can you give me some insight how can I use term 
filter in this case even if the field on which I want to apply the filter 
is not present in all the documents.

Thanks

On Saturday, 12 July 2014 00:09:50 UTC+5:30, Jörg Prante wrote:
>
> For autocompletion, you should use the completion suggester 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
>
> or edge ngram tokenizer
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
>
> Jörg
>
>
> On Fri, Jul 11, 2014 at 8:11 PM, coder > 
> wrote:
>
>> Hi,
>>
>> I'm working on improving the search response of ES but not able to do 
>> anything. My scenario is something like this:
>>
>> I'm using 3 ES queries to get relevant results for my autocompleter.
>>
>> 1. A function score query with a match query  ( To get a correct match if 
>> user typed query is available in documents based on popularity)
>>
>> 2. A multi match query  (To handle those scenarios in which a user types 
>> some text which is present in different fields in a document since my 
>> documents are multi fields like name, address, city, state, country )
>>
>> 3. A query string (In order to ensure if I missed user query by the above 
>> type I'll be able to search using more powerful but less accurate query 
>> string)
>>
>> Along with all the 3 queries, I'm using 4 filters (clubbed using AND 
>> filter).
>>
>> My performance is really bad and I want to improve it along with 
>> delivering relevat results in my autocompleter.
>>
>> Can anyone help me how can I improve this ? Any way I can club the 
>> queries for better performance ? 
>>
>> I have read that I BOOL filters should be used instead of AND filter 
>> since they use bitset which are cached internally. I think this makes one 
>> improvement because if in the first query ES stores the information of 
>> filters in bitset, it can reuse it in other two queries. That will make the 
>> thigs a little fast but based on queries, I'm not able to do any 
>> improvement ?
>>
>> Is there any way by which I can combine match and multi-match queries ( 1 
>> and 2) into a single effective query.
>>
>> Also, in place of query_string should I use some other query for faster 
>> execution.
>>
>> Any suggestions are welcome. 
>> Thanks
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5d99495b-20ef-46b6-a069-365574fdc0a9%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/741a7bc5-ffd7-4ba7-9296-ff6fff8f559f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Docker & Elasticsearch using Unicast

2014-07-11 Thread Tony P.
I've been playing with Elasticsearch and had a working cluster in a 
multicast environment using VMs. I recently tried adapting that to work 
within Docker and I'm running into a wall with the unicast configuration.

Current setup is two nodes: one on the host and another in a docker 
container (dockerfile/elasticsearch)

I'm running the container with:
$ docker run -d -h "elasticsearch-node-01" --name="elasticsearch-node-01" \
-p 9201:9200 -p 9301:9300 -v \
/etc/elasticsearch/cluster/:/data \
dockerfile/elasticsearch /elasticsearch/bin/elasticsearch \
-Des.config=/data/elasticsearch.yml

It spins up on the docker0 bridge with some IP, 172.17.0.xx
docker0 is bound to 172.17.42.1
127.0.0.1 refers to the host in this example

I can access the elasticsearch node in the container with any of the 
commands:
$ curl 127.0.0.1:9201
$ curl 172.17.42.1:9201
$ curl 172.17.0.xx:9200

But when I add it to a cluster via unicast, I see an exception thrown with 
"No route to host" being the reason.

After trying a few different IPs to see which could be accessed through the 
bridge, it seems that the container doesn't know how to speak to the host 
to join the host node's cluster or tell the host node that it has a cluster 
that can be joined. The host node logs also shows a similar error:

[2014-07-10 12:00:37,264][INFO][discovery.zen] [elasticsearch-node-test] 
failed to send join request to master 
[[elasticsearch-node-01][I3LiEOyeSome3djzr37uuQ][elasticsearch-node-01][inet[/172.17.0.14:9300]]],
 
reason [org.elasticsearch.transport.RemoteTransportException: 
[elasticsearch-node-01][inet[/172.17.0.14:9300]][discovery/zen/join]; 
org.elasticsearch.transport.ConnectTransportException: 
[elasticsearch-node-test][inet[/128.59.222.215:9300]] connect_timeout[30s]; 
java.net.NoRouteToHostException: No route to host]

There are two conversations I've found here with docker, elasticsearch and 
unicast but neither provide an answer to my issue:
https://groups.google.com/d/msg/elasticsearch/OsGJcxuW1vI/qybPOrgE4fMJ
https://groups.google.com/d/msg/elasticsearch/2p9jXbCwRC8/mm4BPt5iQfgJ

Any ideas on what I'm doing wrong? Is it because elasticsearch is 
attempting to search based on the node's hostname (elasticsearch-node-01) 
instead of the IP address? The hostname, elasticsearch-node-01 isn't valid 
since it's just generated by passing it to docker. Should I use the IP as 
the hostname if that's what ES is using to add the node to the cluster?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d52a92e3-0956-48b6-8816-cea5ed31a412%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch Mapping Issue

2014-07-11 Thread shriyansh jain
Hey,

I am using ELK stack for log processing. Logstash 1.4.2 (Single Instance) 
and Elasticsearch 1.2.2(Cluster of 2 nodes), and redis as broker between 
logstash and elasticsearch.

I am able to parse my logs using Logstash, get the parsed document in 
Redis. But not able to get the documents inside Elasticsearch.

The following is my Central logstash server configuration file.

   


































*input {redis {host => "xx.xx.xx.xx"type => 
"redis-input"data_type => "list" key => "logstash"
threads => 5}redis {host => "xx.xx.xx.xx"type => 
"redis-input"data_type => "list" key => "logstash"
threads => 5}}   filter {  date { match => 
["timestamp_nsstats",%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[,]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?"]
 
   }  }   output {   stdout { }   
elasticsearch {cluster => "logstash"host => 
"xx.xx.xx.xx"protocol => "http" node_name 
=> "Node1"index => "logstash-%{+.MM.dd}"}}*
The following is the health of Elasticsearch Instances.


   1. {
   2.   "cluster_name": "logstash",
   3.   "status": "green",
   4.   "timed_out": false,
   5.   "number_of_nodes": 2,
   6.   "number_of_data_nodes": 2,
   7.   "active_primary_shards": 3,
   8.   "active_shards": 6,
   9.   "relocating_shards": 0,
   10.   "initializing_shards": 0,
   11.   "unassigned_shards": 0
   12. }

I am not able to find out the reason, that is causing logstash being not able 
to reach elasticsearch. Do I need to make any changes in configuration file, 
which might make logstash aware of the elasticsearch server. 

Please let me know what you think, that might be causing the issue. If you need 
any other information please let me know.


Thank you,

Shriyansh

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c586b5d-c53e-4899-b9cb-57ce0f57a262%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing files from filesystem

2014-07-11 Thread David Pilato
I love your plan Ivan! :-)

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 11 juil. 2014 à 20:36, Ivan Brusic  a écrit :

Never used FSRiver, but from what I read, it should be exactly what you want. 
The code is open-sourced, so I would just check out the project, update the 
Elasticsearch version to 1.2.1 and find whatever bugs come up. Then submit a 
pull request and contribute back to the project. :)

Cheers,

Ivan


> On Fri, Jul 11, 2014 at 1:13 AM, Daniel Berretz  
> wrote:
> I just had a look at their website, the youtube video of their own 
> presentation and I red a bit about it in generally, how it works.
> For me, it now just looks like I give him a file C:\Apache\logs.txt and it 
> works with it. 
> What I look for is something I can for example say: Check our company´s drive 
> where are sub folders like marketing, projects with again have sub folders 
> and so on and index me into elasticsearch the path and the name to each file 
> in each of those subfolders and if it is a word document or a pdf then also 
> put its content into elasticsearch. So we can search not only for file names 
> and path but also in the file contents.
> I did a small tool for it written in Delphi (because we develop in Delphi) 
> but it uses some libs we want to get rid of so we can use that system in our 
> product as well for indexing documents. Logstash doesn´t look like it is made 
> for that. 
> So is there a plugin or something else which is able to do so?
> 
> 
>> On Friday, July 11, 2014 9:20:50 AM UTC+2, Mark Walkom wrote:
>> Check out Logstash, it'll do most of what you want.
>> http://logstash.net/
>> 
>> Regards,
>> Mark Walkom
>> 
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>> 
>> 
>>> On 11 July 2014 17:15, Dan Ber  wrote:
>>> Hey,
>>> 
>>> I just wondered if it is somehow possible to index files from a directory 
>>> on HDD and their contents if they are textfiles or word documents and maybe 
>>> even PDFs.
>>> I read about FSRiver but could not test it becauser it seems to be not 
>>> working with es 1.2.1 due to a bug.
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearc...@googlegroups.com.
>>> 
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com.
> 
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBGKzVvTiUTsGForwjm-tMGDRWynPSBLcFM7wVFoVuGOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7A605A6B-532D-496F-BECB-C0D0B190D495%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


_all and compress fields

2014-07-11 Thread IronMike
I am looking for ways to reduce index size,
So I turned on compression, it helped a little but not much.

I did 2 more things that helped but not quite sure what the effects are:

- I turned off _all field for all. What is the downside to this?

- I tuned off "store" field for "file" field, which handles text from 
binary files like pdfs..? Should the "store" field for attachments be ON if 
I need highlighting in my queries?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51994e55-fba3-4968-b146-834f1ff97fba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: automatic ID generation (noob question)

2014-07-11 Thread Glen Smith
Yeah, since the response is referring to your local file name, it's pretty 
clear that the problem here is with curl.exe - it's obviously not sending 
the file contents in your second example, it's sending the file name.

On Friday, July 11, 2014 10:01:39 AM UTC-4, rtm...@googlemail.com wrote:
>
> Hi all, 
> first post. Am working through the ElasticSearch Server book (packt). Not 
> getting what I expect with automatic id gen.
> Using similar example from your website [1] <
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html>
>  
>
>
> With a file go.txt (given below) I entered this
>
>   curl.exe -XPOST http://localhost:9200/twitter/tweet 
> 
>  
> -T go.txt
>
> got back this
>
> {"_index":"twitter","_type":"
> tweet","_id":"VJcZR33ETjC1W8LX1CnL0w","_version":1,"created":true}
>
> Ok, it works as expected, but this
>
>   curl.exe -XPOST http://localhost:9200/twitter/tweet/ 
> 
>  
> -T go.txt
>
> failed:
>
>
> {"_index":"twitter","_type":"tweet","_id":"go.txt","_version":6,"created":false}
>
> Only difference is the trailing backslash on the URL.
> FYI contents of go.txt is straight off your website [1]
>
> ---
> {
> "user" : "kimchy",
> "post_date" : "2009-11-15T14:12:12",
> "message" : "trying out Elasticsearch"
> }
> ---
>
> Even [1] shows a trailing forward slash used, and it apparently succeeds.
>
> Furthermore, only found this when it failed with a forwardslash (ie: 
> curl.exe -XPOST http://localhost:9200/twitter/tweet/ 
> 
>  
> -T go.txt), I added "?pretty" for readability (ie: curl.exe -XPOST 
> http://localhost:9200/twitter/tweet/ 
> ?pretty
>  
> -T go.txt) and it succeeded! Which I'm sure it shouldn't do.
>
> Machine is windows server 2008 R2 (64-bit), curl is 7.33.0, java is 
> 1.7.0.55. Everything is being run locally, single node etc. Really basic. 
> Used a file as embedding the contents in-line is a bit scrappy in windows.
>
> Need more info? can anyone reproduce?
>
> thanks
>
> jan
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1356e906-2355-4d62-a0cf-ab3e94da1993%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic search dynamic number of replicas from Java API

2014-07-11 Thread joergpra...@gmail.com
They work both ways.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFLPD9NH%2BTf4aoHCTPdZ4EQ2CnDB75VaeZv%3D%3DV%2BtUUBVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to improve search performance in ES ?

2014-07-11 Thread joergpra...@gmail.com
For autocompletion, you should use the completion suggester

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

or edge ngram tokenizer

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html

Jörg


On Fri, Jul 11, 2014 at 8:11 PM, coder  wrote:

> Hi,
>
> I'm working on improving the search response of ES but not able to do
> anything. My scenario is something like this:
>
> I'm using 3 ES queries to get relevant results for my autocompleter.
>
> 1. A function score query with a match query  ( To get a correct match if
> user typed query is available in documents based on popularity)
>
> 2. A multi match query  (To handle those scenarios in which a user types
> some text which is present in different fields in a document since my
> documents are multi fields like name, address, city, state, country )
>
> 3. A query string (In order to ensure if I missed user query by the above
> type I'll be able to search using more powerful but less accurate query
> string)
>
> Along with all the 3 queries, I'm using 4 filters (clubbed using AND
> filter).
>
> My performance is really bad and I want to improve it along with
> delivering relevat results in my autocompleter.
>
> Can anyone help me how can I improve this ? Any way I can club the queries
> for better performance ?
>
> I have read that I BOOL filters should be used instead of AND filter since
> they use bitset which are cached internally. I think this makes one
> improvement because if in the first query ES stores the information of
> filters in bitset, it can reuse it in other two queries. That will make the
> thigs a little fast but based on queries, I'm not able to do any
> improvement ?
>
> Is there any way by which I can combine match and multi-match queries ( 1
> and 2) into a single effective query.
>
> Also, in place of query_string should I use some other query for faster
> execution.
>
> Any suggestions are welcome.
> Thanks
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5d99495b-20ef-46b6-a069-365574fdc0a9%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEBchn0AKB_heFFjr%2B%3Df1X_CzfJBGFnFQf_rEpgAiHUvA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic search dynamic number of replicas from Java API

2014-07-11 Thread Gonçalo Luiz
Thanks for the clear and simple explanation.

However, will the cluster (with auto expand replicas) ever go green if it
has been grown from 2 to 3 (triggering replicas to grow to to) and then
downsized to two nodes again? In other words, do the auto grow replicas
setting work both ways or just upwards?

Thanks again.

G.
On 11 Jul 2014 12:11, "Glen Smith"  wrote:

> Hi Goncalo,
>
> I think it's important that you understand: multiple copies of a shard
> will never be located on the same node.
> Not two replicas, and not the primary and one replica.
> To witness this, run a server on your local machine, and create an index
> with the defaults - 5 shards, one replica.
> You will see that your cluster is "yellow", and has 5 unallocated shards.
>
> How that helps create a better mental picture of shard allocation.
>
>
> On Friday, July 11, 2014 2:00:47 AM UTC-4, Gonçalo Luiz wrote:
>>
>> Hi Ivan,
>>
>> Does this mean that if a note comes back and a replication is underway
>> we'll end up with a node holding 2 replicas and 1 node holding node ?
>>
>> Scenario:
>>
>> Node A - Replica 2
>> Node B - Replica 3
>> Node C - Replica 1
>>
>> If node A dies and Node B get's Replica 2, as soon as node A (or a
>> replacement) is brought up, is the final configuration likely to be
>>
>> Node A (or replcament) - No replicas
>> Node B .- Replica 3 and 2
>> Node C - Replica 1
>>
>> or is there a re-balance that takes place ?
>>
>> Thanks,
>> Gonçalo
>>
>> Gonçalo Luiz
>>
>>
>> On 10 July 2014 22:11, Ivan Brusic  wrote:
>>
>>> It's only been around for 3.5 years: https://github.com/
>>> elasticsearch/elasticsearch/issues/623 :)
>>>
>>> I should clarify part of my previous statement.
>>>
>>> *"By default, the ongoing recovery is not cancelled when the missing
>>> node rejoins the cluster. You can change the gateway settings [2] to
>>> control when recovery kicks in."*
>>>
>>> What I meant to say is that an ongoing recovery is never cancelled once
>>> it has commenced, no matter what settings. By default, recovery happens
>>> immediately, but can be changed with the gateway settings.
>>>
>>> --
>>> Ivan
>>>
>>>
>>> On Thu, Jul 10, 2014 at 1:48 PM, joerg...@gmail.com 
>>> wrote:
>>>
 Indeed,  auto_expand_replicas "all" triggers an update cluster
 settings action each time a node is added.

 Still blown by the many settings Elasticsearch provides. Feeling small.
 Homework: collecting a gist textfile of all ES 1.2 settings.

 Jörg


  On Thu, Jul 10, 2014 at 9:57 PM, Ivan Brusic  wrote:

>  Sticking to your use case, you might want to use
> the auto_expand_replicas setting to "all" [1]: Never used it, but it 
> sounds
> what you are looking for.
>
> By default, the ongoing recovery is not cancelled when the missing
> node rejoins the cluster. You can change the gateway settings [2] to
> control when recovery kicks in.
>
> [1] http://www.elasticsearch.org/guide/en/elasticsearch/
> reference/current/indices-update-settings.html
> [2] http://www.elasticsearch.org/guide/en/elasticsearch/
> reference/current/modules-gateway.html
>
> Cheers,
>
> Ivan
>
>
> On Thu, Jul 10, 2014 at 12:39 PM, Gonçalo Luiz 
> wrote:
>
>> I get it know.
>>
>> I agree that setting the number of replicas is connected to the
>> deployment reality in each case and it's derived variables and thus there
>> is no one formula to fit all cases (it would't be a setting in that 
>> case).
>>
>> What I was trying to cover was the theoretical / extreme case where
>> any node may fail at any time and what is the best way to go to minimize
>> the chance of losing data. Also, in the case you want to scale down the
>> installation (pottentially down to one node) without having to worry 
>> about
>> selecting nodes that hold different replicated shards is an example that
>> can beneffit from such configuration.
>>
>> I'm however not clear yet on what happens when a node goes down
>> (triggering extra replication amongst the survivors) and then comes up
>> again. Is the ongoing replication cancelled and the returning node 
>> brought
>> up to date?
>>
>> Thanks for your valuable input.
>>
>> G.
>> On 10 Jul 2014 18:07, "joerg...@gmail.com" 
>> wrote:
>>
>>> All I say is that it depends on the probability of the event of
>>> three nodes failing simultaneously, not on the total number of nodes 
>>> having
>>> a replica. You can even have 5 nodes and the probability of the event 
>>> of 4
>>> nodes failing simultaneously, and so on.
>>>
>>> As an illustration, suppose you have a data center with two
>>> independent electric circuits and the probability of failure corresponds
>>> with power outage, then it is enough to distribute nodes equally over
>>> servers using the two independent power lines in th

Re: Indexing files from filesystem

2014-07-11 Thread Ivan Brusic
Never used FSRiver, but from what I read, it should be exactly what you
want. The code is open-sourced, so I would just check out the project,
update the Elasticsearch version to 1.2.1 and find whatever bugs come up.
Then submit a pull request and contribute back to the project. :)

Cheers,

Ivan


On Fri, Jul 11, 2014 at 1:13 AM, Daniel Berretz 
wrote:

> I just had a look at their website, the youtube video of their own
> presentation and I red a bit about it in generally, how it works.
> For me, it now just looks like I give him a file C:\Apache\logs.txt and it
> works with it.
> What I look for is something I can for example say: Check our company´s
> drive where are sub folders like marketing, projects with again have sub
> folders and so on and index me into elasticsearch the path and the name to
> each file in each of those subfolders and if it is a word document or a pdf
> then also put its content into elasticsearch. So we can search not only for
> file names and path but also in the file contents.
> I did a small tool for it written in Delphi (because we develop in Delphi)
> but it uses some libs we want to get rid of so we can use that system in
> our product as well for indexing documents. Logstash doesn´t look like it
> is made for that.
> So is there a plugin or something else which is able to do so?
>
>
> On Friday, July 11, 2014 9:20:50 AM UTC+2, Mark Walkom wrote:
>
>> Check out Logstash, it'll do most of what you want.
>> http://logstash.net/
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 11 July 2014 17:15, Dan Ber  wrote:
>>
>>> Hey,
>>>
>>> I just wondered if it is somehow possible to index files from a
>>> directory on HDD and their contents if they are textfiles or word documents
>>> and maybe even PDFs.
>>> I read about FSRiver but could not test it becauser it seems to be not
>>> working with es 1.2.1 due to a bug.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBGKzVvTiUTsGForwjm-tMGDRWynPSBLcFM7wVFoVuGOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Brian Thomas
I was just curious if there was a way of doing this without doing this, I 
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another 
property available also, like es.mapping.id.exlude that will not include 
the id field in the source document.  In elasticsearch, you can create and 
update documents without having to include the id in the source document, 
so I think it would make sense to be able to do that with 
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:
>
> You need to specify the id of the document you want to update somehow. 
> Since in es-hadoop things are batch focused, each 
> doc needs its own id specified somehow hence the use of 'es.mapping.id' 
> to indicate its value. 
> Is there a reason why this approach does not work for you - any 
> alternatives that you thought of? 
>
> Cheers, 
>
> On 7/7/14 10:48 PM, Brian Thomas wrote: 
> > I am trying to update an elasticsearch index using elasticsearch-hadoop. 
>  I am aware of the *es.mapping.id* 
> > configuration where you can specify that field in the document to use as 
> an id, but in my case the source document does 
> > not have the id (I used elasticsearch's autogenerated id when indexing 
> the document).  Is it possible to specify the id 
> > to update without having the add a new field to the MapWritable object? 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > elasticsearc...@googlegroups.com   elasticsearch+unsubscr...@googlegroups.com >. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
>  
> > <
> https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=email&utm_source=footer>.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
> -- 
> Costin 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Brian Thomas
I was just curious if there was a way of doing this without doing this, I 
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another 
property available also, like es.mapping.id.include.in.src where you could 
specify whether the src field actually gets included in the source 
document.  In elasticsearch, you can create and update documents without 
having to include the id in the source document, so I think it would make 
sense to be able to do that with elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:
>
> You need to specify the id of the document you want to update somehow. 
> Since in es-hadoop things are batch focused, each 
> doc needs its own id specified somehow hence the use of 'es.mapping.id' 
> to indicate its value. 
> Is there a reason why this approach does not work for you - any 
> alternatives that you thought of? 
>
> Cheers, 
>
> On 7/7/14 10:48 PM, Brian Thomas wrote: 
> > I am trying to update an elasticsearch index using elasticsearch-hadoop. 
>  I am aware of the *es.mapping.id* 
> > configuration where you can specify that field in the document to use as 
> an id, but in my case the source document does 
> > not have the id (I used elasticsearch's autogenerated id when indexing 
> the document).  Is it possible to specify the id 
> > to update without having the add a new field to the MapWritable object? 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > elasticsearc...@googlegroups.com   elasticsearch+unsubscr...@googlegroups.com >. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
>  
> > <
> https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=email&utm_source=footer>.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
> -- 
> Costin 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77259ed3-a896-47cc-9304-cc32046756ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to improve search performance in ES ?

2014-07-11 Thread coder
Hi,

I'm working on improving the search response of ES but not able to do 
anything. My scenario is something like this:

I'm using 3 ES queries to get relevant results for my autocompleter.

1. A function score query with a match query  ( To get a correct match if 
user typed query is available in documents based on popularity)

2. A multi match query  (To handle those scenarios in which a user types 
some text which is present in different fields in a document since my 
documents are multi fields like name, address, city, state, country )

3. A query string (In order to ensure if I missed user query by the above 
type I'll be able to search using more powerful but less accurate query 
string)

Along with all the 3 queries, I'm using 4 filters (clubbed using AND 
filter).

My performance is really bad and I want to improve it along with delivering 
relevat results in my autocompleter.

Can anyone help me how can I improve this ? Any way I can club the queries 
for better performance ? 

I have read that I BOOL filters should be used instead of AND filter since 
they use bitset which are cached internally. I think this makes one 
improvement because if in the first query ES stores the information of 
filters in bitset, it can reuse it in other two queries. That will make the 
thigs a little fast but based on queries, I'm not able to do any 
improvement ?

Is there any way by which I can combine match and multi-match queries ( 1 
and 2) into a single effective query.

Also, in place of query_string should I use some other query for faster 
execution.

Any suggestions are welcome. 
Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d99495b-20ef-46b6-a069-365574fdc0a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to add several name fields to an unmach definition in a mapping definition?

2014-07-11 Thread Ivan Brusic
Besides stop works, you can use a bool query one clause is the match all,
and the other clause is must not with the terms in question.

Something like:
{
   "query": {
  "bool": {
 "must": [
{
   "match_all": {}
}
 ],
 "must_not": [
{
   "terms": {
  "string_generic": [
 "Hash",
 "video_hash"
  ]
   }
}
 ]
  }
   }
}

Lucene used to have a limitation where you could not simply execute a not
query by itself. The trick was simply to use a match all in conjunction
with the not query. Not sure if the limitation still exists (it probably
does), but I find it easier to read if you thrown in the explicit match all.

Cheers,

Ivan



On Thu, Jul 10, 2014 at 7:37 PM, vineeth mohan 
wrote:

> Hi Joaquin ,
>
> I believe you don't want certain words to be indexed.
> If that is the case , add those words as stop words -
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html
>
> Thanks
> Vineeth
>
>
> On Fri, Jul 11, 2014 at 3:12 AM, JB Mere  wrote:
>
>>
>> Hello
>>
>> Yes, sure, sorry for being quite unclear. Let me try to explin the issue
>> better:
>>
>> Imagine I'm defining the following mapping:
>> curl -XPUT 'http://localhost:9200/inteco5/inteco5/_mapping' -d '{
>>  "inteco5" : {
>> "dynamic_templates" : [
>> {
>>  "date_fields" : {
>> "match" : "*Date*",
>> "mapping" : {
>>  "type" : "date",
>> "format" : "dateOptionalTime"
>> }
>>  }
>> },
>> {
>> "location": {
>>  "match" : "asasec_loc",
>> "mapping" : {
>>   "type": "geo_point"
>>  }
>> }
>> },
>> {
>>  "string_generic" : {
>> "match" : "*",
>> "unmatch" : "Hash",
>>  "mapping" : {
>> "type" : "string"
>> }
>>  }
>> }
>> ]
>> }
>> }'
>>
>>
>> For sure it works, but what if I want to define in the string part that
>> every keyword shall be string except those keys named Hash AND video_hash ?
>>
>> Then, it should be like
>> "match" : "*",
>> "unmatch" : ["Hash", "video_hash"],
>> but it does not work at all.
>>
>> Hopefully It is now clearer.
>> Do you have any idea on how to specify it?
>>
>> Thanks a lot.
>>
>> Joaquin
>>
>>
>>
>>
>> El jueves, 10 de julio de 2014 19:47:01 UTC+2, vineeth mohan escribió:
>>>
>>> Hello Joaquin ,
>>>
>>> Your question is not very clear.
>>> Can you brief it once more.
>>>
>>> Thanks
>>>Vineeth
>>>
>>>
>>> On Thu, Jul 10, 2014 at 9:58 PM, JB Mere  wrote:
>>>
  Hello to everybody and sorry for this, maybe naive question.

 I'm defining a mapping for an ES 1.X index database

 My interest should be to do like:
 ...
 "unmatch" : ["Hash","video_hash"],
 "match" : "*",
 ...

 but unfortunately it doesn't work?

 Can you figure out how to get it done ?

 Thanks in advance

 Joaquin

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/6c6fbcf6-40b4-447f-b09d-5d74ad67a1a9%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/497dff13-514d-4944-8d2f-0c1146d05487%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nd%3DnJpoGZdAprKsoFH0b8jJjB4D2LbDQiU5SmkzAnJMA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an e

Sorting on Parent/Child attributes

2014-07-11 Thread Brian Rook
Hello,

I'm looking for a solution to a problem I am having.  Lets say I have 2 
types Person and Pet in an index called customers.
Person
-account
-firstname
-lastname
-SSN

Pet
-name
-type
-id
-account

I would like to query/filter on fields in both person and pet in order to 
retrieve people and their associated pet.  Additionally, I need to sort on 
a field that could be in either person or pet.

For example, retrieve all people/pets that have wildcard person.firstname 
'*ave*' and pet.type wildcard '*terrier*' and sort on pet.name.  Or 
wildcard search on person.SSN = '*55*' and pet.name='*mister*' and sort on 
person.lastname.

I currently have a solution where I search/sort on people or pet based on 
the sort that I am using.  I use a hasChild/hasParent to manage the fields 
that are on the 'other' type.  Then I use an id field to retrieve the 
entities of the other type.  So, if I have a sort on personfirstname, I 
query on person and child (pet) and sort on person.firstname, then use the 
accounts to retrieve the pets (by account) in another query.  This is not 
ideal because it is ugly and I suspect difficult to maintain if this 
query's requirements change in the future.

I suspect that I can do a query at the 'customers' level and do 'type' 
queries on the fields that I need for person and pet.  Similar to this:
http://joelabrahamsson.com/grouping-in-elasticsearch-using-child-documents/

However, I'm not sure how I would implement the sort.  I suspect that I 
could use a custom scoring script, but I am not sure how I would score text 
fields.


Any thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53e9d304-d1e3-4994-b95f-3e5f0052ec96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Attachment field type parsed properly but cant see metadata in iformation ...

2014-07-11 Thread David Pilato
1) 
As you are a Java dev, I'd recommend using directly Tika in your code and 
extract data as you need and produce JSON which exactly answers to your needs.
Somehow, this: 
https://github.com/dadoonet/fsriver/blob/master/src/main/java/fr/pilato/elasticsearch/river/fs/river/FsRiver.java#L688-L695

That way, you won't need to send a full binary doc to elasticsearch just to 
index some meta data or raw text.

That said, you could look at Source exclude: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html#include-exclude

2)
The mapper attachment never modify source document.
But, if you ask for stored field at search time in addition to default 
"_source" field, you should get back your values.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html#search-request-fields

HTH

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 15:53:54, David Marko (dmarko...@gmail.com) a écrit:

I'm uploading attachments to be parsed in ES using Java api. I have ES 1.2.2 
with proper elasticsearch-mapper-attachments/ plugin installed. Code works fine 
and I can search by attachment content but  ...

1. File content is stored into elastic search. Is there a way how to avoid 
this? Just to index the content but not store?

I have this mapping code (not full code):

XContentBuilder map = jsonBuilder().startObject()
        .startObject(idxType)
          .startObject("properties")
            .startObject("file")
              .field("type", "attachment")
              .field("store","no")
            .endObject()
          .endObject()
     .endObject();

    and indexing by using this:

BytesReference json = jsonBuilder()
                .startObject()
                    .field("_id", filePath)
                     .field("file", data64)
                .endObject().bytes();
       
IndexResponse idxResp = 
client.prepareIndex().setIndex(idxName).setType(idxType).setId(filePath)

2)  I cant see file metadata created as described in docs. I understand that 
they are (should be) created automaticly ?

Docs says these fields should appear ...

 "fields" : {
"file" : {"index" : "no"},
"title" : {"store" : "yes"},
"date" : {"store" : "yes"},
"author" : {"analyzer" : "myAnalyzer"},
"keywords" : {"store" : "yes"},
"content_type" : {"store" : "yes"},
"content_length" : {"store" : "yes"},
"language" : {"store" : "yes"}
   }
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5a29d66f-99d8-48e4-b93c-7caf61b93214%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53c01069.2d1d5ae9.70e%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: automatic ID generation (noob question)

2014-07-11 Thread David Pilato
My advice: don't use curl.exe on windows.
You should take a look at Marvel which comes with SENSE.

It's free for development: http://www.elasticsearch.org/overview/marvel/

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 16:01:43, rtm4...@googlemail.com (rtm4...@googlemail.com) 
a écrit:

Hi all,
first post. Am working through the ElasticSearch Server book (packt). Not 
getting what I expect with automatic id gen.
Using similar example from your website [1] 


With a file go.txt (given below) I entered this

  curl.exe -XPOST http://localhost:9200/twitter/tweet -T go.txt

got back this

{"_index":"twitter","_type":"
tweet","_id":"VJcZR33ETjC1W8LX1CnL0w","_version":1,"created":true}

Ok, it works as expected, but this

  curl.exe -XPOST http://localhost:9200/twitter/tweet/ -T go.txt

failed:

{"_index":"twitter","_type":"tweet","_id":"go.txt","_version":6,"created":false}

Only difference is the trailing backslash on the URL.
FYI contents of go.txt is straight off your website [1]

---
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
---

Even [1] shows a trailing forward slash used, and it apparently succeeds.

Furthermore, only found this when it failed with a forwardslash (ie: curl.exe 
-XPOST http://localhost:9200/twitter/tweet/ -T go.txt), I added "?pretty" for 
readability (ie: curl.exe -XPOST http://localhost:9200/twitter/tweet/?pretty -T 
go.txt) and it succeeded! Which I'm sure it shouldn't do.

Machine is windows server 2008 R2 (64-bit), curl is 7.33.0, java is 1.7.0.55. 
Everything is being run locally, single node etc. Really basic. Used a file as 
embedding the contents in-line is a bit scrappy in windows.

Need more info? can anyone reproduce?

thanks

jan
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/038004c1-114e-455a-a6c8-5c1aff3a7cf2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53c00f03.333ab105.70e%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


elasticsearch heapdump using jmap

2014-07-11 Thread NC
Over last few days noticed more GC in my es cluster and was trying to take 
a look at heap .. I'm runnning Es over wrapper, running on EC2 , with 
Amazon linux AMI

Always get this message  with jmap -J-d64 -histo 

"Unable to open socket file: target process not responding or HotSpot VM 
not loaded." 

Tried , with -F option - that works, but wasnt able to provide all objects 
- since it fails with bad Address error ( i think using -F , thats always a 
high possibility ) 
 
If anyone successfully took a heap dump - help will be appreciated !

TIA
Nirmal

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5cf70106-d308-4b4c-bbb1-722dcb934c52%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why does the TransportClient depend on Lucene core?

2014-07-11 Thread joergpra...@gmail.com
I'd like to release such a TransportClient, as part of a modularization of
the 1.2.1 (probably 1.3) codebase. But there is no ETA.

I plan to release three flavors of a modularized TransportClient, an ingest
client that can only write documents to a cluster (depending on Jackson
JSON and Google Guava), a search client that can submit queries (this will
require at least the Lucene query builder classes) and an admin client that
can also execute control commands.

Jörg


On Fri, Jul 11, 2014 at 5:30 PM, Ming Fang  wrote:

> Are there any plans to release a client only Maven artifact?
> It is very strange for a client application to include the entire
> Elasticsearch server.
>
>
> On Friday, June 27, 2014 8:38:20 AM UTC-4, David Pilato wrote:
>
>> The TransportClient is part of the full elasticsearch distribution.
>> We don't have another official client for that.
>>
>> If you really need to have another Lucene version in your class path, I'm
>> afraid you need to use HTTP REST Layer and not Transport layer.
>>
>> May be you should look at JEST in that case? http://www.
>> elasticsearch.org/guide/en/elasticsearch/client/
>> community/current/clients.html#community-java
>>
>> --
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
>> @dadoonet  | @elasticsearchfr
>> 
>>
>>
>> Le 27 juin 2014 à 13:32:55, Jeroen Reijn (j.r...@1hippo.com) a écrit:
>>
>>  Hi all,
>>
>> while trying to use the TransportClient to connect to a remote
>> elasticsearch cluster in a project that already contains Lucene libraries
>> it fails to create a client connection with the following error.
>>
>>
>>
>> [INFO] [talledLocalContainer] Jun 27, 2014 12:43:53 PM org.apache.
>> catalina.core.StandardContext loadOnStartup
>> [INFO] [talledLocalContainer] SEVERE: Servlet /app threw load() exception
>> [INFO] [talledLocalContainer] at org.elasticsearch.Version.(
>> Version.java:119)
>> [INFO] [talledLocalContainer] at org.elasticsearch.client.transport.
>> TransportClient.(TransportClient.java:169)
>> [INFO] [talledLocalContainer] at org.elasticsearch.client.transport.
>> TransportClient.(TransportClient.java:125)
>>
>>
>> Caused by: java.lang.NoSuchFieldError: LUCENE_41
>>
>>
>>
>>
>> I guess this has to do with the assignment of
>>
>>  Version version = Version.CURRENT;
>>
>>
>> Is there a particular reason why the transport client depends on the
>> Lucene core library which contains the versions?
>>
>> Is there a way I can work around this besides not having the old lucene
>> libraries in my classpath?
>>
>> Kind regards,
>>
>> Jeroen
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearc...@googlegroups.com.
>>
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/e29b3445-bef2-440c-b439-65a9a69a8291%
>> 40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8f6d868c-69e4-4fcc-8424-13cb5e30eb78%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE41shRm-hcKQ6ES3UkNaTQ76Kyzz%2Bqs%2BX9tfmzxCzuTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why can i do an "if" in my script ?

2014-07-11 Thread Joffrey Hercule
I found my issue by myself :)
The "else" is mandatory. 

So now it's good !

{
  "query" : {
"match_all" : {}
},
"sort" : {
"_script" : {
  "script" : "if(doc['users'].values.contains((long)1)){return 
'foo'}else{return 'bar'}",
"type" : "string",
"order" : "desc"
}
}
}

Le vendredi 11 juillet 2014 17:18:13 UTC+2, Joffrey Hercule a écrit :
>
> Hi all,
>
> i'm trying to test if a value is in an array but i don't know why my if 
> clause doesn't work...
>
> doc['users'] is an array of long values in my example. 
> So If i do : 
>
> {
>   "query" : {
> "match_all" : {}
> },
> "sort" : {
> "_script" : {
>   "script" : "doc['users'].values.contains((long)1)",
> "type" : "string",
> "order" : "desc"
> }
> }
> }
>
> I retrieve "true" or "false" in my result.
>
> But if i try :
>
> {
>   "query" : {
> "match_all" : {}
> },
> "sort" : {
> "_script" : {
>   "script" : "if(doc['users'].values.contains((long)1)){return 
> 'foo'}",
> "type" : "string",
> "order" : "desc"
> }
> }
> }
>
> I got an exception "SearchPhaseExecutionException": 
> org.elasticsearch.index.fielddata.fieldcomparator.StringScriptDataComparator$InnerSource
> I tried with if(doc['users'].values.contains((long)1) == true) too but 
> same issue...
>
> Someone knows why ?
> Thanks in advance !
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e4aa450b-b853-46f4-8470-77fac6df4b5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why does the TransportClient depend on Lucene core?

2014-07-11 Thread Ming Fang
Are there any plans to release a client only Maven artifact?
It is very strange for a client application to include the entire 
Elasticsearch server.

On Friday, June 27, 2014 8:38:20 AM UTC-4, David Pilato wrote:
>
> The TransportClient is part of the full elasticsearch distribution.
> We don't have another official client for that.
>
> If you really need to have another Lucene version in your class path, I'm 
> afraid you need to use HTTP REST Layer and not Transport layer.
>
> May be you should look at JEST in that case? 
> http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/clients.html#community-java
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 27 juin 2014 à 13:32:55, Jeroen Reijn (j.r...@1hippo.com ) 
> a écrit:
>
>  Hi all,
>
> while trying to use the TransportClient to connect to a remote 
> elasticsearch cluster in a project that already contains Lucene libraries 
> it fails to create a client connection with the following error.
>
>  
>
> [INFO] [talledLocalContainer] Jun 27, 2014 12:43:53 PM org.apache.catalina
> .core.StandardContext loadOnStartup
> [INFO] [talledLocalContainer] SEVERE: Servlet /app threw load() exception
> [INFO] [talledLocalContainer] at org.elasticsearch.Version.(
> Version.java:119)
> [INFO] [talledLocalContainer] at org.elasticsearch.client.transport.
> TransportClient.(TransportClient.java:169)
> [INFO] [talledLocalContainer] at org.elasticsearch.client.transport.
> TransportClient.(TransportClient.java:125)
>
>
> Caused by: java.lang.NoSuchFieldError: LUCENE_41
>
>  
>
>
> I guess this has to do with the assignment of 
>
>  Version version = Version.CURRENT;
>  
>
> Is there a particular reason why the transport client depends on the 
> Lucene core library which contains the versions?
>
> Is there a way I can work around this besides not having the old lucene 
> libraries in my classpath?
>
> Kind regards,
>
> Jeroen
>  --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/e29b3445-bef2-440c-b439-65a9a69a8291%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8f6d868c-69e4-4fcc-8424-13cb5e30eb78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Why can i do an "if" in my script ?

2014-07-11 Thread Joffrey Hercule
Hi all,

i'm trying to test if a value is in an array but i don't know why my if 
clause doesn't work...

doc['users'] is an array of long values in my example. 
So If i do : 

{
  "query" : {
"match_all" : {}
},
"sort" : {
"_script" : {
  "script" : "doc['users'].values.contains((long)1)",
"type" : "string",
"order" : "desc"
}
}
}

I retrieve "true" or "false" in my result.

But if i try :

{
  "query" : {
"match_all" : {}
},
"sort" : {
"_script" : {
  "script" : "if(doc['users'].values.contains((long)1)){return 
'foo'}",
"type" : "string",
"order" : "desc"
}
}
}

I got an exception "SearchPhaseExecutionException": 
org.elasticsearch.index.fielddata.fieldcomparator.StringScriptDataComparator$InnerSource
I tried with if(doc['users'].values.contains((long)1) == true) too but same 
issue...

Someone knows why ?
Thanks in advance !

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bd1e6451-b1d3-4b30-be70-ce141409f9d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: has_child query on an alias.

2014-07-11 Thread Paweł Krzaczkowski
Hi,

My bad ... someone deleted aliases without my knowladge ... that's why it 
didn't work :) it works as it should ... the problem was with alias 
definition :)

Pawel

W dniu piątek, 11 lipca 2014 14:16:08 UTC+2 użytkownik Paweł Krzaczkowski 
napisał:
>
> Hi,
>
> I don't know if this is a bug or it should work like that ... but it seems 
> like a bug.
>
> Let's say we have monthly indexes
>
> ...
> index_1405
> index_1406
> index_1407
> ...
>
> All of them are assigned to an alias index_year - that is pointing to last 
> 12 indexes.
>
> We have two types
>  - question
>  - answer
>
> question is a parent to answer.
>
> Our query:
>
> {
>   "min_score": 1,
>   "query": {
> "function_score": {
>   "query": {
> "has_child": {
>   "query": {
> "match_all": {}
>   },
>   "score_mode" : "sum",
>   "type": "answer"
> }
>   }
> }
>   }
> }
>
> So sending this query to http://127.0.0.1:9200/index_year/_search return 
> records only from last index - index_1407
>
>
> {
>   "took" : 2,
>   "timed_out" : false,
>   "_shards" : {
> "total" : 1,
> "successful" : 1,
> "failed" : 0
>   },
>   "hits" : {
> "total" :* 3*,
> "max_score" : 4.0,
> "hits" : [ ... ]
>   }
> }
>
>- 
>   
>   
> Sending this query to 
> http://127.0.0.1:9200/index_1405,index_1406,index_1407/_search return 
> results from all indxes.
>
> {
>   "took" : 3,
>   "timed_out" : false,
>   "_shards" : {
> "total" : 2,
> "successful" : 2,
> "failed" : 0
>   },
>   "hits" : {
> "total" : *17*,
> "max_score" : 4.0,
> "hits" : [  ]
>   }
> }
>
> As You can see this one return 17 records .. from all indexes.
>
> So my question is ... is there some kind of limit on has_child query and 
> aliases ??
>
> Elasticsearch 1.2.1
>
> Best ragards,
> Pawel
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2f2da2e4-923f-4703-a5d9-7a3eca6cea63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch dynamic scripting vs static script - deployment

2014-07-11 Thread Alex S.V.
Hi,

We've been also hacked on our staging server because of opened ports :)
I find dynamic scripting flexible for applications, but static scripting 
causes bunch of problems:

1. I should deploy it in special directory at elasticsearch node? We are 
using capistrano for web-app deployment and it's easy procedure, though we 
should provide additional access to elasticsearch node filesystem
2. I don't know, how to support script versions? just append _v1, _v2, etc. 
suffixes in filename?
3. Should I deploy on one node, or on each node? If I must deploy on each 
node - what happens if one node has a script, and other doesn't have?

Regards,
Alex


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20681e2f-bb8b-4602-8b19-ed27b661a88b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does "size" work under the hood?

2014-07-11 Thread Nikolas Everett
The first count query might not be needed because a normal query comes back
with a count anyway.

The size parameter ultimately translates into the size of a min heap.
Worst case scenario space is just the size parameter times number of
shards.  Worst
case scenario time is log(size) * number of documents matched.  If size is
much smaller then number of documents matched then the average case tends
to end up being more related to the number of documents matched then the
size because most matches aren't better then whatever is collected in the
heap.  At some point the heap has to be sorted, sent back to the one node,
merged, then fetched.  The fetch may end up being slower then all of the
rest of it.  And may end up using more memory if you are loading source.

Nik


On Fri, Jul 11, 2014 at 10:32 AM, x0ne  wrote:

> I am trying to figure out the best way to issue my queries as to not flood
> the heap with data I may not care about. Before each query, I do a count
> search type to identify how many results I am potentially dealing with.
> When I specify a "size" in my search query, how exactly does that impact
> results and the heap? If I run a query that matches 50k documents and I am
> only interested in 25 (specified by size), are all 50K still loaded into
> memory? Is there a way to get just the top 25 results off the query match
> without loading all hits into memory or is that how size actually works?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9837a8e5-ddf6-4684-8fe0-dd6909bcee48%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3TJV-ZQ-wZMrfMR%2B9EQV_V48onfCCmqr9P64T5eusSyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How does "size" work under the hood?

2014-07-11 Thread x0ne
I am trying to figure out the best way to issue my queries as to not flood 
the heap with data I may not care about. Before each query, I do a count 
search type to identify how many results I am potentially dealing with. 
When I specify a "size" in my search query, how exactly does that impact 
results and the heap? If I run a query that matches 50k documents and I am 
only interested in 25 (specified by size), are all 50K still loaded into 
memory? Is there a way to get just the top 25 results off the query match 
without loading all hits into memory or is that how size actually works?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9837a8e5-ddf6-4684-8fe0-dd6909bcee48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


has_child query on an alias.

2014-07-11 Thread Paweł Krzaczkowski
Hi,

I don't know if this is a bug or it should work like that ... but it seems 
like a bug.

Let's say we have monthly indexes

...
index_1405
index_1406
index_1407
...

All of them are assigned to an alias index_year - that is pointing to last 
12 indexes.

We have two types
 - question
 - answer

question is a parent to answer.

Our query:

{
  "min_score": 1,
  "query": {
"function_score": {
  "query": {
"has_child": {
  "query": {
"match_all": {}
  },
  "score_mode" : "sum",
  "type": "answer"
}
  }
}
  }
}

So sending this query to http://127.0.0.1:9200/index_year/_search return 
records only from last index - index_1407


{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
  },
  "hits" : {
"total" :* 3*,
"max_score" : 4.0,
"hits" : [ ... ]
  }
}

   - 
  
  
Sending this query to 
http://127.0.0.1:9200/index_1405,index_1406,index_1407/_search return 
results from all indxes.

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
  },
  "hits" : {
"total" : *17*,
"max_score" : 4.0,
"hits" : [  ]
  }
}

As You can see this one return 17 records .. from all indexes.

So my question is ... is there some kind of limit on has_child query and 
aliases ??

Elasticsearch 1.2.1

Best ragards,
Pawel

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77c62bf9-7fb5-49c0-8330-0e9c3770bf4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to translate “and or” where clause from sql query to elasticsearch filter

2014-07-11 Thread Artem Frolov
I've done what you want.
Mapping and populating data GIST: 
https://gist.github.com/ArFeRR/3031c1ce8f95549ad86d

GIST with my search query: 
https://gist.github.com/ArFeRR/f69ebe24ddc543b7bffd

(it have to return one record with resolution:1920x1080 and weight: 2,9 kg. 
It's the notebook with the name "Lenovo IdeaPad Z710A")
but it returns empty.

Please help to achieve the behaviour what i need!

пятница, 11 июля 2014 г., 11:48:40 UTC+3 пользователь David Pilato написал:
>
> A full script would allow any user on the mailing list to recreate from 
> scratch your issue without the need of building a script by ourselves which 
> is really time consuming.
>
> So, basically a script should look like this:
>
> // Remove test data
> DELETE test
>
> // If needed, add your settings/mappings
> PUT test
> {
>   "settings": {},
>   "mappings": {}
> }
>
> // Index some data
> PUT test/doc/1
> {
>   "foo":"bar"
> }
>
> PUT test/doc/x?refresh
> {
>   "foo":"bar"
> }
>
> // Run the query
> GET test/doc/_search
> {
> }
>
> With that, we can definitely help you I think.
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 11 juillet 2014 à 10:44:01, Artem Frolov (kee...@gmail.com 
> ) a écrit:
>
>  Can you answer what's wrong with the gist, i've provided? I can't figure 
> out...
> I provide you all the data I have, related to this issue. Look:
> 1) The elasticsearch index, containing the products and its options, which 
> have to be filtered: https://gist.github.com/ArFeRR/de86b8b0a5f2bc7dfd86
> 2) The JSON query for the filtration: 
> https://gist.github.com/ArFeRR/e159ef1047122a617b88
> 3)The ELastica.io code to genereate the json above on PHP:
> https://gist.github.com/ArFeRR/cebb2bf54232069d817b#file-gistfile1-php
>
> (I've changed the data to make it more real)
>
> All the json works fine, i've tested it! Can't figure out what's wrong and 
> what I have to do to provide a "FULL working GIST" for you
>
>
>
> четверг, 10 июля 2014 г., 18:37:19 UTC+3 пользователь David Pilato 
> написал: 
>>
>>  Have a look at this page to see how you can build a full working GIST 
>> which could help us to reproduce your use case.
>>  
>>  When your GIST will be updated, please update this thread so I can look 
>> at it.
>>
>>  -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>  @dadoonet  | @elasticsearchfr 
>> 
>>  
>>
>> Le 10 juillet 2014 à 11:59:22, Artem Frolov (kee...@gmail.com) a écrit:
>>
>>  the GIST:
>>  https://gist.github.com/ArFeRR/630acb216b8d95168b73
>>
>> четверг, 10 июля 2014 г., 12:51:49 UTC+3 пользователь Artem Frolov 
>> написал: 
>>>
>>> here's my try to solve it:
>>>
>>> {
>>>"filtered":{
>>>   "filter":{
>>>  "nested":{
>>> "path":"productsOptionValues",
>>> "filter":{
>>>"and":[
>>>   {
>>>  "or":[
>>> {
>>>"and":[
>>>   {
>>>  "term":{
>>> "productsOptionValues.productOption"
>>> :"weight"
>>>  }
>>>   },
>>>   {
>>>  "term":{
>>> "productsOptionValues.value":
>>> "500 kg"
>>>  }
>>>   }
>>>]
>>> },
>>> {
>>>"and":[
>>>   {
>>>  "term":{
>>> "productsOptionValues.productOption"
>>> :"weight"
>>>  }
>>>   },
>>>   {
>>>  "term":{
>>> "productsOptionValues.value":"50kg"
>>>  }
>>>   }
>>>]
>>> }
>>>  ]
>>>   },
>>>   {
>>>  "or":[
>>> {
>>>"and":[
>>>   {
>>>  "term":{
>>> "productsOptionValues.productOption"
>>> :"magic"
>>>  }
>>>   },
>>>   {
>>>  "term":{
>>> "productsOptionValues.value":"no"
>>>  }
>>>   }
>>>]
>>>

automatic ID generation (noob question)

2014-07-11 Thread rtm443x
Hi all, 
first post. Am working through the ElasticSearch Server book (packt). Not 
getting what I expect with automatic id gen.
Using similar example from your website [1] 

 


With a file go.txt (given below) I entered this

  curl.exe -XPOST http://localhost:9200/twitter/tweet 

 
-T go.txt

got back this

{"_index":"twitter","_type":"
tweet","_id":"VJcZR33ETjC1W8LX1CnL0w","_version":1,"created":true}

Ok, it works as expected, but this

  curl.exe -XPOST http://localhost:9200/twitter/tweet/ 

 
-T go.txt

failed:

{"_index":"twitter","_type":"tweet","_id":"go.txt","_version":6,"created":false}

Only difference is the trailing backslash on the URL.
FYI contents of go.txt is straight off your website [1]

---
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
---

Even [1] shows a trailing forward slash used, and it apparently succeeds.

Furthermore, only found this when it failed with a forwardslash (ie: 
curl.exe -XPOST http://localhost:9200/twitter/tweet/ 

 
-T go.txt), I added "?pretty" for readability (ie: curl.exe -XPOST 
http://localhost:9200/twitter/tweet/ 
?pretty
 
-T go.txt) and it succeeded! Which I'm sure it shouldn't do.

Machine is windows server 2008 R2 (64-bit), curl is 7.33.0, java is 
1.7.0.55. Everything is being run locally, single node etc. Really basic. 
Used a file as embedding the contents in-line is a bit scrappy in windows.

Need more info? can anyone reproduce?

thanks

jan

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/038004c1-114e-455a-a6c8-5c1aff3a7cf2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Attachment field type parsed properly but cant see metadata in iformation ...

2014-07-11 Thread David Marko
I'm uploading attachments to be parsed in ES using Java api. I have ES 
1.2.2 with proper elasticsearch-mapper-attachments/ plugin installed. Code 
works fine and I can search by attachment content but  ...

1. File content is stored into elastic search. Is there a way how to avoid 
this? Just to index the content but not store?

I have this mapping code (not full code):

XContentBuilder map = jsonBuilder().startObject()
.startObject(idxType)
  .startObject("properties")
.startObject("file")
  .field("type", "attachment")
  .field("store","no")
.endObject()
  .endObject()
 .endObject();

and indexing by using this:

BytesReference json = jsonBuilder()
.startObject()
.field("_id", filePath)
 .field("file", data64)
.endObject().bytes();

IndexResponse idxResp = client.prepareIndex().setIndex(idxName).setType(
idxType).setId(filePath)

2)  I cant see file metadata created as described in docs. I understand 
that they are (should be) created automaticly ?

Docs says these fields should appear ...

 "fields" : {
"file" : {"index" : "no"},
"title" : {"store" : "yes"},
"date" : {"store" : "yes"},
"author" : {"analyzer" : "myAnalyzer"},
"keywords" : {"store" : "yes"},
"content_type" : {"store" : "yes"},
"content_length" : {"store" : "yes"},
"language" : {"store" : "yes"}
   }

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5a29d66f-99d8-48e4-b93c-7caf61b93214%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Updating a nested object (or other solutions ?)

2014-07-11 Thread Aleks
Hi there,

I'm working for an e-commerce website and I want to visualize stocks 
evolutions.
This stock should be visualizable by (group) different hierarchy level.

Here is the mapping I did to represent the sale (=operation) hierarchy :

{
   "operations": {
  "mappings": {
 "operation": {
"properties": {
   "Sites": {
  "properties": {
 "siteid": {
"type": "long"
 },
 "siteuniversid": {
"type": "long"
 },
 "univers": {
"properties": {
   "id": {
  "type": "long"
   },
   "level": {
  "type": "long"
   },
   "name": {
  "type": "string"
   },
   "productFamilies": {
  "properties": {
 "id": {
"type": "long"
 },
 "name": {
"type": "string"
 },
 "references": {
"type": "nested",
"properties": {
   "name": {
  "type": "string"
   },
   "referenceid": {
  "type": "long"
   },
   *"stockevol"*: {
  "properties": {
 "date": {
"type": "date",
"format": "-MM-dd 
HH:mm:ss"
 },
 "initial": {
"type": "integer"
 },
 "real": {
"type": "integer"
 },
 "sold": {
"type": "integer"
 }
  }
   }
}
 }
  }
   },
   "univers": {
  "properties": {
 "id": {
"type": "long"
 },
 "level": {
"type": "long"
 },
 "name": {
"type": "string"
 },
 "productFamilies": {
"properties": {
   "id": {
  "type": "long"
   },
   "name": {
  "type": "string"
   },
   "references": {
  "type": "nested",
  "properties": {
 "name": {
"type": "string"
 },
 "referenceid": {
"type": "long"
 },
 *"stockevol"*: {
"properties": {
   "date": {
  "type": "date",
  "format": "-MM-dd 
HH:mm:ss"
   },
   "initial": {
  "type": "integer"
   },
   "real": {
 

Re: What types of SSDs?

2014-07-11 Thread John Smith
Right now I have 4 boxes...

2x 32 cores 200GB RAM with RAID10 SATA1 + the Fusion IO

2x 24 cores 96GB RAM with RAID10 SAS but regular mechanical drives.

I only test them as pairs. So it's clusters of 2

On the surface all searches seem to perform quite close to each other. Only 
when looking at the stats in HQ and Marvel the true story is told. For 
instance most warnings with Fusion IO are yellow at best. While with the 
SAS Raid 10 (Regular SATA Drives) they reach red.

I'm hopping I can get some regular SSDs to put on the SAS boxes and see if 
it's better.




On Thursday, 10 July 2014 18:00:11 UTC-4, Jörg Prante wrote:
>
> Did you consider SSD with RAID0 (Linux, ext4, noatime) and SAS2 (6g/s) or 
> SAS3 (12g/s) controller?
>
> I have for personal use at home LSI SAS 2008 of 4x128g SSD RAID0 with 
> sustained 800 MB/s write and 950 MB/s read, on a commodity dual AMD C32 
> socket server mainboard. I do not test with JMeter but on this single node 
> hardware alone I observe 15k bulk index operations per second, and 
> scan/scroll over 45m docs takes less than 70 min.
>
> I'm waiting until SAS3 is affordable for me. For the future I have on my 
> list: LSI SAS 3008 HBA and SAS3 SSDs. For personal home use, Fusion IO is 
> too heavy for my wallet. Even for commercial purpose I do not consider it 
> as a cost effective solution.
>
> Just a note: if you want spend your money to accelerate ES, buy RAM. You 
> will get more performance than from drives. Reason is the lower latency. 
> Low latency will speed up applications like ES more than the fastest I/O 
> drive is able to. That reminds me that I'm waiting since ages for DDR4 
> RAM...
>
> Jörg
>
>
> On Thu, Jul 10, 2014 at 10:13 PM, John Smith  > wrote:
>
>> Using 1.2.1
>>
>> I know each system and functionality is different but just curious when 
>> people say buy SSDs for ES, what types of SSDs are they buying?
>>
>> Fortunately for me I had some Fusion IO cards to test with, but just 
>> wondering if it's worth the price and if I should look into off the shelf 
>> SSDs like Samsung EVOs using SAS instead of pure SATA.
>>
>> So far from my testing it seems that all search operation regardless of 
>> the drive type seem to return in the same amount of time. So I suppose 
>> caching is playing a huge part here.
>>
>> Though when looking at the HQ indexing stats like query time, fetch time, 
>> refresh time etc... The Fusion IO fares a bit better then regular SSDs 
>> using SATA.
>>
>> For instance refresh time for Fusion IO is 250ms while for regular SSDs 
>> (SATA NOT SAS, will test SAS when I get a chance) it's just above 1 second.
>> Even with fusion IO I do see some warnings on the index stats, but 
>> slightly better then regular SSDs
>>
>> Some strategies I picked for my indexes...
>> - New index per day, plus routing by "user"
>> - New index per day for monster users.
>>
>> Using JMeter to test...
>> - Achieved 3,500 index operations per second (Not bulk) avg document size 
>> 2,500 bytes (Fusion IO seemed to perform a bit better)
>> - Created a total of 25 indexes totaling over 100,000,000 documents 
>> anywhere between 3,000,000 to 5,000,000 documents per index.
>> - Scroll query to retrieve 15,000,000 documents out of the 100,000,000 
>> (all indexes) took 25 minutes regardless of drive type.
>>
>> P.s: I want to index 2,000,000,000 documents per year so about 4,000,000 
>> per day. So you can see why Fusion IO could be expensive :)
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/24928d08-6354-4661-8164-9ff665709285%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/13e20470-a38e-4d89-be98-5d6e26b0f0aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel index.html is giving Blank page

2014-07-11 Thread David Pilato
It should work on a windows machine as well.
What was your error on windows ?

May be a firewall issue?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 11 juil. 2014 à 12:37, srinu konda  a écrit :

Hi David,

Thank you

Actually I was using windows machinenow I have installed marvel plugin on 
server and from there am able to connect.

Regards,
Srinivas Konda.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2540f6ca-054d-4317-8365-6704e142a6d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/F7765332-CC70-4781-B751-7E86CE188A8D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic search dynamic number of replicas from Java API

2014-07-11 Thread Glen Smith
Hi Goncalo,

I think it's important that you understand: multiple copies of a shard will 
never be located on the same node.
Not two replicas, and not the primary and one replica.
To witness this, run a server on your local machine, and create an index 
with the defaults - 5 shards, one replica.
You will see that your cluster is "yellow", and has 5 unallocated shards.

How that helps create a better mental picture of shard allocation.
 

On Friday, July 11, 2014 2:00:47 AM UTC-4, Gonçalo Luiz wrote:
>
> Hi Ivan,
>
> Does this mean that if a note comes back and a replication is underway 
> we'll end up with a node holding 2 replicas and 1 node holding node ?
>
> Scenario:
>
> Node A - Replica 2
> Node B - Replica 3
> Node C - Replica 1
>
> If node A dies and Node B get's Replica 2, as soon as node A (or a 
> replacement) is brought up, is the final configuration likely to be
>
> Node A (or replcament) - No replicas
> Node B .- Replica 3 and 2 
> Node C - Replica 1
>
> or is there a re-balance that takes place ?
>
> Thanks,
> Gonçalo
>
> Gonçalo Luiz
>
>
> On 10 July 2014 22:11, Ivan Brusic > wrote:
>
>> It's only been around for 3.5 years: 
>> https://github.com/elasticsearch/elasticsearch/issues/623 :)
>>
>> I should clarify part of my previous statement.
>>
>> *"By default, the ongoing recovery is not cancelled when the missing node 
>> rejoins the cluster. You can change the gateway settings [2] to control 
>> when recovery kicks in."*
>>
>> What I meant to say is that an ongoing recovery is never cancelled once 
>> it has commenced, no matter what settings. By default, recovery happens 
>> immediately, but can be changed with the gateway settings.
>>
>> -- 
>> Ivan
>>
>>
>> On Thu, Jul 10, 2014 at 1:48 PM, joerg...@gmail.com  <
>> joerg...@gmail.com > wrote:
>>
>>> Indeed,  auto_expand_replicas "all" triggers an update cluster settings 
>>> action each time a node is added.
>>>
>>> Still blown by the many settings Elasticsearch provides. Feeling small. 
>>> Homework: collecting a gist textfile of all ES 1.2 settings.
>>>
>>> Jörg
>>>
>>>
>>>  On Thu, Jul 10, 2014 at 9:57 PM, Ivan Brusic >> > wrote:
>>>
  Sticking to your use case, you might want to use 
 the auto_expand_replicas setting to "all" [1]: Never used it, but it 
 sounds 
 what you are looking for.

 By default, the ongoing recovery is not cancelled when the missing node 
 rejoins the cluster. You can change the gateway settings [2] to control 
 when recovery kicks in.

 [1] 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html
 [2] 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html
  
 Cheers,

 Ivan


 On Thu, Jul 10, 2014 at 12:39 PM, Gonçalo Luiz >>> > wrote:

> I get it know.
>
> I agree that setting the number of replicas is connected to the 
> deployment reality in each case and it's derived variables and thus there 
> is no one formula to fit all cases (it would't be a setting in that case).
>
> What I was trying to cover was the theoretical / extreme case where 
> any node may fail at any time and what is the best way to go to minimize 
> the chance of losing data. Also, in the case you want to scale down the 
> installation (pottentially down to one node) without having to worry 
> about 
> selecting nodes that hold different replicated shards is an example that 
> can beneffit from such configuration.
>
> I'm however not clear yet on what happens when a node goes down 
> (triggering extra replication amongst the survivors) and then comes up 
> again. Is the ongoing replication cancelled and the returning node 
> brought 
> up to date?
>
> Thanks for your valuable input.
>
> G.
> On 10 Jul 2014 18:07, "joerg...@gmail.com " <
> joerg...@gmail.com > wrote:
>
>> All I say is that it depends on the probability of the event of three 
>> nodes failing simultaneously, not on the total number of nodes having a 
>> replica. You can even have 5 nodes and the probability of the event of 4 
>> nodes failing simultaneously, and so on.
>>
>> As an illustration, suppose you have a data center with two 
>> independent electric circuits and the probability of failure corresponds 
>> with power outage, then it is enough to distribute nodes equally over 
>> servers using the two independent power lines in the racks. If one 
>> electric 
>> circuit (plus UPS) fails, half of the nodes go down. With replica level 
>> 1, 
>> ES cluster will keep all the data. There is no need to set replica level 
>> equal to node number.
>>
>> Jörg
>>
>>
>> On Thu, Jul 10, 2014 at 8:55 AM, Gonçalo Luiz > > wrote:
>>
>>> Hi Joe,
>>>
>>> Thanks for your reply.
>>> On this thougth:
>>>
>>> "
>

How to "translate" the "where" clause from sql query to elasticsearch filter

2014-07-11 Thread Artem Frolov
 

Hello! I'm implementing the filter for the internet shop using elastic 
search. I have an EAV model in my RDBMS.

I have a WHERE clause in my SQL query, which have to be translated into the 
elasticsearch bool filter.It works just fine in my rdbms.

here's it is:

WHERE (option = "weight" AND value = "50kg")OR (option = "weight" AND value = 
"500kg")AND (option = "magic" AND value = "no")

I have written the AND filters for inner ANDs of query, but now I need to 
put them to the bool filter.

Tried to: (using elastica php library)

$boolFilter = new 
\Elastica\Filter\Bool();$boolFilter->addShould($innerFilterAnd1);$boolFilter->addShould($innerFilterAnd2);$boolFilter->addMust($innerFilterAnd3);

returns nothing.

Please, help!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb5db090-e087-4481-8d6d-3d1783543d85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Impossible to implement real custom boost query when the weight is in the child document?

2014-07-11 Thread cdebry
I found a workaround using rescore. It's not ideal, but with a large enough
window, it should yield good results. Here's your query again, rewritten
with a rescore.

GET index/document/_search
{
  "query": {
"match": {
  "title": "basketball"
}
  },
  "rescore": {
"window_size": 100,
"query": {
  "score_mode": "multiply",
  "rescore_query": {
"has_child": {
  "type": "document_boost",
  "query": {
"function_score": {
  "script_score": {
"script": "doc['document_boost.popular_boost_recent'].value"
  }
}
  }
}
  }
}
  }
}



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Impossible-to-implement-real-custom-boost-query-when-the-weight-is-in-the-child-document-tp4057206p4059644.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1405051569505-4059644.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Apply sort on specific index

2014-07-11 Thread navneet.bits
Hi

I have two indexes on my setup - Index-A and Index-B. I want the results of
the query to be sorted. The field on which I want to sort (say
firstName.raw) is present in both the indexes.

My question - Is there a way by which I can sort results only from one of
the indexes (say Index-A) ? I don't want to sort results form Index-B. 

SideNote-1 : I wanted to do the same with filters as well. Which means, I
wanted to apply the filter on a field only on results from one of the
indexes. I was able to do this using Indices filter - 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-indices-filter.html#query-dsl-indices-filter.
How can I do something similar for sorting ? 

Thanks for your help!

Regards
Navneet



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Apply-sort-on-specific-index-tp4059603.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1405012249438-4059603.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Impossible to implement real custom boost query when the weight is in the child document?

2014-07-11 Thread cdebry
I have the exact same issue except that I need to boost a child query based
on a value in the parent. Sadly, I went through the same exercise and came
to the same conclusions.

I agree that the last query is the correct approach. At first, I assumed
that the "has_child" filter was out of scope with the function; however, it
recognized the field name without throwing an error. The issue is that it's
not returning the field value, so it defaults to 1 and effectively doesn't
have any impact on the score.

This definitely seems to be a bug. Have you logged it  here
  ?



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Impossible-to-implement-real-custom-boost-query-when-the-weight-is-in-the-child-document-tp4057206p4059633.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1405035420852-4059633.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel index.html is giving Blank page

2014-07-11 Thread srinu konda
Hi David,

Thank you

Actually I was using windows machinenow I have installed marvel plugin 
on server and from there am able to connect.

Regards,
Srinivas Konda.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2540f6ca-054d-4317-8365-6704e142a6d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can I use the java client of newer version to connect to a old version server?

2014-07-11 Thread xzer LR
I am using TransportClient, the following is how I retrieve the client 
instance:

Client client = new 
TransportClient(sb.build()).addTransportAddresses(esAddresses);

在 2014年7月11日星期五UTC+9下午6时51分26秒,David Pilato写道:
>
> Are you using a TransportClient or NodeClient?
> If NodeClient, could you try with the TransportClient?
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 11 juillet 2014 à 11:14:59, xzer LR (xia...@gmail.com ) a 
> écrit:
>
> As a test result, I got exceptions when I tried to use the newest 1.2.2 
> java client to connect to a 1.0.3 cluster: 
>
>  18:05:41.020 
> [elasticsearch[Slipstream][transport_client_worker][T#1]{New I/O worker 
> #1}] [INFO ] [] org.elasticsearch.client.transport[105] - [Slipstream] 
> failed to get local cluster state for 
> [#transport#-1][e-note][inet[/192.168.200.81:9300]], disconnecting...
> org.elasticsearch.transport.RemoteTransportException: 
> [server-cat][inet[/192.168.21.81:9300]][cluster/state]
> java.lang.IndexOutOfBoundsException: Readable byte limit exceeded: 48
> at 
> org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236)
>  
> ~[elasticsearch-1.2.2.jar:na]
> at 
> org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132)
>  
> ~[elasticsearch-1.2.2.jar:na]
> at 
> org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141) 
> ~[elasticsearch-1.2.2.jar:na]
> at 
> org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272)
>  
> ~[elasticsearch-1.2.2.jar:na]
> at 
> org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:61)
>  
> ~[elasticsearch-1.2.2.jar:na]
> at 
> org.elasticsearch.common.io.stream.StreamInput.readStringArray(StreamInput.java:362)
>  
> ~[elasticsearch-1.2.2.jar:na]
> at 
> org.elasticsearch.action.admin.cluster.state.ClusterStateRequest.readFrom(ClusterStateRequest.java:132)
>  
> ~[elasticsearch-1.2.2.jar:na]
> at 
> org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209)
>  
> ~[elasticsearch-1.2.2.jar:na]
> at 
> org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
>  
> ~[elasticsearch-1.2.2.jar:na]
>
> I didn't find any metioned break change about this exceptioin.
>
> 在 2014年7月4日星期五UTC+9下午3时31分07秒,David Pilato写道: 
>>
>>  Well. It depends.
>>
>> 1.0 is incompatible with 0.90
>> 1.2 should work with 1.x IIRC.
>>
>> From 1.0, we try to keep this compatible. If not, release notes will tell 
>> you.
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>>  
>> Le 4 juil. 2014 à 07:09, xzer LR  a écrit :
>>
>>  For some reasons, we have several separated elasticsearch clusters for 
>> our front applicaitons. We want to upgrade our clusters' version to the 
>> newest version but apparently it is impossible to upgrade all the clusters 
>> at the same time, which means our single application have to connect to 
>> multiple clusters with different versions.  
>>
>> My question is whether the elasticsearch java client has the ability to 
>> work correctly with an old version server?
>>  --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/baa98ec5-ffcf-46f9-bfdd-7afbd213b19d%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>  
>  --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/77e32825-812a-46c8-82b4-93a5e4b12788%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/df3afd7e-b0a5-4d26-a777-fc887427bbed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reading and writing the same document too fast --> data loss

2014-07-11 Thread Michael McCandless
Maybe you need to use versioning, to ensure the 3rd write doesn't undo
(overwrite) the changes of the 2nd write?

See
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/optimistic-concurrency-control.html

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jul 11, 2014 at 6:24 AM, Peter Webber 
wrote:

>
> Hello,
>
>
> We store texts in Elasticsearch, where each text has an ID attached. Every
> day we run a batch job to add new documents. Sometimes a new document
> consists of a text that we already have in the database, but it has a
> different ID. In such a case we need to read the document that's alredy
> indexed and add the new ID to this existing document.
>
> Now consider the following scenario:
>
> DocumentA with text "Hello" and  ID #1 is indexed.
>
> We now add documentB with text "Hello" and ID #2
> To do this we find documentA which has the same text, read it, add ID #2
> and save it again.
>
> Then we want to add documentC with text "Hello" and ID #3
> To do this we find documentA which has the same text, read it, add ID #3
> and save it again.
>
> What do we get as a result? It's a bit unpredictable but quite often:
> DocumentA with text "Hello" and IDs #1 and #3. This means ID #2 is missing.
>
> It seems like the first write (with ID #2) has not been completed, when
> the second read is done.
>
>
> I guess we are not the first to encounter these issues. What are common
> strategies to deal with this?
>
>
> Regards
> Peter
>
>
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3da9cec9-e48d-4c55-b7ed-330235322c4f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7nZ88G4GfkJK5-E5X-3yBtMhx9B705p8EzzHLqdcinQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Reading and writing the same document too fast --> data loss

2014-07-11 Thread Peter Webber

Hello,


We store texts in Elasticsearch, where each text has an ID attached. Every 
day we run a batch job to add new documents. Sometimes a new document 
consists of a text that we already have in the database, but it has a 
different ID. In such a case we need to read the document that's alredy 
indexed and add the new ID to this existing document.

Now consider the following scenario:

DocumentA with text "Hello" and  ID #1 is indexed.

We now add documentB with text "Hello" and ID #2
To do this we find documentA which has the same text, read it, add ID #2 
and save it again.

Then we want to add documentC with text "Hello" and ID #3
To do this we find documentA which has the same text, read it, add ID #3 
and save it again.

What do we get as a result? It's a bit unpredictable but quite often:
DocumentA with text "Hello" and IDs #1 and #3. This means ID #2 is missing.

It seems like the first write (with ID #2) has not been completed, when the 
second read is done.


I guess we are not the first to encounter these issues. What are common 
strategies to deal with this?


Regards
Peter






-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3da9cec9-e48d-4c55-b7ed-330235322c4f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Node works slow about 10 seconds after initialization

2014-07-11 Thread Mike Theairkit
Thanks for answer!
I will test with this options.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/90d6b137-8d88-4e36-a876-c01203b7baed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can I use the java client of newer version to connect to a old version server?

2014-07-11 Thread David Pilato
Are you using a TransportClient or NodeClient?
If NodeClient, could you try with the TransportClient?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 11:14:59, xzer LR (xiao...@gmail.com) a écrit:

As a test result, I got exceptions when I tried to use the newest 1.2.2 java 
client to connect to a 1.0.3 cluster:

18:05:41.020 [elasticsearch[Slipstream][transport_client_worker][T#1]{New I/O 
worker #1}] [INFO ] [] org.elasticsearch.client.transport[105] - [Slipstream] 
failed to get local cluster state for 
[#transport#-1][e-note][inet[/192.168.200.81:9300]], disconnecting...
org.elasticsearch.transport.RemoteTransportException: 
[server-cat][inet[/192.168.21.81:9300]][cluster/state]
java.lang.IndexOutOfBoundsException: Readable byte limit exceeded: 48
at 
org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236)
 ~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132)
 ~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141) 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272) 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:61)
 ~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.common.io.stream.StreamInput.readStringArray(StreamInput.java:362)
 ~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.action.admin.cluster.state.ClusterStateRequest.readFrom(ClusterStateRequest.java:132)
 ~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209)
 ~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
 ~[elasticsearch-1.2.2.jar:na]

I didn't find any metioned break change about this exceptioin.

在 2014年7月4日星期五UTC+9下午3时31分07秒,David Pilato写道:
Well. It depends.

1.0 is incompatible with 0.90
1.2 should work with 1.x IIRC.

>From 1.0, we try to keep this compatible. If not, release notes will tell you.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 4 juil. 2014 à 07:09, xzer LR  a écrit :

For some reasons, we have several separated elasticsearch clusters for our 
front applicaitons. We want to upgrade our clusters' version to the newest 
version but apparently it is impossible to upgrade all the clusters at the same 
time, which means our single application have to connect to multiple clusters 
with different versions. 

My question is whether the elasticsearch java client has the ability to work 
correctly with an old version server?
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/baa98ec5-ffcf-46f9-bfdd-7afbd213b19d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77e32825-812a-46c8-82b4-93a5e4b12788%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53bfb38f.1190cde7.70e%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Node works slow about 10 seconds after initialization

2014-07-11 Thread David Pilato
Some comments:

Unrelated to your "issue": I would not boost by 1. It's too much in my 
opinion and potentially could have opposite effect as you are expecting IIRC.
Boosting by 1.5, 2 and 3 should be enough.
I was expecting more OR queries but it looks like you have a reasonable list of 
terms here.

May be you could try some options from here:

- If only the first queries are slow, try to use warmers so the first query a 
user will send will get a better response time: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-warmers.html#indices-warmers
- Remove highlighting and see how it performs
- Remove aggs and see how it performs

Once you have the culprit, you can try to focus on that.

HTH

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 10:50:44, Mike Theairkit (theair...@gmail.com) a écrit:

Typical query: https://gist.github.com/anonymous/20fc650ca2ada3928b0b
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/81aaa013-e4a7-4377-9b34-bdc158c49835%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53bfb156.4db127f8.70e%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Lumberjack loosing grokked fields ??

2014-07-11 Thread Siddharth Trikha
I am using logstash 1.4.1 on my client machine (where logs are present) and 
server machine (where logstash parses events).

On *client* I read logs:

input {
file{
path => "/root/Desktop/Logstash-Input/**/*_log"
start_position => "beginning"
}
}

filter {
grok {

match => ["path", 
"/root/Desktop/Logstash-Input/(?[^/]+)/(?[^/]+)/(?[\d]+.[\d]+.[\d]+)/(?.*)_log"]
}
}

output {

lumberjack {
hosts => ["192.168.105.71"]
port => 4545
ssl_certificate => "./logstash.pub"
}

stdout { codec => rubydebug }
}

*Console: *













*filter received {:event=>{"message"=>"2014-05-26T00:00:01+05:30 bxas1 
crond[268]: (rt) CMD (2014/05/31/server2/cron/log)", "@version"=>"1", 
"@timestamp"=>"2014-07-11T09:14:28.740Z", "host"=>"cmd", 
"path"=>"/root/Desktop/Logstash-Input/Server2/CronLog/2014.05.31/cron_log"}, 
:level=>:debug, :file=>"(eval)", :line=>"18"}{"message" => 
"2014-05-26T00:00:01+05:30 bxas1 crond[268]: (rt) CMD 
(2014/05/31/server2/cron/log)",   "@version" => "1", "@timestamp" 
=> "2014-07-11T09:14:28.735Z",   "host" => "cmd",   "path" 
=> 
"/root/Desktop/Logstash-Input/Server2/CronLog/2014.05.31/cron_log", 
"server" => "Server2","logtype" => "CronLog","logdate" => 
"2014.05.31","logfilename" => "cron"}*
*On server:*

input { 
lumberjack {
port => 4545
ssl_certificate => "/etc/ssl/logstash.pub"
ssl_key => "/etc/ssl/logstash.key"
codec => "json"
  }
}

filter {
if [server] == "Server2" and [logtype] == "CronLog" {

grok{
match => ["message", "Pattern.."]
add_tag => "server2-cronlog"
}   
}else if [server] == "Server2" and [logtype] == "AuthLog"{

grok{
match => ["message", "...Pattern.."]
}
}


*Server-Console:*







*filter received {:event=>{"message"=>"2014-07-11T09:29:59.730+ cmd 
2014-05-26T00:00:01+05:30 bx920as1 crond[268]: (rorit) CMD 
(2014/05/31/server2/cron/log)", "@version"=>"1", 
"@timestamp"=>"2014-07-11T09:30:41.772Z"}, :level=>:debug, :file=>"(eval)", 
:line=>"30"}output received 
{:event=>{"message"=>"2014-07-11T09:29:59.730+ cmd 
2014-05-26T00:00:01+05:30 bx920as1 crond[26388]: (rorit) CMD 
(2014/05/31/server2/cron/log)", "@version"=>"1", 
"@timestamp"=>"2014-07-11T09:30:41.772Z"}, :level=>:debug, :file=>"(eval)", 
:line=>"100"}{   "message" => "2014-07-11T09:29:59.730+ cmd 
2014-05-26T00:00:01+05:30 bx920as1 crond[268]: (rorit) CMD 
(2014/05/31/server2/cron/log)",  "@version" => "1","@timestamp" => 
"2014-07-11T09:30:41.772Z"}*
So as one can see the grokked fields at the client machine are lost after 
shipping via lumberjack. Is this a bug??

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/136d308e-3f02-42cd-bf8d-0bcd9d665e74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can I use the java client of newer version to connect to a old version server?

2014-07-11 Thread xzer LR
As a test result, I got exceptions when I tried to use the newest 1.2.2 
java client to connect to a 1.0.3 cluster:

18:05:41.020 [elasticsearch[Slipstream][transport_client_worker][T#1]{New 
I/O worker #1}] [INFO ] [] org.elasticsearch.client.transport[105] - 
[Slipstream] failed to get local cluster state for 
[#transport#-1][e-note][inet[/192.168.200.81:9300]], disconnecting...
org.elasticsearch.transport.RemoteTransportException: 
[server-cat][inet[/192.168.21.81:9300]][cluster/state]
java.lang.IndexOutOfBoundsException: Readable byte limit exceeded: 48
at 
org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236)
 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132)
 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141) 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272) 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:61)
 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.common.io.stream.StreamInput.readStringArray(StreamInput.java:362)
 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.action.admin.cluster.state.ClusterStateRequest.readFrom(ClusterStateRequest.java:132)
 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209)
 
~[elasticsearch-1.2.2.jar:na]
at 
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
 
~[elasticsearch-1.2.2.jar:na]

I didn't find any metioned break change about this exceptioin.

在 2014年7月4日星期五UTC+9下午3时31分07秒,David Pilato写道:
>
> Well. It depends.
>
> 1.0 is incompatible with 0.90
> 1.2 should work with 1.x IIRC.
>
> From 1.0, we try to keep this compatible. If not, release notes will tell 
> you.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 4 juil. 2014 à 07:09, xzer LR > a 
> écrit :
>
> For some reasons, we have several separated elasticsearch clusters for our 
> front applicaitons. We want to upgrade our clusters' version to the newest 
> version but apparently it is impossible to upgrade all the clusters at the 
> same time, which means our single application have to connect to multiple 
> clusters with different versions. 
>
> My question is whether the elasticsearch java client has the ability to 
> work correctly with an old version server?
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/baa98ec5-ffcf-46f9-bfdd-7afbd213b19d%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77e32825-812a-46c8-82b4-93a5e4b12788%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Node works slow about 10 seconds after initialization

2014-07-11 Thread Mike Theairkit
Typical query: https://gist.github.com/anonymous/20fc650ca2ada3928b0b

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/81aaa013-e4a7-4377-9b34-bdc158c49835%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to translate “and or” where clause from sql query to elasticsearch filter

2014-07-11 Thread David Pilato
A full script would allow any user on the mailing list to recreate from scratch 
your issue without the need of building a script by ourselves which is really 
time consuming.

So, basically a script should look like this:

// Remove test data
DELETE test

// If needed, add your settings/mappings
PUT test
{
  "settings": {},
  "mappings": {}
}

// Index some data
PUT test/doc/1
{
  "foo":"bar"
}

PUT test/doc/x?refresh
{
  "foo":"bar"
}

// Run the query
GET test/doc/_search
{
}

With that, we can definitely help you I think.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 10:44:01, Artem Frolov (kee...@gmail.com) a écrit:

 Can you answer what's wrong with the gist, i've provided? I can't figure out...
I provide you all the data I have, related to this issue. Look:
1) The elasticsearch index, containing the products and its options, which have 
to be filtered: https://gist.github.com/ArFeRR/de86b8b0a5f2bc7dfd86
2) The JSON query for the filtration: 
https://gist.github.com/ArFeRR/e159ef1047122a617b88
3)The ELastica.io code to genereate the json above on PHP:
https://gist.github.com/ArFeRR/cebb2bf54232069d817b#file-gistfile1-php

(I've changed the data to make it more real)

All the json works fine, i've tested it! Can't figure out what's wrong and what 
I have to do to provide a "FULL working GIST" for you



четверг, 10 июля 2014 г., 18:37:19 UTC+3 пользователь David Pilato написал:
Have a look at this page to see how you can build a full working GIST which 
could help us to reproduce your use case.

When your GIST will be updated, please update this thread so I can look at it.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 10 juillet 2014 à 11:59:22, Artem Frolov (kee...@gmail.com) a écrit:

the GIST:
https://gist.github.com/ArFeRR/630acb216b8d95168b73

четверг, 10 июля 2014 г., 12:51:49 UTC+3 пользователь Artem Frolov написал:
here's my try to solve it:

{
   "filtered":{
  "filter":{
 "nested":{
"path":"productsOptionValues",
"filter":{
   "and":[
  {
 "or":[
{
   "and":[
  {
 "term":{

"productsOptionValues.productOption":"weight"
 }
  },
  {
 "term":{
"productsOptionValues.value":"500 kg"
 }
  }
   ]
},
{
   "and":[
  {
 "term":{

"productsOptionValues.productOption":"weight"
 }
  },
  {
 "term":{
"productsOptionValues.value":"50kg"
 }
  }
   ]
}
 ]
  },
  {
 "or":[
{
   "and":[
  {
 "term":{
"productsOptionValues.productOption":"magic"
 }
  },
  {
 "term":{
"productsOptionValues.value":"no"
 }
  }
   ]
}
 ]
  }
   ]
}
 }
  }
   }
}

but it's an equivalent of:

 WHERE ((
option = "weight" AND value = "50kg"
)
OR (
option = "weight" AND value = "500kg"
))
AND (
option = "magic" AND value = "no"
)

it's wrong tree... I need the and\or logic to be at one branch of the json 
tree. I don't know if it possible. Please help to translate the logic from the 
query where condition!


четверг, 10 июля 2014 г., 12:23:05 UTC+3 пользователь David Pilato написал:
It could help if you could gist a full SENSE/curl script recreation

Best

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 10 juil. 2014 à 11:15, Artem Frolov  a écrit :

I have a WHERE clause in my SQL query, which have to be translated into the 
elasticsearch bool filter.

here's the where clause:

WHERE (
option = "weight" AND value = "50kg"
)
OR (
option = "weight" AND value = "500kg"
)
AND (
option = "magic" AND value = "no"
)
I have written the AND filters for inner ANDs of query, but now I need 

Re: how to translate “and or” where clause from sql query to elasticsearch filter

2014-07-11 Thread Artem Frolov
 Can you answer what's wrong with the gist, i've provided? I can't figure 
out...
I provide you all the data I have, related to this issue. Look:
1) The elasticsearch index, containing the products and its options, which 
have to be filtered: https://gist.github.com/ArFeRR/de86b8b0a5f2bc7dfd86
2) The JSON query for the filtration: 
https://gist.github.com/ArFeRR/e159ef1047122a617b88
3)The ELastica.io code to genereate the json above on PHP:
https://gist.github.com/ArFeRR/cebb2bf54232069d817b#file-gistfile1-php

(I've changed the data to make it more real)

All the json works fine, i've tested it! Can't figure out what's wrong and 
what I have to do to provide a "FULL working GIST" for you



четверг, 10 июля 2014 г., 18:37:19 UTC+3 пользователь David Pilato написал:
>
> Have a look at this page to see how you can build a full working GIST 
> which could help us to reproduce your use case.
>
> When your GIST will be updated, please update this thread so I can look at 
> it.
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 10 juillet 2014 à 11:59:22, Artem Frolov (kee...@gmail.com 
> ) a écrit:
>
> the GIST:
> https://gist.github.com/ArFeRR/630acb216b8d95168b73
>
> четверг, 10 июля 2014 г., 12:51:49 UTC+3 пользователь Artem Frolov 
> написал: 
>>
>> here's my try to solve it:
>>
>> {
>>"filtered":{
>>   "filter":{
>>  "nested":{
>> "path":"productsOptionValues",
>> "filter":{
>>"and":[
>>   {
>>  "or":[
>> {
>>"and":[
>>   {
>>  "term":{
>> "productsOptionValues.productOption":
>> "weight"
>>  }
>>   },
>>   {
>>  "term":{
>> "productsOptionValues.value":"500 kg"
>>  }
>>   }
>>]
>> },
>> {
>>"and":[
>>   {
>>  "term":{
>> "productsOptionValues.productOption":
>> "weight"
>>  }
>>   },
>>   {
>>  "term":{
>> "productsOptionValues.value":"50kg"
>>  }
>>   }
>>]
>> }
>>  ]
>>   },
>>   {
>>  "or":[
>> {
>>"and":[
>>   {
>>  "term":{
>> "productsOptionValues.productOption":
>> "magic"
>>  }
>>   },
>>   {
>>  "term":{
>> "productsOptionValues.value":"no"
>>  }
>>   }
>>]
>> }
>>  ]
>>   }
>>]
>> }
>>  }
>>   }
>>}
>> }
>>
>> but it's an equivalent of:
>>
>> WHERE ((option = "weight" AND value = "50kg")OR (option = "weight" AND value 
>> = "500kg"))AND (option = "magic" AND value = "no")
>>
>> it's wrong tree... I need the and\or logic to be at one branch of the json 
>> tree. I don't know if it possible. Please help to translate the logic from 
>> the query where condition!
>>
>>
>> четверг, 10 июля 2014 г., 12:23:05 UTC+3 пользователь David Pilato 
>> написал: 
>>>
>>>  It could help if you could gist a full SENSE/curl script recreation
>>>
>>> Best
>>>
>>> --
>>> David ;-) 
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>  
>>> Le 10 juil. 2014 à 11:15, Artem Frolov  a écrit :
>>>
>>>   I have a WHERE clause in my SQL query, which have to be translated 
>>> into the elasticsearch bool filter.
>>>
>>> here's the where clause:
>>>
>>> WHERE (option = "weight" AND value = "50kg")OR (option = "weight" AND value 
>>> = "500kg")AND (option = "magic" AND value = "no")
>>>
>>> I have written the AND filters for inner ANDs of query, but now I need 
>>> to put them to the bool filter.
>>>
>>> Tried to:
>>>
>>> $boolFilter = new 
>>> \Elastica\Filter\Bool();$boolFilter->addShould($innerFilterAnd1);$boolFilter->addShould($innerFilterAnd2);$boolFilter->addMust($innerFilterAnd3);
>>>
>>> returns nothing.
>>>
>>> Plea

Re: Node works slow about 10 seconds after initialization

2014-07-11 Thread David Pilato
What does your query look like?

Note: please don't add attachment to your emails but prefer put them on 
gist.github.com for example

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 10:29:11, Mike Theairkit (theair...@gmail.com) a écrit:

I ran tests again, and get hot_threads when problem occurs
See attachment.
So, when node is slow, there are threads which using more than 60%cpu vs. in 
normal work they using ~2-5%cpu
May you help me to interpret this log?

About warmers - I see slow work, when using warmers
For now  - there are no wamers.

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f948cd4a-33ed-442b-8bf4-e4b30a61d3a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53bfa225.3d1b58ba.70e%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Node works slow about 10 seconds after initialization

2014-07-11 Thread Mike Theairkit
I ran tests again, and get hot_threads when problem occurs
See attachment.
So, when node is slow, there are threads which using more than 60%cpu vs. 
in normal work they using ~2-5%cpu
May you help me to interpret this log?

About warmers - I see slow work, when using warmers
For now  - there are no wamers.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f948cd4a-33ed-442b-8bf4-e4b30a61d3a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
::: 
[Kafka][5yLycO0mRS2JlNKog3PElA][search2][inet[/172.16.76.22:9300]]{master=true}
   
   51.9% (259.3ms out of 500ms) cpu usage by thread 
'elasticsearch[Kafka][search][T#37]'
 2/10 snapshots sharing following 19 elements
   
org.apache.lucene.search.DisjunctionScorer.heapAdjust(DisjunctionScorer.java:55)
   
org.apache.lucene.search.DisjunctionScorer.nextDoc(DisjunctionScorer.java:131)
   
org.apache.lucene.search.FilteredQuery$QueryFirstScorer.score(FilteredQuery.java:165)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
   
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:345)
   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116)
   
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:330)
   
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
   
org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
   
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
   
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
   
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:744)
 8/10 snapshots sharing following 18 elements
   
org.apache.lucene.search.DisjunctionScorer.nextDoc(DisjunctionScorer.java:128)
   
org.apache.lucene.search.FilteredQuery$QueryFirstScorer.score(FilteredQuery.java:165)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
   
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)
   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:345)
   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116)
   
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:330)
   
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
   
org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
   
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
   
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
   
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:744)
   
   50.5% (252.3ms out of 500ms) cpu usage by thread 
'elasticsearch[Kafka][search][T#12]'
 4/10 snapshots sharing following 17 elements
   
org.apache.lucene.search.DisjunctionScorer.heapAdjust(DisjunctionScorer.java:55)
   
org.apache.lucene.search.DisjunctionScorer.nextDoc(DisjunctionScorer.java:131)
   
org.apache.lucene.search.FilteredQuery$QueryFirstScorer.sc

Re: Indexing files from filesystem

2014-07-11 Thread Daniel Berretz
I just had a look at their website, the youtube video of their own 
presentation and I red a bit about it in generally, how it works.
For me, it now just looks like I give him a file C:\Apache\logs.txt and it 
works with it. 
What I look for is something I can for example say: Check our company´s 
drive where are sub folders like marketing, projects with again have sub 
folders and so on and index me into elasticsearch the path and the name to 
each file in each of those subfolders and if it is a word document or a pdf 
then also put its content into elasticsearch. So we can search not only for 
file names and path but also in the file contents.
I did a small tool for it written in Delphi (because we develop in Delphi) 
but it uses some libs we want to get rid of so we can use that system in 
our product as well for indexing documents. Logstash doesn´t look like it 
is made for that. 
So is there a plugin or something else which is able to do so?

On Friday, July 11, 2014 9:20:50 AM UTC+2, Mark Walkom wrote:
>
> Check out Logstash, it'll do most of what you want.
> http://logstash.net/
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 11 July 2014 17:15, Dan Ber > wrote:
>
>> Hey,
>>
>> I just wondered if it is somehow possible to index files from a directory 
>> on HDD and their contents if they are textfiles or word documents and maybe 
>> even PDFs.
>> I read about FSRiver but could not test it becauser it seems to be not 
>> working with es 1.2.1 due to a bug.
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel index.html is giving Blank page

2014-07-11 Thread David Pilato
The issue is that you did not follow installation guide! 
http://www.elasticsearch.org/overview/marvel/download/

Why this?
Was that not working for you?

Could you try again to follow the install guide before doing anything by 
yourself?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 09:46:30, srinu konda (konda.srin...@gmail.com) a écrit:

I have downloaded marvel which containing marvel-1.1.1.jar file and I kept it 
in Elastic search lib folder, I have restarted ES server and trying to access  
http://localhost:9200/_plugin/marvel/sense/index.html  URL...but still same 
blank screen is coming.

Please let me know what might be the issue.

Thanks,
srinivas K.
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/668005a2-2342-4ad6-8713-fd7a97cfce39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53bf9a96.66334873.70e%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana Comparison between two indexes data

2014-07-11 Thread srinu konda
Please let me know whether above scenario is possible or not.

Thanks,
Srinivas K.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/abe4dadc-9914-48b8-b783-6877412bd726%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel index.html is giving Blank page

2014-07-11 Thread srinu konda
I have downloaded marvel which containing marvel-1.1.1.jar file and I kept 
it in Elastic search lib folder, I have restarted ES server and trying to 
access  http://localhost:9200/_plugin/marvel/sense/index.html  URL...but 
still same blank screen is coming.

Please let me know what might be the issue.

Thanks,
srinivas K.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/668005a2-2342-4ad6-8713-fd7a97cfce39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Node works slow about 10 seconds after initialization

2014-07-11 Thread David Pilato
Any clue when looking at hot_threads?

May be it's just because your segments are not warmed yet and running the first 
queries take some time.
Do you have warmers?


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 08:39:24, Mike Theairkit (theair...@gmail.com) a écrit:

I check my configs, now are identical, HEAP set to 4GB

Problem with slow work after initialization still persists.

Attach new cluster_setting to this message.
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/467da7e4-e9f5-46a4-a0a8-e9bb1a530c7b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53bf95c1.643c9869.681%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation using the results of an aggregation?

2014-07-11 Thread Rémi Nonnon
Hi, 

If I am not wrong it's not possible yet to use the result of an aggregation 
in another one.
You have to do that outside elasticsearch.

Regards,
Rémi

Le vendredi 11 juillet 2014 02:39:16 UTC+2, Greg Day a écrit :
>
> Hi guys
>
> Im wondering if it is possible to use the results of an aggregation in a 
> aggregation calculation?
> eg: How would I calculate stocktosales in the example below?
>
> 
>  "aggregations": {
> "product": {
>   "aggregations": {
> "sales": {
>   "sum": {
> "field": "Sale"
>   }
> },
> "onhand": {
>   "avg": {
> "field": "Onhand"
>   }
> },
>"*stocktosales*" : {
>   * should be onhand/sales*
>}
> }
>  }
> }
>
> Thanks!
>
> Greg
>
> Watch the fun new 2 minute and 2 second video tour of Vend 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e701b9f-068d-4410-8eaf-51903bc08b2c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing files from filesystem

2014-07-11 Thread Mark Walkom
Check out Logstash, it'll do most of what you want.
http://logstash.net/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 11 July 2014 17:15, Dan Ber  wrote:

> Hey,
>
> I just wondered if it is somehow possible to index files from a directory
> on HDD and their contents if they are textfiles or word documents and maybe
> even PDFs.
> I read about FSRiver but could not test it becauser it seems to be not
> working with es 1.2.1 due to a bug.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y9UOdNt6Y6H%2B7xcmsJHDjb0HPhQKj8TD2eCYxMuXsR4w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Indexing files from filesystem

2014-07-11 Thread Dan Ber
Hey,

I just wondered if it is somehow possible to index files from a directory 
on HDD and their contents if they are textfiles or word documents and maybe 
even PDFs.
I read about FSRiver but could not test it becauser it seems to be not 
working with es 1.2.1 due to a bug.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel index.html is giving Blank page

2014-07-11 Thread David Pilato
Why not following Marvel install instructions?

Did you have any issue using plugin command?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 11 juil. 2014 à 09:07, srinu konda  a écrit :

Hi,

Am using Elastic search(elasticsearch-1.1.1) marvel plugin in my windows 
machine, I have downloaded and placed marvel-1.1.1.jar in 
elasticsearch-1.1.1\lib  folder. I have restarted my ES server and trying to 
access http://localhost:9200/_plugin/marvel/sense/index.html URL, it is giving 
blank white page.


Please help me to resolve that issue.please find the attachment of that blank 
screen.

Thanks and Regards,
Srinivas k.
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e3721908-b736-415d-ba87-aa4fa5ac2272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/EEC14D60-CED2-427E-A657-B4FA95220A1A%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.