Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Maco Ma
I tried your script with setting iwc.setRAMBufferSizeMB(4)/ and 48G 
heap size. The speed can be around 430 docs/sec before the first flush and 
the final speed is 350 docs/sec. Not sure what configuration Solr uses and 
its ingestion speed can be 800 docs/sec.

Maco

On Wednesday, June 18, 2014 6:09:07 AM UTC+8, Michael McCandless wrote:
>
> I tested roughly your Scenario 2 (100K unique fields, 100 fields per 
> document) with a straight Lucene test (attached, but not sure if the list 
> strips attachments).  Net/net I see ~100 docs/sec with one thread ... which 
> is very slow.
>
> Lucene stores quite a lot for each unique indexed field name and it's 
> really a bad idea to plan on having so many unique fields in the index: 
> you'll spend lots of RAM and CPU.
>
> Can you describe the wider use case here?  Maybe there's a more performant 
> way to achieve it...
>
>
>
> On Fri, Jun 13, 2014 at 2:40 PM, Cindy Hsin  > wrote:
>
>> Hi, Mark:
>>
>> We are doing single document ingestion. We did a performance comparison 
>> between Solr and Elastic Search (ES).
>> The performance for ES degrades dramatically when we increase the 
>> metadata fields where Solr performance remains the same. 
>> The performance is done in very small data set (ie. 10k documents, the 
>> index size is only 75mb). The machine is a high spec machine with 48GB 
>> memory.
>> You can see ES performance drop 50% even when the machine have plenty 
>> memory. ES consumes all the machine memory when metadata field increased to 
>> 100k. 
>> This behavior seems abnormal since the data is really tiny.
>>
>> We also tried with larger data set (ie. 100k and 1Mil documents), ES 
>> throw OOW for scenario 2 for 1 Mil doc scenario. 
>> We want to know whether this is a bug in ES and/or is there any 
>> workaround (config step) we can use to eliminate the performance 
>> degradation. 
>> Currently ES performance does not meet the customer requirement so we 
>> want to see if there is anyway we can bring ES performance to the same 
>> level as Solr.
>>
>> Below is the configuration setting and benchmark results for 10k document 
>> set.
>> scenario 0 means there are 1000 different metadata fields in the system.
>> scenario 1 means there are 10k different metatdata fields in the system.
>> scenario 2 means there are 100k different metadata fields in the system.
>> scenario 3 means there are 1M different metadata fields in the system.
>>
>>- disable hard-commit & soft commit + use a *client* to do commit (ES 
>>& Solr) every 10 second
>>- ES: flush, refresh are disabled
>>   - Solr: autoSoftCommit are disabled
>>- monitor load on the system (cpu, memory, etc) or the ingestion 
>>speed change over time
>>- monitor the ingestion speed (is there any degradation over time?) 
>>- new ES config:new_ES_config.sh 
>>
>> ;
>>  
>>new ingestion: new_ES_ingest_threads.pl 
>>
>> 
>>  
>>- new Solr ingestion: new_Solr_ingest_threads.pl 
>>
>> 
>>- flush interval: 10s
>>
>>
>> Number of different meta data fieldESSolrScenario 0: 100012secs -> 
>> 833docs/sec
>> CPU: 30.24%
>> Heap: 1.08G
>> time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
>> index size: 36M
>> iowait: 0.02%13 secs -> 769 docs/sec
>> CPU: 28.85%
>> Heap: 9.39G
>> time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs -> 
>> 345docs/sec
>> CPU: 40.83%
>> Heap: 5.74G
>> time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
>> iowait: 0.02%
>> Index Size: 36M12 secs -> 833 docs/sec
>> CPU: 28.62%
>> Heap: 9.88G
>> time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2 Scenario 2: 100k17 mins 
>> 44 secs -> 9.4docs/sec
>> CPU: 54.73%
>> Heap: 47.99G
>> time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
>> iowait: 0.02%
>> Index Size: 75M13 secs -> 769 docs/sec
>> CPU: 29.43%
>> Heap: 9.84G
>> time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8 
>> secs -> 0.9 docs/sec
>> CPU: 40.47%
>> Heap: 47.99G
>> time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 1594 15 
>> secs -> 666.7 docs/sec
>> CPU: 45.10%
>> Heap: 9.64G
>> time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2
>>
>> Thanks!
>> Cindy
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com
>>  
>> 
>> 

Retrieving data from Database using Elastic search

2014-06-17 Thread srinu konda
Hi,
 I am trying to pull data from database using Elastic Search,

I have  created below river:

PUT http://localhost:9200/jdbc_river/river1/_meta
{
"type":"jdbc"
"jdbc":
{
"url":"Mysql URL"
"driver":"Mysql driver"
"username":"username"
"password":"XXX"
"sql":"select * from orders"
}
}

Note:Above code working fine but when am trying to create one more 
river(below code) its not working for me

PUT http://localhost:9200/jdbc1/river2/_meta
{
"type":"jdbc1"
"jdbc1":
{
"url":"Mysql URL"
"driver":"Mysql driver"
"username":"username"
"password":"XXX"
"sql":"select * from employee"
}
}

Please help me, Am trying to display the Elastic Search result using kibana.

Thanks & Regards,
Srinivas.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6e5a3da-06f5-48f4-be43-d5c8f59a33dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Splunk vs. Elastic search performance?

2014-06-17 Thread Sabareesh SS
What are the different ways I can make a good use of Elasticsearch?

On Saturday, April 19, 2014 3:03:59 AM UTC+5:30, Frank Flynn wrote:
>
> We have a large Splunk instance.  We load about 1.25 Tb of logs a day.  We 
> have about 1,300 loaders (servers that collect and load logs - they may do 
> other things too).
>
> As I look at Elasticsearch / Logstash / Kibana does anyone know of a 
> performance comparison guide?  Should I expect to run on very similar 
> hardware?  More? or Less?
>
> Sure it depends on exactly what we're doing, the exact queries and the 
> frequency we'd run them but I'm trying to get any kind of idea before we 
> start.
>
> Are there any white papers or other documents about switching?  It seems 
> an obvious choice but I can only find very little performance comparisons 
> (I did see that Elasticsearch just hired "the former VP of Products at 
> Splunk, Gaurav Gupta" - but there were few numbers in that article either).
>
> Thanks,
> Frank
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4b59aaf2-c64f-4299-a066-7533aafac97f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Can I sort has_child query result by child's numeric field?

2014-06-17 Thread fiefdx yang
I found this:
http://www.slideshare.net/martijnvg/document-relationsbbuz2013
it use custom_score, but it can not work at version 1.2.1.
I try to use function_score, but get exception:
nested: PropertyAccessException[[Error: could not access: offer; in class: 
org.elasticsearch.search.lookup.DocLookup]\n[Near : {... 
doc[offer.price].value }
"offer" is the child's index type.
Please some one know that, help me, thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/63c50d63-05ee-4692-b85a-139dab506ead%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Bulk API possible bug

2014-06-17 Thread pablitomusa
Hi guys,
today I was using the bulk API and the data was loading just fine into 
Elasticsearch.
However, when querying Elasticsearch the resulting JSON (apparently ok) was 
invalid, with an extra comma

 "hits": [{
...
"_source":{...},
  },
  {
...
"_source":{...},
  },
}]

After a long time, I found out that my data file for the bulk had a ',' at 
the end of the data line as:
{"index":{"_index":"XXX","_id":222,"_type":"YYY"}}
{"id":222, "test": name},

I am not sure if that is the expected behavior, but it took me a long time 
to find out :(
First, there was no error at bulk, second Elasticsearch is not raising any 
error in query (although it returns a invalid json with the information).
Finally, Kibana was not showing any error when trying to use the index.

Thanks,
Pablo

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0d73335f-7c02-4367-bae3-a4077b12daec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Maco Ma
Hi Mike,

new_ES_config.sh(define the templates and disable the refresh/flush):
curl -XPOST localhost:9200/doc -d '{
  "mappings" : {
  "type" : {
  "_source" : { "enabled" : false },
  "dynamic_templates" : [
{"t1":{
  "match" : "*_ss",
  "mapping":{
"type": "string",
"store":false,
"norms" : {"enabled" : false}
}
}},
{"t2":{
  "match" : "*_dt",
  "mapping":{
"type": "date",
"store": false
}
}},
{"t3":{
  "match" : "*_i",
  "mapping":{
"type": "integer",
"store": false
}
}}
]
  }
}
  }'

curl -XPUT localhost:9200/doc/_settings -d '{
  "index.refresh_interval" : "-1"
}'

curl -XPUT localhost:9200/doc/_settings -d '{
  "index.translog.disable_flush" : true
}'

new_ES_ingest_threads.pl( spawn 10 threads to use curl command to ingest 
the doc and one thread to flush/optimize periodically):

my $num_args = $#ARGV + 1;
if ($num_args < 1 || $num_args > 2) {
  print "\n usuage:$0 [src_dir] [thread_count]\n";
  exit;
}

my $INST_HOME="/scratch/aime/elasticsearch-1.2.1";

my $pid = qx(jps | sed -e '/Elasticsearch/p' -n | sed 's/ .*//');
chomp($pid);
if( "$pid" eq "")
{
  print "Instance is not up\n";
  exit;
}


my $dir = $ARGV[0];
my $td_count = 10;
$td_count = $ARGV[1] if($num_args == 2);
open(FH, ">$lf");
print FH "source dir: $dir\nthread_count: $td_count\n";
print FH localtime()."\n";

use threads;
use threads::shared;

my $flush_intv = 10;

my $no:shared=0;
my $total = 1;
my $intv = 1000;
my $tstr:shared = "";
my $ltime:shared = time;

sub commit {
  $SIG{'KILL'} = sub {`curl -XPOST 
'http://localhost:9200/doc/_flush'`;print "forced commit done on 
".localtime()."\n";threads->exit();};

  while ($no < $total )
  {
`curl -XPOST 'http://localhost:9200/doc/_flush'`;
`curl -XPOST 'http://localhost:9200/doc/_optimize'`;
print "commit on ".localtime()."\n";
sleep($flush_intv);
  }
  `curl -XPOST 'http://localhost:9200/doc/_flush'`;
  print "commit done on ".localtime()."\n";
}

sub do {
  my $c = -1;
  while(1)
  {
{
  lock($no);
  $c=$no;
  $no++;
}
last if($c >= $total);
`curl -XPOST -s localhost:9200/doc/type/$c --data-binary 
\@$dir/$c.json`;
if( ($c +1) % $intv == 0 )
{
  lock($ltime);
  $curtime = time;
  $tstr .= ($curtime - $ltime)." ";
  $ltime = $curtime;
}
  }
}

# start the monitor processes
my $sarId = qx(sar -A 5 10 -o sar5sec_$dir.out > /dev/null &\necho \$!);
my $jgcId = qx(jstat -gc $pid 2s > jmem_$dir.out &\necho \$!);

my $ct = threads->create(\&commit);
my $start = time;
my @ts=();
for $i (1..$td_count)
{
  my $t = threads->create(\&do);
  push(@ts, $t);
}

for my $t (@ts)
{
  $t->join();
}

$ct->kill('KILL');
my $fin = time;

qx(kill -9 $sarId\nkill -9 $jgcId);

print FH localtime()."\n";
$ct->join();
print FH qx(curl 'http://localhost:9200/doc/type/_count?q=*');
close(FH);

new_Solr_ingest_threads.pl is similar to the file  new_ES_ingest_threads.pl 
and uses the different parameters for curl commands. Only post the 
differences here:

sub commit {
  while ($no < $total )
  {
`curl  'http://localhost:8983/solr/collection2/update?commit=true'`;
`curl  'http://localhost:8983/solr/collection2/update?optimize=true'`;
print "commit on ".localtime()."\n";
sleep(10);
  }
  `curl  'http://localhost:8983/solr/collection2/update?commit=true'`;
  print "commit done on ".localtime()."\n";
}


sub do {
  my $c = -1;
  while(1)
  {
{
  lock($no);
  $c=$no;
  $no++;
}
last if($c >= $total);
`curl  -s 'http://localhost:8983/solr/collection2/update/json' 
--data-binary \@$dir/$c.json -H 'Content-type:application/json'`;
if( ($c +1) % $intv == 0 )
{
  lock($ltime);
  $curtime = time;
  $tstr .= ($curtime - $ltime)." ";
  $ltime = $curtime;
}
  }
}


B&R
Maco

On Wednesday, June 18, 2014 4:44:35 AM UTC+8, Michael McCandless wrote:
>
> Hi,
>
> Could you post the scripts you linked to (new_ES_config.sh, 
> new_ES_ingest_threads.pl, new_Solr_ingest_threads.pl) inlined?  I can't 
> download them from where you linked.
>
> Optimizing every 10 seconds or 10 minutes is really not a good idea in 
> general, but I guess if you're doing the same with ES and Solr then the 
> comparison is at least "fair".
>
> It's odd you see such a slowdown with ES...
>
> Mike
>
> On Fri, Jun 13, 2014 at 2:40 PM, Cindy Hsin  > wrote:
>
>> Hi, Mark:
>>
>> We are doing single document ingestion. We did a performance comparison 
>> between Solr and Elastic Search 

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Cindy Hsin
The way we make Solr ingest faster (single document ingest) is by turn off 
the engine soft commit and hard commit and use a client to commit the 
changes every 10 seconds. 

Solr ingest speed remains at 800 docs per second where ES ingest speed 
drops in half when we increase the fields (ie. from 1000 to 10k).
I have asked Maco to send you the requested script so you can do more 
analysis.

If you can help to solve the first level ES performance degradation (ie. 
1000 to 10k) as a starting point, that will be the best.

We do have real customer scenario that require large amount of metadata 
fields, that is why this is a blocking issue for the stack evaluation 
between Solr and Elastic Search.

Thanks!
Cindy

On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:
>
> I try to measure the performance of ingesting the documents having lots of 
> fields.
>
>
> The latest elasticsearch 1.2.1:
> Total docs count: 10k (a small set definitely)
> ES_HEAP_SIZE: 48G
> settings:
>
> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}
>
> mappings:
>
> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}
>
> All fields in the documents mach the templates in the mappings.
>
> Since I disabled the flush & refresh, I submitted the flush command (along 
> with optimize command after it) in the client program every 10 seconds. (I 
> tried the another interval 10mins and got the similar results)
>
> Scenario 0 - 10k docs have 1000 different fields:
> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
> heap memory).
>
>
> Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
> with scenario0):
> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>
> Not sure why the performance degrades sharply.
>
> If I try to ingest the docs having 100k different fields, it will take 17 
> mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
> badly. 
>
> Anyone can give suggestion to improve the performance?
>
>
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79911a7f-4118-4421-bc2d-2284eccebd3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: exclude some documents (and category filter combination) for some queries

2014-06-17 Thread Srinivasan Ramaswamy
Yeah, I forgot to include my actual result. My "not" filter was not working
at all. I got all the 3 designs back: 100, 101 and 102.

I followed the syntax in the link you sent and it worked :) I tried similar
syntax a few times before i posted the question, but i didn't have a
"filter" clause inside the "nested" clause (basically two "filter"
clauses). That made the trick !

Thanks a lot !
Srini



On Tue, Jun 17, 2014 at 4:46 PM, Ivan Brusic  wrote:

> I jumped the gun when I thought I realized the issue.
>
> You listed your expected result, but not your actual result. Are you
> actually using nested documents? If so, you would need to use nested
> queries/filters:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-filter.html
>
> Are you looking for a workaround for the issue referenced? You would
> either need to do the extra filtering on the client side or push all the
> nested values into the parent and query on that field.
>
> --
> Ivan
>
>
>
>
>
> On Fri, Jun 13, 2014 at 11:52 AM, Srinivasan Ramaswamy  > wrote:
>
>> Hi Ivan
>>
>> Thanks for your reply. Yeah, I do understand that currently elasticsearch
>> returns the whole nested doc.
>> Can you help me how can i get the negative query with multiple categories
>> working ?
>>
>> Thanks
>> Srini
>>
>>
>> On Fri, Jun 13, 2014 at 10:58 AM, Ivan Brusic  wrote:
>>
>>> Currently not possible. Elasticsearch will return all the nested
>>> documents as long as one of the nested documents satisfies the query.
>>>
>>> https://github.com/elasticsearch/elasticsearch/issues/3022
>>>
>>> The issue is my personal #1 feature requested. Frustrating considering
>>> there has been a working implementation since version 0.90.5. 1.0, 1.1, 1.2
>>> and still nothing.
>>>
>>> --
>>> Ivan
>>>
>>>
>>>
>>>
>>> On Thu, Jun 12, 2014 at 2:17 PM, Srinivasan Ramaswamy <
>>> ursva...@gmail.com> wrote:
>>>
 any thoughts anyone ?


 On Wednesday, June 11, 2014 11:15:18 PM UTC-7, Srinivasan Ramaswamy
 wrote:
>
> I would like to exclude some documents belonging to certain category
> from the results only for certain search queries. I have a ES client layer
> where i am thinking of implementing this logic as a "not" filter depending
> on the search query. Let me give an example.
>
> sample index
>
> designId: 100
> tags: ["dog", "cute"]
> caption : cute dog in the garden
> products : [ { productId: "200", category: 1}, {productId: "201",
> category: 2} ]
>
> designId: 101
> tags: ["brown", "dog"]
> caption :  little brown dog
> products : [ {productId: "202", category: 3} ]
>
> designId: 102
> tags: ["black", "dog"]
> caption :  little black dog
> products : [ { productId: "202", category: 4}, {productId: "203",
> category: 5} ]
>
> products is a nested field inside each design.
>
> I would like to write a query to get all matches for "dog", (not for
> other keywords) but filter out few categories from the result. As ES
> returns the whole nested document even if only one nested document matches
> the query, my expected result is
>
> designId: 100
> tags: ["dog", "cute"]
> caption : cute dog in the garden
> products : [ { productId: "200", category: 1}, {productId: "201",
> category: 2} ]
>
> designId: 102
> tags: ["black", "dog"]
> caption :  little black dog
> products : [ { productId: "202", category: 4}, {productId: "203",
> category: 5} ]
>  Here is the query i tried but it doesn't work. Can anyone help me
> point out the mistake ?
>
> GET /_search/
> {
>"query": {
>   "filtered": {
>  "filter": {
>   "and": [
>  {
>  "not": {
>"term": {
>   "category": 1
>}
>  }
>  },
>  {
>  "not": {
>"term": {
>   "category": 3
>}
>  }
>  }
>   ]
>
>  },
>  "query": {
> "multi_match": {
>"query": "dog",
>"fields": [
>   "tags",
>   "caption"
>],
>"minimum_should_match": "50%"
> }
>  }
>   }
>}
> }
>
  --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion 

Re: Kibana chart data view understanding

2014-06-17 Thread fred . grummit
Yes, I tried all of those steps. There doesn't seem to be a way to get the 
current Kibana to render multiple lines from different JSON attributes in 
the same histogram when the documents contain numeric values in the format 
described.

The nearest similar prolem is: 
https://github.com/elasticsearch/kibana/issues/199 
or https://github.com/elasticsearch/kibana/issues/150
The nearest solution I can find is this 
diff 
https://github.com/tvvmb/kibana/commit/52b4f62711176e4fc0048a6d27e42871f681b32a
Related to https://github.com/elasticsearch/kibana/pull/374

I was wondering if this functionality is in the main Kibana release?




On Tuesday, June 17, 2014 4:07:10 PM UTC+8, Mark Walkom wrote:
>
> Where have you gotten so far with KB?
>
> Try this;
>
>1. Create a new blank dashboard from the default homepage
>2. Configure that (top right) to point to the index and your timestamp 
>fied then save that 
>3. On the main dashboard page add a new row, then save
>4. Add a new panel
>
> This is where things can get tricky as you will have to figure out what 
> panel type to use, but I think you may want to start with a histogram.
> Play around from there. It is a bit tough when you start, but you will 
> pick it up pretty easily!
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 17 June 2014 14:31, > wrote:
>
>> I have a problem trying to visualise the data below in Kibana.
>> Each document describes a test run audit entry with passing, failing and 
>> pending tests along with a timestamp, project identifier and host name.
>> The curls below setup four documents and they are correctly returned if I 
>> do http://localhost:9200/someaudits/_search?pretty=true
>>
>> I would like to use kibana to display a single graph with:
>> The X axis using @timestamp
>> The Y axis showing four separate lines for passed, failed, pending and 
>> (passed + failed + pending)
>> Each document (and its timestamp value) should contain a tag that 
>> references the document itself.
>> Documents and their pass/fail/pending values should not be totalised, so 
>> they remain distinct on the graph.
>>
>> However the sticking point is that I'm cannot see what to click (and in 
>> what order) to setup the graph view from a blank Kibana instance located at 
>> http://localhost:9200/_plugin/kibana/
>> I've read the kibana related tutorials but I'm just not groking it.
>>
>>
>>
>> # Delete the whole index:
>> curl -XDELETE http://localhost:9200/someaudits
>>
>> # Create the index:
>> curl -XPOST 'localhost:9200/someaudits/'
>>
>> # Use this mapping:
>> curl -XPUT http://localhost:9200/someaudits/testaudit/_mapping -d '
>> {
>>   "testaudit" : {
>>"properties" : {
>>"@timestamp" : {"format" : "dateOptionalTime", "type" : "date" },
>> "project" : {"type": "string" },
>> "host" : {"type": "string" },
>> "passed" : { "type" : "integer" },
>> "failed" : { "type" : "integer" },
>> "pending" : { "type" : "integer" }
>>}
>>   }
>>  }
>> '
>>
>> # Add some data:
>> curl -XPUT 'http://localhost:9200/someaudits/testaudit/1' -d '
>> {
>> "@timestamp" : "2014-06-17T02:10:08.593Z",
>> "project" : "test",
>> "host" : "mymachine",
>> "passed" : 10,
>> "failed" : 20,
>> "pending" : 1
>> }'
>>
>> curl -XPUT 'http://localhost:9200/someaudits/testaudit/2' -d '
>> {
>> "@timestamp" : "2014-06-17T02:15:08.593Z",
>> "project" : "test",
>> "host" : "mymachine",
>> "passed" : 0,
>> "failed" : 30,
>> "pending" : 0
>> }'
>>
>> curl -XPUT 'http://localhost:9200/someaudits/testaudit/3' -d '
>> {
>> "@timestamp" : "2014-06-17T02:20:08.593Z",
>> "project" : "test",
>> "host" : "mymachine",
>> "passed" : 50,
>> "failed" : 0,
>> "pending" : 1
>> }'
>>
>> curl -XPUT 'http://localhost:9200/someaudits/testaudit/4' -d '
>> {
>> "@timestamp" : "2014-06-17T02:10:18.593Z",
>> "project" : "another test",
>> "host" : "mymachine",
>> "passed" : 0,
>> "failed" : 1,
>> "pending" : 0
>> }'
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/86f13f44-868a-49b8-991d-64138c602f15%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/db957733-1a2c-43b0-8550-ebd6246f5

Re: Stats aggregation, return documents at min/max.

2014-06-17 Thread Dan Isla
I was able to solve my own problem using a terms sub-aggregation on the 'x' 
values in the interval bucket.

{
"size": 0,
"aggs" : {
"vals": {
"filter": {"term" : { "component" : "data_to_plot" }},
"aggs": {
"values_over_time" : {
"date_histogram" : {
"field" : "time_seconds",
"interval" : "1500s"
},
"aggs": {
"time_y_min": {
"terms": {
"field": "time_seconds",
"order": {"y_min": "asc"},
"size": 1
},
"aggs": {
"y_min": {"min": {"field": "y" } }
}
},
"time_y_max": {
"terms": {
"field": "time_seconds",
"order": {"y_max": "desc"},
"size": 1
},
"aggs": {
"y_max": {"max": {"field": "y" } }
}
}
}
}
}
}
}
}



On Thursday, June 12, 2014 4:08:13 PM UTC-7, Dan Isla wrote:
>
> I'm trying to combine a date_histogram agg with a stats agg to find the 
> min/max within a bucket of documents, so far so good, but now I need a 
> field from the document where the Min/Max was actually found.
>
> My index basically has millions of x,y datapoints that I want to search 
> for and plot.
>
> What I want are the 2 documents from each bucket where the 'y' value was a 
> min and a max.  
>
> This query successfully gave me buckets divided by 300s intervals and the 
> min/max y values within those buckets. Problem is, I have no way of linking 
> those min/max values to their corresponding 'x' value for plotting. 
>
> {
> "size": 0,
> "aggs" : {
> "vals" : {
> "filter" : { "term" : { "component" : "data_to_plot" } },
>  "aggs" : {
> "values_over_time" : {
> "date_histogram" : {
> "field" : "x",
> "interval" : "300s"
> },
> "aggs": {
>   "stats_y": {"stats": {"field": "y"} }, 
>  
> }
> }
> }
> }
> }  
> }
>
> Any suggestions?
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/764262f4-41a9-4e1d-9485-ac1fa904825a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: exclude some documents (and category filter combination) for some queries

2014-06-17 Thread Ivan Brusic
I jumped the gun when I thought I realized the issue.

You listed your expected result, but not your actual result. Are you
actually using nested documents? If so, you would need to use nested
queries/filters:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-filter.html

Are you looking for a workaround for the issue referenced? You would either
need to do the extra filtering on the client side or push all the nested
values into the parent and query on that field.

-- 
Ivan





On Fri, Jun 13, 2014 at 11:52 AM, Srinivasan Ramaswamy 
wrote:

> Hi Ivan
>
> Thanks for your reply. Yeah, I do understand that currently elasticsearch
> returns the whole nested doc.
> Can you help me how can i get the negative query with multiple categories
> working ?
>
> Thanks
> Srini
>
>
> On Fri, Jun 13, 2014 at 10:58 AM, Ivan Brusic  wrote:
>
>> Currently not possible. Elasticsearch will return all the nested
>> documents as long as one of the nested documents satisfies the query.
>>
>> https://github.com/elasticsearch/elasticsearch/issues/3022
>>
>> The issue is my personal #1 feature requested. Frustrating considering
>> there has been a working implementation since version 0.90.5. 1.0, 1.1, 1.2
>> and still nothing.
>>
>> --
>> Ivan
>>
>>
>>
>>
>> On Thu, Jun 12, 2014 at 2:17 PM, Srinivasan Ramaswamy > > wrote:
>>
>>> any thoughts anyone ?
>>>
>>>
>>> On Wednesday, June 11, 2014 11:15:18 PM UTC-7, Srinivasan Ramaswamy
>>> wrote:

 I would like to exclude some documents belonging to certain category
 from the results only for certain search queries. I have a ES client layer
 where i am thinking of implementing this logic as a "not" filter depending
 on the search query. Let me give an example.

 sample index

 designId: 100
 tags: ["dog", "cute"]
 caption : cute dog in the garden
 products : [ { productId: "200", category: 1}, {productId: "201",
 category: 2} ]

 designId: 101
 tags: ["brown", "dog"]
 caption :  little brown dog
 products : [ {productId: "202", category: 3} ]

 designId: 102
 tags: ["black", "dog"]
 caption :  little black dog
 products : [ { productId: "202", category: 4}, {productId: "203",
 category: 5} ]

 products is a nested field inside each design.

 I would like to write a query to get all matches for "dog", (not for
 other keywords) but filter out few categories from the result. As ES
 returns the whole nested document even if only one nested document matches
 the query, my expected result is

 designId: 100
 tags: ["dog", "cute"]
 caption : cute dog in the garden
 products : [ { productId: "200", category: 1}, {productId: "201",
 category: 2} ]

 designId: 102
 tags: ["black", "dog"]
 caption :  little black dog
 products : [ { productId: "202", category: 4}, {productId: "203",
 category: 5} ]
  Here is the query i tried but it doesn't work. Can anyone help me
 point out the mistake ?

 GET /_search/
 {
"query": {
   "filtered": {
  "filter": {
   "and": [
  {
  "not": {
"term": {
   "category": 1
}
  }
  },
  {
  "not": {
"term": {
   "category": 3
}
  }
  }
   ]

  },
  "query": {
 "multi_match": {
"query": "dog",
"fields": [
   "tags",
   "caption"
],
"minimum_should_match": "50%"
 }
  }
   }
}
 }

>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/45fbf85d-4d29-4222-a72a-bf0a04d9a26d%40googlegroups.com
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/Fqt70gBtypQ/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/

Re: Nested vs Parent-Child - index and search side differences

2014-06-17 Thread Adrien Grand
Your understanding is correct. To add more to it, nested documents are
stored in contiguous blocks in the index, making it very fast to resolve
the parent given a child and vice-versa. On the other hand for parent/child
there is sort or a hash table maintained on top of the index to match
parents with children. This makes indexing more flexible but search much
slower.

About your questions:
1. Indeed you cannot sort by fields of a child doc.
2. Correct.

My recommendation would be to only use parent/child when nested documents
are not applicable. They are much faster and memory-efficient at search
time. But sometimes, the need to reindex all nested documents might prove
not practical in which case parent/child might be an alternative.



On Tue, Jun 17, 2014 at 8:49 PM, Srinivasan Ramaswamy 
wrote:

> I searched the forum and internet in general, but i couldn't find clear
> answers about the differences in scoring. And most of the answers are
> pretty old. I would like to know all the important current
> differences/comparison between nested and parent-child documents.
>
> What i understood so far
> 1. parent and child are two different lucene docs (and guaranteed to be in
> same shard). nested docs are stored as a separate doc using some internal
> representation and they are also in the same shard as parent doc.
> 2. using nested document gives significantly better performance compared
> to parent-child documents.
> 3. any update to a nested document will trigger the whole parent document
> to be reindexed, but any update to child will reindex only the child doc
> 4. when you apply a filter on nested field, the filter will work but all
> the nested docs will be returned along with the parent  (its a feature in
> progress  ).
> we do not have this problem with parent-child.
>
> Questions or need to confirm my understanding
> 1. using nested documents will let me sort the documents based on fields
> in the nested documents, on the other hand i cannot sort by fields in child
> docs. (feature in progress
> )
> 2. filtering results based on a field is possible with both nested and
> parent-child documents
>
> I am curious to know other differences from ranking/scoring perspective. I
> would ideally like to score the parent documents by an aggregate function
> (sum or avg) of a nested/child field. Any thoughts anyone ?
>
> Thanks
> Srini
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8183aa8d-1efc-40e5-8555-120bca8ff426%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j59Nz6UT29%2BnY_zzUT34ApQOH4%3DLcnzA5nKSqQ_SSnGgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem setting up cluster with NAT address

2014-06-17 Thread Mark Walkom
You can only define one address for ES to use.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 18 June 2014 00:12, pmartins  wrote:

> Hi,
>
> Thanks for the reply.
>
> The firewall on the node is off, and he can't comunicate with himself. The
> problem is:
>
> vm-motisqaapp02 has the local address 172.16.3.81 with the NAT 10.10.1.135.
> But, with the current data center definitions, it can't solve the
> 10.10.1.135 doesn't recognizing itself.
>
> Can I configure different adresses for network.publish host ? One for
> comunicating with outside nodes and another with itself?
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/Problem-setting-up-cluster-with-NAT-address-tp4057849p4057867.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1403014356232-4057867.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Yt__EfwsqM%2B4k8goMCdwCnwZ21RQjF4C%2BuAzNZrYnnHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Storing auto generated _id under different name

2014-06-17 Thread Adrien Grand
No, it isn't possible.

Why would you like to have the id of the document included in _source?


On Tue, Jun 17, 2014 at 8:16 PM, Johny Lam  wrote:

> Is it possible to have the _id be auto-generated and store it so that it's
> in the _source field under a different name, like say "id" instead of "_id"?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1eb03930-64c8-44ac-9f69-7ad2ff6b563e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4bXnVGEmu%3D6Qwz3fgaZKvDOJicGqLLf1VakX%3Dpq6oFvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Scroll Questions

2014-06-17 Thread joergpra...@gmail.com
1. yes

2. facet/aggregations are not very useful while scrolling (I doubt they
even work at all) because scrolling works on shard level and aggregations
work on indices level

3. a scroll request takes resources. The purpose of ClearScrollRequest is
to release those resources explicitly. This is indeed a rare situation when
you need explicit clearing. The time delay of releasing scrolls implicitly
can be controlled by the requests.

4. yes, the scroll id is an encoding of the combined state of all the
shards that participate in the scroll. Even if the ID looks as if it has
not changed, you should always use the latest reference to the scroll ID in
the response, or you may clutter the nodes with unreleased scroll resources.

Scrolling is very different from search, because there is a shard-level
machinery that iterates over the Lucene segments and keep them open. This
tends to ramp up lots of server-side resources, which may long-lived - a
challenge for resource management. There is a reaper thread that wakes up
from time to time to take care of stray scroll searches. You observed this
as a "time delay". Ordinary search actions never keep resources open at
shard level.

Using scroll search for creating large CSV exports is adequate because this
iterates through the result set doc by doc. But replacing a full-fledged
search that has facets/filters/aggregations/sorting with a scroll search,
you will only create large overheads (if it is even possible).

A null scroll ID is a matter of API design. By using hit length check for
0, you can use the same condition for other queries, so it is convenient
and not confusing. Null scroll IDs are always prone to NPEs.

Jörg



On Tue, Jun 17, 2014 at 7:46 PM, mooky  wrote:

> Having hit a bunch of issues using scroll, I thought I better improve my
> understanding of how scroll is supposed to be used (and how its not
> supposed to be used).
>
>
>1. Does it make sense to execute a search request with scroll, but
>SearchType != SCAN?
>2. Does it make sense to execute a search request with scroll, and
>also with facet/aggregations?
>3. What is the difference between scrolling to the end of the results
>(ie calling until hits.length ==0) and issuing a specific
>ClearScrollRequest? It appears to me that the ClearScrollRequest
>immediately clears the scroll - whereas there is some time delay before a
>scroll is cleaned up after reaching the end of the results. ( I can see
>this in my tests because the ElasticsearchIntegrationTest fails on teardown
>unless I perform an explicit ClearScrollRequest or I put a delay of some
>number of seconds). From reading the docs, I am not sure if this a bug or
>expected behaviour.
>4. Does the scrollId represent the cursor, or the cursor
>page/iteration state? I have read documentation/mailing list explanations
>that have words to the effect "you must pass the scrollId from the previous
>response into the subsequent request" - which suggests the id represents
>some cursor state - ie performing a scroll request with a given scrollId
>will always return the same results. My observation, however, is that the
>scrollId does not change (ie I get back the same scrollId I passed in) so
>each scroll request with the same scrollId advances the 'cursor' until no
>results are returned. I have also read stuff on the mailing list that
>implied multiple calls could be made in parallel with the same scrollId to
>load all the results faster (which would imply the scrollId is *not* 
> expected
>to change). So which is correct? :)
>
>
> To explain the background for my questions: I have two requirements :
> 1) I get an update event that leads me to go find items in the index that
> need re-indexing. I perform a search on the index, I get the id's and I
> load the original data from the source system(s) to reconstruct the
> document and index it. This seems to be exactly what SCAN and SCROLL is
> meant for. (However, the SCAN search type is different in that it always
> returns zero hits from the original search request - only the scroll
> requests seem to
>
> 2) The user normally performs a search, and naturally we limit how many
> results we serve to the client. However, occasionally, the user wants to
> return all the data for a given search/filter (say, to export to excel or
> whatever), so it seems like a good idea to use the scroll rather than
> paging through the results using from&size as we know we will get a
> consistent results even if documents are being added/removed/updated on the
> server.
> From a functionality perspective, I want to make sure the scrolling search
> request is the same as the non-scrolling search request so the user gets
> the same results - so from a code perspective, ideally I really want to
> make the codepath the same (save for adding the scroll keepAlive param).
> However, perhaps there are things I perform with my normal sea

Upgrade from 1.0.3 to 1.1.2 and cat shards (and cat indices) give NullPointerException during upgrade

2014-06-17 Thread Christopher Crammond
Has anyone observed a problem when upgrading from ElasticSearch 1.0.3 to 
1.1.2 regarding the cat shards (or cat indices)?  We recently rolled this 
out in one of our environments and experienced a NullPointerException (NPE) 
for some of the cat API calls (for instance: cat/shards or cat/indices). 
 The good news is that this went away once all nodes were on the same 
version.

My question: is it correct to assume that this is expected behavior?  Also, 
is there anything that I could do to mitigate this (upgrade faster ;)?

Problems experienced were with:

*cat/shards*
*cat/indices*
*cat/nodes*

Oddly, *cat/indices/my-index?H* did return the expected values (or at least 
when I tested it did).


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/906da92f-0072-43af-84cb-bc394728544a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Facet cache size and other memory metrics

2014-06-17 Thread smonasco
This has something like what I'd like to find in the Cache stats per field 
section https://github.com/bleskes/elasticfacets , but I'm unsure if it's 
any good for 1.x

On Tuesday, June 17, 2014 4:10:13 PM UTC-6, smonasco wrote:
>
> For instance I think I remember some plugin that would give you an idea 
> how big an impact a facet might have on your field cache, and I think that 
> was suppose to become part of Elasticsearch itself, but I may be dreaming.
>
> On Tuesday, June 17, 2014 4:06:35 PM UTC-6, smonasco wrote:
>>
>> Hi,
>>
>> We're having problems with some nodes hitting the maximum heap size and 
>> were looking into ways to get visibility into the field cache impact of 
>> different indexes/shards.
>>
>> Any suggestions?
>>
>> --Shannon Monasco
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d869473a-96ff-4169-a96a-1fd0e11681d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Facet cache size and other memory metrics

2014-06-17 Thread smonasco
For instance I think I remember some plugin that would give you an idea how 
big an impact a facet might have on your field cache, and I think that was 
suppose to become part of Elasticsearch itself, but I may be dreaming.

On Tuesday, June 17, 2014 4:06:35 PM UTC-6, smonasco wrote:
>
> Hi,
>
> We're having problems with some nodes hitting the maximum heap size and 
> were looking into ways to get visibility into the field cache impact of 
> different indexes/shards.
>
> Any suggestions?
>
> --Shannon Monasco
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7998f97-36dc-45fd-86a4-28eae0a9eb90%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Michael McCandless
I tested roughly your Scenario 2 (100K unique fields, 100 fields per
document) with a straight Lucene test (attached, but not sure if the list
strips attachments).  Net/net I see ~100 docs/sec with one thread ... which
is very slow.

Lucene stores quite a lot for each unique indexed field name and it's
really a bad idea to plan on having so many unique fields in the index:
you'll spend lots of RAM and CPU.

Can you describe the wider use case here?  Maybe there's a more performant
way to achieve it...



On Fri, Jun 13, 2014 at 2:40 PM, Cindy Hsin  wrote:

> Hi, Mark:
>
> We are doing single document ingestion. We did a performance comparison
> between Solr and Elastic Search (ES).
> The performance for ES degrades dramatically when we increase the metadata
> fields where Solr performance remains the same.
> The performance is done in very small data set (ie. 10k documents, the
> index size is only 75mb). The machine is a high spec machine with 48GB
> memory.
> You can see ES performance drop 50% even when the machine have plenty
> memory. ES consumes all the machine memory when metadata field increased to
> 100k.
> This behavior seems abnormal since the data is really tiny.
>
> We also tried with larger data set (ie. 100k and 1Mil documents), ES throw
> OOW for scenario 2 for 1 Mil doc scenario.
> We want to know whether this is a bug in ES and/or is there any workaround
> (config step) we can use to eliminate the performance degradation.
> Currently ES performance does not meet the customer requirement so we want
> to see if there is anyway we can bring ES performance to the same level as
> Solr.
>
> Below is the configuration setting and benchmark results for 10k document
> set.
> scenario 0 means there are 1000 different metadata fields in the system.
> scenario 1 means there are 10k different metatdata fields in the system.
> scenario 2 means there are 100k different metadata fields in the system.
> scenario 3 means there are 1M different metadata fields in the system.
>
>- disable hard-commit & soft commit + use a *client* to do commit (ES
>& Solr) every 10 second
>- ES: flush, refresh are disabled
>   - Solr: autoSoftCommit are disabled
>- monitor load on the system (cpu, memory, etc) or the ingestion speed
>change over time
>- monitor the ingestion speed (is there any degradation over time?)
>- new ES config:new_ES_config.sh
>
> ;
>new ingestion: new_ES_ingest_threads.pl
>
> 
>- new Solr ingestion: new_Solr_ingest_threads.pl
>
> 
>- flush interval: 10s
>
>
> Number of different meta data fieldESSolrScenario 0: 100012secs ->
> 833docs/sec
> CPU: 30.24%
> Heap: 1.08G
> time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
> index size: 36M
> iowait: 0.02%13 secs -> 769 docs/sec
> CPU: 28.85%
> Heap: 9.39G
> time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs ->
> 345docs/sec
> CPU: 40.83%
> Heap: 5.74G
> time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
> iowait: 0.02%
> Index Size: 36M12 secs -> 833 docs/sec
> CPU: 28.62%
> Heap: 9.88G
> time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2 Scenario 2: 100k17 mins
> 44 secs -> 9.4docs/sec
> CPU: 54.73%
> Heap: 47.99G
> time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
> iowait: 0.02%
> Index Size: 75M13 secs -> 769 docs/sec
> CPU: 29.43%
> Heap: 9.84G
> time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8
> secs -> 0.9 docs/sec
> CPU: 40.47%
> Heap: 47.99G
> time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 1594 15
> secs -> 666.7 docs/sec
> CPU: 45.10%
> Heap: 9.64G
> time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2
>
> Thanks!
> Cindy
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRcDKZWA8tjsqfcthGUKcEX7q2dohWy_1vcFyKo7JgB53w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Facet cache size and other memory metrics

2014-06-17 Thread smonasco
Hi,

We're having problems with some nodes hitting the maximum heap size and 
were looking into ways to get visibility into the field cache impact of 
different indexes/shards.

Any suggestions?

--Shannon Monasco

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6d95699e-b6cf-4553-907e-8aa029f045a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-17 Thread Michael McCandless
Hi,

Could you post the scripts you linked to (new_ES_config.sh,
new_ES_ingest_threads.pl, new_Solr_ingest_threads.pl) inlined?  I can't
download them from where you linked.

Optimizing every 10 seconds or 10 minutes is really not a good idea in
general, but I guess if you're doing the same with ES and Solr then the
comparison is at least "fair".

It's odd you see such a slowdown with ES...

Mike

On Fri, Jun 13, 2014 at 2:40 PM, Cindy Hsin  wrote:

> Hi, Mark:
>
> We are doing single document ingestion. We did a performance comparison
> between Solr and Elastic Search (ES).
> The performance for ES degrades dramatically when we increase the metadata
> fields where Solr performance remains the same.
> The performance is done in very small data set (ie. 10k documents, the
> index size is only 75mb). The machine is a high spec machine with 48GB
> memory.
> You can see ES performance drop 50% even when the machine have plenty
> memory. ES consumes all the machine memory when metadata field increased to
> 100k.
> This behavior seems abnormal since the data is really tiny.
>
> We also tried with larger data set (ie. 100k and 1Mil documents), ES throw
> OOW for scenario 2 for 1 Mil doc scenario.
> We want to know whether this is a bug in ES and/or is there any workaround
> (config step) we can use to eliminate the performance degradation.
> Currently ES performance does not meet the customer requirement so we want
> to see if there is anyway we can bring ES performance to the same level as
> Solr.
>
> Below is the configuration setting and benchmark results for 10k document
> set.
> scenario 0 means there are 1000 different metadata fields in the system.
> scenario 1 means there are 10k different metatdata fields in the system.
> scenario 2 means there are 100k different metadata fields in the system.
> scenario 3 means there are 1M different metadata fields in the system.
>
>- disable hard-commit & soft commit + use a *client* to do commit (ES
>& Solr) every 10 second
>- ES: flush, refresh are disabled
>   - Solr: autoSoftCommit are disabled
>- monitor load on the system (cpu, memory, etc) or the ingestion speed
>change over time
>- monitor the ingestion speed (is there any degradation over time?)
>- new ES config:new_ES_config.sh; new ingestion:
>new_ES_ingest_threads.pl
>- new Solr ingestion: new_Solr_ingest_threads.pl
>- flush interval: 10s
>
>
> Number of different meta data field ESSolrScenario 0: 100012secs ->
> 833docs/sec
> CPU: 30.24%
> Heap: 1.08G
> time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
> index size: 36M
> iowait: 0.02%13 secs -> 769 docs/sec
> CPU: 28.85%
> Heap: 9.39G
> time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs ->
> 345docs/sec
> CPU: 40.83%
> Heap: 5.74G
> time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
> iowait: 0.02%
> Index Size: 36M12 secs -> 833 docs/sec
> CPU: 28.62%
> Heap: 9.88G
> time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2Scenario 2: 100k17 mins 44
> secs -> 9.4docs/sec
> CPU: 54.73%
> Heap: 47.99G
> time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
> iowait: 0.02%
> Index Size: 75M13 secs -> 769 docs/sec
> CPU: 29.43%
> Heap: 9.84G
> time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2 Scenario 3: 1M183 mins 8
> secs -> 0.9 docs/sec
> CPU: 40.47%
> Heap: 47.99G
> time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 159415
> secs -> 666.7 docs/sec
> CPU: 45.10%
> Heap: 9.64G
> time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2
>
> Thanks!
> Cindy
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRfsxEPvTjfv%2BPWgpyWD5fLE1DTaPUfAe9%3DdLVzXRe4p4w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Cannot load index from external file using logstash

2014-06-17 Thread Eitan Vesely
Hi all,

i am running one instance of elastic and one of logstash in parallel on the 
same computer.

when trying to load a file into elastic, using logstash that is running the 
config file below, i get the follwing output msgs on elastic and no file is 
loaded
(when input is configured to be stdin everything seems to be working just 
fine)

any ideas?
"
[2014-06-17 22:42:24,748][INFO ][cluster.service  ] [Masked Marvel] 
removed 
{[logstash-Eitan-PC-5928-2010][Ql5fyvEGQyO96R9NIeP32g][Eitan-PC][inet[Eitan-PC/10.0.0.5:9301]]{client=true,
 
data=false},}, reason: 
zen-disco-node_failed([logstash-Eitan-PC-5928-2010][Ql5fyvEGQyO96R9NIeP32g][Eitan-PC][inet[Eitan-PC/10.0.0.5:9301]]{client=true,
 
data=false}), reason transport disconnected (with verified connect)

[2014-06-17 22:43:00,686][INFO ][cluster.service  ] [Masked Marvel] 
added 
{[logstash-Eitan-PC-5292-4014][m0Tg-fcmTHW9aP6zHeUqTA][Eitan-PC][inet[/10.0.0.5:9301]]{client=true,
 
data=false},}, reason: zen-disco-receive(join from 
node[[logstash-Eitan-PC-5292-4014][m0Tg-fcmTHW9aP6zHeUqTA][Eitan-PC][inet[/10.0.0.5:9301]]{client=true,
 
data=false}])
"

input { 
file {
path => "c:\testLog.txt"
}
} 


output {
elasticsearch { host => localhost  
index=> amat1
 }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4277c367-e6f7-4c6c-99fa-89c6c5a3a132%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to set replication type equal to Async for an index

2014-06-17 Thread pranav amin
Hi,

We are planning to create Index with 2 replica's and in order to have 
better performance we are thinking of doing the replication Async.

I'm creating the Index this way - 

curl -XPUT 'http://localhost:9200/xyz/' -d '{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 2,
"replication" : "async"
}
}'


Is the above correct? How do i confirm that my replication is Async, is 
there any curl command for confirmation. 

Thanks
Pranav.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a285565a-b9f8-4774-90fa-fbf7b7b8f091%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Nested vs Parent-Child - index and search side differences

2014-06-17 Thread Srinivasan Ramaswamy
I searched the forum and internet in general, but i couldn't find clear 
answers about the differences in scoring. And most of the answers are 
pretty old. I would like to know all the important current 
differences/comparison between nested and parent-child documents. 

What i understood so far
1. parent and child are two different lucene docs (and guaranteed to be in 
same shard). nested docs are stored as a separate doc using some internal 
representation and they are also in the same shard as parent doc.
2. using nested document gives significantly better performance compared to 
parent-child documents.
3. any update to a nested document will trigger the whole parent document 
to be reindexed, but any update to child will reindex only the child doc
4. when you apply a filter on nested field, the filter will work but all 
the nested docs will be returned along with the parent  (its a feature in 
progress  ). we 
do not have this problem with parent-child.

Questions or need to confirm my understanding
1. using nested documents will let me sort the documents based on fields in 
the nested documents, on the other hand i cannot sort by fields in child 
docs. (feature in progress 
)
2. filtering results based on a field is possible with both nested and 
parent-child documents

I am curious to know other differences from ranking/scoring perspective. I 
would ideally like to score the parent documents by an aggregate function 
(sum or avg) of a nested/child field. Any thoughts anyone ?

Thanks
Srini

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8183aa8d-1efc-40e5-8555-120bca8ff426%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


updating a document using Java API

2014-06-17 Thread ESUser
Hi All,

As part of an update request, I need to add new fields to a document(search 
to get the doc-id and then update).

With Lucene API's it can be achieved by 1) reading the doc into a temp 
document 2) update the temp document 3) delete the original document from 
index and add the temp document. 

How does update work in ES. It is also delete and add a new document? 

My index has some un-stored fields. Is there a way to keep the un-stored 
field after the update or they are lost in ElasticSearch?

Would appreciate any pointers you might have.

Thanks!

Neera


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22b1bbd3-ce0b-49bf-9eb0-5e1b56edf159%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Storing auto generated _id under different name

2014-06-17 Thread Johny Lam
Is it possible to have the _id be auto-generated and store it so that it's 
in the _source field under a different name, like say "id" instead of "_id"?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1eb03930-64c8-44ac-9f69-7ad2ff6b563e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Scroll Questions

2014-06-17 Thread mooky
One more question I forgot:

Rather than looking at hits.length to know if the end of the scroll has 
been reached, would it not be better to return a null scrollId when the end 
of the cursor has been reached? On the surface it seems that would be
a) more intuitive
b) be the same regardless of which SearchType you are using
c) not be affected by the search itself returning zero results

Cheers.



On Tuesday, 17 June 2014 18:46:07 UTC+1, mooky wrote:
>
> Having hit a bunch of issues using scroll, I thought I better improve my 
> understanding of how scroll is supposed to be used (and how its not 
> supposed to be used).
>
>
>1. Does it make sense to execute a search request with scroll, but 
>SearchType != SCAN?
>2. Does it make sense to execute a search request with scroll, and 
>also with facet/aggregations?
>3. What is the difference between scrolling to the end of the results 
>(ie calling until hits.length ==0) and issuing a specific 
>ClearScrollRequest? It appears to me that the ClearScrollRequest 
>immediately clears the scroll - whereas there is some time delay before a 
>scroll is cleaned up after reaching the end of the results. ( I can see 
>this in my tests because the ElasticsearchIntegrationTest fails on 
> teardown 
>unless I perform an explicit ClearScrollRequest or I put a delay of some 
>number of seconds). From reading the docs, I am not sure if this a bug or 
>expected behaviour.
>4. Does the scrollId represent the cursor, or the cursor 
>page/iteration state? I have read documentation/mailing list explanations 
>that have words to the effect "you must pass the scrollId from the 
> previous 
>response into the subsequent request" - which suggests the id represents 
>some cursor state - ie performing a scroll request with a given scrollId 
>will always return the same results. My observation, however, is that the 
>scrollId does not change (ie I get back the same scrollId I passed in) so 
>each scroll request with the same scrollId advances the 'cursor' until no 
>results are returned. I have also read stuff on the mailing list that 
>implied multiple calls could be made in parallel with the same scrollId to 
>load all the results faster (which would imply the scrollId is *not* 
> expected 
>to change). So which is correct? :)
>
>
> To explain the background for my questions: I have two requirements :
> 1) I get an update event that leads me to go find items in the index that 
> need re-indexing. I perform a search on the index, I get the id's and I 
> load the original data from the source system(s) to reconstruct the 
> document and index it. This seems to be exactly what SCAN and SCROLL is 
> meant for. (However, the SCAN search type is different in that it always 
> returns zero hits from the original search request - only the scroll 
> requests seem to 
>
> 2) The user normally performs a search, and naturally we limit how many 
> results we serve to the client. However, occasionally, the user wants to 
> return all the data for a given search/filter (say, to export to excel or 
> whatever), so it seems like a good idea to use the scroll rather than 
> paging through the results using from&size as we know we will get a 
> consistent results even if documents are being added/removed/updated on the 
> server.
> From a functionality perspective, I want to make sure the scrolling search 
> request is the same as the non-scrolling search request so the user gets 
> the same results - so from a code perspective, ideally I really want to 
> make the codepath the same (save for adding the scroll keepAlive param). 
> However, perhaps there are things I perform with my normal search (e.g. 
> aggregations, SearchType.DEFAULT, etc) that just don't make sense when 
> scrolling?
>
> Many thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6697426-9d3b-43e4-8c9e-cd14bf3c7859%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Scroll Questions

2014-06-17 Thread mooky
Having hit a bunch of issues using scroll, I thought I better improve my 
understanding of how scroll is supposed to be used (and how its not 
supposed to be used).


   1. Does it make sense to execute a search request with scroll, but 
   SearchType != SCAN?
   2. Does it make sense to execute a search request with scroll, and also 
   with facet/aggregations?
   3. What is the difference between scrolling to the end of the results 
   (ie calling until hits.length ==0) and issuing a specific 
   ClearScrollRequest? It appears to me that the ClearScrollRequest 
   immediately clears the scroll - whereas there is some time delay before a 
   scroll is cleaned up after reaching the end of the results. ( I can see 
   this in my tests because the ElasticsearchIntegrationTest fails on teardown 
   unless I perform an explicit ClearScrollRequest or I put a delay of some 
   number of seconds). From reading the docs, I am not sure if this a bug or 
   expected behaviour.
   4. Does the scrollId represent the cursor, or the cursor page/iteration 
   state? I have read documentation/mailing list explanations that have words 
   to the effect "you must pass the scrollId from the previous response into 
   the subsequent request" - which suggests the id represents some cursor 
   state - ie performing a scroll request with a given scrollId will always 
   return the same results. My observation, however, is that the scrollId does 
   not change (ie I get back the same scrollId I passed in) so each scroll 
   request with the same scrollId advances the 'cursor' until no results are 
   returned. I have also read stuff on the mailing list that implied multiple 
   calls could be made in parallel with the same scrollId to load all the 
   results faster (which would imply the scrollId is *not* expected to 
   change). So which is correct? :)


To explain the background for my questions: I have two requirements :
1) I get an update event that leads me to go find items in the index that 
need re-indexing. I perform a search on the index, I get the id's and I 
load the original data from the source system(s) to reconstruct the 
document and index it. This seems to be exactly what SCAN and SCROLL is 
meant for. (However, the SCAN search type is different in that it always 
returns zero hits from the original search request - only the scroll 
requests seem to 

2) The user normally performs a search, and naturally we limit how many 
results we serve to the client. However, occasionally, the user wants to 
return all the data for a given search/filter (say, to export to excel or 
whatever), so it seems like a good idea to use the scroll rather than 
paging through the results using from&size as we know we will get a 
consistent results even if documents are being added/removed/updated on the 
server.
>From a functionality perspective, I want to make sure the scrolling search 
request is the same as the non-scrolling search request so the user gets 
the same results - so from a code perspective, ideally I really want to 
make the codepath the same (save for adding the scroll keepAlive param). 
However, perhaps there are things I perform with my normal search (e.g. 
aggregations, SearchType.DEFAULT, etc) that just don't make sense when 
scrolling?

Many thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/80f173a7-07a0-4f72-a896-944223a3ac30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: No handler found for uri when creating a mapping

2014-06-17 Thread Ivan Brusic
An index can be comprised of multiple types, so the type is not needed in
the URL.

Try simply 192.168.1.103:9200/nxtxnlogs

Cheers,

Ivan


On Tue, Jun 17, 2014 at 1:14 AM, Abhishek Mukherjee <4271...@gmail.com>
wrote:

> Hi,
>
> I am following the ES Definitive guide. I am trying to create a mapping
> for an index and type as follows.
>
> curl -XPUT '192.168.1.103:9200/nxtxnlogs/transaction/' -d '
>  "mappings" : {
>   "_default_" : {
>"properties" : {
> "txn_id" : { "type" : "long" },
> "logged_at" : { "type" : "string" },
> "key_name" : { "type" : "string" },
> "des" : {"type" : "string", "index" : "not_analyzed" },
> "params" : { "type" : "string"}
>}
>   }
>  }
> }
> ';
>
> But I get this error.
>
> No handler found for uri [/nxtxnlogs/transaction/] and method [PUT].
>
> Also how do I create an empty index.
>
> I apologise if the questions are basic. But I can't find it in the
> documentation.
>
> Regards
> Abhishek
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8e77e1e5-d8dd-40e0-a290-8137989bda44%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBSZdUCUtY_%2BFnaDyUueFHfvWkA6m4eNOHwJhJ3S7eO%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query Performance

2014-06-17 Thread ravimbhatt
Hi Binh,

Did some tests and here are the findings: 

Moving to c3.4xlarge reduces time by 300 ms. So that takes overall 90th 
percentile down to ~1.5 seconds. CPU still in high 80s-90s. 

Making all queries filtered and removing script from 2nd queries'  2nd 
aggregation reduced CPU footprint (high 50s-60s) and improved overall 
timings by close to 200 ms. I am at ~1.3 seconds for all 3 queries.

I guess only next steps now is to play with shard size? or more machines? 

Thanks!
Ravi

On Tuesday, 17 June 2014 15:52:31 UTC+1, ravim...@gmail.com wrote:
>
> Hi Binh,
>
> thanks for helping. 
>
> My record size for 1st query is 4 fields. 3 of them integers and a date. 
> so the _source is not big enough to raise concerns. I will anyways try your 
> suggestion and report any improvements here. 
>
> For the 2nd query: i have 15gb of RAM. only 20% of which gets utilised 
> during the tests. Thanks for all three suggestion, Will definitely try that 
> and come back here. Good catch for using script in simSum, thanks. I need 
> just the sum of that field, which does not need a script. Will change that 
> and see what happens. 
>
> For the 3rd query, i do not care about the _score of returned values. Will 
> give that a try as well. 
>
> Thanks a lot. 
>
> Ravi
>
>
> On Tuesday, 17 June 2014 15:28:21 UTC+1, Binh Ly wrote:
>>
>> For the first query, since you don't care about the _score, move the bool 
>> query into a filter. If you only need field1 and field2 and your _source is 
>> big, might be able to save some network payload using source filtering only 
>> for those 2 fields.
>>
>> For the second query, if you have a lot RAM and say col_a and col_b are 
>> not big values (long strings) and not high cardinality, you can try to 
>> switch all _source.col_a (or _source.blah) to doc['col_a'].value in your 
>> scripts. This syntax will load the field values into memory and should 
>> perform faster than _source.blah. And your last stats agg (simSum), not 
>> sure why that needs to be a script - can it just be a stats-field on col_x? 
>> Also if the second query does not need to return hits (i.e. you only need 
>> info from the aggs), you can set search_type=count to further optimize it.
>>
>> For the third query, if you don't care about _score, move the query part 
>> into the filter part.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3d880d3-5b33-465a-bb60-3245b7b87854%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Boost field does not work..

2014-06-17 Thread Ivan Brusic
How do you know that the search is not working? Can you post an example
query and perhaps an example explanation?

If you are searching against the all field, you can set include_in_all to
false for that field. You are better off not searching a field instead of
trying to set a boost.

Cheers,

Ivan


On Mon, Jun 16, 2014 at 9:35 PM, Felix Xu  wrote:

> Hi Guys,
> I have two types in a index, one is used for indexing topics and another
> one is used for indexing comments.
> Here is the sample mapping:
>
> *Topic*:
>
> curl -XPUT http://localhost:9200/bbs/topic/_mapping -d'
> {
> "topic": {
> "_timestamp": {
> "enabled": true
> },
> "properties": {
> "title": {
> "type": "string",
> "store": true,
> "term_vector": "with_positions_offsets",
> "indexAnalyzer": "ik",
> "searchAnalyzer": "ik",
> "include_in_all": true,
> *"boost": 8*
> },
> "content": {
> "type": "string",
> "store": true,
> "term_vector": "with_positions_offsets",
> "indexAnalyzer": "ik",
> "searchAnalyzer": "ik",
> "include_in_all": true,
> *"boost": 4*
> }
> }
> }
>
>
> *Comment*:
>
> curl -XPUT http://localhost:9200/bbs/comment/_mapping -d'
> {
> "comment": {
> "_timestamp": {
> "enabled": true
> },
> "properties": {
> "title": {
> "type": "string",
> "store": true,
> "term_vector": "with_positions_offsets",
> "indexAnalyzer": "ik",
> "searchAnalyzer": "ik",
> "include_in_all": true,
> *"boost": 0*
> },
> "content": {
> "type": "string",
> "store": true,
> "term_vector": "with_positions_offsets",
> "indexAnalyzer": "ik",
> "searchAnalyzer": "ik",
> "include_in_all": true,
> *"boost": 4*
> }
> }
> }
>
>
> I want to search theses two types(title,content of Topic and only content
> of Comment) at the same time, however, I do not want to match the "title"
> field of comments, since the title is the same with its corresponding
> topic, matching the title field of a comment does not make any sense.
> I have tried to set the boost value of Comment's title field to zero but
> it seems does not work..
> I think a simple solution is to set the "title" of Comment with
> "not_analyzed", but I also want to highlight the matching words in the
> title, so it's better to also index the title field but let it have little
> effects on scoring..
> Could someone please give me some hints? Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4dac461c-03ab-4014-ab27-53b50b01e8b5%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC0qnQn5oQKJoWJ1g5F8FR2KRjOOQJ9tx8GLcLyuWp-Kw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Type Ahead feature for contact list

2014-06-17 Thread Itamar Syn-Hershko
Take a look here: http://www.elasticsearch.org/blog/you-complete-me/


--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Tue, Jun 17, 2014 at 8:03 PM, Omi  wrote:

> Hello All
>
> I am quite new for elasticsearch and reading elasticsearch related
> documents from few days. I am creating a contact list search for my
> application, where contacts are stored in  firstname> format.
>
> I am facing a problem while searching the name using java client.
>
> for example few contacts in my application are:
>
> Smith (Mik), Mike
> Smith, John
> Gomes, Madona
> Fernandis, Madona
> Trav (Mik), John
>
> Now, when I search the name with java client api, the search term split in
> tokens and return wrong results:
>
> Case 1:
> QueryBuilder qb = QueryBuilders.queryString("John Gomes*");
> Expected Result: 0
> Actual Result: "Smith, John", "Gomes, Madona" and "Trav (Mik), John"
>
> Case 2:
> QueryBuilder qb = QueryBuilders.queryString("Smi* John");
> Expected Result: "Smith, John"
> Actual Result: "Smith, John" , "Smith (Mik), Mike" and "Trav (Mik), John"
>
> Case 3:
> QueryBuilder qb = QueryBuilders.queryString("Gomes Madona");
> Expected Result: "Gomes, Madona"
> Actual Result: "Gomes, Madona" and "Fernandis, Madona"
>
>
> This is heppening because search term split into two tokens and it's
> searching in result for 2 separate words.
>
> I tried with "not_analyzed" in mapping for name field but it restricted me
> to search case sensitive and in record stored order only, but as per my
> usecase user can search with any order in name. In case of two or more
> words query I need to display exact results and in case of one word query I
> have to display results containing search term.
>
> Please suggest how to get expected results. What config changes I need to
> do to get correct results.
>
> Thanks in advance.
>
> Regards,
> Omi
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/421b5e94-199a-4d4d-8b58-5ae650f53cb4%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsGmOiToMW9qDb6q8SmomoB63R%3DUGzDW9Z%3D2iVz1vhimA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Type Ahead feature for contact list

2014-06-17 Thread Omi
Hello All 

I am quite new for elasticsearch and reading elasticsearch related 
documents from few days. I am creating a contact list search for my 
application, where contacts are stored in  format. 

I am facing a problem while searching the name using java client. 

for example few contacts in my application are: 

Smith (Mik), Mike 
Smith, John 
Gomes, Madona 
Fernandis, Madona 
Trav (Mik), John 

Now, when I search the name with java client api, the search term split in 
tokens and return wrong results: 

Case 1: 
QueryBuilder qb = QueryBuilders.queryString("John Gomes*"); 
Expected Result: 0 
Actual Result: "Smith, John", "Gomes, Madona" and "Trav (Mik), John" 

Case 2: 
QueryBuilder qb = QueryBuilders.queryString("Smi* John"); 
Expected Result: "Smith, John" 
Actual Result: "Smith, John" , "Smith (Mik), Mike" and "Trav (Mik), John" 

Case 3: 
QueryBuilder qb = QueryBuilders.queryString("Gomes Madona"); 
Expected Result: "Gomes, Madona" 
Actual Result: "Gomes, Madona" and "Fernandis, Madona" 


This is heppening because search term split into two tokens and it's 
searching in result for 2 separate words. 

I tried with "not_analyzed" in mapping for name field but it restricted me 
to search case sensitive and in record stored order only, but as per my 
usecase user can search with any order in name. In case of two or more 
words query I need to display exact results and in case of one word query I 
have to display results containing search term. 

Please suggest how to get expected results. What config changes I need to 
do to get correct results. 

Thanks in advance. 

Regards, 
Omi 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/421b5e94-199a-4d4d-8b58-5ae650f53cb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: node failures

2014-06-17 Thread Kireet Reddy
As soon as we restarted indexing, we saw a lot of merge activity and the 
deleted documents percentage went to around 25%. Does indexing activity trigger 
merges? Currently, there is not much merge activity, but some indices still 
have high deleted document counts. E.g. we have one index with count around 17m 
and deleted at 15m, but no merge activity. I am wondering if merges aren't 
scheduled for that index because writes to that index are infrequent.

On Jun 16, 2014, at 3:16 PM, Mark Walkom  wrote:

> TTL does use a lot of resources as it constantly scans for expired docs. It'd 
> be more efficient to switch to daily indexes and then drop them, though that 
> might not fit your business requirements.
> 
> You can try forcing an optimise on an index, 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-optimize.html,
>  it's very resource intensive though but it if reduces your segment count 
> then it may allude to where the problem lies.
> 
> Regards,
> Mark Walkom
> 
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
> 
> 
> On 17 June 2014 07:07, Kireet Reddy  wrote:
> java version is 1.7.0_55. the servers have a 32GB heap, 96GB of memory, 12 
> logical cores, and 4 spinning disks.
> 
> Currently we have about 450GB of data on each machine, average doc size is 
> about 1.5KB. We create an index (4 shards, 1 replica) every N days. Right now 
> we have 12 indices, meaning about 24 shards/node (12*4*2 / 4). 
> 
> Looking at ElasticHQ, I noticed some warnings around documents deleted. Our 
> percentages are in the 70s and the pass level is 10% (!). Due to our business 
> requirements, we have to use TTL. My understanding is this leads to a lot of 
> document deletions and increased merge activity. However it seems that maybe 
> segments with lots of deletes aren't being merged? We stopped indexing 
> temporarily and there are no merges occurring anywhere in the system so it's 
> not a throttling issue. We are using almost all default settings, but is 
> there some setting in particular I should look at?
> 
> On Jun 10, 2014, at 3:41 PM, Mark Walkom  wrote:
> 
>> Are you using a monitoring plugin such as marvel or elastichq? If not then 
>> installing those will give you a better insight into your cluster.
>> You can also check the hot threads end point to check each node - 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html
>> 
>> Providing a bit more info on your cluster setup may help as well, index size 
>> and count, server specs, java version, that sort of thing.
>> 
>> Regards,
>> Mark Walkom
>> 
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>> 
>> 
>> On 11 June 2014 00:41, Kireet Reddy  wrote:
>> On our 4 node test cluster (1.1.2), seemingly out of the blue we had one 
>> node experience very high cpu usage and become unresponsive and then after 
>> about 8 hours another node experienced the same issue. The processes 
>> themselves stayed alive, gc activity was normal, they didn't experience an 
>> OutOfMemoryError. The nodes left the cluster though, perhaps due to the 
>> unresponsiveness. The only errors in the log files were a bunch of messages 
>> like:
>> 
>> org.elasticsearch.search.SearchContextMissingException: No search context 
>> found for id ...
>> 
>> and errors about the search queue being full. We see the 
>> SearchContextMissingException occasionally during normal operation, but 
>> during the high cpu period it happened quite a bit.
>> 
>> I don't think we had an unusually high number of queries during that time 
>> because the other 2 nodes had normal cpu usage and for the prior week things 
>> ran smoothly.
>> 
>> We are going to restart testing, but is there anything we can do to better 
>> understand what happened? Maybe change a particular log level or do 
>> something while the problem is happening, assuming we can reproduce the 
>> issue?
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/58351342-da89-43ad-a1be-194d8b608457%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/6ze7e1TVM8A/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAEM624bNyfbBkLZ

Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-17 Thread Jinyuan Zhou
I will check the value. However, it has problem only when I use both
es.mapping.id and 'dynamic/mult resource wirtes' feature. used separately
they are fine.

Jinyuan (Jack) Zhou


On Tue, Jun 17, 2014 at 6:25 AM, Costin Leau  wrote:

> Most likely the some of your data contains some invalid entries which
> result in an invalid JSON payload being sent to ES.
> Check your ID values and/or keep an eye on issue #217 which aims to
> provide more human-friendly messages for the user.
>
> Cheers.
>
> https://github.com/elasticsearch/elasticsearch-hadoop/issues/217
>
>
> On 6/17/14 2:42 AM, Jinyuan Zhou wrote:
>
>> sure, I was able to run  follwoing command against my remote es cluster.
>> hive -i init.hive -f search.hql.
>>
>> Below is the contents of init.hive, search.hql and data file in hdfs
>> /user/cloudera/hivework/foobar/foobar.data
>>
>> I replaced value for es.nodes with fake name. Other than that,  it should
>> ran without problem. I am using feature called
>> 'dynamic/mult resource wirtes. It works in this example, but when I also
>> add 'es.mapping.id ' =
>>
>> 'id' setting. I got a the following error:
>> /
>> Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
>> Unexpected character ('"' (code 34)): was expecting
>> comma to separate OBJECT entries
>>   at [Source: [B@7be1d686; line: 1, column: 53]
>>  at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:300)
>>  at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:278)/
>>
>>
>>
>> -init.hive
>>
>> set es.nodes=my.remote.escluster;
>> set es.port=9200;
>> set es.index.auto.create=yes;
>> set hive.cli.print.current.db=true;
>> set hive.exec.mode.local.auto=true;
>> set mapred.map.tasks.speculative.execution=false;
>> set mapred.reduce.tasks.speculative.execution=false;
>> set hive.mapred.reduce.tasks.speculative.execution=false;
>> add jar /home/cloudera/elasticsearch-hadoop-2.0.0/dist/
>> elasticsearch-hadoop-hive-2.0.0.jar;
>>
>> -search.hql
>>
>> use search;
>> DROP TABLE IF EXISTS foo;
>> CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>> LOCATION '/user/cloudera/hivework/foobar';
>> select * from foo;
>> DROP TABLE IF EXISTS es_foo;
>> CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
>> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
>> TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');
>>
>> INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;
>>
>> - /user/cloudera/hivework/foobar/foobar.data ---
>>
>> 1, bar1, first_bar
>> 2, bar2, first_bar
>> 3, foo_bar_1, second_bar
>> 4, foo_bar_12, second_bar
>> ~
>>
>>
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau > > wrote:
>>
>> Thanks for sharing - can you also give an example of the table
>> initialization in init.hive vs myscript.hql?
>>
>> Cheers!
>>
>>
>> On 6/16/14 11:19 PM, Jinyuan Zhou wrote:
>>
>> Just share a solution  I learned  hive side.
>>
>> hive cli has an -i option that takes a  file of hive commands to
>> initilize the session.
>> so I can put a list of set comand as well as add jar ... command
>> in one file, say inithive
>> then run the cli as this:  hive -i init.hive -f myscript.hql.
>>  Note table creation hql inside myscript.hql don't
>> have to
>> set es.* properties as long as it appears in init.hive file  This
>> solves my problem.
>> Thanks,
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou <
>> zhou.jiny...@gmail.com 
>> __>>
>> wrote:
>>
>>  Thanks Costin,
>>  I am aiming at modifying  the existing hadoop cluster and
>> hive installation and also modularizing   some
>> common es.*
>>  properies in a separate common place.  I know the first goal
>> can be achieved with hive cli  --auxpath
>> option  and
>>  hive table's TBLPROPERTERTIES. For the secon goal, I am able
>> to move  some es.* settings from TBLPROPERTIES
>>  declaration to hive's set statments. For example, I can put
>>
>>  set es.nodes=my.domain.com  <
>> http://my.domain.com>
>>
>>
>>
>>  in the same hql file  then skip es.nodes setting in
>> TBLPROPERTIES in the external table delcarations in the
>> SAME
>>  hql. But I wish  I can move the set statetemnt in a separate
>> file. I now realize this is rather a  hive
>> question.
>>  Regards,
>>  Jack
>>
>>
>>  On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau <
>> costin.l...@gmail.com 
>> >__>

cross_fields with decompound words

2014-06-17 Thread Bernhardt Scherer
Hi there,

I have a question concerning the decompounding of the search term.

Use case:

- Search is a multi match search on "name" and "category"
- cross_fields is used: each of the search terms need to be in one of the 
searched fields
- There is a product named "holzspiralbohrer"
- There is a product named "bohrer" in the category "holz"


-User is looking for "holzspiralbohrer" but doesn't know the exact 
terminology and types in the search term "holzbohrer"

In order to deliver results for "holzbohrer" I wanted to decompound the 
search term into "holz" and "bohrer".

To deliver the proper result, I would need to define something like this 
rule:

IF a search word gets decompounded ("holzbohrer" -> "holz" "bohrer")
THEN the decompounded parts should BOTH match in ONE field

In the stated example:
-"holzspiralbohrer" should be in the search results
-"bohrer" in the category "holz" shouldn't be

Is there any way of realizing this?

Many thanks in advance!

Best regards,
Bernhardt


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3faedbcd-8759-4d3c-924a-6e797acfb912%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard failure when scrolling - invalid results, but no error reported

2014-06-17 Thread mooky
With a bit of cunning use of the debugger, I managed to get the stacktrace 
that was lost.
It appears that maybe the ElasticsearchIntegrationTest environment is the 
cause...
This looks like a bug?

java.lang.AssertionError
 at org.elasticsearch.common.util.BigArrays$LongArrayWrapper.get(BigArrays.
java:200)
 at org.elasticsearch.test.cache.recycler.MockBigArrays$LongArrayWrapper.get
(MockBigArrays.java:374)
 at org.elasticsearch.common.util.BytesRefHash.get(BytesRefHash.java:66)
 at org.elasticsearch.common.util.BytesRefHash.set(BytesRefHash.java:101)
 at org.elasticsearch.common.util.BytesRefHash.add(BytesRefHash.java:145)
 at org.elasticsearch.search.aggregations.bucket.terms.
StringTermsAggregator$WithOrdinals.collect(StringTermsAggregator.java:299)
 at org.elasticsearch.search.aggregations.
AggregationPhase$AggregationsCollector.collect(AggregationPhase.java:164)
 at org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.
java:60)
 at org.apache.lucene.search.Scorer.score(Scorer.java:65)
 at org.apache.lucene.search.AssertingScorer.score(AssertingScorer.java:136)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
 at org.elasticsearch.search.internal.ContextIndexSearcher.search(
ContextIndexSearcher.java:173)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
 at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:123)
 at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.
java:282)
 at org.elasticsearch.search.action.SearchServiceTransportAction.
sendExecuteQuery(SearchServiceTransportAction.java:270)
 at org.elasticsearch.action.search.type.
TransportSearchScrollQueryThenFetchAction$AsyncAction.executeQueryPhase(
TransportSearchScrollQueryThenFetchAction.java:200)
 at org.elasticsearch.action.search.type.
TransportSearchScrollQueryThenFetchAction$AsyncAction.access$600(
TransportSearchScrollQueryThenFetchAction.java:75)
 at org.elasticsearch.action.search.type.
TransportSearchScrollQueryThenFetchAction$AsyncAction$2.run(
TransportSearchScrollQueryThenFetchAction.java:184)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
ThreadPoolExecutor.java:886)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:908)
 at java.lang.Thread.run(Thread.java:662)



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4ed29d05-f287-44ff-ba83-f690b71fae1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Proper parsing of String values like 1m, 1q HOUR etc.

2014-06-17 Thread Thomas
Hi,

I was wondering whether there is a proper Utility class to parse the given 
values and get the duration in milliseconds probably for values such as 1m 
(which means 1 minute) 1q (which means 1 quarter) etc.

I have found that elasticsearch utilizes class TimeValue but it only parses 
up to week, and values such as WEEK, HOUR are not accepted. So is in 
elasticsearch source any utility class that does the job ? (for Histograms, 
ranges wherever is needed)

Thank you
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/12ff6fbe-4d0e-4e8a-aa74-356311512b3e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Shard failure when scrolling - invalid results, but no error reported

2014-06-17 Thread mooky

I have been having some problem with a failing test (using the 
ElasticsearchIntegrationTest) that was testing scrolling.
It took quite a while to notice that in my response was an indication I was 
getting a shard failure (my search response was OK, but the response 
included shard failures).
I strongly suspect this is why I am not getting the results I expect from 
the scroll (ie no results, totalHits=0)

When I dump the search response, I get a bit of a cryptic (at least for me) 
message and I can't figure out what the cause is:
e.g.

{
  "_scroll_id" : "c2NhbjswOzE7dG90YWxfaGl0czoxOw==",
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
"total" : 10,
"successful" : 9,
"failed" : 1,
"failures" : [ {
  "index" : "logistics",
  "shard" : 8,
  "status" : 500,
  "reason" : "QueryPhaseExecutionException[[logistics][8]: 
query[+ConstantScore(cache(_type:quota)) 
+(groupCompanyId:582D1EE6CDF54510AAA7CC2AA635F31B) 
+_all:cancelled*],from[0],size[500]: Query Failed [Failed to execute main 
query]]; nested: AssertionError; "
} ]
  },
  "hits" : {
"total" : 1000,
"max_score" : 0.0,
"hits" : [ ]
  }
}



1) I would have expected this error to manifest itself in the form of an 
exception (java api) rather than having to inspect shard failures manually 
(given the impact on the results).
I have seen all shards fail, but still no error reported. Is this expected?

2) How can I get a full stack trace of the error related to the shard 
failure? "nested: AssertionError;" is teasing me with the fact that there 
*is* more info, but its being kept from me.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/023bdc12-6e6a-4bd2-a357-f8419cb4a403%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query Performance

2014-06-17 Thread ravimbhatt
Hi Binh,

thanks for helping. 

My record size for 1st query is 4 fields. 3 of them integers and a date. so 
the _source is not big enough to raise concerns. I will anyways try your 
suggestion and report any improvements here. 

For the 2nd query: i have 15gb of RAM. only 20% of which gets utilised 
during the tests. Thanks for all three suggestion, Will definitely try that 
and come back here. Good catch for using script in simSum, thanks. I need 
just the sum of that field, which does not need a script. Will change that 
and see what happens. 

For the 3rd query, i do not care about the _score of returned values. Will 
give that a try as well. 

Thanks a lot. 

Ravi


On Tuesday, 17 June 2014 15:28:21 UTC+1, Binh Ly wrote:
>
> For the first query, since you don't care about the _score, move the bool 
> query into a filter. If you only need field1 and field2 and your _source is 
> big, might be able to save some network payload using source filtering only 
> for those 2 fields.
>
> For the second query, if you have a lot RAM and say col_a and col_b are 
> not big values (long strings) and not high cardinality, you can try to 
> switch all _source.col_a (or _source.blah) to doc['col_a'].value in your 
> scripts. This syntax will load the field values into memory and should 
> perform faster than _source.blah. And your last stats agg (simSum), not 
> sure why that needs to be a script - can it just be a stats-field on col_x? 
> Also if the second query does not need to return hits (i.e. you only need 
> info from the aggs), you can set search_type=count to further optimize it.
>
> For the third query, if you don't care about _score, move the query part 
> into the filter part.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1d6e9de2-a063-4e41-8068-87f195faaf87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Swap indexes?

2014-06-17 Thread 'Binh Ly' via elasticsearch
Not sure I fully understand, but you might be interested in 
snapshot/restore:

http://www.elasticsearch.org/blog/introducing-snapshot-restore/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7d3639ff-4244-4fcb-84d6-b6f17ce9a585%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-17 Thread Georgi Ivanov
I don't know how you are doing the indexing .

Are you using bulk request or .. ? Bulk insert can greatly increase 
indexing speed.

You can also check node client. It should have better indexing speed 
because it will be 1 hop operation, compared to two hop with transport 
client. (assuming Java AP here)

You can hit the limits of the bulk thread pool(can be increased). If you 
are sending all indexing ops to one server only. One could try to hist all 
master nodes on round-robin basis.

You can monitor IOPs in marvel (or iostat locally on the server) to see if 
are not hitting IO limit.

On my ES cluster i reach 50k indexing ops per second.


On Monday, June 9, 2014 5:40:53 PM UTC+2, pranav amin wrote:
>
> Hi all,
>
> While doing some prototyping in ES using SSD's we got some good Write TPS. 
> But the Write TPS saturated after adding some more nodes! 
>
>
> Here are the details i used for prototyping -
>
> Requirement: To read data as soon as possible since the read is followed 
> by write. 
> Version of ES:1.0.0
> Document Size:144 KB
> Use of SSD for Storage: Yes
> Benchmarking Tool: Soap UI or Jmeter
> VM: Ubuntu, 64 Bit OS
> Total Nodes: 12
> Total Shards: 60
> Threads: 200
> Replica: 2
> Index Shards: 20
> Total Index:1 
> Hardware configuration: 4 CPU, 6 GB RAM, 3 GB Heap
>
> Using the above setup we got Write TPS ~= 500. 
>
> We wanted to know by adding more node if we can increase our Write TPS. 
> But we couldn't. 
> * By adding 3 more nodes (i..e Total Nodes = 15) the TPS just increase by 
> 10 i.e. ~= 510. 
> * Adding more Hardware like CPU, RAM and increasing Heap didn't help as 
> well [8 CPU, 12 GB RAM, 5 GB Heap].
>
> Can someone help out or point ideas what will be wrong? Conceptually ES 
> should scale in terms of Write & Read TPS by adding more nodes. However we 
> aren't able to get that.
>
> Much appreciated if someone can point us in the right direction. Let me 
> know if more information is needed.
>
> Thanks
> Pranav.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3470cead-d70a-4dbc-af3c-4b47abce4d40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query Performance

2014-06-17 Thread ravimbhatt
Hi Georgi,

Thanks for your response. I clear the caches before each test i.e. a test 
for 5000 unique ids. During the test period, cache size reaches 2.5-3 gb 
for filter cache and 120+ mb for field cache. 

The response time (90th percentile) for query 1 is about 85-100 
milliseconds. The max i see is about 375 milliseconds. Ideally i should 
avoid sorting but in this case i need this sorting. And given the times are 
mostly under 100 ms, i guess that sort is fine. 

Please let me know if you think otherwise. 

Thanks!
Ravi

On Tuesday, 17 June 2014 15:06:49 UTC+1, Georgi Ivanov wrote:
>
> Does the response time improve when caches are full ?
>
> Can you try query without sort and see of things get better ?
>
> I found that sorting in ES is not good idea sometimes.
>
> Georgi
>
>
> On Monday, June 16, 2014 1:40:38 PM UTC+2, ravim...@gmail.com wrote:
>>
>> Hi All, 
>>
>> I am trying to improve my ES query performance. The goal is to get 
>> response times for 3 related queries under a second!. In my test i have 
>> seen 90th percentile response time (*took time*) for combined 3 queries 
>> to be ~1.8 seconds. Here are the details: 
>>
>> *Cluster*: 
>> - 5 Machines, 5 Shards, Currently on m3.2xlarge. (Had started with less 
>> powerful boxes and went up one by one, started from m3.large)
>> - 4 indexes. 
>>  - one index with *~90 million* recrods (total *19.3 GB *on all 
>> shards*.*)
>>  - one with *~24 million* (total *6GB* on all shards.)
>>  - Other two are in 780K and 340K ( total *160MB* and *190MB*)
>> - All *fields* in the larger indexes are *integers*.
>> - Record size is small-ish.
>> - indexes are *compressed*. 
>> - I have given *15 GB to ES* instances. 
>> - Indexes are stored on *EBS* volumes. Each instance has *250GB* volume 
>> with it. (Keeping SSDs as last resort) 
>>
>> The indexes are not changing (for now, in future they would change once a 
>> day). So no indexing is taking place while we query. *Therefore*, I have 
>> tried things like *reducing number of segments* in the two larger 
>> indexes. That helped to a point. 
>>
>> *Querying Technique*:
>>
>> - use python ES client. 
>> - *3 small instance* forking *10 threads* at the same time. 
>> - Each thread would fire *3 queries* before reporting a time. 
>> - At time there would be *~100 concurren*t queries on the machines. 
>> settles around ~50-60. 
>> - I take *'took'* time from ES response to measure times. 
>> - I *discard 100 records* before measuring times. 
>> - A total of *5000 unique users* are used for which 3 ES queries would 
>> be fired. A total of *4900 users' times* are measured.  
>>
>> *Observations*:
>>
>> - RAM is never under stress. Well below 15 GB allotted. 
>> - CPU comes under strain, goes upto 85-95 region on all instances during 
>> the tests. 
>>
>> *Queries*: 
>>
>> *1. On an index with ~24 Million records*: 
>>
>> res = es.search( index="index1", 
>> body={"query":{"bool":{"must":[{"term":{"cid":value}}]}}}, sort=[ 
>> "source:desc", "cdate:desc" ], size=100, fields=["wiid"], _source="true")
>>
>> i parse results of these queries to get certain fields out and pass on to 
>> the 2nd query. Lets call those fields as: *q1.field1* and *q2.field2*
>>
>> *2. On an index with ~90 million records:*
>>
>> res1 = es.search(index="index2", 
>> body={"query":{"filtered":{"filter":{"bool":{"must":{"terms":{"*col_a*":
>> *q1.field1*}},"must_not":{"terms":{"*col_b*":*q1.field1*
>> }},"aggs":{"i2B":{"terms":{"field":"*col_b*", "size": 1000 
>> ,"shard_size":1, "order" : { "mss.sum":"desc"} 
>> },"aggs":{"mss":{"stats":{"script":"ca = _source.*col_a*; 
>> index=wiids.indexOf(ca); sval=0; if(index!=-1) sval=svalues.get(index); 
>> else sval=-1; return _source.*col_x**sval; ","params":{"wiids":
>> *q1.field1*,"svalues":*q1.field2*}}},"simSum":{"stats":{"script":"return 
>> _source.*col_x* "}}, size=1)
>>
>> - it uses *filtered query*.
>> - uses *2 aggregations*
>> - uses *script in aggregation*.  
>> - use *shard_size* 
>>
>> Again, i parse results and get a filed out. Lets call that field as: 
>> *q2.field1*
>>
>> 3. *On an index with ~340K records:*
>>
>>  res2 = es.search(index="index3", body= { "query" :  { "filtered" : { 
>> "query":{ "terms":{ "wiid":*q2.field1*  }  }, "filter" : { "bool" : { 
>> "must" : [ {  "range" : {"isInRange": { "gte" : 10  } } } , { "term" : { 
>> "isCondA" : "false" } } , { "term" : { "isCondB" : "false"} }, { "term" : { 
>> "isCondC" : "false" }  }  ]  }  } } } }   ,  size=1000)
>>
>> Please let me know if any other information would help you help me. 
>>
>> Query 2 above is doing aggregations and using a custom script. This is 
>> where times reach few seconds, like 2-3 seconds or even 4+ seconds at 
>> times. 
>>
>> I can move to a high end CPU machine and may be the performance would 
>> improve. Wanted to check if there is anything else that i am missing. 
>>
>> Thanks!
>> Ravi
>>
>>

-- 
You received this message because you are subscribed to the Google Groups

Re: Query Performance

2014-06-17 Thread 'Binh Ly' via elasticsearch
For the first query, since you don't care about the _score, move the bool 
query into a filter. If you only need field1 and field2 and your _source is 
big, might be able to save some network payload using source filtering only 
for those 2 fields.

For the second query, if you have a lot RAM and say col_a and col_b are not 
big values (long strings) and not high cardinality, you can try to switch 
all _source.col_a (or _source.blah) to doc['col_a'].value in your scripts. 
This syntax will load the field values into memory and should perform 
faster than _source.blah. And your last stats agg (simSum), not sure why 
that needs to be a script - can it just be a stats-field on col_x? Also if 
the second query does not need to return hits (i.e. you only need info from 
the aggs), you can set search_type=count to further optimize it.

For the third query, if you don't care about _score, move the query part 
into the filter part.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/38d33be0-286d-4dc3-95e3-8b3fadb7f4df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch support for Java 1.8?

2014-06-17 Thread Chris Neal
Thanks very much guys!

Chris


On Tue, Jun 17, 2014 at 9:01 AM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:

> Scripting issues were due to MVEL, but with MVEL 2.2.0.Final, this has
> been fixed in ES.
>
> So yes, you can run ES on Java 8 JVM.
>
> Jörg
>
>
> On Tue, Jun 17, 2014 at 3:58 PM, Georgi Ivanov 
> wrote:
>
>> As far as I know , ES will work just fine with java 1.8,
>> except script support.
>>
>> I read some articles on the Internet that scripting support is broken
>> with java 1.8
>>
>> But I would love to hear someone who actually tried :)
>>
>>
>> On Tuesday, June 17, 2014 3:19:37 PM UTC+2, Chris Neal wrote:
>>>
>>> Hi,
>>>
>>> I saw this blog post from April stating java 1.7u55 as being safe for
>>> Elasticsearch, but I didn't see anything about Java 1.8 support.  Just
>>> wondering if it was :)
>>>
>>> http://www.elasticsearch.org/blog/java-1-7u55-safe-use-
>>> elasticsearch-lucene/
>>>
>>> Thanks!
>>> Chris
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/0e7fa099-f52d-4e70-a533-e013eb0cd75c%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE4kjc9oSwH1gLdkL%2BJF5wBJ0XTMBROCV6NWx7GVC8x%2Bw%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpg7oqrFbeAvxviVJeLHc0%2BUt9DNhTTeJAB89ByS0U_JSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem setting up cluster with NAT address

2014-06-17 Thread pmartins
Hi,

Thanks for the reply.

The firewall on the node is off, and he can't comunicate with himself. The
problem is:

vm-motisqaapp02 has the local address 172.16.3.81 with the NAT 10.10.1.135.
But, with the current data center definitions, it can't solve the
10.10.1.135 doesn't recognizing itself. 

Can I configure different adresses for network.publish host ? One for
comunicating with outside nodes and another with itself?



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Problem-setting-up-cluster-with-NAT-address-tp4057849p4057867.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403014356232-4057867.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Share a document across multiple indices

2014-06-17 Thread Georgi Ivanov
Will aliases help you in this case ?

For example :
index1 : [doc1]
index2 : [doc2]


Create an alias "Docs" for index1 and index2


The run queries against the alias?




On Monday, June 16, 2014 3:51:45 AM UTC+2, Martin Angers wrote:
>
> Hi,
>
> I'm wondering if this is a supported scenario in ElasticSearch, reading 
> the guide and API reference I couldn't find a way to achieve this.
>
> I'd like to index documents only once, say in a master index, and then 
> create secondary or "meta" indices that would only contain a subset of the 
> master index.
>
> For example, document A, B and C would be indexed once in the master 
> index. Then a secondary index would be able to see only documents A and B, 
> while another secondary index could see only documents B and C, etc. (and 
> by "see" I mean the search queries should only consider those documents)
>
> The idea being that documents could be relatively big, and they should not 
> be indexed multiple times.
>
> Does that make sense? Am I missing "the right way" to design such a 
> pattern? I am new to ES.
>
> Thanks,
> Martin
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ab9180d-6af9-45d4-8b2c-22f32869ee2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Garbage collector logs long passes

2014-06-17 Thread 'Binh Ly' via elasticsearch
You likely want to find out whats taking up your heap. The biggest consumer 
of heap is fielddata. This will tell you what is in your fielddata and you 
can track it back to your code to see where you are using these fields:

curl localhost:9200/_nodes/stats/indices/fielddata/*?pretty

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6b539a31-a164-4398-8154-d87a3f3ee037%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Swap indexes?

2014-06-17 Thread Georgi Ivanov
Tribe node may be ?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-tribe.html#modules-tribe

On Tuesday, June 17, 2014 10:31:00 AM UTC+2, Lee Gee wrote:
>
> Is it possible to have one ES instance create an index and then have a 
> second instance use that created index, without downtime?
>
> tia
> lee
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d57b3e0f-fc7b-429e-8ace-c786d4627b96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query Performance

2014-06-17 Thread Georgi Ivanov
Does the response time improve when caches are full ?

Can you try query without sort and see of things get better ?

I found that sorting in ES is not good idea sometimes.

Georgi


On Monday, June 16, 2014 1:40:38 PM UTC+2, ravim...@gmail.com wrote:
>
> Hi All, 
>
> I am trying to improve my ES query performance. The goal is to get 
> response times for 3 related queries under a second!. In my test i have 
> seen 90th percentile response time (*took time*) for combined 3 queries 
> to be ~1.8 seconds. Here are the details: 
>
> *Cluster*: 
> - 5 Machines, 5 Shards, Currently on m3.2xlarge. (Had started with less 
> powerful boxes and went up one by one, started from m3.large)
> - 4 indexes. 
>  - one index with *~90 million* recrods (total *19.3 GB *on all shards
> *.*)
>  - one with *~24 million* (total *6GB* on all shards.)
>  - Other two are in 780K and 340K ( total *160MB* and *190MB*)
> - All *fields* in the larger indexes are *integers*.
> - Record size is small-ish.
> - indexes are *compressed*. 
> - I have given *15 GB to ES* instances. 
> - Indexes are stored on *EBS* volumes. Each instance has *250GB* volume 
> with it. (Keeping SSDs as last resort) 
>
> The indexes are not changing (for now, in future they would change once a 
> day). So no indexing is taking place while we query. *Therefore*, I have 
> tried things like *reducing number of segments* in the two larger 
> indexes. That helped to a point. 
>
> *Querying Technique*:
>
> - use python ES client. 
> - *3 small instance* forking *10 threads* at the same time. 
> - Each thread would fire *3 queries* before reporting a time. 
> - At time there would be *~100 concurren*t queries on the machines. 
> settles around ~50-60. 
> - I take *'took'* time from ES response to measure times. 
> - I *discard 100 records* before measuring times. 
> - A total of *5000 unique users* are used for which 3 ES queries would be 
> fired. A total of *4900 users' times* are measured.  
>
> *Observations*:
>
> - RAM is never under stress. Well below 15 GB allotted. 
> - CPU comes under strain, goes upto 85-95 region on all instances during 
> the tests. 
>
> *Queries*: 
>
> *1. On an index with ~24 Million records*: 
>
> res = es.search( index="index1", 
> body={"query":{"bool":{"must":[{"term":{"cid":value}}]}}}, sort=[ 
> "source:desc", "cdate:desc" ], size=100, fields=["wiid"], _source="true")
>
> i parse results of these queries to get certain fields out and pass on to 
> the 2nd query. Lets call those fields as: *q1.field1* and *q2.field2*
>
> *2. On an index with ~90 million records:*
>
> res1 = es.search(index="index2", 
> body={"query":{"filtered":{"filter":{"bool":{"must":{"terms":{"*col_a*":
> *q1.field1*}},"must_not":{"terms":{"*col_b*":*q1.field1*
> }},"aggs":{"i2B":{"terms":{"field":"*col_b*", "size": 1000 
> ,"shard_size":1, "order" : { "mss.sum":"desc"} 
> },"aggs":{"mss":{"stats":{"script":"ca = _source.*col_a*; 
> index=wiids.indexOf(ca); sval=0; if(index!=-1) sval=svalues.get(index); 
> else sval=-1; return _source.*col_x**sval; ","params":{"wiids":*q1.field1*
> ,"svalues":*q1.field2*}}},"simSum":{"stats":{"script":"return _source.
> *col_x* "}}, size=1)
>
> - it uses *filtered query*.
> - uses *2 aggregations*
> - uses *script in aggregation*.  
> - use *shard_size* 
>
> Again, i parse results and get a filed out. Lets call that field as: 
> *q2.field1*
>
> 3. *On an index with ~340K records:*
>
>  res2 = es.search(index="index3", body= { "query" :  { "filtered" : { 
> "query":{ "terms":{ "wiid":*q2.field1*  }  }, "filter" : { "bool" : { 
> "must" : [ {  "range" : {"isInRange": { "gte" : 10  } } } , { "term" : { 
> "isCondA" : "false" } } , { "term" : { "isCondB" : "false"} }, { "term" : { 
> "isCondC" : "false" }  }  ]  }  } } } }   ,  size=1000)
>
> Please let me know if any other information would help you help me. 
>
> Query 2 above is doing aggregations and using a custom script. This is 
> where times reach few seconds, like 2-3 seconds or even 4+ seconds at 
> times. 
>
> I can move to a high end CPU machine and may be the performance would 
> improve. Wanted to check if there is anything else that i am missing. 
>
> Thanks!
> Ravi
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ea0b3e2b-a76d-417b-99e3-db927355a2bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch support for Java 1.8?

2014-06-17 Thread joergpra...@gmail.com
Scripting issues were due to MVEL, but with MVEL 2.2.0.Final, this has been
fixed in ES.

So yes, you can run ES on Java 8 JVM.

Jörg


On Tue, Jun 17, 2014 at 3:58 PM, Georgi Ivanov 
wrote:

> As far as I know , ES will work just fine with java 1.8,
> except script support.
>
> I read some articles on the Internet that scripting support is broken with
> java 1.8
>
> But I would love to hear someone who actually tried :)
>
>
> On Tuesday, June 17, 2014 3:19:37 PM UTC+2, Chris Neal wrote:
>>
>> Hi,
>>
>> I saw this blog post from April stating java 1.7u55 as being safe for
>> Elasticsearch, but I didn't see anything about Java 1.8 support.  Just
>> wondering if it was :)
>>
>> http://www.elasticsearch.org/blog/java-1-7u55-safe-use-
>> elasticsearch-lucene/
>>
>> Thanks!
>> Chris
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0e7fa099-f52d-4e70-a533-e013eb0cd75c%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE4kjc9oSwH1gLdkL%2BJF5wBJ0XTMBROCV6NWx7GVC8x%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem setting up cluster with NAT address

2014-06-17 Thread Georgi Ivanov
Doesn't sound like elasticsearch issue ...

I would look to my FW rules



On Tuesday, June 17, 2014 2:17:20 PM UTC+2, pmartins wrote:
>
> Hi, 
>
> I'm having some problems setting up a 1.2.1 ES cluster. I have two nodes, 
> each one in a different data center/network. 
>
> One of the nodes is behind a NAT address, so I set network.publish_host to 
> de NAT address. 
>
> Both nodes connect to each other without problems. The issue is when the 
> node behind the NAT address tries to connect to himself. In my network, he 
> doesn't know his NAT address and can't solve it. So I get the exception: 
>
> [2014-06-17 12:58:19,681][WARN ][cluster.service  ] 
> [vm-motisqaapp02] failed to reconnect to node 
> [vm-motisqaapp02][4oSfsIaBTSyQWdnxiTt7Cw][vm-motisqaapp02.***][inet[/10.10.1.135:9300]]{master=true}
>  
>
> org.elasticsearch.transport.ConnectTransportException: 
> [vm-motisqaapp02][inet[/10.10.1.135:9300]] connect_timeout[30s] 
> at 
> org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:727)
>  
>
> at 
> org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:656)
>  
>
> at 
> org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:624)
>  
>
> at 
> org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:146)
>  
>
> at 
> org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:518)
>  
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source) 
> at java.lang.Thread.run(Unknown Source) 
> Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException: 
> connection timed out: /10.10.1.135:9300 
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
>  
>
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
>  
>
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>  
>
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
>  
>
> at 
> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>  
>
> at 
> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  
>
> ... 3 more 
>
> vm-motisqaapp02 NAT address is 10.10.1.135, but locally it can't solve 
> this 
> address. Is there any way that I can setup other IP to comunicate locally? 
>
>
>
> -- 
> View this message in context: 
> http://elasticsearch-users.115913.n3.nabble.com/Problem-setting-up-cluster-with-NAT-address-tp4057849.html
>  
> Sent from the ElasticSearch Users mailing list archive at Nabble.com. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24e11c67-8133-4893-b665-09f31735f269%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch support for Java 1.8?

2014-06-17 Thread Georgi Ivanov
As far as I know , ES will work just fine with java 1.8,
except script support.

I read some articles on the Internet that scripting support is broken with 
java 1.8

But I would love to hear someone who actually tried :)


On Tuesday, June 17, 2014 3:19:37 PM UTC+2, Chris Neal wrote:
>
> Hi,
>
> I saw this blog post from April stating java 1.7u55 as being safe for 
> Elasticsearch, but I didn't see anything about Java 1.8 support.  Just 
> wondering if it was :)
>
>
> http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/
>
> Thanks!
> Chris
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e7fa099-f52d-4e70-a533-e013eb0cd75c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch - search statistic - like google analytics

2014-06-17 Thread Jacob Dalgaard
Hello Mark
Thank you for your reply. Then I will look into this approch.
 
Regards.
Jacob

Den tirsdag den 17. juni 2014 14.06.15 UTC+2 skrev Mark Walkom:

> ES doesn't store this natively, you'd have to put something in-between the 
> user and ES to capture and collate this information.
>
> Your LS idea seems like a good one to solve it.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 17 June 2014 20:41, Jacob Dalgaard > 
> wrote:
>
>> Hello
>> I am looking into using ElasticSearch as a search engine for one of the 
>> projects I am working on. There is still one thing which I need to find an 
>> answer for, and I hope someone inhere can help.
>> The customer want to be able to see some search statistic, like google 
>> analytics. Most searched words, new search words and so on. 
>>  
>> Is there a way to easily setup this type of search statistic? 
>> My idea is something like ElasticSearch stores search history, about the 
>> search request made to the REST API. Then my customer can use Kibana or 
>> some other visual tool to monitor the search history of ElasticSearch.
>>  
>>  
>> Another approch could be to set up LogStash to pick up all log entries to 
>> the IIS on search requests, and put em in ElasticSearch. Then they could be 
>> viewed with Kibana. Is anyone aware of a logstash pattern for IIS?
>>  
>>  
>> Hope someone can help me with an answer for this.
>>  
>>  
>>
>> Regards Jacob
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5dbbf0b8-739a-4201-8500-6dd8efdccb42%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8ee2eb2f-ce2c-4eac-9c89-022c08041893%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: reverse_nested aggregation facing troubles when applied to array of nested objects

2014-06-17 Thread Adrian Luna
If not possible. Is there any other way to aggregate by several fields? Or 
do I need to make 2 different aggregations and then merge them in my 
application?

El martes, 17 de junio de 2014 15:11:26 UTC+2, Adrian Luna escribió:
>
> Ok, just realized something. The problem wasn't related to this. But in 
> order to use the 1.2 version (which first expose this reverse_nested 
> functionallity), something seem to change from the 1.1 version I was using 
> before.
>
> Something I usually did before is aggregation by several fields using the 
> same aggregation name in order to "merge" the results (which I must 
> recognize, I have never seen documented). I mean:
>
> {
>  "aggs":{
>"forms":[
>  {"terms":{"field":"object_of_type_a.form"}},
>  {"terms":{"field":"object_of_type_b.form"}}
>]
>  }
> }
>
> Such thing was working on previous versions, but not anymore?
>
> Thanks in advance
>
> El martes, 17 de junio de 2014 14:18:08 UTC+2, Adrian Luna escribió:
>>
>> I have an issue where my mapping includes an array of nested objects. 
>> Let's imagine something simplified like this:
>>
>> 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1220658-8eca-4b30-b595-3a77b1de8a90%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: reverse_nested aggregation facing troubles when applied to array of nested objects

2014-06-17 Thread Adrian Luna
If not possible. Is there any other way to aggregate by several fields?

El martes, 17 de junio de 2014 15:11:26 UTC+2, Adrian Luna escribió:
>
> Ok, just realized something. The problem wasn't related to this. But in 
> order to use the 1.2 version (which first expose this reverse_nested 
> functionallity), something seem to change from the 1.1 version I was using 
> before.
>
> Something I usually did before is aggregation by several fields using the 
> same aggregation name in order to "merge" the results (which I must 
> recognize, I have never seen documented). I mean:
>
> {
>  "aggs":{
>"forms":[
>  {"terms":{"field":"object_of_type_a.form"}},
>  {"terms":{"field":"object_of_type_b.form"}}
>]
>  }
> }
>
> Such thing was working on previous versions, but not anymore?
>
> Thanks in advance
>
> El martes, 17 de junio de 2014 14:18:08 UTC+2, Adrian Luna escribió:
>>
>> I have an issue where my mapping includes an array of nested objects. 
>> Let's imagine something simplified like this:
>>
>> 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7d5dce21-abbb-4fa9-b326-9d9e73da4097%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-17 Thread Costin Leau

Most likely the some of your data contains some invalid entries which result in 
an invalid JSON payload being sent to ES.
Check your ID values and/or keep an eye on issue #217 which aims to provide 
more human-friendly messages for the user.

Cheers.

https://github.com/elasticsearch/elasticsearch-hadoop/issues/217

On 6/17/14 2:42 AM, Jinyuan Zhou wrote:

sure, I was able to run  follwoing command against my remote es cluster.
hive -i init.hive -f search.hql.

Below is the contents of init.hive, search.hql and data file in hdfs 
/user/cloudera/hivework/foobar/foobar.data

I replaced value for es.nodes with fake name. Other than that,  it should ran 
without problem. I am using feature called
'dynamic/mult resource wirtes. It works in this example, but when I also add 
'es.mapping.id ' =
'id' setting. I got a the following error:
/
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Unexpected 
character ('"' (code 34)): was expecting
comma to separate OBJECT entries
  at [Source: [B@7be1d686; line: 1, column: 53]
 at 
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300)
 at 
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:278)/


-init.hive

set es.nodes=my.remote.escluster;
set es.port=9200;
set es.index.auto.create=yes;
set hive.cli.print.current.db=true;
set hive.exec.mode.local.auto=true;
set mapred.map.tasks.speculative.execution=false;
set mapred.reduce.tasks.speculative.execution=false;
set hive.mapred.reduce.tasks.speculative.execution=false;
add jar 
/home/cloudera/elasticsearch-hadoop-2.0.0/dist/elasticsearch-hadoop-hive-2.0.0.jar;

-search.hql

use search;
DROP TABLE IF EXISTS foo;
CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/cloudera/hivework/foobar';
select * from foo;
DROP TABLE IF EXISTS es_foo;
CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');

INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;

- /user/cloudera/hivework/foobar/foobar.data ---

1, bar1, first_bar
2, bar2, first_bar
3, foo_bar_1, second_bar
4, foo_bar_12, second_bar
~




Jinyuan (Jack) Zhou


On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau mailto:costin.l...@gmail.com>> wrote:

Thanks for sharing - can you also give an example of the table 
initialization in init.hive vs myscript.hql?

Cheers!


On 6/16/14 11:19 PM, Jinyuan Zhou wrote:

Just share a solution  I learned  hive side.

hive cli has an -i option that takes a  file of hive commands to 
initilize the session.
so I can put a list of set comand as well as add jar ... command in one 
file, say inithive
then run the cli as this:  hive -i init.hive -f myscript.hql.  Note 
table creation hql inside myscript.hql don't
have to
set es.* properties as long as it appears in init.hive file  This 
solves my problem.
Thanks,


Jinyuan (Jack) Zhou


On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou mailto:zhou.jiny...@gmail.com>
__>> 
wrote:

 Thanks Costin,
 I am aiming at modifying  the existing hadoop cluster and hive 
installation and also modularizing   some
common es.*
 properies in a separate common place.  I know the first goal can 
be achieved with hive cli  --auxpath
option  and
 hive table's TBLPROPERTERTIES. For the secon goal, I am able to 
move  some es.* settings from TBLPROPERTIES
 declaration to hive's set statments. For example, I can put

 set es.nodes=my.domain.com  



 in the same hql file  then skip es.nodes setting in TBLPROPERTIES 
in the external table delcarations in the
SAME
 hql. But I wish  I can move the set statetemnt in a separate file. 
I now realize this is rather a  hive
question.
 Regards,
 Jack


 On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau mailto:costin.l...@gmail.com>
>__> wrote:

 Could you please raise an issue with some type of example? Due 
to the way Hadoop (and Hive) works,
 things tend to be tricky in terms of configuring a job.

 The configuration needs to be created before a job is submitted 
which in practice means "dynamic
configurations"
 are basically impossible (this also has some security 
implications which are simply avoided this way).
 Thus either one specifies the configuration manually or loads 
a known location file (hive-site.xml,
 core-site.xml...)
 upfront, b

Elasticsearch support for Java 1.8?

2014-06-17 Thread Chris Neal
Hi,

I saw this blog post from April stating java 1.7u55 as being safe for
Elasticsearch, but I didn't see anything about Java 1.8 support.  Just
wondering if it was :)

http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/

Thanks!
Chris

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjgmHbHpGsXOHY%2BBHa5YBrGvj8yFiH7fp%2BV3NMycWCnYA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: reverse_nested aggregation facing troubles when applied to array of nested objects

2014-06-17 Thread Adrian Luna
Ok, just realized something. The problem wasn't related to this. But in 
order to use the 1.2 version (which first expose this reverse_nested 
functionallity), something seem to change from the 1.1 version I was using 
before.

Something I usually did before is aggregation by several fields using the 
same aggregation name in order to "merge" the results (which I must 
recognize, I have never seen documented). I mean:

{
 "aggs":{
   "forms":[
 {"terms":{"field":"object_of_type_a.form"}},
 {"terms":{"field":"object_of_type_b.form"}}
   ]
 }
}

Such thing was working on previous versions, but not anymore?

Thanks in advance

El martes, 17 de junio de 2014 14:18:08 UTC+2, Adrian Luna escribió:
>
> I have an issue where my mapping includes an array of nested objects. 
> Let's imagine something simplified like this:
>
> 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aed8df74-8e91-4d4f-a0c0-da2f895bf6c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Index template requires settings object even if its value is empty

2014-06-17 Thread Brian
By the way, I got a little ahead of myself in the previous post. In 
particular:

"settings" : {
  "index.mapping.ignore_malformed" : true*,*
  *"index.query.default_field" : "message"*
},

Apparently, when added the setting above in red, and then removed the 
following option from my ES 1.2.1 start-up script, Kibana was no longer 
able to search on HTTP and it required message:HTTP because the _all field 
has also been disabled:

-Des.index.query.default_field=message

So I put the configuration option (above) back into my ES start-up script, 
and removed the index configuration option in red above (as it didn't seem 
to work). Not sure if this is a problem with my understanding (most likely) 
or a bug in ES (very unlikely). But I offer it to the experts for comment 
and correction.

But however it should be, ES rocks and I've managed to get several people 
up and running with a one-button (as it were) build, install, load, and 
test. Awesome job, Elasticsearch.com! You make me look good!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d68e3db5-e651-4e57-85b8-fea70a5e8de9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Get X word before and after search word

2014-06-17 Thread Petr Janský
Hello,

I'm trying to find way how to get words/terms around search word eg let's 
have a document with text "The best search engine is ElasticSearch". I will 
search for "best" and get info that word "search" is xtime the next one 
after search words.

Thx
Petr

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a1b4896-6263-4de2-ad45-dc5efd4df7a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


reverse_nested aggregation facing troubles when applied to array of nested objects

2014-06-17 Thread Adrian Luna
I have an issue where my mapping includes an array of nested objects. Let's 
imagine something simplified like this:

{
"properties":{
   "datetime":{"type":"date"},
   "tags":{"type":"object","properties":{
  
 "object_of_type_a":{"type":"nested","properties":{"##SOME FIELDS##"}},
  
 "object_of_type_b":{"type":"nested","properties":{"##SOME FIELDS##"}},
}
}
}

Both object_of_type_a and object_of_type_b are arrays of the actual nested 
object. 

So, one doc may look like:

{
"datetime":"17-06-2014T14:11",
"##other fields I don't care about right now##",
"tags":{
  "object_of_type_a":[{"form":"whatever",...},{"form":"another thing",...}],
  "object_of_type_b":[{"form":"something else",...},{"form":"others",...}],
}
}



Now imagine I want to aggregate for each element of some of the fields from 
one of the inner objects, but also obtain their histogram based on the 
top-level field ("datetime").
 
"aggs": {
"top_agg": {
  "nested": {
"path": "tags.object_of_type_a"
  },
  "aggs": {
"medium_agg": {
  "terms": {
"size": 5,
"field": "tags.object_of_type_a.form"
  },
  "aggs": {
"reverse": {
  "reverse_nested": {},
  "aggs": {
"timeline": {
  "date_histogram": {
"field": "datetime",
"interval": "day"
  }
}
  }
}
  }
}
  }
}



Once I try to do so, I am getting an error:

Parse Failure [Aggregation definition for [object_of_type_a starts with a 
[START_ARRAY], expected a [START_OBJECT].]]; }


Is it possible to perform such an aggregation?
Thanks in advance. Really appreciate any help you can provide..

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/104ffef4-8bd8-4422-9a19-b3b4a31ff7ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Problem setting up cluster with NAT address

2014-06-17 Thread pmartins
Hi,

I'm having some problems setting up a 1.2.1 ES cluster. I have two nodes,
each one in a different data center/network.

One of the nodes is behind a NAT address, so I set network.publish_host to
de NAT address.

Both nodes connect to each other without problems. The issue is when the
node behind the NAT address tries to connect to himself. In my network, he
doesn't know his NAT address and can't solve it. So I get the exception:

[2014-06-17 12:58:19,681][WARN ][cluster.service  ]
[vm-motisqaapp02] failed to reconnect to node
[vm-motisqaapp02][4oSfsIaBTSyQWdnxiTt7Cw][vm-motisqaapp02.***][inet[/10.10.1.135:9300]]{master=true}
org.elasticsearch.transport.ConnectTransportException:
[vm-motisqaapp02][inet[/10.10.1.135:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:727)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:656)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:624)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:146)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:518)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException:
connection timed out: /10.10.1.135:9300
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more

vm-motisqaapp02 NAT address is 10.10.1.135, but locally it can't solve this
address. Is there any way that I can setup other IP to comunicate locally? 



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Problem-setting-up-cluster-with-NAT-address-tp4057849.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403007434340-4057849.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch - search statistic - like google analytics

2014-06-17 Thread Mark Walkom
ES doesn't store this natively, you'd have to put something in-between the
user and ES to capture and collate this information.

Your LS idea seems like a good one to solve it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 17 June 2014 20:41, Jacob Dalgaard  wrote:

> Hello
> I am looking into using ElasticSearch as a search engine for one of the
> projects I am working on. There is still one thing which I need to find an
> answer for, and I hope someone inhere can help.
> The customer want to be able to see some search statistic, like google
> analytics. Most searched words, new search words and so on.
>
> Is there a way to easily setup this type of search statistic?
> My idea is something like ElasticSearch stores search history, about the
> search request made to the REST API. Then my customer can use Kibana or
> some other visual tool to monitor the search history of ElasticSearch.
>
>
> Another approch could be to set up LogStash to pick up all log entries to
> the IIS on search requests, and put em in ElasticSearch. Then they could be
> viewed with Kibana. Is anyone aware of a logstash pattern for IIS?
>
>
> Hope someone can help me with an answer for this.
>
>
>
> Regards Jacob
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5dbbf0b8-739a-4201-8500-6dd8efdccb42%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZzoOu9t8gSu%2B9YuTDBN%2BL%3DTgaBJAgH0xJWeYr-9fc%2Bgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Garbage collector logs long passes

2014-06-17 Thread Mark Walkom
Upgrade to a newer version of ES, also upgrade java, and if you can,
increase your heap.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 17 June 2014 21:00, Kevin Qi  wrote:

> *Hi,*
> *We are running Elasticsearch 0.90.7 on Linux sever (1 node cluster).*
> *From time to time, Elasticsearch stop responding and the issue looks
> related to the Garbage Collector. The log file is shown blow:*
>
> *[2014-06-16 09:35:48,563][WARN ][monitor.jvm  ] [node01]
> [gc][ConcurrentMarkSweep][1674153][113273] duration [12.1s], collections
> [1]/[12.3s], total [12.1s]/[17.9h], memory [7.3gb]->[7.2gb]/[7.9gb],
> all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space]
> [158.2mb]->[95.3mb]/[665.6mb]}{[Par Survivor Space]
> [0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen]
> [34.7mb]->[34.7mb]/[82mb]}*
> *[2014-06-16 09:35:58,800][INFO ][monitor.jvm  ] [node01]
> [gc][ConcurrentMarkSweep][1674154][113274] duration [9.9s], collections
> [1]/[10.2s], total [9.9s]/[17.9h], memory [7.2gb]->[7.2gb]/[7.9gb],
> all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space]
> [95.3mb]->[58.6mb]/[665.6mb]}{[Par Survivor Space]
> [0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen]
> [34.7mb]->[34.7mb]/[82mb]}*
> *[2014-06-16 09:36:11,236][WARN ][monitor.jvm  ] [node01]
> [gc][ConcurrentMarkSweep][1674155][113275] duration [12s], collections
> [1]/[12.4s], total [12s]/[17.9h], memory [7.2gb]->[7.3gb]/[7.9gb],
> all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space]
> [58.6mb]->[138.1mb]/[665.6mb]}{[Par Survivor Space]
> [0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen]
> [34.7mb]->[34.7mb]/[82mb]}*
> *[2014-06-16 09:36:23,879][WARN ][monitor.jvm  ] [node01]
> [gc][ConcurrentMarkSweep][1674156][113276] duration [12.3s], collections
> [1]/[12.6s], total [12.3s]/[17.9h], memory [7.3gb]->[7.2gb]/[7.9gb],
> all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space]
> [138.1mb]->[113mb]/[665.6mb]}{[Par Survivor Space]
> [0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen]
> [34.7mb]->[34.7mb]/[82mb]}*
> *[2014-06-16 09:36:34,043][INFO ][monitor.jvm  ] [node01]
> [gc][ConcurrentMarkSweep][1674157][113277] duration [9.8s], collections
> [1]/[10.1s], total [9.8s]/[17.9h], memory [7.2gb]->[7.2gb]/[7.9gb],
> all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space]
> [113mb]->[79mb]/[665.6mb]}{[Par Survivor Space] [0b]->[0b]/[83.1mb]}{[CMS
> Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen]
> [34.7mb]->[34.7mb]/[82mb]}*
> *[2014-06-16 09:36:46,486][WARN ][monitor.jvm  ] [node01]
> [gc][ConcurrentMarkSweep][1674158][113278] duration [12.1s], collections
> [1]/[12.4s], total [12.1s]/[17.9h], memory [7.2gb]->[7.2gb]/[7.9gb],
> all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space]
> [79mb]->[107.2mb]/[665.6mb]}{[Par Survivor Space] [0b]->[0b]/[83.1mb]}{[CMS
> Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen]
> [34.7mb]->[34.7mb]/[82mb]}*
> *[2014-06-16 09:36:56,649][INFO ][monitor.jvm  ] [node01]
> [gc][ConcurrentMarkSweep][1674159][113279] duration [9.9s], collections
> [1]/[10.1s], total [9.9s]/[18h], memory [7.2gb]->[7.2gb]/[7.9gb], all_pools
> {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space]
> [107.2mb]->[68.7mb]/[665.6mb]}{[Par Survivor Space]
> [0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen]
> [34.7mb]->[34.7mb]/[82mb]}*
> *[2014-06-16 09:37:08,995][WARN ][monitor.jvm  ] [node01]
> [gc][ConcurrentMarkSweep][1674160][113280] duration [12s], collections
> [1]/[12.3s], total [12s]/[18h], memory [7.2gb]->[7.2gb]/[7.9gb], all_pools
> {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space]
> [68.7mb]->[79.7mb]/[665.6mb]}{[Par Survivor Space]
> [0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen]
> [34.7mb]->[34.7mb]/[82mb]}*
>
> *The garbage collector logs long passes (around 10 seconds). Our system
> has total memory of 32G and we set the ES_HEAP_SIZA to be 8G. *
> *We are almost sure this issue comes from long GC run.*
> *What can we do to prevent this behavior and run ES smoothly ?*
>
> *Thanks,*
>
> *Kevin*
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/bac7e7fb-e166-457f-89af-e832ce76010d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed

Garbage collector logs long passes

2014-06-17 Thread Kevin Qi
*Hi,*
*We are running Elasticsearch 0.90.7 on Linux sever (1 node cluster).*
*From time to time, Elasticsearch stop responding and the issue looks 
related to the Garbage Collector. The log file is shown blow:*

*[2014-06-16 09:35:48,563][WARN ][monitor.jvm  ] [node01] 
[gc][ConcurrentMarkSweep][1674153][113273] duration [12.1s], collections 
[1]/[12.3s], total [12.1s]/[17.9h], memory [7.3gb]->[7.2gb]/[7.9gb], 
all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space] 
[158.2mb]->[95.3mb]/[665.6mb]}{[Par Survivor Space] 
[0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen] 
[34.7mb]->[34.7mb]/[82mb]}*
*[2014-06-16 09:35:58,800][INFO ][monitor.jvm  ] [node01] 
[gc][ConcurrentMarkSweep][1674154][113274] duration [9.9s], collections 
[1]/[10.2s], total [9.9s]/[17.9h], memory [7.2gb]->[7.2gb]/[7.9gb], 
all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space] 
[95.3mb]->[58.6mb]/[665.6mb]}{[Par Survivor Space] 
[0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen] 
[34.7mb]->[34.7mb]/[82mb]}*
*[2014-06-16 09:36:11,236][WARN ][monitor.jvm  ] [node01] 
[gc][ConcurrentMarkSweep][1674155][113275] duration [12s], collections 
[1]/[12.4s], total [12s]/[17.9h], memory [7.2gb]->[7.3gb]/[7.9gb], 
all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space] 
[58.6mb]->[138.1mb]/[665.6mb]}{[Par Survivor Space] 
[0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen] 
[34.7mb]->[34.7mb]/[82mb]}*
*[2014-06-16 09:36:23,879][WARN ][monitor.jvm  ] [node01] 
[gc][ConcurrentMarkSweep][1674156][113276] duration [12.3s], collections 
[1]/[12.6s], total [12.3s]/[17.9h], memory [7.3gb]->[7.2gb]/[7.9gb], 
all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space] 
[138.1mb]->[113mb]/[665.6mb]}{[Par Survivor Space] 
[0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen] 
[34.7mb]->[34.7mb]/[82mb]}*
*[2014-06-16 09:36:34,043][INFO ][monitor.jvm  ] [node01] 
[gc][ConcurrentMarkSweep][1674157][113277] duration [9.8s], collections 
[1]/[10.1s], total [9.8s]/[17.9h], memory [7.2gb]->[7.2gb]/[7.9gb], 
all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space] 
[113mb]->[79mb]/[665.6mb]}{[Par Survivor Space] [0b]->[0b]/[83.1mb]}{[CMS 
Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen] 
[34.7mb]->[34.7mb]/[82mb]}*
*[2014-06-16 09:36:46,486][WARN ][monitor.jvm  ] [node01] 
[gc][ConcurrentMarkSweep][1674158][113278] duration [12.1s], collections 
[1]/[12.4s], total [12.1s]/[17.9h], memory [7.2gb]->[7.2gb]/[7.9gb], 
all_pools {[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space] 
[79mb]->[107.2mb]/[665.6mb]}{[Par Survivor Space] [0b]->[0b]/[83.1mb]}{[CMS 
Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen] 
[34.7mb]->[34.7mb]/[82mb]}*
*[2014-06-16 09:36:56,649][INFO ][monitor.jvm  ] [node01] 
[gc][ConcurrentMarkSweep][1674159][113279] duration [9.9s], collections 
[1]/[10.1s], total [9.9s]/[18h], memory [7.2gb]->[7.2gb]/[7.9gb], all_pools 
{[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space] 
[107.2mb]->[68.7mb]/[665.6mb]}{[Par Survivor Space] 
[0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen] 
[34.7mb]->[34.7mb]/[82mb]}*
*[2014-06-16 09:37:08,995][WARN ][monitor.jvm  ] [node01] 
[gc][ConcurrentMarkSweep][1674160][113280] duration [12s], collections 
[1]/[12.3s], total [12s]/[18h], memory [7.2gb]->[7.2gb]/[7.9gb], all_pools 
{[Code Cache] [15.5mb]->[15.5mb]/[48mb]}{[Par Eden Space] 
[68.7mb]->[79.7mb]/[665.6mb]}{[Par Survivor Space] 
[0b]->[0b]/[83.1mb]}{[CMS Old Gen] [7.1gb]->[7.1gb]/[7.1gb]}{[CMS Perm Gen] 
[34.7mb]->[34.7mb]/[82mb]}*

*The garbage collector logs long passes (around 10 seconds). Our system has 
total memory of 32G and we set the ES_HEAP_SIZA to be 8G. *
*We are almost sure this issue comes from long GC run.*
*What can we do to prevent this behavior and run ES smoothly ?*

*Thanks,*

*Kevin*

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bac7e7fb-e166-457f-89af-e832ce76010d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: River between embedded Elasticsearch and embedded Neo4j

2014-06-17 Thread Flavio Graf
Hi Jimmy, interested in updating ES index when Neo4j changes (automatic). 
Did you find a solution to your problem?
Cheers

On Friday, March 7, 2014 11:24:06 PM UTC+1, Jimmy Reeves wrote:
>
> I am currently using a river between Neo4j and ES in my project, 
> Configuring rivers for standalone datasources (Neo4j in my case) is quite 
> clear and easy to use.
> Right now i want to make a portable version of a tool so everyone could 
> set up a system really fast and test use cases they are intersted in.
> So here is a question:
> *-* Is there any way to set up a river between embedded Elasticsearch and 
> embedded Neo4j?
> A workaround like applying changes to ES data after Neo4j transaction is 
> complete could be done here, but river usage looks more gracefull imo.
>
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da5d481b-1710-4102-bb10-aad74227a48c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch - search statistic - like google analytics

2014-06-17 Thread Jacob Dalgaard
Hello
I am looking into using ElasticSearch as a search engine for one of the 
projects I am working on. There is still one thing which I need to find an 
answer for, and I hope someone inhere can help.
The customer want to be able to see some search statistic, like google 
analytics. Most searched words, new search words and so on. 
 
Is there a way to easily setup this type of search statistic? 
My idea is something like ElasticSearch stores search history, about the 
search request made to the REST API. Then my customer can use Kibana or 
some other visual tool to monitor the search history of ElasticSearch.
 
 
Another approch could be to set up LogStash to pick up all log entries to 
the IIS on search requests, and put em in ElasticSearch. Then they could be 
viewed with Kibana. Is anyone aware of a logstash pattern for IIS?
 
 
Hope someone can help me with an answer for this.
 
 

Regards Jacob

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5dbbf0b8-739a-4201-8500-6dd8efdccb42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query Performance

2014-06-17 Thread ravimbhatt
Hello All, 

Any help one this please? 

Thanks!
Ravi

On Monday, 16 June 2014 12:40:38 UTC+1, ravim...@gmail.com wrote:
>
> Hi All, 
>
> I am trying to improve my ES query performance. The goal is to get 
> response times for 3 related queries under a second!. In my test i have 
> seen 90th percentile response time (*took time*) for combined 3 queries 
> to be ~1.8 seconds. Here are the details: 
>
> *Cluster*: 
> - 5 Machines, 5 Shards, Currently on m3.2xlarge. (Had started with less 
> powerful boxes and went up one by one, started from m3.large)
> - 4 indexes. 
>  - one index with *~90 million* recrods (total *19.3 GB *on all shards
> *.*)
>  - one with *~24 million* (total *6GB* on all shards.)
>  - Other two are in 780K and 340K ( total *160MB* and *190MB*)
> - All *fields* in the larger indexes are *integers*.
> - Record size is small-ish.
> - indexes are *compressed*. 
> - I have given *15 GB to ES* instances. 
> - Indexes are stored on *EBS* volumes. Each instance has *250GB* volume 
> with it. (Keeping SSDs as last resort) 
>
> The indexes are not changing (for now, in future they would change once a 
> day). So no indexing is taking place while we query. *Therefore*, I have 
> tried things like *reducing number of segments* in the two larger 
> indexes. That helped to a point. 
>
> *Querying Technique*:
>
> - use python ES client. 
> - *3 small instance* forking *10 threads* at the same time. 
> - Each thread would fire *3 queries* before reporting a time. 
> - At time there would be *~100 concurren*t queries on the machines. 
> settles around ~50-60. 
> - I take *'took'* time from ES response to measure times. 
> - I *discard 100 records* before measuring times. 
> - A total of *5000 unique users* are used for which 3 ES queries would be 
> fired. A total of *4900 users' times* are measured.  
>
> *Observations*:
>
> - RAM is never under stress. Well below 15 GB allotted. 
> - CPU comes under strain, goes upto 85-95 region on all instances during 
> the tests. 
>
> *Queries*: 
>
> *1. On an index with ~24 Million records*: 
>
> res = es.search( index="index1", 
> body={"query":{"bool":{"must":[{"term":{"cid":value}}]}}}, sort=[ 
> "source:desc", "cdate:desc" ], size=100, fields=["wiid"], _source="true")
>
> i parse results of these queries to get certain fields out and pass on to 
> the 2nd query. Lets call those fields as: *q1.field1* and *q2.field2*
>
> *2. On an index with ~90 million records:*
>
> res1 = es.search(index="index2", 
> body={"query":{"filtered":{"filter":{"bool":{"must":{"terms":{"*col_a*":
> *q1.field1*}},"must_not":{"terms":{"*col_b*":*q1.field1*
> }},"aggs":{"i2B":{"terms":{"field":"*col_b*", "size": 1000 
> ,"shard_size":1, "order" : { "mss.sum":"desc"} 
> },"aggs":{"mss":{"stats":{"script":"ca = _source.*col_a*; 
> index=wiids.indexOf(ca); sval=0; if(index!=-1) sval=svalues.get(index); 
> else sval=-1; return _source.*col_x**sval; ","params":{"wiids":*q1.field1*
> ,"svalues":*q1.field2*}}},"simSum":{"stats":{"script":"return _source.
> *col_x* "}}, size=1)
>
> - it uses *filtered query*.
> - uses *2 aggregations*
> - uses *script in aggregation*.  
> - use *shard_size* 
>
> Again, i parse results and get a filed out. Lets call that field as: 
> *q2.field1*
>
> 3. *On an index with ~340K records:*
>
>  res2 = es.search(index="index3", body= { "query" :  { "filtered" : { 
> "query":{ "terms":{ "wiid":*q2.field1*  }  }, "filter" : { "bool" : { 
> "must" : [ {  "range" : {"isInRange": { "gte" : 10  } } } , { "term" : { 
> "isCondA" : "false" } } , { "term" : { "isCondB" : "false"} }, { "term" : { 
> "isCondC" : "false" }  }  ]  }  } } } }   ,  size=1000)
>
> Please let me know if any other information would help you help me. 
>
> Query 2 above is doing aggregations and using a custom script. This is 
> where times reach few seconds, like 2-3 seconds or even 4+ seconds at 
> times. 
>
> I can move to a high end CPU machine and may be the performance would 
> improve. Wanted to check if there is anything else that i am missing. 
>
> Thanks!
> Ravi
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cfbbdbd9-dcc7-4f74-a116-798e5bab750c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: better places to store es.nodes and es.port in ES Hive integration?

2014-06-17 Thread Jinyuan Zhou
sure, I was able to run  follwoing command against my remote es cluster.
hive -i init.hive -f search.hql.

Below is the contents of init.hive, search.hql and data file in hdfs
/user/cloudera/hivework/foobar/foobar.data

I replaced value for es.nodes with fake name. Other than that,  it should
ran without problem. I am using feature called 'dynamic/mult resource
wirtes. It works in this example, but when I also add 'es.mapping.id' =
'id' setting. I got a the following error:




*Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
Unexpected character ('"' (code 34)): was expecting comma to separate
OBJECT entries at [Source: [B@7be1d686; line: 1, column: 53]at
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:278)*


-init.hive

set es.nodes=my.remote.escluster;
set es.port=9200;
set es.index.auto.create=yes;
set hive.cli.print.current.db=true;
set hive.exec.mode.local.auto=true;
set mapred.map.tasks.speculative.execution=false;
set mapred.reduce.tasks.speculative.execution=false;
set hive.mapred.reduce.tasks.speculative.execution=false;
add jar
/home/cloudera/elasticsearch-hadoop-2.0.0/dist/elasticsearch-hadoop-hive-2.0.0.jar;

-search.hql

use search;
DROP TABLE IF EXISTS foo;
CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/cloudera/hivework/foobar';
select * from foo;
DROP TABLE IF EXISTS es_foo;
CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');

INSERT OVERWRITE TABLE es_foo  SELECT * FROM foo;

- /user/cloudera/hivework/foobar/foobar.data ---

1, bar1, first_bar
2, bar2, first_bar
3, foo_bar_1, second_bar
4, foo_bar_12, second_bar
~




Jinyuan (Jack) Zhou


On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau  wrote:

> Thanks for sharing - can you also give an example of the table
> initialization in init.hive vs myscript.hql?
>
> Cheers!
>
>
> On 6/16/14 11:19 PM, Jinyuan Zhou wrote:
>
>> Just share a solution  I learned  hive side.
>>
>> hive cli has an -i option that takes a  file of hive commands to
>> initilize the session.
>> so I can put a list of set comand as well as add jar ... command in one
>> file, say inithive
>> then run the cli as this:  hive -i init.hive -f myscript.hql.  Note table
>> creation hql inside myscript.hql don't have to
>> set es.* properties as long as it appears in init.hive file  This solves
>> my problem.
>> Thanks,
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou > > wrote:
>>
>> Thanks Costin,
>> I am aiming at modifying  the existing hadoop cluster and hive
>> installation and also modularizing   some common es.*
>> properies in a separate common place.  I know the first goal can be
>> achieved with hive cli  --auxpath option  and
>> hive table's TBLPROPERTERTIES. For the secon goal, I am able to move
>>  some es.* settings from TBLPROPERTIES
>> declaration to hive's set statments. For example, I can put
>>
>> set es.nodes=my.domain.com 
>>
>>
>> in the same hql file  then skip es.nodes setting in TBLPROPERTIES in
>> the external table delcarations in the SAME
>> hql. But I wish  I can move the set statetemnt in a separate file. I
>> now realize this is rather a  hive question.
>> Regards,
>> Jack
>>
>>
>> On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau > > wrote:
>>
>> Could you please raise an issue with some type of example? Due to
>> the way Hadoop (and Hive) works,
>> things tend to be tricky in terms of configuring a job.
>>
>> The configuration needs to be created before a job is submitted
>> which in practice means "dynamic configurations"
>> are basically impossible (this also has some security
>> implications which are simply avoided this way).
>> Thus either one specifies the configuration manually or loads a
>> known location file (hive-site.xml,
>> core-site.xml...)
>> upfront, before the job is submitted.
>> This means when dealing with Hive, Pig, Cascading, etc... unless
>> one adds a pre-processor to the job content
>> (script, flow, etc...)
>> by the time es-hadoop kicks in, the job is already running and
>> thus its changes discarded.
>>
>> Cheers,
>>
>> On 6/14/14 1:57 AM, Jinyuan Zhou wrote:
>>
>> Hi,
>> I am playing with elasticsearch and hive integration. The
>> documentation says
>> to set configuration like es.nodes, es.port  in
>> TBLPROPERTIES. It works.
>> But it can cause many reduntant codes. If I have ten data set
>> to index to the same es cluster,
>>I would have to repeat this informat

Re: exclude some documents (and category filter combination) for some queries

2014-06-17 Thread Srinivasan Ramaswamy
Hi Ivan

Thanks for your reply. Yeah, I do understand that currently elasticsearch
returns the whole nested doc.
Can you help me how can i get the negative query with multiple categories
working ?

Thanks
Srini


On Fri, Jun 13, 2014 at 10:58 AM, Ivan Brusic  wrote:

> Currently not possible. Elasticsearch will return all the nested documents
> as long as one of the nested documents satisfies the query.
>
> https://github.com/elasticsearch/elasticsearch/issues/3022
>
> The issue is my personal #1 feature requested. Frustrating considering
> there has been a working implementation since version 0.90.5. 1.0, 1.1, 1.2
> and still nothing.
>
> --
> Ivan
>
>
>
>
> On Thu, Jun 12, 2014 at 2:17 PM, Srinivasan Ramaswamy 
> wrote:
>
>> any thoughts anyone ?
>>
>>
>> On Wednesday, June 11, 2014 11:15:18 PM UTC-7, Srinivasan Ramaswamy wrote:
>>>
>>> I would like to exclude some documents belonging to certain category
>>> from the results only for certain search queries. I have a ES client layer
>>> where i am thinking of implementing this logic as a "not" filter depending
>>> on the search query. Let me give an example.
>>>
>>> sample index
>>>
>>> designId: 100
>>> tags: ["dog", "cute"]
>>> caption : cute dog in the garden
>>> products : [ { productId: "200", category: 1}, {productId: "201",
>>> category: 2} ]
>>>
>>> designId: 101
>>> tags: ["brown", "dog"]
>>> caption :  little brown dog
>>> products : [ {productId: "202", category: 3} ]
>>>
>>> designId: 102
>>> tags: ["black", "dog"]
>>> caption :  little black dog
>>> products : [ { productId: "202", category: 4}, {productId: "203",
>>> category: 5} ]
>>>
>>> products is a nested field inside each design.
>>>
>>> I would like to write a query to get all matches for "dog", (not for
>>> other keywords) but filter out few categories from the result. As ES
>>> returns the whole nested document even if only one nested document matches
>>> the query, my expected result is
>>>
>>> designId: 100
>>> tags: ["dog", "cute"]
>>> caption : cute dog in the garden
>>> products : [ { productId: "200", category: 1}, {productId: "201",
>>> category: 2} ]
>>>
>>> designId: 102
>>> tags: ["black", "dog"]
>>> caption :  little black dog
>>> products : [ { productId: "202", category: 4}, {productId: "203",
>>> category: 5} ]
>>>  Here is the query i tried but it doesn't work. Can anyone help me
>>> point out the mistake ?
>>>
>>> GET /_search/
>>> {
>>>"query": {
>>>   "filtered": {
>>>  "filter": {
>>>   "and": [
>>>  {
>>>  "not": {
>>>"term": {
>>>   "category": 1
>>>}
>>>  }
>>>  },
>>>  {
>>>  "not": {
>>>"term": {
>>>   "category": 3
>>>}
>>>  }
>>>  }
>>>   ]
>>>
>>>  },
>>>  "query": {
>>> "multi_match": {
>>>"query": "dog",
>>>"fields": [
>>>   "tags",
>>>   "caption"
>>>],
>>>"minimum_should_match": "50%"
>>> }
>>>  }
>>>   }
>>>}
>>> }
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/45fbf85d-4d29-4222-a72a-bf0a04d9a26d%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/Fqt70gBtypQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAfwARsZ7uGKkBf%2BH10jhrdw4dr5nxvHEK_FDUwQv%2BpQw%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL1MvVyOvd_YDi92z_fH9-OE6VJTcOP-Q4E-BvKj94FSkvEJOw%40mail.gmail.com.
For more options, vi

Re: Update single field of a document

2014-06-17 Thread Aditya Tripathi
Thanks Mark,
I will start a new thread on this with better description of the problem.


On Sun, Jun 15, 2014 at 3:30 PM, Mark Walkom 
wrote:

> The thread you are quoting here is nearly 4 years old, it might be better
> if you start a new thread as it's possible the info contained in this will
> be out of date.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 15 June 2014 19:41, Aditya  wrote:
>
>> Hi,
>> Wanted to know if ES has a solution for updateable fields.
>>
>> I did google a little but could not find anything latest on this.(Though
>> the latest in ES is pretty exciting :) )
>>
>> I had two questions if you have thought about them already:
>>
>> 1)Can we fully implement an updateable field using Lucene Codecs?
>> Without getting into details: We tried this by writing a custom postings
>> format and put the field in a key-value store. Our postings consumer, would
>> write directly to the key-value store. We could write directly to the store
>> without buffering anything in RAM as Lucene's Indexing chain invokes the
>> PerFieldPostingsFormat only at flush time - Ref:
>> FreqProxTermsWriterPerField flush method. However, Lucene also invokes the
>> custom PostingsConsumer/TermsConsumer at merge time. And both merge and
>> flush use the same methods of PostingsConumer and TermsConsumer
>> (startDoc,startTerm,finishDoc,finishTerm etc). And since in these methods
>> we did not buffer anything and wrote directly to the key-value store, we
>> wrote the new merged state also to the store directly. But a merged segment
>> is checkpointed and not yet commited (or fsynced) and we got into
>> inconsistencies with respect to copying data to other nodes as well as
>> search would fail if IndexReader did not open to the new merged segment.
>>
>> I tried rectifying this problem by putting the new merged info (document
>> no remapping) into an in-memory structure (searchable by PartialProducer),
>> but we did not have any good event to flush this in-memory merge info to
>> the key-value store, so we did it at the next flush. However, Lucene can
>> commit a checkpointed merged segment without flushing anything. So, when
>> Lucene committed the merged segment without passing any signal to the
>> custom PostingsFormat, we got into inconsistency again.
>> There were more problems like : how can you update a document in the same
>> segment, because the fields with custom postingsFormat are available for
>> update only after flushing of the segment.
>>
>> I am trying something more by using a DirectoryWrapper and a
>> SegmentInfosFormat, but have some doubts on the whole approach of providing
>> updateable field using codecs.
>>
>> 2) I am sure you are aware of this patch -
>> https://issues.apache.org/jira/browse/LUCENE-5189 . Updateable fields
>> for NumericDocValue fields. We haven't tried this patch but just wanted to
>> know if ES has considered it to provide numeric updateable fields.
>>
>>
>> On Tuesday, 16 August 2011 06:38:24 UTC+5:30, kimchy wrote:
>>>
>>> Otis, are you referring to this: http://lucene.apache.
>>> org/solr/api/org/apache/solr/schema/ExternalFileField.html? And you
>>> think its the same..., really? Are you sure you understand what it means to
>>> provide updatable fields, and then taking them to a distributed system?
>>> What I would love is to really think about "comprable" "features" before
>>> throwing them out here (similar to the "update processor" suggestion for
>>> notifications), with or without smilies.
>>>
>>> On Tue, Aug 16, 2011 at 3:19 AM, Otis Gospodnetic >> > wrote:
>>>
 Andy,

 In Solr land ExternalFileFile is designed for your use case (see
 http://search-lucene.com/?q=ExternalFileField )
 I *think* there is nothing like that in ES, but I'd love for somebody
 to point I'm wrong about this! :)

 Otis
 --
 Sematext is hiring Search Engineers -- http://sematext.com/about/
 jobs.html


 On Aug 15, 12:38 am, Andy  wrote:
 > I vote for this feature as well.
 >
 > I have a "popularity" field that holds the number of user votes a
 > document has received. I use it to influence result ranking. It is
 > frequently updated. Right now every time a  user votes on a document
 > I'd need to reindex the entire document which is obviously very
 > inefficient.
 >
 > It'd be great to have a way to update certain fields without
 > reindexing the entire document. Solr has an ExternalFileField field
 > type for this purpose but it's not very user friendly.
 >
 > Don't know if it's possible to implement such an "update certain field
 > without reindexing the whole document" feature in ES but if it's
 > possible it'd be very useful.
 >
 > On Aug 13, 4:56 pm, Ridvan Gyundogan  wrote:
 >
 >
 >
 >
 >
 >
 >
 > > To be more concrete this is my use case, o

Re: Swap indexes?

2014-06-17 Thread Mark Walkom
Instance as in cluster, or node?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 17 June 2014 18:31, Lee Gee  wrote:

> Is it possible to have one ES instance create an index and then have a
> second instance use that created index, without downtime?
>
> tia
> lee
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9fe7a9eb-11dc-4092-8ec4-e5fc11eaebba%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624auqS6oG5PPnTPnSm%2BMN-uyEbeggJJ0bPB0s3KLFr%2B%3DQQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting on timestamps from multiple fields

2014-06-17 Thread Jurian Sluiman
Thanks for the response :)

I was now testing it out with the _timestamp field which I need to set 
manually for each item, but copy_to seems even better. Thanks for the 
insights!
--
Jurian

On Thursday, June 12, 2014 5:28:49 PM UTC+2, Jörg Prante wrote:
>
> If you have two (or more) date fields to sort on, look at "copy_to" 
> mapping feature to copy them over to a third field e.g. "sort_date". So you 
> have a single field you can happily to sort on, without having to change 
> fields in the source.
>
> Same method works for tag/category fields in different indexes that are 
> meant for facets that can span more than one index.
>
> Jörg
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/755c9765-581d-425c-8bde-86a463d9f5d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: After upgrade to elastic search 1.2.1 getting org.elasticsearch.transport.RemoteTransportException: Failed to deserialize response of type [org.elasticsearch.action.admin.cluster.node.info.NodesIn

2014-06-17 Thread Martin Forssen
I have also encountered this, did the debugging and created an issue: 
https://github.com/elasticsearch/elasticsearch/issues/6325

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77d17a29-db6d-44c7-9afd-04d8161fac74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Swap indexes?

2014-06-17 Thread Lee Gee
Is it possible to have one ES instance create an index and then have a 
second instance use that created index, without downtime?

tia
lee

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9fe7a9eb-11dc-4092-8ec4-e5fc11eaebba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


version 1.2.1 has_child query with function_score can not access child's numeric field value

2014-06-17 Thread fiefdx yang


curl -XPOST 'localhost:9200/products/' -d '{
"index" : {
"number_of_shards" : 4,
"number_of_replicas" : 1
}
}'
echo ""

curl -XPOST localhost:9200/products/product/_mapping -d '{
"product":{
"properties" : {
"property1" : {"type" : "string"}
}
}
}'
echo ""

curl -XPOST localhost:9200/products/offer/_mapping -d '{
"offer":{
"_parent": {"type": "product"},
"properties" : {
"color" : {"type" : "string"},
"size" : {"type" : "integer"},
"price" : {"type" : "float"}
}
}
}'
echo ""

curl -XPUT localhost:9200/products/product/1 -d'{
"property1": "value1"
}'
echo ""

curl -XPUT localhost:9200/products/product/2 -d'{
"property1": "value2"
}'
echo ""

curl -XPOST localhost:9200/products/offer/1?parent=1 -d '{
"color": "blue",
"size": 1,
"price": 99.4
}'
echo ""

curl -XPOST localhost:9200/products/offer/2?parent=1 -d '{
"color": "red",
"size": 2,
"price": 100.5
}'
echo ""

curl -XPOST localhost:9200/products/offer/3?parent=2 -d '{
"color": "blue",
"size": 3,
"price": 100.7
}'
echo ""

search script as:
curl -s -XPOST 'localhost:9200/products/product,offer/_search?pretty=true' 
-d '{
"query" : {
"has_child" : {
"type" : "offer",
"score_mode" : "max",
"query" : {
"function_score" : {
"boost_mode" : "replace",
"query" : {
"bool" : {
"must" : [
{ "term" : { "color" : "blue" } }
]
}
},
"script_score" : {
"script" : "doc['offer.price'].value"
}
}
}
}
}
}'
echo ""

get exception:
nested: PropertyAccessException[[Error: could not access: offer; in class: 
org.elasticsearch.search.lookup.DocLookup]\n[Near : {... 
doc[offer.price].value }

Can function_score get child's numeric field as parent's score like version 
0.90.x custom_score did?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0fcc7198-a84b-4ac4-804f-e7c5d0aaa501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


No handler found for uri when creating a mapping

2014-06-17 Thread Abhishek Mukherjee
Hi,

I am following the ES Definitive guide. I am trying to create a mapping for 
an index and type as follows.

curl -XPUT '192.168.1.103:9200/nxtxnlogs/transaction/' -d '  
 "mappings" : {
  "_default_" : {
   "properties" : {
"txn_id" : { "type" : "long" },
"logged_at" : { "type" : "string" },
"key_name" : { "type" : "string" },
"des" : {"type" : "string", "index" : "not_analyzed" },
"params" : { "type" : "string"}
   }
  }
 }
}
';

But I get this error.

No handler found for uri [/nxtxnlogs/transaction/] and method [PUT].

Also how do I create an empty index. 

I apologise if the questions are basic. But I can't find it in the 
documentation.

Regards
Abhishek

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e77e1e5-d8dd-40e0-a290-8137989bda44%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Accessing Search Templates via Rest

2014-06-17 Thread Sebastian Gräser
Thank you very much : ) good to know!

Am Montag, 16. Juni 2014 15:46:43 UTC+2 schrieb Alexander Reelsen:
>
> Hey,
>
> no, this is not yet possible, but this will be added sooner or later as 
> the search template API should behave like any other API.
>
>
> --Alex
>
>
> On Fri, Jun 13, 2014 at 9:51 AM, Sebastian Gräser  > wrote:
>
>> so i guess its not possible?
>>
>> Am Dienstag, 10. Juni 2014 16:58:31 UTC+2 schrieb Sebastian Gräser:
>>
>>> Hello,
>>>
>>> maybe someone can help me. Is there a way to get the available search 
>>> templates via rest api? havent found a way yet, hope you can help me.
>>>
>>> Best regards
>>> Sebastian
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/ae1fedb0-4c74-4407-9532-fe7ad705ceb0%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ffef6a1-1d0e-48d3-a659-1f6ea85da3eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana chart data view understanding

2014-06-17 Thread Mark Walkom
Where have you gotten so far with KB?

Try this;

   1. Create a new blank dashboard from the default homepage
   2. Configure that (top right) to point to the index and your timestamp
   fied then save that
   3. On the main dashboard page add a new row, then save
   4. Add a new panel

This is where things can get tricky as you will have to figure out what
panel type to use, but I think you may want to start with a histogram.
Play around from there. It is a bit tough when you start, but you will pick
it up pretty easily!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 17 June 2014 14:31,  wrote:

> I have a problem trying to visualise the data below in Kibana.
> Each document describes a test run audit entry with passing, failing and
> pending tests along with a timestamp, project identifier and host name.
> The curls below setup four documents and they are correctly returned if I
> do http://localhost:9200/someaudits/_search?pretty=true
>
> I would like to use kibana to display a single graph with:
> The X axis using @timestamp
> The Y axis showing four separate lines for passed, failed, pending and
> (passed + failed + pending)
> Each document (and its timestamp value) should contain a tag that
> references the document itself.
> Documents and their pass/fail/pending values should not be totalised, so
> they remain distinct on the graph.
>
> However the sticking point is that I'm cannot see what to click (and in
> what order) to setup the graph view from a blank Kibana instance located at
> http://localhost:9200/_plugin/kibana/
> I've read the kibana related tutorials but I'm just not groking it.
>
>
>
> # Delete the whole index:
> curl -XDELETE http://localhost:9200/someaudits
>
> # Create the index:
> curl -XPOST 'localhost:9200/someaudits/'
>
> # Use this mapping:
> curl -XPUT http://localhost:9200/someaudits/testaudit/_mapping -d '
> {
>   "testaudit" : {
>"properties" : {
>"@timestamp" : {"format" : "dateOptionalTime", "type" : "date" },
> "project" : {"type": "string" },
> "host" : {"type": "string" },
> "passed" : { "type" : "integer" },
> "failed" : { "type" : "integer" },
> "pending" : { "type" : "integer" }
>}
>   }
>  }
> '
>
> # Add some data:
> curl -XPUT 'http://localhost:9200/someaudits/testaudit/1' -d '
> {
> "@timestamp" : "2014-06-17T02:10:08.593Z",
> "project" : "test",
> "host" : "mymachine",
> "passed" : 10,
> "failed" : 20,
> "pending" : 1
> }'
>
> curl -XPUT 'http://localhost:9200/someaudits/testaudit/2' -d '
> {
> "@timestamp" : "2014-06-17T02:15:08.593Z",
> "project" : "test",
> "host" : "mymachine",
> "passed" : 0,
> "failed" : 30,
> "pending" : 0
> }'
>
> curl -XPUT 'http://localhost:9200/someaudits/testaudit/3' -d '
> {
> "@timestamp" : "2014-06-17T02:20:08.593Z",
> "project" : "test",
> "host" : "mymachine",
> "passed" : 50,
> "failed" : 0,
> "pending" : 1
> }'
>
> curl -XPUT 'http://localhost:9200/someaudits/testaudit/4' -d '
> {
> "@timestamp" : "2014-06-17T02:10:18.593Z",
> "project" : "another test",
> "host" : "mymachine",
> "passed" : 0,
> "failed" : 1,
> "pending" : 0
> }'
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/86f13f44-868a-49b8-991d-64138c602f15%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bgv8n9M6%3D2OBma0iCgn4BXkNrvn9Npc5uAL9-JEbZAJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Percollation limits

2014-06-17 Thread Maciej Dziardziel
Thanks for reply. I did some early testing and I am getting about 0.7-1.4s 
to get results, (that's without any filtering yet), which is still within 
acceptable range for me.
I'd still like to hear about people experience with it. It seems this is 
very rarely used feature.



On Monday, June 16, 2014 2:19:24 PM UTC+1, Luca Cavanna wrote:
>
> Hi Maciej,
> what you describe doesn't sound insane, just make sure you use proper 
> filtering as much as you can to limit the number of queries you execute 
> when percolating each document.
> Also, with the percolator available since 1.0 you can scale out just by 
> adding more nodes and have the percolator queries distributed over multiple 
> shards. That means that if you were to reach the limit of a single shard 
> you could always scale out.
>
> On Friday, June 13, 2014 5:15:05 PM UTC+2, Maciej Dziardziel wrote:
>>
>> Hi
>>
>> I wanted to ask those who use percollation: how many queries are you 
>> percollating?
>>
>> I need to set up some equivalent of percollation for about 100k queries. 
>> With some filtering
>> probably up to 10k would actually had to be checked for each new document.
>> Is the idea of using ES percollations for that insane?
>>
>> Thanks
>> Maciej Dziardziel
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/89f40dd8-9c70-4015-bb69-c127ada8551d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Creating a browse interface from ES

2014-06-17 Thread joergpra...@gmail.com
If you can use the sort key of the term (internal java collation key or ICU
collation key) instead of absolute position number, there is no longer the
need to reindex. One advantage is that you can adjust the sort key to the
requirements (in Germany we have complex sort requirements that are not
compatible with Unicode canonical sort order).

One left challenge is creating the frequency count per term. In "register
search" each term in the result list should be paired with an occurence
count (or even a prefix count). This can be achieved by iterating over the
result page (e.g. 20 entries) and executing a count query over the term (or
use a prefix query for the count)

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-count.html


is very fast since no docs are returned.

Sure, there are facets/aggregations but they have a slight disadvantage:
they do not return exact counts, only an estimated count. For "register
search" you need absolutely exact counts.

Jörg


On Tue, Jun 17, 2014 at 7:28 AM, Robin Sheat  wrote:

> joergpra...@gmail.com schreef op ma 16-06-2014 om 13:12 [+0200]:
>
>
> > This is how I implement "register search"
>
> This is interesting. It could work for me.
>
> Though, I'm not sure I totally understand it. To find, say "Smith", I'd
> search for it, get its index, and then use the from/size stuff to bring
> up the list in that area. Is that essentially what you're using?
>
> If so, that seems like what I need. The only issue is that it'll require
> a total reindex every time something is added. But, I don't see a way
> around that even with some other ideas I'm exploring.
>
> --
> Robin Sheat
> Catalyst IT Ltd.
> ✆ +64 4 803 2204
> GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1402982936.19820.60.camel%40zarathud.wgtn.cat-it.co.nz
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG9N-DEMz0KA2LUXvkQE4zWB14kQbHVQFYbc%2BsHcFsCug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch field mapping, dynamic_templates

2014-06-17 Thread sirkubax


*Hi Alex, That's more or less what I did:*

 curl -XGET localhost:9200/_template?pretty > template_all
 
 edit template_all
 
 and put it back:
 curl -XPUT localhost:9200/_template/* -d @template_all






* By ES is 1.0.1, I've seen that there is major change in templates in ES 
1.2. Do you think my task could be achieved faster? I had to dump config, 
edit id, and put it back. I'd wish to upload only "testdate" 
dynamic_templates in one step.  the file: *
cat template_all
{
"template" : "logstash-*",
"settings" : {
  "index.analysis.analyzer.default.stopwords" : "_none_",
  "index.refresh_interval" : "5s",
  "index.analysis.analyzer.default.type" : "standard"
},
"mappings" : {
  "_default_" : {
"dynamic_templates" : [
 { "testdate": {
   "match":  "testdate*",
   "mapping": {
"type":   "date",
   "format" : "-MM-dd HH:mm:ss.SS"
   }
}
  },
  {
  "string_fields" : {
"mapping" : {
  "type" : "multi_field",
  "fields" : {
"raw" : {
  "index" : "not_analyzed",
  "ignore_above" : 256,
  "type" : "string"
},
"{name}" : {
  "index" : "analyzed",
  "omit_norms" : true,
  "type" : "string"
}
  }
},
"match_mapping_type" : "string",
"match" : "*"
  }
} ],
"properties" : {
  "geoip" : {
"dynamic" : true,
"path" : "full",
"properties" : {
  "location" : {
"type" : "geo_point"
  }
},
"type" : "object"
  },
  "@version" : {
"index" : "not_analyzed",
"type" : "string"
  }
},
"_all" : {
  "enabled" : true
}
  }
}
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/917d10ec-63b3-4d94-858a-2f2deeeba604%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic Search and consistency

2014-06-17 Thread shikhar
On Fri, Jun 13, 2014 at 12:11 PM, shikhar  wrote:

> I take this back, I understand the ES model better now. So although the
> write-consistency-level check is only applied before the write is about
> to be issued, with sync replication the client can only get an ack if it
> succeded on the primary shard as well as all replicas (as per the same
> cluster state as the check is performed on). In case it fails on some
> replica(s), the operation would be retried (together with the write-
> consistency-level check using a possibly-updated cluster state).
>

FWIW I'm really not sure anymore. TransportShardReplicationOperationAction
where this stuff is happening has a bunch of logic in performReplicas(..)

where it decides to take into account updated cluster state, and there seem
to be exceptions for certain kinds of failures being tolerated.

Seems like this would be so much more straightforward if a write were to be
fanned-out and then block uptil max of timeout for checking that the
requried number of replicas succeeded (with success on primary being
required).

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DMM_-0ySs3iCFkVKxL3DrNW2vT9daUAT9eWfXHpUrN2wQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.