Frequency of significant terms in documents matching a query

2014-12-14 Thread Graeme Pietersz
I understand how to use aggregations to get significant terms with counts 
of the number of documents in which they occur.

I would like to also be able to count the number of times these terms occur 
in across all documents. I can use term vectors to count how often a term 
occurs in a single document, but I cannot see how to to get the sum of 
occurrences in all documents matching a query, sorted by the number of 
occurrences - and I really need to get this for the significant terms 
(otherwise I will just get a list of common words).

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d98a16b6-1bca-4a3e-a99d-52391976c371%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


AWS machine for ES master

2014-12-14 Thread Yoav Melamed
Hello,

I run Elasticsearch cluser in AWS based on c3.8xlarge machines.
Can I use smaller machine for the masters ?
What should be enough ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3035c4c-e0b2-4ae3-91a7-e1bd6f1c515f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Looking for a best practice to get all data according to some filters

2014-12-14 Thread David Pilato
Implication is the memory needed to be allocated on each shard.


David

 Le 14 déc. 2014 à 05:46, Ron Sher ron.s...@gmail.com a écrit :
 
 Again, why not use a very large count size? What are the implications of 
 using a very large count?
 Regarding performance - it seems doing 1 request with a very large count 
 performs better than using scan scroll (with count of 100 using 32 shards)
 
 On Wednesday, December 10, 2014 10:53:50 PM UTC+2, David Pilato wrote:
 No I did not say that. Or I did not mean that. Sorry if it was unclear.
 I said: don’t use large sizes:
 
 Never use size:1000 or from:1000. 
 
 
 You should read this: 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan
 
 -- 
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet | @elasticsearchfr | @scrutmydocs
 
 
 
 Le 10 déc. 2014 à 21:16, Ron Sher ron@gmail.com a écrit :
 
 So you're saying there's no impact on elasticsearch if I issue a large 
 size? 
 If that's the case then why shouldn't I just call size of 1M if I want to 
 make sure I get everything?
 
 On Wednesday, December 10, 2014 8:22:47 PM UTC+2, David Pilato wrote:
 Scan/scroll is the best option to extract a huge amount of data.
 Never use size:1000 or from:1000. 
 
 It's not realtime because you basically scroll over a given set of 
 segments and all new changes that will come in new segments won't be taken 
 into account during the scroll.
 Which is good because you won't get inconsistent results.
 
 About size, I'd would try and test. It depends on your docs size I believe.
 Try with 1 and see how it goes when you increase it. You will may be 
 discover that getting 10*1 docs is the same as 1*10. :)
 
 Best
 
 David
 
 Le 10 déc. 2014 à 19:09, Ron Sher ron@gmail.com a écrit :
 
 Hi,
 
 I was wondering about best practices to to get all data according to some 
 filters.
 The options as I see them are:
 Use a very big size that will return all accounts, i.e. use some value 
 like 1m to make sure I get everything back (even if I need just a few 
 hundreds or tens of documents). This is the quickest way, development 
 wise.
 Use paging - using size and from. This requires looping over the result 
 and the performance gets worse as we advance to later pages. Also, we 
 need to use preference if we want to get consistent results over the 
 pages. Also, it's not clear what's the recommended size for each page.
 Use scan/scroll - this gives consistent paging but also has several 
 drawbacks: If I use search_type=scan then it can't be sorted; using 
 scan/scroll is (maybe) less performant than paging (the documentation 
 says it's not for realtime use); again not clear which size is 
 recommended.
 So you see - many options and not clear which path to take.
 
 What do you think?
 
 Thanks,
 Ron
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac0841ac-4150-435c-a3da-afbf2a4b06a6%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7717B0E2-E971-4653-A0A7-BA66EC3EAE9F%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


UI/Plugin to visualize output of Scoring Explain flat

2014-12-14 Thread vineeth mohan
Hi ,

I remember seeing a UI from a plugin or otherwise which visualizes the
output of explain API for scoring as  a neat d3 visualization of
collapsible tree - http://bl.ocks.org/mbostock/4339083

If anyone remembers the link , please reply to this mail.

Thanks
  Vineeth

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m%2BRuQXAEE7dXcD16GjUxngJ21Ns%3DEDM%2BSoPrAQFkZG9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


querying array of strings (multiword) with AND operator

2014-12-14 Thread Mathew Bolek
Hi,

I really don't know what's wrong, but I seem to be unable to find a way of 
querying array elements that contain multiple words with AND operator.

ES version: 1.3.*,
only the default standard analyser in use

A dummy document

POST /dummy/location
{
  locationArray : [United Kindgom, London],
  location : United Kingdom
}

It's mapping 

GET /dummy/_mapping
{
   dummy: {
  mappings: {
 location: {
properties: {
   locationArray: {
  type: string
   },
   location: {
  type: string
   }
}
 }
  }
   }
}

Query that matches location field OK:

GET /dummy/_search
{
  query: {
match: {
  location : {
query : United Kingdom,
operator : AND
  }
}
  }
}

*Problematic query that returns no results trying to match locationArray 
field:*

GET /dummy/_search
{
  query: {
match: {
  locationArray : {
query : United Kingdom,
operator : AND
  }
}
  }
}

I'd really appreciate a hint - there must be something obvious i'm missing 
/ not know about when querying that array field.

Cheers




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4179ca35-e376-471a-9228-1bf69b0bf550%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

2014-12-14 Thread CAI Longqi
 

Hello, I’m using elasticsearch-hadoop-2.0.2.jar, and meet the problem:


Exception in thread main java.lang.NoClassDefFoundError: 
org/elasticsearch/hadoop/mr/EsOutputFormat

at com.clqb.app.ElasticSearch.run(ElasticSearch.java:46)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at com.clqb.app.ElasticSearch.main(ElasticSearch.java:60)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.lang.ClassNotFoundException: 
org.elasticsearch.hadoop.mr.EsOutputFormat

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 9 more


Here’s my program: 


public class ElasticSearch extends Configured implements Tool {

public static class AwesomeMapper extends MapperLongWritable, Text, 
NullWritable, MapWritable {

@Override

protected void map(LongWritable key, Text value, Context 
context) throws IOException, InterruptedException {

context.write(NullWritable.get(), 
XmlUtils.xmlTextToMapWritable(value)); // XmlUtils is not shown here

}

}


public static class AwesomeReducer extends ReducerNullWritable, 
MapWritable, NullWritable, NullWritable {

}


public int run(String[] args) throws Exception {

Configuration conf = getConf();

conf.set(xmlinput.start, page);

conf.set(xmlinput.end, /page);

conf.setBoolean(mapred.map.tasks.speculative.execution, false);

conf.setBoolean(mapred.reduce.tasks.speculative.execution, false);

conf.set(es.nodes, localhost:9200);

conf.set(es.resource, radio/artists);


Job job = Job.getInstance(conf);

job.setJarByClass(ElasticSearch.class);

job.setInputFormatClass(XmlInputFormat.class);

job.setOutputFormatClass(EsOutputFormat.class);

job.setMapOutputValueClass(MapWritable.class);

job.setMapperClass(AwesomeMapper.class);

job.setReducerClass(AwesomeReducer.class);


Path outputPath = new Path(args[1]);

FileInputFormat.setInputPaths(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, outputPath);

outputPath.getFileSystem(conf).delete(outputPath, true);


return job.waitForCompletion(true) ? 0 : 1;

}


public static void main(String[] args) throws Exception {

int exitCode = ToolRunner.run(new ElasticSearch(), args);

System.exit(exitCode);

}

}


p.s. *I also make sure that I have included 
`elasticsearch-hadoop-2.0.2.jar` in my `-libjars`*. Any suggestions?


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/762794c8-0bd0-4c16-b1dd-9c914a29a710%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Looking for a best practice to get all data according to some filters

2014-12-14 Thread Nikolas Everett
Search consumes O(offset + size) memory and O(ln(offset +
size)*(offset+size) CPU. Scan scroll has higher overhead but is O(size) the
whole time. I don't know the break even point.

The other thing is that scroll provides a consistent snapshot. That means
it consumes resources you shouldn't let end users expose but it won't miss
results or have repeats like scrolling with increasing offset.

You can certainly do large fetches with big size but its less stable in
general.

Finally, scan/scroll has always been pretty quick for me. I usually use a
batch size in the thousands.

Nik
On Dec 14, 2014 4:13 AM, David Pilato da...@pilato.fr wrote:

 Implication is the memory needed to be allocated on each shard.


 David

 Le 14 déc. 2014 à 05:46, Ron Sher ron.s...@gmail.com a écrit :

 Again, why not use a very large count size? What are the implications of
 using a very large count?
 Regarding performance - it seems doing 1 request with a very large count
 performs better than using scan scroll (with count of 100 using 32 shards)

 On Wednesday, December 10, 2014 10:53:50 PM UTC+2, David Pilato wrote:

 No I did not say that. Or I did not mean that. Sorry if it was unclear.
 I said: don’t use large sizes:

 Never use size:1000 or from:1000.


 You should read this: http://www.elasticsearch.org/guide/en/
 elasticsearch/reference/current/search-request-scroll.html#scroll-scan

 --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com
 http://Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr | @scrutmydocs
 https://twitter.com/scrutmydocs



 Le 10 déc. 2014 à 21:16, Ron Sher ron@gmail.com a écrit :

 So you're saying there's no impact on elasticsearch if I issue a large
 size?
 If that's the case then why shouldn't I just call size of 1M if I want to
 make sure I get everything?

 On Wednesday, December 10, 2014 8:22:47 PM UTC+2, David Pilato wrote:

 Scan/scroll is the best option to extract a huge amount of data.
 Never use size:1000 or from:1000.

 It's not realtime because you basically scroll over a given set of
 segments and all new changes that will come in new segments won't be taken
 into account during the scroll.
 Which is good because you won't get inconsistent results.

 About size, I'd would try and test. It depends on your docs size I
 believe.
 Try with 1 and see how it goes when you increase it. You will may be
 discover that getting 10*1 docs is the same as 1*10. :)

 Best

 David

 Le 10 déc. 2014 à 19:09, Ron Sher ron@gmail.com a écrit :

 Hi,

 I was wondering about best practices to to get all data according to
 some filters.
 The options as I see them are:

- Use a very big size that will return all accounts, i.e. use some
value like 1m to make sure I get everything back (even if I need just a 
 few
hundreds or tens of documents). This is the quickest way, development 
 wise.
- Use paging - using size and from. This requires looping over the
result and the performance gets worse as we advance to later pages. Also,
we need to use preference if we want to get consistent results over the
pages. Also, it's not clear what's the recommended size for each page.
- Use scan/scroll - this gives consistent paging but also has
several drawbacks: If I use search_type=scan then it can't be sorted; 
 using
scan/scroll is (maybe) less performant than paging (the documentation 
 says
it's not for realtime use); again not clear which size is recommended.

 So you see - many options and not clear which path to take.

 What do you think?

 Thanks,
 Ron

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to 

Re: AWS machine for ES master

2014-12-14 Thread Mark Walkom
Master only nodes are very light, you can probably get away with 1 or 2GB
for heap.

Of course this will depend on your cluster topology and a few other things,
so it might be best to trial it.

On 14 December 2014 at 10:01, Yoav Melamed yo...@exelate.com wrote:

 Hello,

 I run Elasticsearch cluser in AWS based on c3.8xlarge machines.
 Can I use smaller machine for the masters ?
 What should be enough ?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b3035c4c-e0b2-4ae3-91a7-e1bd6f1c515f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b3035c4c-e0b2-4ae3-91a7-e1bd6f1c515f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8aDkbQRkREg%3D%3Dj%2BtQ57srUys86QJ5wwLVx48mPM8uPcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: AWS machine for ES master

2014-12-14 Thread Yoav Melamed
Thanks

On Sunday, December 14, 2014 11:01:58 AM UTC+2, Yoav Melamed wrote:

 Hello,

 I run Elasticsearch cluser in AWS based on c3.8xlarge machines.
 Can I use smaller machine for the masters ?
 What should be enough ?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0d5896f6-0e19-442e-aefc-fe0195096b5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


SearchParseException (marvel) - [No mapping found for [@timestamp] in order to sort on

2014-12-14 Thread Eugen Paraschiv
Hi, 
I'm using Elasticsearch 1.4.1 and the latest Marvel (1.2.1). 
I have Marvel installed on every node of the cluster and generating data 
into the daily index. 
When going into Marvel, I get the following exception: 
Caused by: org.elasticsearch.search.SearchParseException: [.marvel-2014.12.
14][0]: from[-1],size[1]: Parse Failure [No mapping found for [@timestamp] 
in order to sort on]

So - this is referring specifically to the *.marvel-2014.12.14* index - an 
index created by Marvel itself, which should thus have the right structure. 
Am I missing something related to the Marvel setup?
Thank you, 
Eugen. 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/71932dba-a12f-4b48-874b-c51fc660dbd2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Testing distributed characteristic of Elasticsearch

2014-12-14 Thread Luke Laird
Hi guys,
Don't get me wrong. This is absolutely not another post about benchmark of 
Elasticsearch.
First, I am pretty new to ES. Please be patient if I ask dumb questions. I 
am doing a test for academic use only that proving ES's distributed 
characteristic is an improvement over Lucene, which is the base of ES. I 
want to test that with more than 1 node, the time we get from a search 
query is shorter or 'faster'. It is clear that with 2 nodes ( 2 hard disks 
) we could get double bandwidth in theory ( each normal disk peak at ~ 
50MB/s  128MB = 1Gb of Ethernet so Ethernet is not a bottle neck).

I have 2 physical nodes ( normal laptop ) connected directly via 1Gb 
Ethernet port, no router in between. My data is 20GB ( + 20 GB replica) of 
3 million records like this : http://pastebin.com/FDhfy6C3
( the source of data I get is http://www.mockaroo.com/67e33320 )

My strategy is to write as many as possible search queries and at the same 
time clear the cache. Something like

curl -XPOST http://192.168.57.103:9200/myjson/_cache/clear;

curl -XPOST http://192.168.57.103:9200/myjson/_flush?force=true;


curl -XGET http://192.168.57.103:9200/myjson/myjson/_search?pretty; -d \
'{
query : { 
bool : {
should : [
{ match : { first_name : Clarence}},
{ match : { last_name : Fernandez}},
{ match : { country: uk }},
{ match : { amount: $9001.19 }},
{ match : { password_hash: Th94hnXtaYtZ }}
]
}
}
}'

I am writing a script to generate as many as possible those match fields but I 
still want to ask if what I am doing is right?
Any comment/opinion is really appreciated.
Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e0eb8437-a629-4e10-85da-9b9da0076c45%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Looking for a best practice to get all data according to some filters

2014-12-14 Thread Jonathan Foy
Just to reword what others have said, ES will allocate memory for [size] scores 
as I understand it (per shard?) regardless of the final result count. If you're 
getting back 4986 results from a query, it'd be faster to use size: 4986 than 
size: 100. 

What I've done in similar situations is to issue a count first with the same 
filter (which is very fast), then use the result of that in the size field. It 
worked much better/faster than using a default large size.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef65d92f-9a9c-4206-a2b9-5b769e4cec87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: querying array of strings (multiword) with AND operator

2014-12-14 Thread joergpra...@gmail.com
You have a typo:

POST /dummy/location
{
  locationArray : [United Kindgom, London],
  location : United Kingdom
}

and I'm sure you mean

POST /dummy/location
{
  locationArray : [United Kingdom, London],
  location : United Kingdom
}

Jörg


On Sun, Dec 14, 2014 at 12:57 PM, Mathew Bolek borska.pe...@gmail.com
wrote:

 Hi,

 I really don't know what's wrong, but I seem to be unable to find a way of
 querying array elements that contain multiple words with AND operator.

 ES version: 1.3.*,
 only the default standard analyser in use

 A dummy document

 POST /dummy/location
 {
   locationArray : [United Kindgom, London],
   location : United Kingdom
 }

 It's mapping

 GET /dummy/_mapping
 {
dummy: {
   mappings: {
  location: {
 properties: {
locationArray: {
   type: string
},
location: {
   type: string
}
 }
  }
   }
}
 }

 Query that matches location field OK:

 GET /dummy/_search
 {
   query: {
 match: {
   location : {
 query : United Kingdom,
 operator : AND
   }
 }
   }
 }

 *Problematic query that returns no results trying to match locationArray
 field:*

 GET /dummy/_search
 {
   query: {
 match: {
   locationArray : {
 query : United Kingdom,
 operator : AND
   }
 }
   }
 }

 I'd really appreciate a hint - there must be something obvious i'm missing
 / not know about when querying that array field.

 Cheers




  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4179ca35-e376-471a-9228-1bf69b0bf550%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4179ca35-e376-471a-9228-1bf69b0bf550%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFAPG66R6CbKz4q7Y4yNb3svEN_DThXv88GkAj16yMnSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Same query, different CPU util when run with Java API versus REST

2014-12-14 Thread joergpra...@gmail.com
Can you post the full query code for better recreation?

Jörg

On Fri, Dec 12, 2014 at 6:44 PM, Jeff Potts jeffpott...@gmail.com wrote:

 I should mention that the Elasticsearch node, the Java service, and the
 JMeter test client are all on different machines.

 Jeff

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/20ee0ee8-5b4c-476e-8403-1db175de4158%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/20ee0ee8-5b4c-476e-8403-1db175de4158%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvO8iOu0%3DqfmNegSuB2Pv%3DD8y2pQuXem9HUj4%2BV6vO1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: SearchParseException (marvel) - [No mapping found for [@timestamp] in order to sort on

2014-12-14 Thread Eugen Paraschiv
One more detail on this - the Marvel UI also displays the exact query 
that's failing. Running that query results in a more informative message - 
probably the root cause of the problem: 
Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
Facet [fs.total.available_in_bytes]: failed to find mapping for fs.total.
available_in_bytes
And indeed, the *fs.total.available_in_bytes* isn't available in in the 
marvel index. 
Now - the reading is available on *_nodes/stats* - so I'm assuming it's a 
mapping problem from Marvel. 
But - I removed the old mapping, restarted the entire cluster and basically 
allowed marvel to re-create the mapping it needs - so that should be 
correct. 
Any help is appreciated. 
Thanks. 




On Sunday, December 14, 2014 6:22:22 PM UTC+2, Eugen Paraschiv wrote:

 Hi, 
 I'm using Elasticsearch 1.4.1 and the latest Marvel (1.2.1). 
 I have Marvel installed on every node of the cluster and generating data 
 into the daily index. 
 When going into Marvel, I get the following exception: 
 Caused by: org.elasticsearch.search.SearchParseException: [.marvel-2014.12
 .14][0]: from[-1],size[1]: Parse Failure [No mapping found for [@timestamp
 ] in order to sort on]

 So - this is referring specifically to the *.marvel-2014.12.14* index - 
 an index created by Marvel itself, which should thus have the right 
 structure. 
 Am I missing something related to the Marvel setup?
 Thank you, 
 Eugen. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2f43a38-3062-4253-a519-38b3bb3c1c1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

2014-12-14 Thread Costin Leau

Hi,

It looks like es-hadoop is not part of your classpath (hence the NCDFE). This might be either due to some 
misconfiguration of your classpath or due to the way
the Configuration object is used. It looks like you are using it correctly though typically I use Job(Configuration) 
instead of getInstance() (static factory methods

are always bad).
Potentially you can use other variants like LIBJARS or HADOOP_CLASSPATH env variables and make notice of what type of 
separator you use between your jars (, vs : vs ;).


Try to debug the classpath and see what you get - see the jar that is created and uploaded to HDFS, turn on logging on 
the hadoop side - potentially use distributed cache

Embedding the libraries under lib/ also works (see [2]).

All of the have pros and cons, the idea is to get your sample running and then 
debug your env to see what's the issue.

Cheers,

[1] 
http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v2.0
[2] 
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

On 12/14/14 3:32 PM, CAI Longqi wrote:

Hello, I’m using elasticsearch-hadoop-2.0.2.jar, and meet the problem:


Exception in thread main java.lang.NoClassDefFoundError: 
org/elasticsearch/hadoop/mr/EsOutputFormat

at com.clqb.app.ElasticSearch.run(ElasticSearch.java:46)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at com.clqb.app.ElasticSearch.main(ElasticSearch.java:60)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.lang.ClassNotFoundException: 
org.elasticsearch.hadoop.mr.EsOutputFormat

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 9 more


Here’s my program:


public class ElasticSearch extends Configured implements Tool {

 public static class AwesomeMapper extends MapperLongWritable, Text, 
NullWritable, MapWritable {

 @Override

 protected void map(LongWritable key, Text value, Context context) 
throws IOException, InterruptedException {

 context.write(NullWritable.get(), 
XmlUtils.xmlTextToMapWritable(value)); // XmlUtils is not shown here

 }

 }


 public static class AwesomeReducer extends ReducerNullWritable, MapWritable, 
NullWritable, NullWritable {

 }


 public int run(String[] args) throws Exception {

 Configuration conf = getConf();

 conf.set(xmlinput.start, page);

 conf.set(xmlinput.end, /page);

 conf.setBoolean(mapred.map.tasks.speculative.execution, false);

 conf.setBoolean(mapred.reduce.tasks.speculative.execution, false);

 conf.set(es.nodes, localhost:9200);

 conf.set(es.resource, radio/artists);


 Job job = Job.getInstance(conf);

 job.setJarByClass(ElasticSearch.class);

 job.setInputFormatClass(XmlInputFormat.class);

 job.setOutputFormatClass(EsOutputFormat.class);

 job.setMapOutputValueClass(MapWritable.class);

 job.setMapperClass(AwesomeMapper.class);

 job.setReducerClass(AwesomeReducer.class);


 Path outputPath = new Path(args[1]);

 FileInputFormat.setInputPaths(job, new Path(args[0]));

 FileOutputFormat.setOutputPath(job, outputPath);

 outputPath.getFileSystem(conf).delete(outputPath, true);


 return job.waitForCompletion(true) ? 0 : 1;

 }


 public static void main(String[] args) throws Exception {

 int exitCode = ToolRunner.run(new ElasticSearch(), args);

 System.exit(exitCode);

 }

}


p.s. *I also make sure that I have included `elasticsearch-hadoop-2.0.2.jar` in 
my `-libjars`*. Any suggestions?


Thanks

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/762794c8-0bd0-4c16-b1dd-9c914a29a710%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/762794c8-0bd0-4c16-b1dd-9c914a29a710%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message 

Re: Creating a custom plugin to return hashes of the terms or the terms of an Elasticsearch index

2014-12-14 Thread joergpra...@gmail.com
The termlist plugin can use filters with the 'term' parameter and
pagination with the 'size' parameter. So you can get smaller term lists,
for terms starting with 'a','b','c' ...,  and you can limit the number of
entries returned by say size=1000 (or 1 etc) The 'term' filter should
be sufficient for most cases.

There are no hashes for terms.

Jörg

On Fri, Dec 12, 2014 at 2:45 PM, Rosen Nikolov rpniko...@gmail.com wrote:


   I am currently using an Elasticsearch plugin called termlist (
 https://github.com/jprante/elasticsearch-index-termlist)
 that returns the terms of an index. But it breaks when there are too many
 terms and the output
 information is larger than about 30-40 megabytes. I need my custom plugin
 to work for bigger amounts of output data.

I am thinking about creating a custom plugin to return hashes of terms
 instead of the actual terms to reduce the output data volume.

   So I have a couple of questions:
 1. I presume that Elasticsearch might already use hashes of terms
 internally in the index, so would it be possible to get those?
 2. If the above is not possible, what other options do I have to
 circumvent the 30-40 MB barrier?

Thank you in advance.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF6qYpkhOvxrPRTgYU%3D8def18h%2BjuA8jzgyzurGV9300Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread am
Is there a way to do exact and full text searches without having to create 
two different fields?

The documentation 
(http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_exact_values.html)
 
states fields must have the mapping not_analyzed in order to avoid 
tokenization. This allows exact searches to be done.

In my case, I would like both full text search and exact searches. For 
example:

When searching for book titles, a user can input either:

I like ElasticSearch

-OR-

exact=I like ElasticSearch

The first case will return results from a full text search. 

The second case will return results only if the book title is exactly I 
like ElasticSearch. Case sensitivity does not matter.

To do this, I think I will have to create two fields called book_title 
and book_title_exact where book_title_exact will have a field mapping 
not_analyzed so that I can do exact matches.

Is this the proper way of handling my use case? Or is there a simpler way 
in ES without having to store a title twice?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread Nikolas Everett
Look at multifields.  They let you send the field once and analyze it
multiple times. You also might want to use keyword ananlyzer and lowercase
filter rather than not_analyzed. Folks are used to case insensitivity.

Nik
Is there a way to do exact and full text searches without having to create
two different fields?

The documentation (
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_exact_values.html)
states fields must have the mapping not_analyzed in order to avoid
tokenization. This allows exact searches to be done.

In my case, I would like both full text search and exact searches. For
example:

When searching for book titles, a user can input either:

I like ElasticSearch

-OR-

exact=I like ElasticSearch

The first case will return results from a full text search.

The second case will return results only if the book title is exactly I
like ElasticSearch. Case sensitivity does not matter.

To do this, I think I will have to create two fields called book_title
and book_title_exact where book_title_exact will have a field mapping
not_analyzed so that I can do exact matches.

Is this the proper way of handling my use case? Or is there a simpler way
in ES without having to store a title twice?

-- 
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3tzungzKTtCrxeLJrLpFJEdiGokqP7%2BuQS5ZB388-mTQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread am
Ah, thanks. I've set this up (using ES python bindings):

es.indices.put_mapping(index=myindex, 
   doc_type=books, 
   body={ 
books: { 
 properties: {
  title: { 
  type: string, 
  fields: { 
name: { type
: string }, 
raw: { type: 
string, index: not_analyzed } 
} } } } } )




Then I try to search for exact match:

es.search(index='myindex', doc_type=books, body={query: { filtered: { 
filter: { term: { title: { raw : I like ElasticSearch } } } } } } 
)



But I get an ES error stating

nested: QueryParsingException[[myindex] [term] filter does not support 
[raw]]; }]')

It seems like I'm not searching for the raw correctly. How would I specify 
to search for the raw title (exact matching)?

On Sunday, December 14, 2014 6:49:09 PM UTC-5, Nikolas Everett wrote:

 Look at multifields.  They let you send the field once and analyze it 
 multiple times. You also might want to use keyword ananlyzer and lowercase 
 filter rather than not_analyzed. Folks are used to case insensitivity. 

 Nik
 Is there a way to do exact and full text searches without having to create 
 two different fields?

 The documentation (
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_exact_values.html)
  
 states fields must have the mapping not_analyzed in order to avoid 
 tokenization. This allows exact searches to be done.

 In my case, I would like both full text search and exact searches. For 
 example:

 When searching for book titles, a user can input either:

 I like ElasticSearch

 -OR-

 exact=I like ElasticSearch

 The first case will return results from a full text search. 

 The second case will return results only if the book title is exactly I 
 like ElasticSearch. Case sensitivity does not matter.

 To do this, I think I will have to create two fields called book_title 
 and book_title_exact where book_title_exact will have a field mapping 
 not_analyzed so that I can do exact matches.

 Is this the proper way of handling my use case? Or is there a simpler way 
 in ES without having to store a title twice?

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/413133ef-c82e-4bff-9a9f-af9bd9f56815%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread am
I think I just figured it out:

{title.raw : I like ElasticSearch}  

instead of 

title: { raw: I like ElasticSearch }

On Sunday, December 14, 2014 9:00:52 PM UTC-5, am wrote:

 Ah, thanks. I've set this up (using ES python bindings):

 es.indices.put_mapping(index=myindex, 
doc_type=books, 
body={ 
 books: { 
  properties: {
   title: { 
   type: string, 
   fields: { 
 name: { 
 type: string }, 
 raw: { 
 type: string, index: not_analyzed } 
 } } } } } )




 Then I try to search for exact match:

 es.search(index='myindex', doc_type=books, body={query: { filtered: 
 { filter: { term: { title: { raw : I like ElasticSearch } } } } 
 } } )



 But I get an ES error stating

 nested: QueryParsingException[[myindex] [term] filter does not support 
 [raw]]; }]')

 It seems like I'm not searching for the raw correctly. How would I specify 
 to search for the raw title (exact matching)?

 On Sunday, December 14, 2014 6:49:09 PM UTC-5, Nikolas Everett wrote:

 Look at multifields.  They let you send the field once and analyze it 
 multiple times. You also might want to use keyword ananlyzer and lowercase 
 filter rather than not_analyzed. Folks are used to case insensitivity. 

 Nik
 Is there a way to do exact and full text searches without having to 
 create two different fields?

 The documentation (
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_exact_values.html)
  
 states fields must have the mapping not_analyzed in order to avoid 
 tokenization. This allows exact searches to be done.

 In my case, I would like both full text search and exact searches. For 
 example:

 When searching for book titles, a user can input either:

 I like ElasticSearch

 -OR-

 exact=I like ElasticSearch

 The first case will return results from a full text search. 

 The second case will return results only if the book title is exactly I 
 like ElasticSearch. Case sensitivity does not matter.

 To do this, I think I will have to create two fields called book_title 
 and book_title_exact where book_title_exact will have a field mapping 
 not_analyzed so that I can do exact matches.

 Is this the proper way of handling my use case? Or is there a simpler way 
 in ES without having to store a title twice?

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.
  


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5ad817d-f36c-495f-99cc-8dbe96e78e3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Frequent updates to documents

2014-12-14 Thread Jinal Shah
Thanks for the reply Nikolas.

Users want to search data instantly after the save, so we are unable to use 
batch updates. It is good to know that even an update in single field means 
whole document reindex.

Thanks,
Jinal

On Friday, 12 December 2014 16:23:55 UTC+11, Jinal Shah wrote:

 Hi,

 We are using ES 1.0.3. In our application we do frequent updates to the 
 documents and this is causes delete count to increase quickly and frequent 
 merges.

 Due to Lucene version issues with our application and ES API, we are not 
 able to use the API and we have written a module to interact with ES. For 
 document creation and updates, we use HTTP POST and always send all fields 
 for updates also.

 Would sending just the updated fields, reduce the delete count? OR
 Does update to single field also triggers whole document re-indexing?

 Thanks in advance for your help!!

 Regards,
 Jinal Shah

 The information contained in this e-mail message and any accompanying 
 files is or may be confidential. If you are not the intended recipient, any 
 use, dissemination, reliance, forwarding, printing or copying of this 
 e-mail or any attached files is unauthorised. This e-mail is subject to 
 copyright. No part of it should be reproduced, adapted or communicated 
 without the written consent of the copyright owner. If you have received 
 this e-mail in error please advise the sender immediately by return e-mail 
 or telephone and delete all copies. Fairfax Media does not guarantee the 
 accuracy or completeness of any information contained in this e-mail or 
 attached files. Internet communications are not secure, therefore Fairfax 
 Media does not accept legal responsibility for the contents of this message 
 or attached files.


-- 
The information contained in this e-mail message and any accompanying files 
is or may be confidential. If you are not the intended recipient, any use, 
dissemination, reliance, forwarding, printing or copying of this e-mail or 
any attached files is unauthorised. This e-mail is subject to copyright. No 
part of it should be reproduced, adapted or communicated without the 
written consent of the copyright owner. If you have received this e-mail in 
error please advise the sender immediately by return e-mail or telephone 
and delete all copies. Fairfax Media does not guarantee the accuracy or 
completeness of any information contained in this e-mail or attached files. 
Internet communications are not secure, therefore Fairfax Media does not 
accept legal responsibility for the contents of this message or attached 
files.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ffd2084b-0fb2-4741-a2d5-52e20252fea6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Frequent updates to documents

2014-12-14 Thread Jinal Shah
Thanks for the Nikolas.

Users want to search data instantly after the save, so we are unable to use 
batch updates. It is good to know that even an update in single field means 
whole document reindex.

Thanks,
Jinal

On Friday, 12 December 2014 16:23:55 UTC+11, Jinal Shah wrote:

 Hi,

 We are using ES 1.0.3. In our application we do frequent updates to the 
 documents and this is causes delete count to increase quickly and frequent 
 merges.

 Due to Lucene version issues with our application and ES API, we are not 
 able to use the API and we have written a module to interact with ES. For 
 document creation and updates, we use HTTP POST and always send all fields 
 for updates also.

 Would sending just the updated fields, reduce the delete count? OR
 Does update to single field also triggers whole document re-indexing?

 Thanks in advance for your help!!

 Regards,
 Jinal Shah

 The information contained in this e-mail message and any accompanying 
 files is or may be confidential. If you are not the intended recipient, any 
 use, dissemination, reliance, forwarding, printing or copying of this 
 e-mail or any attached files is unauthorised. This e-mail is subject to 
 copyright. No part of it should be reproduced, adapted or communicated 
 without the written consent of the copyright owner. If you have received 
 this e-mail in error please advise the sender immediately by return e-mail 
 or telephone and delete all copies. Fairfax Media does not guarantee the 
 accuracy or completeness of any information contained in this e-mail or 
 attached files. Internet communications are not secure, therefore Fairfax 
 Media does not accept legal responsibility for the contents of this message 
 or attached files.


-- 
The information contained in this e-mail message and any accompanying files 
is or may be confidential. If you are not the intended recipient, any use, 
dissemination, reliance, forwarding, printing or copying of this e-mail or 
any attached files is unauthorised. This e-mail is subject to copyright. No 
part of it should be reproduced, adapted or communicated without the 
written consent of the copyright owner. If you have received this e-mail in 
error please advise the sender immediately by return e-mail or telephone 
and delete all copies. Fairfax Media does not guarantee the accuracy or 
completeness of any information contained in this e-mail or attached files. 
Internet communications are not secure, therefore Fairfax Media does not 
accept legal responsibility for the contents of this message or attached 
files.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f20e4365-eb20-4b0a-be58-4bdddba46cdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: analytics on data stored in ES

2014-12-14 Thread Ramchandra Phadake
Yes in general the fetch can be improved using standalone clients. I am NOT 
saying that data nodes are a bottleneck as of now.Indexing is not impacting 
the search. 

The point I am raising is data locality. Data is spread over few shards 
across few machines.Need to perform processing on this data without 
explicit fetch outside ES.
I want to return sample of grouped entries after going over all matched 
entries within an index.

Thanks,
Ram

On Saturday, December 13, 2014 9:24:11 PM UTC+5:30, Arie wrote:

 Hi,,

 Consider a non-data master node, this can improve data handling and search 
 speed a lot as I understand.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4af3e662-f9a7-4200-84bf-4cc6af5e4d7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


aggs terms is support *?

2014-12-14 Thread 唐坤
I have a scene.

-
Data A
{
name:foodA,
props:{
  color: red,
  taste: sweet,
  xx: xx,
  xx: xx
}
}

Date B
{
name:foodB,
props:{
  color: black,
  taste: sweet,
  xx: xx,
  xx: xx
}
}



the props field is dynamic, is custom.


-

i want result data is

{
aggregation: {
 color: [
{key:red, count: 1},
{key:black, count: 1}
 ],
 xx: [
{key:xx, count: xx},
 ]
 }
}



i use the dsl
{
  aggs: {
all: {
  terms: {
field: props.*
  }
}
  }
}


but return nothing.  i don't konw how to do

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f62cfb3-bd7b-4831-b29a-824f04a71422%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Marvel Index Taking too much disk space

2014-12-14 Thread Chetan Dev
Hi,

I installed the plugin for marvel , 
but its creating its own indexes which are even larger than my original 
indexed data.
is there a way to delete these indexes on daily basis or any other way ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fd306457-7201-4187-86ef-e5dab4bd563b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel Index Taking too much disk space

2014-12-14 Thread David Pilato
you can use curator for that. See https://github.com/elasticsearch/curator

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 15 déc. 2014 à 07:31, Chetan Dev cheten@carwale.com a écrit :
 
 Hi,
 
 I installed the plugin for marvel , 
 but its creating its own indexes which are even larger than my original 
 indexed data.
 is there a way to delete these indexes on daily basis or any other way ?
 
 Thanks
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com 
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/fd306457-7201-4187-86ef-e5dab4bd563b%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/fd306457-7201-4187-86ef-e5dab4bd563b%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout 
 https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/E847EF1D-7886-46AF-A871-B6C60ED11B9C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.