date:20141214

Frequency of significant terms in documents matching a query

2014-12-14 Thread Graeme Pietersz

I understand how to use aggregations to get significant terms with counts 
of the number of documents in which they occur.

I would like to also be able to count the number of times these terms occur 
in across all documents. I can use term vectors to count how often a term 
occurs in a single document, but I cannot see how to to get the sum of 
occurrences in all documents matching a query, sorted by the number of 
occurrences - and I really need to get this for the significant terms 
(otherwise I will just get a list of common words).

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d98a16b6-1bca-4a3e-a99d-52391976c371%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

AWS machine for ES master

2014-12-14 Thread Yoav Melamed

Hello,

I run Elasticsearch cluser in AWS based on c3.8xlarge machines.
Can I use smaller machine for the masters ?
What should be enough ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3035c4c-e0b2-4ae3-91a7-e1bd6f1c515f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Looking for a best practice to get all data according to some filters

2014-12-14 Thread David Pilato

Implication is the memory needed to be allocated on each shard.

David

Le 14 déc. 2014 à 05:46, Ron Sher ron.s...@gmail.com a écrit :

Again, why not use a very large count size? What are the implications of
using a very large count?
Regarding performance - it seems doing 1 request with a very large count
performs better than using scan scroll (with count of 100 using 32 shards)

On Wednesday, December 10, 2014 10:53:50 PM UTC+2, David Pilato wrote:
No I did not say that. Or I did not mean that. Sorry if it was unclear.
I said: don’t use large sizes:

Never use size:1000 or from:1000.

You should read this:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 10 déc. 2014 à 21:16, Ron Sher ron@gmail.com a écrit :

So you're saying there's no impact on elasticsearch if I issue a large
size?
If that's the case then why shouldn't I just call size of 1M if I want to
make sure I get everything?

On Wednesday, December 10, 2014 8:22:47 PM UTC+2, David Pilato wrote:
Scan/scroll is the best option to extract a huge amount of data.
Never use size:1000 or from:1000.

It's not realtime because you basically scroll over a given set of
segments and all new changes that will come in new segments won't be taken
into account during the scroll.
Which is good because you won't get inconsistent results.

About size, I'd would try and test. It depends on your docs size I believe.
Try with 1 and see how it goes when you increase it. You will may be
discover that getting 10*1 docs is the same as 1*10. :)

Best

David

Le 10 déc. 2014 à 19:09, Ron Sher ron@gmail.com a écrit :

Hi,

I was wondering about best practices to to get all data according to some
filters.
The options as I see them are:
Use a very big size that will return all accounts, i.e. use some value
like 1m to make sure I get everything back (even if I need just a few
hundreds or tens of documents). This is the quickest way, development
wise.
Use paging - using size and from. This requires looping over the result
and the performance gets worse as we advance to later pages. Also, we
need to use preference if we want to get consistent results over the
pages. Also, it's not clear what's the recommended size for each page.
Use scan/scroll - this gives consistent paging but also has several
drawbacks: If I use search_type=scan then it can't be sorted; using
scan/scroll is (maybe) less performant than paging (the documentation
says it's not for realtime use); again not clear which size is
recommended.
So you see - many options and not clear which path to take.

What do you think?

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ac0841ac-4150-435c-a3da-afbf2a4b06a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7717B0E2-E971-4653-A0A7-BA66EC3EAE9F%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

UI/Plugin to visualize output of Scoring Explain flat

2014-12-14 Thread vineeth mohan

Hi ,

I remember seeing a UI from a plugin or otherwise which visualizes the
output of explain API for scoring as  a neat d3 visualization of
collapsible tree - http://bl.ocks.org/mbostock/4339083

If anyone remembers the link , please reply to this mail.

Thanks
  Vineeth

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m%2BRuQXAEE7dXcD16GjUxngJ21Ns%3DEDM%2BSoPrAQFkZG9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

querying array of strings (multiword) with AND operator

2014-12-14 Thread Mathew Bolek

Hi,

I really don't know what's wrong, but I seem to be unable to find a way of 
querying array elements that contain multiple words with AND operator.

ES version: 1.3.*,
only the default standard analyser in use

A dummy document

POST /dummy/location
{
  locationArray : [United Kindgom, London],
  location : United Kingdom
}

It's mapping 

GET /dummy/_mapping
{
   dummy: {
  mappings: {
 location: {
properties: {
   locationArray: {
  type: string
   },
   location: {
  type: string
   }
}
 }
  }
   }
}

Query that matches location field OK:

GET /dummy/_search
{
  query: {
match: {
  location : {
query : United Kingdom,
operator : AND
  }
}
  }
}

*Problematic query that returns no results trying to match locationArray 
field:*

GET /dummy/_search
{
  query: {
match: {
  locationArray : {
query : United Kingdom,
operator : AND
  }
}
  }
}

I'd really appreciate a hint - there must be something obvious i'm missing 
/ not know about when querying that array field.

Cheers




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4179ca35-e376-471a-9228-1bf69b0bf550%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

2014-12-14 Thread CAI Longqi

 

Hello, I’m using elasticsearch-hadoop-2.0.2.jar, and meet the problem:


Exception in thread main java.lang.NoClassDefFoundError: 
org/elasticsearch/hadoop/mr/EsOutputFormat

at com.clqb.app.ElasticSearch.run(ElasticSearch.java:46)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at com.clqb.app.ElasticSearch.main(ElasticSearch.java:60)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.lang.ClassNotFoundException: 
org.elasticsearch.hadoop.mr.EsOutputFormat

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 9 more


Here’s my program: 


public class ElasticSearch extends Configured implements Tool {

public static class AwesomeMapper extends MapperLongWritable, Text, 
NullWritable, MapWritable {

@Override

protected void map(LongWritable key, Text value, Context 
context) throws IOException, InterruptedException {

context.write(NullWritable.get(), 
XmlUtils.xmlTextToMapWritable(value)); // XmlUtils is not shown here

}

}


public static class AwesomeReducer extends ReducerNullWritable, 
MapWritable, NullWritable, NullWritable {

}


public int run(String[] args) throws Exception {

Configuration conf = getConf();

conf.set(xmlinput.start, page);

conf.set(xmlinput.end, /page);

conf.setBoolean(mapred.map.tasks.speculative.execution, false);

conf.setBoolean(mapred.reduce.tasks.speculative.execution, false);

conf.set(es.nodes, localhost:9200);

conf.set(es.resource, radio/artists);


Job job = Job.getInstance(conf);

job.setJarByClass(ElasticSearch.class);

job.setInputFormatClass(XmlInputFormat.class);

job.setOutputFormatClass(EsOutputFormat.class);

job.setMapOutputValueClass(MapWritable.class);

job.setMapperClass(AwesomeMapper.class);

job.setReducerClass(AwesomeReducer.class);


Path outputPath = new Path(args[1]);

FileInputFormat.setInputPaths(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, outputPath);

outputPath.getFileSystem(conf).delete(outputPath, true);


return job.waitForCompletion(true) ? 0 : 1;

}


public static void main(String[] args) throws Exception {

int exitCode = ToolRunner.run(new ElasticSearch(), args);

System.exit(exitCode);

}

}


p.s. *I also make sure that I have included 
`elasticsearch-hadoop-2.0.2.jar` in my `-libjars`*. Any suggestions?


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/762794c8-0bd0-4c16-b1dd-9c914a29a710%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Looking for a best practice to get all data according to some filters

2014-12-14 Thread Nikolas Everett

Search consumes O(offset + size) memory and O(ln(offset +
size)*(offset+size) CPU. Scan scroll has higher overhead but is O(size) the
whole time. I don't know the break even point.

The other thing is that scroll provides a consistent snapshot. That means
it consumes resources you shouldn't let end users expose but it won't miss
results or have repeats like scrolling with increasing offset.

You can certainly do large fetches with big size but its less stable in
general.

Finally, scan/scroll has always been pretty quick for me. I usually use a
batch size in the thousands.

Nik
On Dec 14, 2014 4:13 AM, David Pilato da...@pilato.fr wrote:

Implication is the memory needed to be allocated on each shard.

David

Le 14 déc. 2014 à 05:46, Ron Sher ron.s...@gmail.com a écrit :

On Wednesday, December 10, 2014 10:53:50 PM UTC+2, David Pilato wrote:

No I did not say that. Or I did not mean that. Sorry if it was unclear.
I said: don’t use large sizes:

Never use size:1000 or from:1000.

You should read this: http://www.elasticsearch.org/guide/en/
elasticsearch/reference/current/search-request-scroll.html#scroll-scan

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com
http://Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 10 déc. 2014 à 21:16, Ron Sher ron@gmail.com a écrit :

So you're saying there's no impact on elasticsearch if I issue a large
size?
If that's the case then why shouldn't I just call size of 1M if I want to
make sure I get everything?

On Wednesday, December 10, 2014 8:22:47 PM UTC+2, David Pilato wrote:

Scan/scroll is the best option to extract a huge amount of data.
Never use size:1000 or from:1000.

About size, I'd would try and test. It depends on your docs size I
believe.
Try with 1 and see how it goes when you increase it. You will may be
discover that getting 10*1 docs is the same as 1*10. :)

Best

David

Le 10 déc. 2014 à 19:09, Ron Sher ron@gmail.com a écrit :

Hi,

I was wondering about best practices to to get all data according to
some filters.
The options as I see them are:

- Use a very big size that will return all accounts, i.e. use some
value like 1m to make sure I get everything back (even if I need just a
few
hundreds or tens of documents). This is the quickest way, development
wise.
- Use paging - using size and from. This requires looping over the
result and the performance gets worse as we advance to later pages. Also,
we need to use preference if we want to get consistent results over the
pages. Also, it's not clear what's the recommended size for each page.
- Use scan/scroll - this gives consistent paging but also has
several drawbacks: If I use search_type=scan then it can't be sorted;
using
scan/scroll is (maybe) less performant than paging (the documentation
says
it's not for realtime use); again not clear which size is recommended.

So you see - many options and not clear which path to take.

What do you think?

Thanks,
Ron

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

Re: AWS machine for ES master

2014-12-14 Thread Mark Walkom

Master only nodes are very light, you can probably get away with 1 or 2GB
for heap.

Of course this will depend on your cluster topology and a few other things,
so it might be best to trial it.

On 14 December 2014 at 10:01, Yoav Melamed yo...@exelate.com wrote:

 Hello,

 I run Elasticsearch cluser in AWS based on c3.8xlarge machines.
 Can I use smaller machine for the masters ?
 What should be enough ?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b3035c4c-e0b2-4ae3-91a7-e1bd6f1c515f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b3035c4c-e0b2-4ae3-91a7-e1bd6f1c515f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8aDkbQRkREg%3D%3Dj%2BtQ57srUys86QJ5wwLVx48mPM8uPcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: AWS machine for ES master

2014-12-14 Thread Yoav Melamed

Thanks

On Sunday, December 14, 2014 11:01:58 AM UTC+2, Yoav Melamed wrote:

 Hello,

 I run Elasticsearch cluser in AWS based on c3.8xlarge machines.
 Can I use smaller machine for the masters ?
 What should be enough ?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0d5896f6-0e19-442e-aefc-fe0195096b5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

SearchParseException (marvel) - [No mapping found for [@timestamp] in order to sort on

2014-12-14 Thread Eugen Paraschiv

Hi, 
I'm using Elasticsearch 1.4.1 and the latest Marvel (1.2.1). 
I have Marvel installed on every node of the cluster and generating data 
into the daily index. 
When going into Marvel, I get the following exception: 
Caused by: org.elasticsearch.search.SearchParseException: [.marvel-2014.12.
14][0]: from[-1],size[1]: Parse Failure [No mapping found for [@timestamp] 
in order to sort on]

So - this is referring specifically to the *.marvel-2014.12.14* index - an 
index created by Marvel itself, which should thus have the right structure. 
Am I missing something related to the Marvel setup?
Thank you, 
Eugen. 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/71932dba-a12f-4b48-874b-c51fc660dbd2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Testing distributed characteristic of Elasticsearch

2014-12-14 Thread Luke Laird

Hi guys,
Don't get me wrong. This is absolutely not another post about benchmark of 
Elasticsearch.
First, I am pretty new to ES. Please be patient if I ask dumb questions. I 
am doing a test for academic use only that proving ES's distributed 
characteristic is an improvement over Lucene, which is the base of ES. I 
want to test that with more than 1 node, the time we get from a search 
query is shorter or 'faster'. It is clear that with 2 nodes ( 2 hard disks 
) we could get double bandwidth in theory ( each normal disk peak at ~ 
50MB/s  128MB = 1Gb of Ethernet so Ethernet is not a bottle neck).

I have 2 physical nodes ( normal laptop ) connected directly via 1Gb 
Ethernet port, no router in between. My data is 20GB ( + 20 GB replica) of 
3 million records like this : http://pastebin.com/FDhfy6C3
( the source of data I get is http://www.mockaroo.com/67e33320 )

My strategy is to write as many as possible search queries and at the same 
time clear the cache. Something like

curl -XPOST http://192.168.57.103:9200/myjson/_cache/clear;

curl -XPOST http://192.168.57.103:9200/myjson/_flush?force=true;


curl -XGET http://192.168.57.103:9200/myjson/myjson/_search?pretty; -d \
'{
query : { 
bool : {
should : [
{ match : { first_name : Clarence}},
{ match : { last_name : Fernandez}},
{ match : { country: uk }},
{ match : { amount: $9001.19 }},
{ match : { password_hash: Th94hnXtaYtZ }}
]
}
}
}'

I am writing a script to generate as many as possible those match fields but I 
still want to ask if what I am doing is right?
Any comment/opinion is really appreciated.
Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e0eb8437-a629-4e10-85da-9b9da0076c45%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Looking for a best practice to get all data according to some filters

2014-12-14 Thread Jonathan Foy

Just to reword what others have said, ES will allocate memory for [size] scores 
as I understand it (per shard?) regardless of the final result count. If you're 
getting back 4986 results from a query, it'd be faster to use size: 4986 than 
size: 100. 

What I've done in similar situations is to issue a count first with the same 
filter (which is very fast), then use the result of that in the size field. It 
worked much better/faster than using a default large size.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef65d92f-9a9c-4206-a2b9-5b769e4cec87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: querying array of strings (multiword) with AND operator

2014-12-14 Thread joergpra...@gmail.com

You have a typo:

POST /dummy/location
{
  locationArray : [United Kindgom, London],
  location : United Kingdom
}

and I'm sure you mean

POST /dummy/location
{
  locationArray : [United Kingdom, London],
  location : United Kingdom
}

Jörg


On Sun, Dec 14, 2014 at 12:57 PM, Mathew Bolek borska.pe...@gmail.com
wrote:

 Hi,

 I really don't know what's wrong, but I seem to be unable to find a way of
 querying array elements that contain multiple words with AND operator.

 ES version: 1.3.*,
 only the default standard analyser in use

 A dummy document

 POST /dummy/location
 {
   locationArray : [United Kindgom, London],
   location : United Kingdom
 }

 It's mapping

 GET /dummy/_mapping
 {
dummy: {
   mappings: {
  location: {
 properties: {
locationArray: {
   type: string
},
location: {
   type: string
}
 }
  }
   }
}
 }

 Query that matches location field OK:

 GET /dummy/_search
 {
   query: {
 match: {
   location : {
 query : United Kingdom,
 operator : AND
   }
 }
   }
 }

 *Problematic query that returns no results trying to match locationArray
 field:*

 GET /dummy/_search
 {
   query: {
 match: {
   locationArray : {
 query : United Kingdom,
 operator : AND
   }
 }
   }
 }

 I'd really appreciate a hint - there must be something obvious i'm missing
 / not know about when querying that array field.

 Cheers




  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4179ca35-e376-471a-9228-1bf69b0bf550%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/4179ca35-e376-471a-9228-1bf69b0bf550%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFAPG66R6CbKz4q7Y4yNb3svEN_DThXv88GkAj16yMnSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Same query, different CPU util when run with Java API versus REST

2014-12-14 Thread joergpra...@gmail.com

Can you post the full query code for better recreation?

Jörg

On Fri, Dec 12, 2014 at 6:44 PM, Jeff Potts jeffpott...@gmail.com wrote:

 I should mention that the Elasticsearch node, the Java service, and the
 JMeter test client are all on different machines.

 Jeff

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/20ee0ee8-5b4c-476e-8403-1db175de4158%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/20ee0ee8-5b4c-476e-8403-1db175de4158%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvO8iOu0%3DqfmNegSuB2Pv%3DD8y2pQuXem9HUj4%2BV6vO1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: SearchParseException (marvel) - [No mapping found for [@timestamp] in order to sort on

2014-12-14 Thread Eugen Paraschiv

One more detail on this - the Marvel UI also displays the exact query
that's failing. Running that query results in a more informative message -
probably the root cause of the problem:
Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException:
Facet [fs.total.available_in_bytes]: failed to find mapping for fs.total.
available_in_bytes
And indeed, the *fs.total.available_in_bytes* isn't available in in the
marvel index.
Now - the reading is available on *_nodes/stats* - so I'm assuming it's a
mapping problem from Marvel.
But - I removed the old mapping, restarted the entire cluster and basically
allowed marvel to re-create the mapping it needs - so that should be
correct.
Any help is appreciated.
Thanks.

On Sunday, December 14, 2014 6:22:22 PM UTC+2, Eugen Paraschiv wrote:

Hi,
I'm using Elasticsearch 1.4.1 and the latest Marvel (1.2.1).
I have Marvel installed on every node of the cluster and generating data
into the daily index.
When going into Marvel, I get the following exception:
Caused by: org.elasticsearch.search.SearchParseException: [.marvel-2014.12
.14][0]: from[-1],size[1]: Parse Failure [No mapping found for [@timestamp
] in order to sort on]

So - this is referring specifically to the *.marvel-2014.12.14* index -
an index created by Marvel itself, which should thus have the right
structure.
Am I missing something related to the Marvel setup?
Thank you,
Eugen.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e2f43a38-3062-4253-a519-38b3bb3c1c1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

2014-12-14 Thread Costin Leau


Hi,

It looks like es-hadoop is not part of your classpath (hence the NCDFE). This might be either due to some 
misconfiguration of your classpath or due to the way
the Configuration object is used. It looks like you are using it correctly though typically I use Job(Configuration) 
instead of getInstance() (static factory methods

are always bad).
Potentially you can use other variants like LIBJARS or HADOOP_CLASSPATH env variables and make notice of what type of 
separator you use between your jars (, vs : vs ;).


Try to debug the classpath and see what you get - see the jar that is created and uploaded to HDFS, turn on logging on 
the hadoop side - potentially use distributed cache

Embedding the libraries under lib/ also works (see [2]).

All of the have pros and cons, the idea is to get your sample running and then 
debug your env to see what's the issue.

Cheers,

[1] 
http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v2.0
[2] 
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

On 12/14/14 3:32 PM, CAI Longqi wrote:

Hello, I’m using elasticsearch-hadoop-2.0.2.jar, and meet the problem:


Exception in thread main java.lang.NoClassDefFoundError: 
org/elasticsearch/hadoop/mr/EsOutputFormat

at com.clqb.app.ElasticSearch.run(ElasticSearch.java:46)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at com.clqb.app.ElasticSearch.main(ElasticSearch.java:60)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.lang.ClassNotFoundException: 
org.elasticsearch.hadoop.mr.EsOutputFormat

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 9 more


Here’s my program:


public class ElasticSearch extends Configured implements Tool {

 public static class AwesomeMapper extends MapperLongWritable, Text, 
NullWritable, MapWritable {

 @Override

 protected void map(LongWritable key, Text value, Context context) 
throws IOException, InterruptedException {

 context.write(NullWritable.get(), 
XmlUtils.xmlTextToMapWritable(value)); // XmlUtils is not shown here

 }

 }


 public static class AwesomeReducer extends ReducerNullWritable, MapWritable, 
NullWritable, NullWritable {

 }


 public int run(String[] args) throws Exception {

 Configuration conf = getConf();

 conf.set(xmlinput.start, page);

 conf.set(xmlinput.end, /page);

 conf.setBoolean(mapred.map.tasks.speculative.execution, false);

 conf.setBoolean(mapred.reduce.tasks.speculative.execution, false);

 conf.set(es.nodes, localhost:9200);

 conf.set(es.resource, radio/artists);


 Job job = Job.getInstance(conf);

 job.setJarByClass(ElasticSearch.class);

 job.setInputFormatClass(XmlInputFormat.class);

 job.setOutputFormatClass(EsOutputFormat.class);

 job.setMapOutputValueClass(MapWritable.class);

 job.setMapperClass(AwesomeMapper.class);

 job.setReducerClass(AwesomeReducer.class);


 Path outputPath = new Path(args[1]);

 FileInputFormat.setInputPaths(job, new Path(args[0]));

 FileOutputFormat.setOutputPath(job, outputPath);

 outputPath.getFileSystem(conf).delete(outputPath, true);


 return job.waitForCompletion(true) ? 0 : 1;

 }


 public static void main(String[] args) throws Exception {

 int exitCode = ToolRunner.run(new ElasticSearch(), args);

 System.exit(exitCode);

 }

}


p.s. *I also make sure that I have included `elasticsearch-hadoop-2.0.2.jar` in 
my `-libjars`*. Any suggestions?


Thanks

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/762794c8-0bd0-4c16-b1dd-9c914a29a710%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/762794c8-0bd0-4c16-b1dd-9c914a29a710%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message

Re: Creating a custom plugin to return hashes of the terms or the terms of an Elasticsearch index

2014-12-14 Thread joergpra...@gmail.com

The termlist plugin can use filters with the 'term' parameter and
pagination with the 'size' parameter. So you can get smaller term lists,
for terms starting with 'a','b','c' ..., and you can limit the number of
entries returned by say size=1000 (or 1 etc) The 'term' filter should
be sufficient for most cases.

There are no hashes for terms.

Jörg

On Fri, Dec 12, 2014 at 2:45 PM, Rosen Nikolov rpniko...@gmail.com wrote:

I am currently using an Elasticsearch plugin called termlist (
https://github.com/jprante/elasticsearch-index-termlist)
that returns the terms of an index. But it breaks when there are too many
terms and the output
information is larger than about 30-40 megabytes. I need my custom plugin
to work for bigger amounts of output data.

I am thinking about creating a custom plugin to return hashes of terms
instead of the actual terms to reduce the output data volume.

So I have a couple of questions:
1. I presume that Elasticsearch might already use hashes of terms
internally in the index, so would it be possible to get those?
2. If the above is not possible, what other options do I have to
circumvent the 30-40 MB barrier?

Thank you in advance.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF6qYpkhOvxrPRTgYU%3D8def18h%2BjuA8jzgyzurGV9300Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread am

Is there a way to do exact and full text searches without having to create 
two different fields?

The documentation 
(http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_exact_values.html)
 
states fields must have the mapping not_analyzed in order to avoid 
tokenization. This allows exact searches to be done.

In my case, I would like both full text search and exact searches. For 
example:

When searching for book titles, a user can input either:

I like ElasticSearch

-OR-

exact=I like ElasticSearch

The first case will return results from a full text search. 

The second case will return results only if the book title is exactly I 
like ElasticSearch. Case sensitivity does not matter.

To do this, I think I will have to create two fields called book_title 
and book_title_exact where book_title_exact will have a field mapping 
not_analyzed so that I can do exact matches.

Is this the proper way of handling my use case? Or is there a simpler way 
in ES without having to store a title twice?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread Nikolas Everett

Look at multifields.  They let you send the field once and analyze it
multiple times. You also might want to use keyword ananlyzer and lowercase
filter rather than not_analyzed. Folks are used to case insensitivity.

Nik
Is there a way to do exact and full text searches without having to create
two different fields?

The documentation (
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_exact_values.html)
states fields must have the mapping not_analyzed in order to avoid
tokenization. This allows exact searches to be done.

In my case, I would like both full text search and exact searches. For
example:

When searching for book titles, a user can input either:

I like ElasticSearch

-OR-

exact=I like ElasticSearch

The first case will return results from a full text search.

The second case will return results only if the book title is exactly I
like ElasticSearch. Case sensitivity does not matter.

To do this, I think I will have to create two fields called book_title
and book_title_exact where book_title_exact will have a field mapping
not_analyzed so that I can do exact matches.

Is this the proper way of handling my use case? Or is there a simpler way
in ES without having to store a title twice?

-- 
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3tzungzKTtCrxeLJrLpFJEdiGokqP7%2BuQS5ZB388-mTQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread am

Ah, thanks. I've set this up (using ES python bindings):

es.indices.put_mapping(index=myindex, 
   doc_type=books, 
   body={ 
books: { 
 properties: {
  title: { 
  type: string, 
  fields: { 
name: { type
: string }, 
raw: { type: 
string, index: not_analyzed } 
} } } } } )




Then I try to search for exact match:

es.search(index='myindex', doc_type=books, body={query: { filtered: { 
filter: { term: { title: { raw : I like ElasticSearch } } } } } } 
)



But I get an ES error stating

nested: QueryParsingException[[myindex] [term] filter does not support 
[raw]]; }]')

It seems like I'm not searching for the raw correctly. How would I specify 
to search for the raw title (exact matching)?

On Sunday, December 14, 2014 6:49:09 PM UTC-5, Nikolas Everett wrote:

 Look at multifields.  They let you send the field once and analyze it 
 multiple times. You also might want to use keyword ananlyzer and lowercase 
 filter rather than not_analyzed. Folks are used to case insensitivity. 

 Nik
 Is there a way to do exact and full text searches without having to create 
 two different fields?

 The documentation (
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_exact_values.html)
  
 states fields must have the mapping not_analyzed in order to avoid 
 tokenization. This allows exact searches to be done.

 In my case, I would like both full text search and exact searches. For 
 example:

 When searching for book titles, a user can input either:

 I like ElasticSearch

 -OR-

 exact=I like ElasticSearch

 The first case will return results from a full text search. 

 The second case will return results only if the book title is exactly I 
 like ElasticSearch. Case sensitivity does not matter.

 To do this, I think I will have to create two fields called book_title 
 and book_title_exact where book_title_exact will have a field mapping 
 not_analyzed so that I can do exact matches.

 Is this the proper way of handling my use case? Or is there a simpler way 
 in ES without having to store a title twice?

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/413133ef-c82e-4bff-9a9f-af9bd9f56815%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there a way to do exact and full-text searching without creating two different fields?

2014-12-14 Thread am

I think I just figured it out:

{title.raw : I like ElasticSearch}  

instead of 

title: { raw: I like ElasticSearch }

On Sunday, December 14, 2014 9:00:52 PM UTC-5, am wrote:

 Ah, thanks. I've set this up (using ES python bindings):

 es.indices.put_mapping(index=myindex, 
doc_type=books, 
body={ 
 books: { 
  properties: {
   title: { 
   type: string, 
   fields: { 
 name: { 
 type: string }, 
 raw: { 
 type: string, index: not_analyzed } 
 } } } } } )




 Then I try to search for exact match:

 es.search(index='myindex', doc_type=books, body={query: { filtered: 
 { filter: { term: { title: { raw : I like ElasticSearch } } } } 
 } } )



 But I get an ES error stating

 nested: QueryParsingException[[myindex] [term] filter does not support 
 [raw]]; }]')

 It seems like I'm not searching for the raw correctly. How would I specify 
 to search for the raw title (exact matching)?

 On Sunday, December 14, 2014 6:49:09 PM UTC-5, Nikolas Everett wrote:

 Look at multifields.  They let you send the field once and analyze it 
 multiple times. You also might want to use keyword ananlyzer and lowercase 
 filter rather than not_analyzed. Folks are used to case insensitivity. 

 Nik
 Is there a way to do exact and full text searches without having to 
 create two different fields?

 The documentation (
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_exact_values.html)
  
 states fields must have the mapping not_analyzed in order to avoid 
 tokenization. This allows exact searches to be done.

 In my case, I would like both full text search and exact searches. For 
 example:

 When searching for book titles, a user can input either:

 I like ElasticSearch

 -OR-

 exact=I like ElasticSearch

 The first case will return results from a full text search. 

 The second case will return results only if the book title is exactly I 
 like ElasticSearch. Case sensitivity does not matter.

 To do this, I think I will have to create two fields called book_title 
 and book_title_exact where book_title_exact will have a field mapping 
 not_analyzed so that I can do exact matches.

 Is this the proper way of handling my use case? Or is there a simpler way 
 in ES without having to store a title twice?

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ac298481-b0f0-4052-a115-388e9db92f50%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.
  


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5ad817d-f36c-495f-99cc-8dbe96e78e3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Frequent updates to documents

2014-12-14 Thread Jinal Shah

Thanks for the reply Nikolas.

Users want to search data instantly after the save, so we are unable to use
batch updates. It is good to know that even an update in single field means
whole document reindex.

Thanks,
Jinal

On Friday, 12 December 2014 16:23:55 UTC+11, Jinal Shah wrote:

Hi,

We are using ES 1.0.3. In our application we do frequent updates to the
documents and this is causes delete count to increase quickly and frequent
merges.

Due to Lucene version issues with our application and ES API, we are not
able to use the API and we have written a module to interact with ES. For
document creation and updates, we use HTTP POST and always send all fields
for updates also.

Would sending just the updated fields, reduce the delete count? OR
Does update to single field also triggers whole document re-indexing?

Thanks in advance for your help!!

Regards,
Jinal Shah

The information contained in this e-mail message and any accompanying
files is or may be confidential. If you are not the intended recipient, any
use, dissemination, reliance, forwarding, printing or copying of this
e-mail or any attached files is unauthorised. This e-mail is subject to
copyright. No part of it should be reproduced, adapted or communicated
without the written consent of the copyright owner. If you have received
this e-mail in error please advise the sender immediately by return e-mail
or telephone and delete all copies. Fairfax Media does not guarantee the
accuracy or completeness of any information contained in this e-mail or
attached files. Internet communications are not secure, therefore Fairfax
Media does not accept legal responsibility for the contents of this message
or attached files.

--
The information contained in this e-mail message and any accompanying files
is or may be confidential. If you are not the intended recipient, any use,
dissemination, reliance, forwarding, printing or copying of this e-mail or
any attached files is unauthorised. This e-mail is subject to copyright. No
part of it should be reproduced, adapted or communicated without the
written consent of the copyright owner. If you have received this e-mail in
error please advise the sender immediately by return e-mail or telephone
and delete all copies. Fairfax Media does not guarantee the accuracy or
completeness of any information contained in this e-mail or attached files.
Internet communications are not secure, therefore Fairfax Media does not
accept legal responsibility for the contents of this message or attached
files.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ffd2084b-0fb2-4741-a2d5-52e20252fea6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Frequent updates to documents

2014-12-14 Thread Jinal Shah

Thanks for the Nikolas.

Users want to search data instantly after the save, so we are unable to use
batch updates. It is good to know that even an update in single field means
whole document reindex.

Thanks,
Jinal

On Friday, 12 December 2014 16:23:55 UTC+11, Jinal Shah wrote:

Hi,

We are using ES 1.0.3. In our application we do frequent updates to the
documents and this is causes delete count to increase quickly and frequent
merges.

Would sending just the updated fields, reduce the delete count? OR
Does update to single field also triggers whole document re-indexing?

Thanks in advance for your help!!

Regards,
Jinal Shah

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f20e4365-eb20-4b0a-be58-4bdddba46cdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: analytics on data stored in ES

2014-12-14 Thread Ramchandra Phadake

Yes in general the fetch can be improved using standalone clients. I am NOT 
saying that data nodes are a bottleneck as of now.Indexing is not impacting 
the search. 

The point I am raising is data locality. Data is spread over few shards 
across few machines.Need to perform processing on this data without 
explicit fetch outside ES.
I want to return sample of grouped entries after going over all matched 
entries within an index.

Thanks,
Ram

On Saturday, December 13, 2014 9:24:11 PM UTC+5:30, Arie wrote:

 Hi,,

 Consider a non-data master node, this can improve data handling and search 
 speed a lot as I understand.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4af3e662-f9a7-4200-84bf-4cc6af5e4d7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aggs terms is support *?

2014-12-14 Thread 唐坤

I have a scene.

-
Data A
{
name:foodA,
props:{
  color: red,
  taste: sweet,
  xx: xx,
  xx: xx
}
}

Date B
{
name:foodB,
props:{
  color: black,
  taste: sweet,
  xx: xx,
  xx: xx
}
}



the props field is dynamic, is custom.


-

i want result data is

{
aggregation: {
 color: [
{key:red, count: 1},
{key:black, count: 1}
 ],
 xx: [
{key:xx, count: xx},
 ]
 }
}



i use the dsl
{
  aggs: {
all: {
  terms: {
field: props.*
  }
}
  }
}


but return nothing.  i don't konw how to do

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f62cfb3-bd7b-4831-b29a-824f04a71422%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marvel Index Taking too much disk space

2014-12-14 Thread Chetan Dev

Hi,

I installed the plugin for marvel , 
but its creating its own indexes which are even larger than my original 
indexed data.
is there a way to delete these indexes on daily basis or any other way ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fd306457-7201-4187-86ef-e5dab4bd563b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Marvel Index Taking too much disk space

2014-12-14 Thread David Pilato

you can use curator for that. See https://github.com/elasticsearch/curator

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 15 déc. 2014 à 07:31, Chetan Dev cheten@carwale.com a écrit :

Hi,

I installed the plugin for marvel ,
but its creating its own indexes which are even larger than my original
indexed data.
is there a way to delete these indexes on daily basis or any other way ?

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fd306457-7201-4187-86ef-e5dab4bd563b%40googlegroups.com

https://groups.google.com/d/msgid/elasticsearch/fd306457-7201-4187-86ef-e5dab4bd563b%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout
https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/E847EF1D-7886-46AF-A871-B6C60ED11B9C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Frequency of significant terms in documents matching a query

AWS machine for ES master

Re: Looking for a best practice to get all data according to some filters

UI/Plugin to visualize output of Scoring Explain flat

querying array of strings (multiword) with AND operator

[hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

Re: Looking for a best practice to get all data according to some filters

Re: AWS machine for ES master

Re: AWS machine for ES master

SearchParseException (marvel) - [No mapping found for [@timestamp] in order to sort on

Testing distributed characteristic of Elasticsearch

Re: Looking for a best practice to get all data according to some filters

Re: querying array of strings (multiword) with AND operator

Re: Same query, different CPU util when run with Java API versus REST

Re: SearchParseException (marvel) - [No mapping found for [@timestamp] in order to sort on

Re: [hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

Re: Creating a custom plugin to return hashes of the terms or the terms of an Elasticsearch index

Is there a way to do exact and full-text searching without creating two different fields?

Re: Is there a way to do exact and full-text searching without creating two different fields?

Re: Is there a way to do exact and full-text searching without creating two different fields?

Re: Is there a way to do exact and full-text searching without creating two different fields?

Re: Frequent updates to documents

Re: Frequent updates to documents

Re: analytics on data stored in ES

aggs terms is support *?

Marvel Index Taking too much disk space

Re: Marvel Index Taking too much disk space

27 matches

Site Navigation

Mail list logo

Footer information