date:20141017

Updating single field in large documents

2014-10-17 Thread Dragan Bošnjak

What strategy could one do when you need to frequently update single field 
in large document?
What can you do to improve update performance in case like that?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4f8f9269-fd44-45ba-bbb7-eece817afefd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

update mapping

2014-10-17 Thread eunever32

Is the way to update mapping of large index as follows

Create empty index with new mapping
Copy old data into new index
Alias new index to previous

If so,  what are recommended  tools? 

Ideally there would be a user interface for IT people to use?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1691b5c9-3aa5-4d8b-9149-e2b2dff56817%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Cluster discovery on Amazon EC2 problem - need urgent help

2014-10-17 Thread Norberto Meijome

I am pretty sure you can open the ports for the sec group the elb belongs
to , regardless of the az. (Az, not region). Unless you r using network
acls.

Anyway, not really ES... pm me if u want to continue the AWS discussion :-)

On 16/10/2014 3:37 pm, Zoran Jeremic zoran.jere...@gmail.com wrote:


 For the zone availability, I had to go with everything in one zone. Main
reason was the problem to connect ELB controlled application instances with
backend instances (MySQL, MongoDB and Elasticsearch). It's not possible to
add rule to the backend instances having port+elb security group if
instances are in different zones, so I had to keep everything in one zone.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACj2-4JkUOB_VmMyO41%2B1GjEF4S79Z2-doYkVXfjLgSOLowPFA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch- IndexReaders cannot exceed 2147483647

2014-10-17 Thread Prasanth R

Dear All,
Thanks for your replies.

Conclusion is, we can not store more than 2147483647 records per shard as
of now. The only option is we need to increase the shard count.

Thanks
Prasath Rajan

On Tuesday, October 14, 2014 9:34:33 PM UTC+5:30, Jörg Prante wrote:

You can not store more than 2G docs per shard in Lucene 4.x codecs. This
is a documented Lucene limit:

Similarly, Lucene uses a Java int to refer to document numbers, and the
index file format uses an Int32 on-disk to store document numbers. This is
a limitation of both the index file format and the current implementation.
Eventually these should be replaced with either UInt64 values, or better
yet, VInt values which have no limit.

https://lucene.apache.org/core/4_9_1/core/org/apache/lucene/codecs/lucene49/package-summary.html#Limitations

Jörg

On Tue, Oct 14, 2014 at 5:37 PM, Prasanth R prasanth...@gmail.com
javascript: wrote:

Thanks for the reply.

My scenario here is,
1) No nested docs.
2) I don't have any limit per shard..

I didn't know about internal limit of ES.
On Oct 14, 2014 8:23 PM, Alexandre Rafalovitch araf...@gmail.com
javascript: wrote:

On 14 October 2014 10:33, Prasanth R prasanth...@gmail.com
javascript: wrote:
There is no upper limit...

Well, then you must have an infinitely scalable architecture and a
decision when the content starts getting shared. So, then the question
is what is your individual shard allowed to grow to. Which is how
many documents - including nested - you are expecting to have in a
single shard.

Because, ElasticSearch has an internal limit and you just hit it. So,
the question is whether it is intentional, unintentional or a result
of a bug.

Regards,
Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/H74KAYmGtoc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com javascript:.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEFAe-G8GsLZpeO9-b5_RGs%3D-tGBNp39QgeY2rywjRnOQfcfnw%40mail.gmail.com
.
For more options, visit https://groups.google.com/d/optout.

https://groups.google.com/d/msgid/elasticsearch/CAJLGCR1LRDnv9Nw71py%3DPFksSqLnXOG6x6-K0kUW1j26%3D6_pYA%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d85eb58f-2c9d-4fba-88ca-55b64a74be3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: What MongoDB can do and ES cannot?

2014-10-17 Thread samant

hi Clinton

Considering the enormous amount of value addition in ES since this original
question was posted . Wondering, if the answer has tilted in favor of
ElasticSearch ?

Can we safely say - ElasticSearch can be considered as a primary data store
?



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/What-MongoDB-can-do-and-ES-cannot-tp4032654p4064962.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1413471142126-4064962.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using a nested object property within custom_filters_score script

2014-10-17 Thread meganeinu7

Hi Veda, 

I run into a similar issue like yours.
Have you found a solution to your problem?

Thanks,
Vincent




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Using-a-nested-object-property-within-custom-filters-score-script-tp4046901p4064981.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1413496110842-4064981.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Many indices.fielddata.breaker errors in logs and cluster slow...

2014-10-17 Thread Kimbro Staken

This is caused by elasticsearch trying to load fielddata. Fielddata is used
for sorting and faceting/aggregations. When a query has a sort parameter
the node will try to load the fielddata for that field for all documents in
the shard, not just those included in the query result. The breaker is
tripped when ES estimates there is not enough heap available to load the
fielddata so it just rejects the query rather than running the node out of
heap space.

You should probably start by looking at the queries that are being run to
determine what's triggering the error. To deal with it the options I'm
aware of are to add heap space, more nodes or look at using doc_values to
move fielddata off the heap.

Kimbro

On Wed, Oct 15, 2014 at 10:42 PM, Robin Clarke robi...@gmail.com wrote:

I'm still having this problem... has anybody got an idea what the cause /
solution might be?

Thank you! :)

On Tuesday, 7 October 2014 14:29:22 UTC+2, Robin Clarke wrote:

I'm getting a lot of these errors in my Elasticsearch logs, and am also
experiencing a lot of slowness on the cluster...

New used memory 7670582710 [7.1gb] from field [machineName.raw] would be
larger than configured breaker: 7666532352 [7.1gb], breaking
...
New used memory 7674188379 [7.1gb] from field [@timestamp] would be
larger than configured breaker: 7666532352 [7.1gb], breaking

I've looked at the documentation about memory limits
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html,
but I don't really understand what is causing this, and more importantly
how to avoid this...

My cluster is 10 machines @ 32GB memory and 8 CPU cores each. I have one
ES node on each machine with 12GB memory allocated. On each machine there
is additionally one logstash agent (1GB) and one redis server (2GB).
I have 10 indexes open with one replication per shard (so each node
should only be holding 22 shards (two more for kibana-int)).

I'm using Elasticsearch 1.3.3, Logstash 1.4.2

Thanks for your help!

-Robin-

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5935b1f4-809c-46ac-ba03-f1df33a8737e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5935b1f4-809c-46ac-ba03-f1df33a8737e%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAA0DmXZRMFsAMXCs9qmMk0KN%2B%2BuLh%3DCiEtP-r4vK3tZF0CRAmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Announcing elasticsearch plugin for Liferay - elasticray

2014-10-17 Thread Alexandre Rafalovitch

Elastic (without Search) should be ok, I believe. At least according
to the official source: http://www.elasticsearch.org/trademarks/

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On 17 October 2014 00:33, k...@rknowsys.com wrote:
Hi all,

I am glad to announce that we have initiated a github project -
https://github.com/R-Knowsys/elasticray

For Liferay users: Please test it in dev/staging environments. We are
working on v1.0 RC and once tested, this should be production ready from
v1.0 (1st week of Nov'14).

We are fixing some minor issues and should have a 1.0 RC by next week.

Query: Do we have any trademark issues by naming the plugin elasticray ? We
chose that name to clearly indicate that this is an elasticsearch plugin for
Liferay.
Posting this query here as I did not find any other obvious place to ask
this.

Thanks,
kc
www.rknowsys.com

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aad07790-b852-4ba9-8c9c-3c575568b818%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEFAe-Fwo7f%2BYWySxX_sR%3D8GcS1NcCpD-fxysX0SEGxhrFTi1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

River MongoDB-Elasticsearch (parent/child)

2014-10-17 Thread Ludovic MEYER

Hello,

I'm looking for a solution to creat parent/child relation with the script 
of the river mongodb-ES plugine.

I don't know if the relation parent/child must be present already in 
MongoDB to do that. For now, I just have the field parent_id in the all 
document with an ID which is the same between the parent and children. The 
type (parent or child) is in the field estype to dispatch them in their 
right type.

My mapping :

POST mongo_index_log
{
  mappings: {
parent: {},
child: {
  _parent: {type: parent}
}
  },
  settings: {
number_of_shards: 1,
number_of_replica: 0
  }
}


My river :

PUT _river/mongo_index_log/_meta
{
  type: mongodb,
  mongodb: {
servers: [
  { host: 127.0.0.1, port: 27017 }
],
options: { secondary_read_preference: true },
db: test,
collection: mongodb_base,
script: ctx._type = ctx.document.estype; if (ctx._type == 'fils') { 
ctx._parent = ctx.document.parent_id; }
  },
  index: {
name: mongo_index_log,
type: mongo_type
  }
}


In this case, the split into the right type is ok, but the relarion 
parent/child doesn't do anything when I put the following command : (and 
i'm waiting for the parent who have at least 1 child)

POST mongo_index_log/parent/_search
{
  query: {
bool: {
  must: [
{
  top_children: {
type: child,
query: {
  match_all: {}
}
  }
}
  ]
}
  }
}



Also, i tried changing the _id of the parent like that  :

script: ctx._type = ctx.document.estype; if (ctx._type == 'fils') { 
ctx._parent = ctx.document.parent_id; } else { ctx._id = 
ctx.document.parent_id; } 


But in this case, the split in the right type doe'nt works et so the 
relation parent/child neither.

Any ideas ? I found on this link, but nothing about my problems :
https://github.com/richardwilly98/elasticsearch-river-mongodb/tree/master/manual-testing/issues/64

Have a good day,

-- 
Ludovic M

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fba13a13-afaa-464f-ad82-b57c8e86197b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Understanding HEAP usage

2014-10-17 Thread karthik jayanthi

Hi,

We are using Elasticsearch for one of our applications. As a part of which
we indexed about 3M documents and have built two indices around them. We
have used a cluster of 2 Nodes each with 7.5 GB RAM and have dedicated 4 GM
to the ES.

What we are seeing is that on one of the nodes, the amount of HEAP using by
ES is more that 60% allocated even though the most obvious ones like
filter-cache, field-data cache etc are pretty low to almost zero. So I am
trying to understand who else could be consuming the memory from ES. Any
pointers on what else should I be looking at.

Here is the snapshot of the same from elasticHQ:

Cache ActivityField Size:0.00.0Field Evictions:00Filter Cache Size:24.0B
24.0BFilter Evictions:0 per query0 per queryID Cache Size:% ID Cache:0%0%
MemoryTotal Memory:7 gb7 gbHeap Size:4 gb4 gbHeap % of RAM:54.5%54.5%%
Heap Used:66.3%26%GC MarkSweep Frequency:0 s0 sGC MarkSweep Duration:0ms0msGC
ParNew Frequency:0 s0 sGC ParNew Duration:0ms0msG1 GC Young Generation Freq:0
s0 sG1 GC Young Generation Duration:0ms0msG1 GC Old Generation Freq:0 s0

Thanks,
Karthik

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7PaHwAGgVj%3DL4hyJCJh--Z6pj5P7%3DVpCp-Mbf9MGyuvzOC-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Filter by specific value without mapping

2014-10-17 Thread Vladimir Krylov

Tried to remove papping and make not_analizable
curl -XPUT http://$HOST:9200/reports; -d'
{
mappings: {
_default_: {
dynamic_templates: [
{
store_generic : {
match : *,
match_mapping_type : string,
mapping : {
type : string,
index : not_analyzed
}
}

}
]
}
}
}'

But Easy filtering returns empty results for approved, not approved or
. Search example

curl -X GET 'http://localhost:9200/reports/_search?pretty' -d '{
filter:{
term:{
general.approval: approved
}
}
}
'

Should I do other filtering syntax for this case?

On Tuesday, October 7, 2014 5:17:02 PM UTC+3, Ivan Brusic wrote:

The field do not need a custom analyzer, they just need to be simply
marked as non_analyzed.

You can setup a dynamic template that states any new field should be non
analyzed.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates

You can still hardcode the mapping for specific fields.

Cheers,

Ivan
On Oct 7, 2014 2:57 AM, Vladimir Krylov s6n...@gmail.com javascript:
wrote:

What I'm trying to do is to get data by filtering term with exact
matching. I have ES 1.3.2 and I cannot do mapping, as attributes are
dynamic (different users has different attributes). My data:

{ id: 111,
org_id: 11,
approval: approved,
...
}

{ id: 112,
org_id: 11,
approval: not approved,
...
}

This request returns results:

curl -X GET 'host:9200/data/_search?pretty' -d '{
filter:{
term:{
approval:approved
}
}
}

But this not:

curl -X GET 'host:9200/reports/_search?pretty' -d '{
filter:{
term:{
approval:not approved
}
}
}

It's a dup of ticket
https://github.com/elasticsearch/elasticsearch/issues/8006#issuecomment-58160111

As I was followed here.

David Pilato proposed that index has indexed probably not and approved
and that there is no exact matching to not approved. I tried to search
for word not and it works

curl -X GET 'host:9200/reports/_search?pretty' -d '{
filter:{
term:{
approval:not
}
}
}

So, how can I filter by exact word matching not approved?

https://groups.google.com/d/msgid/elasticsearch/6ea46153-68a7-481f-9064-9e094094bf29%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4cc3fe20-296c-4a63-af9c-ec1516e3f54e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Set _score field value in Elasticsearch

2014-10-17 Thread Kruti Shukla

Hello All,

I'm trying achieve one functionality in Elasticsearch but I'm not able to 
do it.

In SQL we can do it like -- select SET score_1 = _score from sometable

I trying to assign value of score in one field. That means Elastic search 
will return 2 columns having same values _score and _score1.

I have already tried custom score but it changes the value of _score column 
it self.I DO NOT WANT TO CHANGE . 
I'm already happy with the score returned in _score  field.
I want to have same value of _score column in another column for example 
score_1. 

I want to do same in Elasticsearch.

Is it possible? Is there any functionality provided in elasticsearch?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/59487bae-f12b-42dd-b193-71422f48fcce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

APT repository sync

2014-10-17 Thread Yapeng Wu

Hi,

Can someone point me in the right direction for running a local mirror of 
the elasticsearch APT repositories? Specifically, is there an rsync 
connection available?

Thanks!

Yapeng

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5259914a-2dfb-4327-96ec-aadd8b3a2581%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Understanding HEAP usage

2014-10-17 Thread Nikolas Everett

Measuring heap usage in Java applications is very different than measuring
memory usage for other stuff.
1. Usually java allocates all the heap its going to need up front at
startup. At least, we do that in server applications.
2. Java's garbage collection is very lazy so heap usage will go up slowly
with time. If you zoom out it'll look like a saw tooth.

So its perfectly normal for on server to be using more heap than another
because it is at a different place in the saw tooth. Its interesting to
compare the depth of the valleys in the saw tooth and the time between
peaks. There are other interesting things you can look at to but one
snapshot of % heap used isn't one of them.

Nik

On Fri, Oct 17, 2014 at 6:27 AM, karthik jayanthi
karthikjayanthi.i...@gmail.com wrote:

Hi,

What we are seeing is that on one of the nodes, the amount of HEAP using
by ES is more that 60% allocated even though the most obvious ones like
filter-cache, field-data cache etc are pretty low to almost zero. So I am
trying to understand who else could be consuming the memory from ES. Any
pointers on what else should I be looking at.

Here is the snapshot of the same from elasticHQ:

Cache ActivityField Size:0.00.0Field Evictions:00Filter Cache Size:24.0B
24.0BFilter Evictions:0 per query0 per queryID Cache Size:% ID Cache:0%0%
MemoryTotal Memory:7 gb7 gbHeap Size:4 gb4 gbHeap % of RAM:54.5%54.5%%
Heap Used:66.3%26%GC MarkSweep Frequency:0 s0 sGC MarkSweep Duration:0ms
0msGC ParNew Frequency:0 s0 sGC ParNew Duration:0ms0msG1 GC Young
Generation Freq:0 s0 sG1 GC Young Generation Duration:0ms0msG1 GC Old
Generation Freq:0 s0

Thanks,
Karthik

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7PaHwAGgVj%3DL4hyJCJh--Z6pj5P7%3DVpCp-Mbf9MGyuvzOC-w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7PaHwAGgVj%3DL4hyJCJh--Z6pj5P7%3DVpCp-Mbf9MGyuvzOC-w%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0XKZfgcWkkHO5KsKrHVDz6j84CJovFkiJxOvin3Y8ZGg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: copy index

2014-10-17 Thread joergpra...@gmail.com

You can use the knapsack plugin for export/import data and change mappings
(and much more!)

For a 1:1 online copy, just one curl command is necessary, yes.

https://github.com/jprante/elasticsearch-knapsack

Jörg

On Thu, Oct 16, 2014 at 7:55 PM, euneve...@gmail.com wrote:

 Hi

 I can see there are lots of utilities to copy the contents of an index
 such as
 elasticdump
 reindexer
 streames
 etc

 And they mostly use scan scroll.

 Is there a single curl command to copy an index to a new index?

 Without too much investigation it looks like scan scroll requires repeated
 calls?

 Can you please confirm?

 If this is the case what is the simplest supported utility?

 Alternatively is there a plugin with front end to choose from and to index?

 Thanks in advance

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1caeebf5-44de-4eba-ad5a-c702461bf3d2%40googlegroups.com
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHaUo1mF5xjjyvObT7MoXkiu20WrN1kJi-uPt1oOFdKEA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch spark esRDD not returing the aggregate values in aggregated query

2014-10-17 Thread Jeff Steinmetz

Siva,

Try the latest build of elasticsearch-hadoop,  ver 2.1.0 Beta 2
http://www.elasticsearch.org/overview/hadoop/download/

The esRDD has been changed to sparks PairRDD
https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

The RDD will now be key/value (tuples) that look like (String, Map[String, 
ANY])

so you could start to walk the json key/value hierarchy with something like:

esRDD.flatMap { args = args._2.get(aggregations) }

(the syntax above is not exact, since your specific query result may have a 
different first key/value pair as the first object )

Best,
Jeff Steinmetz
Director of Data Science
Ekho, Inc.
www.ekho.me
@jeffsteinmetz


On Wednesday, September 17, 2014 6:13:37 AM UTC-7, siva pradeep wrote:

 Hi,

 I have a query which filters the rows and then applies the aggregation. I 
 tried running the query in Sense it gave me the expected result. But when 
 I try to run the same query using elasticsearch-spark_2.10 I get the rows 
 filtered by the query but not the aggregation result. I am sure I am 
 missing some thing but unable to figure out that.

  Here is the query

 GET _search
 {
   query : {
 bool: {
   must: [
 {
   filtered: {
 query: {
   range: {
 @timestamp: {
   from: 2014-09-03T01:40:37.437Z,
   to: 2014-09-03T01:45:11.437Z
 }
   }
 }
   }
 }
   ]
 }
   },
   
   size: 0,
   
   fields: [cid,entity],
   aggs: {
 cid: {
   terms: {
 field: cid,
 min_doc_count: 2,
 size: 100
   },
   
   aggs: {
 tn: {
   terms: {
 field: entity
   }
 }
   }
 }
   }
 }


 Query Result:

 {
took: 10005,
timed_out: false,
_shards: {
   total: 10,
   successful: 10,
   failed: 0
},
hits: {
   total: 2430,
   max_score: 0,
   hits: []
},
aggregations: {
   cid: {
  buckets: [
 {
key:  01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168 
 javascript:,
doc_count: 2,
tn: {
   buckets: [
  {
 key: 15052563268,
 doc_count: 2
  }
   ]
}
 }
  ]
   }
}
 }


 Spark program :

 object PresenceFilter extends App {

   val query: String = {\n\n  \query\ : {\n\n\bool\: {\n\n  
 \must\: [\n\n{\n\n  \filtered\: {\n\n
 \query\: {\n\n  \range\: {\n\n
 \@timestamp\: {\n\n  \from\: 
 \2014-09-03T01:40:37.437Z\,\n\n  \to\: 
 \2014-09-03T01:45:11.437Z\\n\n}\n\n  
 }\n\n}\n\n  }\n\n}\n\n  ]\n\n}\n\n  
 },\n  \n  \size\: 0,\n  \n  \fields\: [\cid\,\entity\],\n\n  
 \aggs\: {\n\n\cid\: {\n\n  \terms\: {\n\n\field\: 
 \cid\,\n\n\min_doc_count\: 2,\n\n\size\: 100\n\n  
 },\n  \n  \aggs\: {\n\n\tn\: {\n\n  \terms\: 
 {\n\n\field\: \entity\\n\n  }\n\n}\n\n  
 }\n\n}\n  }\n\n}

   val sparkConf = new SparkConf()
 .setAppName(PresenceAnalysis)
 .setMaster(local[4])
 .set(es.nodes, prs-wch-10.sys.comcast.net)
 .set(es.port, 9200)
 .set(es.resource, spresence-2014.09.03/presence)
 .set(es.endpoint, _search)
// .set(es.query, query)
   val sc = new SparkContext(sparkConf)


 sc.esRDD.count   returns 2430 rows

 How do I get the aggregation part (the following part) of the result into 
 the program
   
  aggregations: {
   cid: {
  buckets: [
 {
key:  01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168 
 javascript:,
doc_count: 2,
tn: {
   buckets: [
  {
 key: 15052563268,
 doc_count: 2
  }
   ]
}
 }

 Please advise.

 Thanks,
 Siva P


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01d05c62-3095-4d9e-9407-4357add26896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sorting by nested fields

2014-10-17 Thread Elke

Has somebody another idea? Or it is not possible at all?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d1748088-0ceb-409d-9e42-deffa314f0e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Minimum double score in a native script

2014-10-17 Thread Alejandro Sierra

Hi!,

I am writing a Java plugin with a customized score script (native) 
returning a double. Basically I wrote a class extending 
AbstractDoubleSearchScript. 
For some documents which don't pass a specific test, the score should be 
the lowest possible, meaning they should be at the bottom of the results.
Its is hard for me to find a lower bound for my scores, since they are 
logarithms of probabilities. (the theoretical lower bound is log(0))
I have tried returning in the runAsDouble() method 
  Double.NEGATIVE_INFINITY and also 
  (-Double.MAX_VALUE)
since the Double.MIN_VALUE is not actualy the minimum negative value (I 
guess the name of that constant is not consistent with the one for 
Integer.MIN_VALUE but that's a different story).

When I return the aforementioned constants I get an error:
java.lang.IllegalArgumentException: docID must be = 0 and  
maxDoc=58514550 (got docID=2147483647)
at 
org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182)
at 
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109)
at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:196)
at 
org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:228)
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156)
at 
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:340)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:308)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:305)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)


I am using ES 1.2.0 on a single machine. and the query is formed like this:

{
  query : {
function_score : {
  query : {
//some filters
  },
  script_score : {
script : my_script,
lang : native,
params : {
  //some parameters
}
  },
  score_mode : first,
  boost_mode : replace
}
  }
}

Cheers

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/67ebe35f-d58c-4890-aacb-b7647fcde75a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to count tuples of 3 variables, sorted

2014-10-17 Thread Artur Martins

Greetings community,

I'm new to elasticsearch, so first of all sorry for my questions being so 
basic.

I developed a flow collector which dumps flows to my elasticsearch server. 
Right now i use Kibana to perform the Top 10 destination and Top 10 source 
IPs filters, and such.
But the query I'm having more difficulties about is knowing the Top 10 
combination of (source + dest + dest_port) so that I can know what the top 
flows are, and from which IPs and to which destinations and protocols.

Example:

{
 aggs:{
 tupulo_teste:{
 value_count:{
 field:SRC_ADDR,
 field:DST_ADDR,
 field:DST_PORT
 }
 }
 }
 }


This does not compute all combinations of (SRC_ADDR, DST_ADDR, DST_PORT) 
nor even sort it giving the Top10 hits. If you are familiar with splunk, I 
need the equivalent of *stats count by a,b,c | sort 10 -count*

I've tried:

 {
 aggs:{
 src:{
 terms:{field: SRC_ADDR},
 aggs:{
 dst:{
 terms:{field: DST_ADDR},
 aggs:{
 dstprt:{
 terms:{field: 
 DST_PORT}
 }
 }
 }
 }
 }
 }


but this produces a strange and long combination, also without sorting.

Can someone please help me on how to do this result combination, with a 
sort by occurence count?

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2c7bad6-dbd7-4edd-b3bd-a9cc6018e7a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kibana group by terms

2014-10-17 Thread Michael Irwin

I'm using Kibana w/ logstash to view web server logs. I'd like to add a 
graph that displays uniques of the *entire* User-Agent string. I've tried 
adding a terms graph, but that breaks the UA string into separate words, 
which is less than desirable in this situation. Is there a way to do this?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5a5f529e-e326-4a29-aa0f-d656a37848d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

managing snapshots

2014-10-17 Thread Matthias Johnson

I'm investigating snapshots and came across some things that aren't clear
in the docs.

My understanding is that the snapshots are incremental and only transfer
things that were changed since the last snapshot. (Is that shards, lucene
stuff, something else ???)

One thing that isn't clear is if I create the following

:9200/_snapshot/es_snapshots/snap_yesterday
:9200/_snapshot/es_snapshots/snap_today

Can I restore snap_yesterday to get that state back OR snap_today for
today's snapshot? I read in the groups that a new snapshot would replace
the change things

That also leads to the question of how to manage the snapshots. Is there
some cleaning I need to (or can) do. If so, then how can I ensure that the
state of a snapshot is usable if I delete older ones.

Lastly I was thinking about the idea of multiple snapshots. So in the
examples above I might replace the es_snapshots with a date such as:

:9200/_snapshot/yesterday/snap_1
:9200/_snapshot/yesterday/snap_2
:9200/_snapshot/today/snap_1
:9200/_snapshot/today/snap_2

Then I could delete yesterday at some point (or older than X days).

Are there any thoughts around that or am I really misunderstanding things?

\@matthias

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/914c2aa7-cfc0-4311-8de3-adf010a54363%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Get only ids with no source Java API

2014-10-17 Thread Ivan Brusic

Have you tried setting no fields to be returned or the explicit
setNoFields() method?

http://jenkins.elasticsearch.org/job/Elasticsearch%20Master%20Branch%20Javadoc/Elasticsearch_API_Documentation/org/elasticsearch/action/search/SearchRequestBuilder.html#setNoFields()

-- 
Ivan

On Thu, Oct 16, 2014 at 2:45 AM, Ilija Subasic subasic.il...@gmail.com
wrote:

 Is there a way in elasticsearch using JAVA API to get only the ids of the
 documents returned for a give query.

 SearchResponse sr =
  esClient.prepareSearch(index).setSize(resultSize).setQuery(q).setScroll(new
 TimeValue(1)).setQuery(fqb).setFetchSource(false).get();

 but I get empty hits (`sr.getHits().hits[].length == 0`) althouh the total
 count of returned hits is 0 (`sr.getHits().getTotalHits == 2`). I
 understand that nothing is returned by elasticsearch because I set fetch
 source to false, but the ids should somehow be available. My current
 solution is:

 SearchResponse sr =
  esClient.prepareSearch(index).setSize(resultSize).setQuery(q).setScroll(new
 TimeValue(1)).setQuery(fqb).srb.setFetchSource(_id, null).get();

 However I think that gets the _id field from source, and for speed I would
 like to avoid this if possible.

 Thanks,
 Ilija

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/089b670f-763c-4795-859a-720767d24a81%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/089b670f-763c-4795-859a-720767d24a81%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCYskSm7WyTDX5LVCrcL%2BR5y%2B2e9fUBTH0Z8iamu06OBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling strategies without shard splitting

2014-10-17 Thread Ian Rose

Hey Nik -

Thanks for the response.

- Ian

On Mon, Oct 13, 2014 at 4:28 PM, Nikolas Everett nik9...@gmail.com wrote:

On Mon, Oct 13, 2014 at 11:12 AM, Ian Rose ianr...@fullstory.com wrote:

Hi -

My team has used Solr in it's single-node configuration (without
SolrCloud) for a few years now. In our current product we are now looking
at transitioning to SolrCloud, but before we made that leap I wanted to
also take a good look at whether ElasticSearch would be a better fit for
our needs. Although ES has some nice advantages (such as automatic shard
rebalancing) I'm trying to figure out how to live in a world without shard
splitting. In brief, our situation is as follows:

- We use one index (collection in Solr) per customer.
- The indexes are going to vary quite a bit in size, following something
like a power-law distribution with many small indexes (let's guess 250k
documents), some medium sized indexes (up to a few million documents) and a
few large indexes (hundreds of millions of documents).
- So the number of shards required per index will vary greatly, and will
be hard to predict accurately at creation time.

How do people generally approach this kind of problem? Do you just make
a best guess at the appropriate number of shards for each new index and
then do a full re-index (with more shards) if the number of documents grows
bigger than expected?

I'm in a pretty similar boat and have done just fine without shard
splitting. I maintain the search index for about 900 wikis
http://noc.wikimedia.org/conf/all.dblist. Each wiki gets two
Elasticsearch indexes and those indexes vary in size, update rate, and
query rate a ton. Most wikis get a single shard for all of there indexes
but many of them use more
https://git.wikimedia.org/blob/operations%2Fmediawiki-config.git/747fc7436226774d1735775c2ef41c911d59b5d2/wmf-config%2FInitialiseSettings.php#L13828.
I basically just guestimated and reindexed the ones that were too big into
more shards.

We have a script that creates a new index with new configuration and then
copies all the document from the old index to the new one and then swap the
aliases (that we use for updates and queries) to the new index. Then it
re-does any updates or deletes that occurred since copy script started.
Having something like that is pretty common. I rarely use it to change
sharding configuration - its much more common that I'll use it to change
how a field in the document is analyzed.

Elasticsearch also has another way to handle this problem (we don't use it
for other reasons) where you create a single index for all customers and
then filter them at query time. You also add routing values to your
documents and queries so all documents from the same customer get routed to
the same shard. That way you can serve queries for a single customer out
of one shard which is pretty cool. For larger customers that don't fit on
a single shard you still create indexes just for them.

One thing to watch out for, though, is that Elasticsearch doesn't use the
shard's size when determining where to place the shard. It'll check to
make sure the shard won't fill the disk beyond some percentage but it won't
try to spread out the large shards so you can get somewhat unbalanced disk
usage. I have an open pull request for something to do that so probably
won't be true forever but it is true for now.

How big are your documents and how frequently do you think you'll need
shard splitting? If your documents are pretty small you may be able to get
away with just reindexing all of them for the customer when you need more
shards like I do. It sure isn't optimal but it gets the job done.

Another way to do things is once your customers get too big you create a
new index and route all of their new data there. You have to query both
indexes. This is _kindof_ how people handle log messages and it might
work, depending on your use case.

Nik

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/5JTYFC93jS8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd051yRH2AiG7ZsSPR_zD2a%3DMfaRcWFywyPfsfSPsyBf4Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd051yRH2AiG7ZsSPR_zD2a%3DMfaRcWFywyPfsfSPsyBf4Q%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

Re: Filters: odd behavior

2014-10-17 Thread Ivan Brusic

They are indeed executed in the defined order. Filters that are more
specific should be placed early on and those that cannot be cached
(geo/timebased) should be placed last.

Cheers,

Ivan

On Thu, Oct 16, 2014 at 5:16 AM, @mromagnoli marce.romagn...@gmail.com
wrote:

Hi everyone,
I have a doubt about Filters.

If I have more than one filter, in a filtered query, are they executed in
the defined order? And, are they filtering in a 'chain' mode, i.e. using
the results of the previous filters?

Thanks in advance as always.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d528067f-5042-4667-bcbc-38dcde87010a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d528067f-5042-4667-bcbc-38dcde87010a%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQASJatGfPg2kP%3D8soiHvvxKDZKJ6qkK0FyfZT4B2x_7Qw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

elasticsearch fields and elasticsearch-hadoop

2014-10-17 Thread Akil Harris

Is there an easy way to rename the fields on an index?

I have a field named searchTerm that I use for some event tracking. But 
the elasticsearch-hadoop library assumes all elasticsearch fields are 
lowercase and is converting all field names to lower case. When hadoop 
tries to retrieve the data from the index the field doesn't match and I 
just get a null value back.  So the question is can I rename this field 
from searchTerm to search_term or searchterm in some easy way? Or do 
I need to setup a new index pull all the records in the current index 
rename the fields to lowercase and insert into the new index.

Thanks,

Akil

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3a8c91ba-4e7a-4a47-9593-f371502c310a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Filters: odd behavior

2014-10-17 Thread Alexandre Rafalovitch

And there is post-filter as well:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html

Regards,
Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On 17 October 2014 16:27, Ivan Brusic i...@brusic.com wrote:
They are indeed executed in the defined order. Filters that are more
specific should be placed early on and those that cannot be cached
(geo/timebased) should be placed last.

Cheers,

Ivan

On Thu, Oct 16, 2014 at 5:16 AM, @mromagnoli marce.romagn...@gmail.com
wrote:

Hi everyone,
I have a doubt about Filters.

If I have more than one filter, in a filtered query, are they executed in
the defined order? And, are they filtering in a 'chain' mode, i.e. using the
results of the previous filters?

Thanks in advance as always.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d528067f-5042-4667-bcbc-38dcde87010a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEFAe-GLae4nZ7hFLxfGs9tQ%2BC2SD%2Bvr7CsXv97sFcwEcsnpTg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Future of cardinality aggregation feature.

2014-10-17 Thread Govind M

Guys,

I see that the cardinality aggregation feature is marked as experimental 
feature. We are using this feature and feel it is very useful.
But would like to how if this feature will be supported going forward or 
any chance of getting removed?

Thanks in advance.

Regards,
-G

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/03407a12-47cd-4fe4-b0c3-90605d32d220%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling strategies without shard splitting

2014-10-17 Thread joergpra...@gmail.com

In my use case I have indexed a union catalog for some hundred libraries,
where each library can have a search service, plus adding their own catalog
data they do not want to share.

Elasticsearch offers far more flexibility and performance than Solr with
the ability of automatic extending the cluster by adding nodes (without
configuration change) combined with automatic rebalancing of shards, plus
the feature of index aliases and shard over-allocation, an explanation is
here:
http://elasticsearch-users.115913.n3.nabble.com/Over-allocation-of-shards-td3673978.html

With index aliases, I do not have to perform evil things like shard
splitting. No index copy required, no full re-index.

That is, I can organize some library catalog index over the machines, and
address an index view for each library by assigning several index aliases
(e.g. collection names or library identifiers) to the library catalog
segments they are interested in, with term filters. Index updates come from
a single point of a primary data base plus data packages the libraries can
upload. If the number of input data exceeds the capacity, I can simply
start a new node, without touching the configuration.

Also, releasing new index versions is a snap with Elasticsearch. The index
names carry timestamp information (e.g. ddMMyyHH) and it is easy to
organize index versions like rolling windows, with the latest index being
the current one to search. Old indices are dropped if the are no longer
needed.

Jörg

On Mon, Oct 13, 2014 at 8:12 PM, Ian Rose ianr...@fullstory.com wrote:

Hi -

How do people generally approach this kind of problem? Do you just make a
best guess at the appropriate number of shards for each new index and then
do a full re-index (with more shards) if the number of documents grows
bigger than expected?

Thanks!
- Ian

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ded96e32-e1f1-4d09-8356-7367c86b1166%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ded96e32-e1f1-4d09-8356-7367c86b1166%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHWv1bNZ571cu64VArC-H9cZ60snV8qRuPcj4JCqsVrBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reduce Disk Space Requirements

2014-10-17 Thread PARTH GANDHI

Details:
Elastic Search version used: 1.3.4
Docs to index: ~ 2.2 Million
Growth in docs: few 100 docs every week.
Number of fields per doc: ~10-15
tokenizers used: ngram (min:2, max:15), path_hierarchy
filters used: word_delimiter, pattern_capture, lowercase, unique
Size on disk: ~ 150 GB (No replicas are active)

Problem:
Unfortunately, I don't have the luxury of a lot of free disk space at my
disposal.
Why? [Let me just say I work for a too big-to-fail organizations, if you
know what I mean :-)]
I need to reduce my index storage footprint by at least 50%.

Solutions tried:
1. run _flush _optimize on the index. Didn't affect the size on disk.
2. decrease the number of primary shards from 5 to 2 (realized this is a
useless attempt as number of shards doesn't affect disk space)
3. Looked into archiving the index after closing (can't use this solution
as I want our users to search through all of the 2.2 Million docs, so can't
archive partial docs)

Can you guys suggest any other options to reduce index disk size?
Your inputs are much appreciated.

Thanks,
Parth Gandhi

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Reduce Disk Space Requirements

2014-10-17 Thread joergpra...@gmail.com

ngram min=2 kills your index space. Use min=3 or higher. Also maybe edge
ngram tokenizer might be an alternative.

Jörg

On Sat, Oct 18, 2014 at 12:06 AM, PARTH GANDHI parth.gandh...@gmail.com
wrote:

Can you guys suggest any other options to reduce index disk size?
Your inputs are much appreciated.

Thanks,
Parth Gandhi

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e343c173-da25-4281-8909-cea62cfdf6f3%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvF%2BnExCR-%3DCr5Z1zdMQdMvaNbNw3q44Gg2_sZTZgJQA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 recommended version?

2014-10-17 Thread Adrien Grand

Hi Jilles,

1.7u55 has indeed be the recommended version for a long time, but JDK 8u25
is fine too. The page that you linked is from elasticsearch-hadoop and
might be a bit outdated, we are trying to keep up to date information about
recommended JVMs at the following URL:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html#jvm-version

For reference we are also trying to improve our startup scripts so that
they would fail to start if you are using a JVM with known issues. See for
instance https://github.com/elasticsearch/elasticsearch/pull/7580

On Thu, Oct 16, 2014 at 1:03 PM, Jilles van Gurp jillesvang...@gmail.com
wrote:

I know JDK 7u55 was labeled as OK some time ago and this is still listed
as the official requirement:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/requirements.html

However, time has moved on and I was wondering what the testing status and
advice is for more recent JDKs.

Particularly, I'd like to know whether Oracle JDK 8u25 safe for production
use (on centos 7)? We've used JDK 8u20 without issues on our dev servers
but it would be nice to have some guidance on this since we are moving to
production soon with this. The reason we're using Java 8 is because we are
using that for our apps as well and it is kind of nice to have just one jdk
to worry about. Also, I suspect there may be some perfomance benefits given
the amount of change that went into e.g. hotspot.

In general, an overview of common vms and status with respect to
elasticsearch would be nice to have somewhere. There are quite a few
different suppliers of vms at this point and picking one seems to be a bit
of a black art leap of faith currently. There's Openjdk, oracle jdk,
Azul's Zulu (essentially openjdk as far as I know), and Azul's Zulu
Enterprise. You can get each of these for Java 6, 7, and 8. Especially for
openjdk, it also matters how it was built.

Jilles

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c3ecaa35-b1cb-47a5-8ee9-5bca711c9b38%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c3ecaa35-b1cb-47a5-8ee9-5bca711c9b38%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j64dH-%2Bp3YE7J9GNOt9JGb%3DeWsuE-4xX1YiUpNHoQKgPw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Multi Field Aggregation

2014-10-17 Thread Artur Martins

Hello,

I'm having the exact same problem.
Have you managed to find a solution?

My thread is here: LINK
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/Oum03VSBzHQ

Thanks

On Thursday, October 16, 2014 1:57:35 PM UTC+1, Alastair James wrote:

Hi there.

I am trying to create an aggregation that mimics the following SQL query:

SELECT col1, col2, COUNT(*), SUM(metric) FROM table GROUP BY col1, col2
ORDER BY SUM(metric) DESC

On the face of it, I could create an terms aggregation for col1, add a
terms aggregation for col2 inside it, and the metric aggregations inside
that. I could then dynamically build the SQL result like grid and sort it
myself. However this breaks down for large results set, or a paginated
result set of a larger result.

The problem is that the ES aggregation system always returns the top N
results for each parent and child bucket. Thus for each value of col1 I
have N values of col2.

What I really want is to consider all possible combinations of col1 and
col2 in the same way as SQL does it and return the top N based on some
other metric. E.g. in ES speak, a single aggregation where the keys are
tuples of (col1, col2).

I suppose one way would be to use a script terms aggregation to
concatenate each value of col1 and col2, however thats going to be slow.

Does anyone else have any ideas?

Ideally there would be a tuple aggregation built in, e.g.:

my_agg:{
tuple:{
fields:[col1,col2]
}
}

Would product keys that are objects like:

{
col1:value1,
col2:value2
}

Does anyone know if this would be possible to write as a plugin?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/713127be-b89e-42ee-8811-18dd0e31d16a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: best practice for thread pool queue size

2014-10-17 Thread Zaki Agha

Yes the particular error is from July.
How can I determine the optimal setting for queue size?

On Monday, October 13, 2014 3:21:32 PM UTC-7, Mark Walkom wrote:

 Increasing queues isn't going to help if there are underlying problems 
 stopping the processing.

 Based on those errors it looks like you may have network issues, but they 
 are from July?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com

 On 14 October 2014 08:16, Zaki Agha za...@roblox.com javascript: 
 wrote:

 Hi
 We have several elastic search clusters
 Recently we faced an issue in which one of our nodes experienced queueing.
 In fact, the queue length was greater than 1000.
 Subsequent requests were rejected as the queue was full.

 Should we increase the default queue size?

 I understand that there are several queue's within elastic search.


1. 

Queues in Elastic Search
1. 
   
   Index 
   
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#modules-threadpool
- 
   default 200
   2. 
   
   Bulk  - default 50
   3. 
   
   Get   - default 1000
   4. 
   
   Search - default 1000
   5. 
   
   Suggest   - default 1000
   6. 
   
   Percolate - default 1000
   7. 
   
   ThreadPool queue_size 
   
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#_literal_fixed_literal:
  
1000
   



 Errors:


1. 

Error # 1

 [[LApp45][SiyuJOHVRRG1udLiFwM9Yw][es1][inet[/xxx.xxx.xxx.xxx:9300]]], id 
 [84124759]

 [2014-07-13 04:13:35,332][WARN ][transport] 

 [es2] Received response for a request that has timed out, 

 sent [55372ms] ago, 

 timed out [25372ms] ago, 

 action [discovery/zen/fd/ping], 

 node 

 [2014-07-13 04:13:35,332][WARN ][transport] 

 [es2] Received response for a request that has timed out, 

 sent [55372ms] ago, 

 timed out [25372ms] ago, 

 action [discovery/zen/fd/ping], 

 node 

 [[LApp37][FKVv20F4RSiEsxJ4Bo8rMA][es3][inet[/xxx.xxx.xxx.xxx:9300]]], id 
 [80874233]

1. 

Error # 2

 [2014-07-13 06:28:26,043][WARN ]

 [transport] 

 [es2] Received response for a request that has timed out, 

 sent [55795ms] ago, 

 timed out [25795ms] ago, 

 action [discovery/zen/fd/ping], 

 node 

1. 

Error # 3

 [2014-07-13 06:28:26,049][WARN ][transport] 

 [es2] Received response for a request that has timed out, 

 sent [56023ms] ago, 

 timed out [26023ms] ago, 

 action [discovery/zen/fd/ping], 

 node [[es3][FKVv20F4RSiEsxJ4Bo8rMA][es3][inet[/xxx.xxx.xxx.xxx:9300]]], 
 id [84124758]

1. 

Error # 4

 There are several errors of this type all for the same index 
 aggregated_user_game_points

 [2014-07-13 06:28:26,153][DEBUG][action.search.type   ]

 [es2] [aggregated_user_game_points][3], node[8qI5LGo2TxG1S-mQUgEA_w], 
 [P], s

 [STARTED]: Failed to execute 
 [org.elasticsearch.action.search.SearchRequest@3367563e] 
 lastShard [true]

 org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: 
 rejected 
 execution (queue capacity 1000) on   

 org.elasticsearch.action.search
 .type.TransportSearchTypeAction$BaseAsyncAction$4@71bd1bf
 omitted the rest of the error message

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/9cc3b7a1-2b2c-4eec-b3e2-85593b021123%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/9cc3b7a1-2b2c-4eec-b3e2-85593b021123%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/12e46524-3f8f-4a1e-90d7-5ae4f4c3a191%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Elasticsearch CSV plugin for formatting search responses as CSV

2014-10-17 Thread Artur Martins

This is priceless. Thank you.

On Wednesday, July 16, 2014 12:23:11 AM UTC+1, Jörg Prante wrote:

 Hi,

 I wrote a little plugin for formatting search responses as CSV (comma 
 separated values)

 This format is useful for extracting some (or all) fields from ES JSON and 
 wrap it into a tabular display, e.g. for exporting them to spreadsheet 
 tools.

 More info:

 https://github.com/jprante/elasticsearch-csv

 In the hope it's useful,

 Jörg


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/45604748-47dd-4203-853b-8c64ec93f7b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

word delimiter

2014-10-17 Thread Nick Tackes

Hello, I am experimenting with word_delimiter and have an example with a 
special character that is indexed.  The character is in the type table for 
the word delimiter.  analysis of the tokenization looks good, but when i 
attempt to do a match query it doesnt seem to respect tokenization as 
expected.  
The example indexes 'HER2+ Breast Cancer'.  Tokenization is 'her2+', 
'breast', 'cancer', which is good.  searching for 'HER2\\+' results in a 
hit, as well as 'HER2\\-'

#!/bin/sh
curl -XPUT 'http://localhost:9200/specialchars' -d '{
settings : {
index : {
number_of_shards : 1,
number_of_replicas : 1
},  
analysis : {
filter : {
special_character_spliter : {
type : word_delimiter,
split_on_numerics:false,
type_table: [+ = ALPHA, - = ALPHA]
}
},
analyzer : {
schar_analyzer : {
type : custom,
tokenizer : whitespace,
filter : [lowercase, special_character_spliter]
}
}
}
},
mappings : {
specialchars : {
properties : {
msg : {
type : string,
analyzer : schar_analyzer
}
}
}
}
}'

curl -XPOST localhost:9200/specialchars/1 -d '{msg : HER2+ Breast 
Cancer}'
curl -XPOST localhost:9200/specialchars/2 -d '{msg : Non-Small Cell Lung 
Cancer}'
curl -XPOST localhost:9200/specialchars/3 -d '{msg : c.2573TG NSCLC}'

curl -XPOST localhost:9200/specialchars/_refresh

curl -XGET 'localhost:9200/specialchars/_analyze?field=msgpretty=1' -d 
HER2+ Breast Cancer
#curl -XGET 'localhost:9200/specialchars/_analyze?field=msgpretty=1' -d 
Non-Small Cell Lung Cancer
#curl -XGET 'localhost:9200/specialchars/_analyze?field=msgpretty=1' -d 
c.2573TG NSCLC

printf HER2+\n
curl -XGET localhost:9200/specialchars/_search?pretty -d '{
query : {
match : {
msg : {
query : HER2\\+
   }
}
}
}'

printf HER2-\n
curl -XGET localhost:9200/specialchars/_search?pretty -d '{
query : {
match : {
msg : {
query : HER2\\-
   }
}
}
}'

curl -X DELETE localhost:9200/specialchars

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/becb02b7-72f0-42dd-b347-5f031fa154d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Updating single field in large documents

update mapping

Re: Cluster discovery on Amazon EC2 problem - need urgent help

Re: ElasticSearch- IndexReaders cannot exceed 2147483647

Re: What MongoDB can do and ES cannot?

Re: Using a nested object property within custom_filters_score script

Re: Many indices.fielddata.breaker errors in logs and cluster slow...

Re: Announcing elasticsearch plugin for Liferay - elasticray

River MongoDB-Elasticsearch (parent/child)

Understanding HEAP usage

Re: Filter by specific value without mapping

Set _score field value in Elasticsearch

APT repository sync

Re: Understanding HEAP usage

Re: copy index

Re: ElasticSearch spark esRDD not returing the aggregate values in aggregated query

Re: Sorting by nested fields

Minimum double score in a native script

How to count tuples of 3 variables, sorted

Kibana group by terms

managing snapshots

Re: Get only ids with no source Java API

Re: Scaling strategies without shard splitting

Re: Filters: odd behavior

elasticsearch fields and elasticsearch-hadoop

Re: Filters: odd behavior

Future of cardinality aggregation feature.

Re: Scaling strategies without shard splitting

Reduce Disk Space Requirements

Re: Reduce Disk Space Requirements

Re: Java 8 recommended version?

Re: Multi Field Aggregation

Re: best practice for thread pool queue size

Re: [ANN] Elasticsearch CSV plugin for formatting search responses as CSV

word delimiter

35 matches

Site Navigation

Mail list logo

Footer information