Re: XContentBuilder.copyCurrentStructure() fails with .JsonParseException: Unexpected end-of-input expected close marker for OBJECT

2014-12-19 Thread Bharathi Raja
Hi,
CREATE EXTERNAL TABLE message (
  messageId string,
  messageSize int,
  sender string,
  recipients array,
  messageParts array>,
  headers map
)ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/Loca1';

JSON:
{
"messageId": "34dd0d3c-f53b-11e0-ac12-d3e782dff199",
"messageSize": 12345,
"sender": "al...@example.com",
"recipients": [
"j...@example.com",
"b...@example.com"
],
"messageParts": [
{
"extension": "pdf",
"size": 4567
},
{
"extension": "jpg",
"size": 9451
}
],
"headers": {
"Received-SPF": "pass",
"X-Broadcast-Id": "9876"
}
}


I get JSONParseException with unexpected end-of-input. Could you please 
correct?

On Saturday, October 19, 2013 2:09:30 PM UTC+5:30, Hendrik wrote:
>
> $ curl -XGET 'http://localhost:9200/_search' -d '{
> "query" : {
> "term" : { "user" : "kimchy" }
> }
> '
>
>
>
>
> public class MyRestFilterDoingSpecialThings extends RestFilter {
>   ...
> Override
> public void process(RestRequest request, RestChannel channel,
> RestFilterChain filterChain) { ...
>
> XContentType xContentType = 
> XContentFactory.xContentType(request.content()); //json
> XContentParser parser = 
> XContentFactory.xContent(xContentType).createParser(request.content());
> XContentParser.Token t = parser.nextToken(); 
> //t is START_OBJECT
> XContentBuilder builder = 
> XContentFactory.contentBuilder( xContentType).copyCurrentStructure(parser); 
>  <-- fails with
>
> org.elasticsearch.common.jackson.core.JsonParseException: Unexpected 
> end-of-input: expected close marker for OBJECT (from [Source: [B@29569b73; 
> line: 1, column: 0])
>  at [Source: [B@29569b73; line: 5, column: 64]
> at 
> org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1369)
> at 
> org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:532)
> at 
> org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:465)
> at 
> org.elasticsearch.common.jackson.core.base.ParserBase._handleEOF(ParserBase.java:491)
> at 
> org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:2513)
> at 
> org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:617)
> at 
> org.elasticsearch.common.jackson.core.base.GeneratorBase.copyCurrentStructure(GeneratorBase.java:401)
> at 
> org.elasticsearch.common.xcontent.json.JsonXContentGenerator.copyCurrentStructure(JsonXContentGenerator.java:310)
> at 
> org.elasticsearch.common.xcontent.XContentBuilder.copyCurrentStructure(XContentBuilder.java:1035)
>
> What did i wrong?
> Thanks
> Hendrik
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/162685e1-d872-474c-a30f-a651bf7ceb5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


index design for web activity

2014-12-19 Thread Chen Wang
Hey Guys, 
Wanna seek your suggestions on the index design for web activities.
Lets say I have browse data,  online purchase data, and store purchase 
data, and I will need to save a year of them.
For browse data, a year of data is around 80G , online purchase data is 
around 50G, and offline data is around 1T.

I have to do query like, e.g, find all the customers who browsed item A in 
the past X months, and also online purchased B in the past Y month. 
Originally I am using complicated parent/child structure, and that 
sometimes results in very bad performance. and I store all browse 
data/online purchase/store purchase in one index distributed to 7 shards.

I have 7 machines with 128G each, and 1T hard disk.

Now, I am trying to save each of those type of data into its own index, say 
browse_v1, onlinepurchase_v1, storepurchase_v1. Since its time based data, 
how should I decide to break them into monthly , or simply yearly? for 
browse(70G)/online purchase(50G), i think i can just use one index and one 
shard for them,. or should I break them into monthly data instead? breaking 
into monthly indexes gives me the flexibility of adding/removing data, but 
it also will decrease the query performance, right? (search against 1 index 
now becomes search against 12 indexes).

For store data(1T) apparently I have to break them into at least monthly 
index, but each monthly index still contains around 100G data. With my 
current cluster, how many shards should I allocate to each monthly index? I 
am also concerned about the query performance. 

Then since I am now storing them into separate indexes, to achieve the 
query I want, I will need to do application level join. Is this the common 
way to handle such user case?

I know I should perform some testing first, but hope someone may have 
similar experience in handling this and could provide some guidance.

thanks in advance,
Chen


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2cba8839-2577-4fd7-b1e9-550ae579bb1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart

2014-12-19 Thread iskren . chernev
https://github.com/elasticsearch/elasticsearch-definitive-guide/pull/285

On Friday, December 19, 2014 1:01:53 PM UTC-8, Nikolas Everett wrote:
>
> I believe so.
>

>>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf15c260-866d-431b-9cf2-093ab8881f6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot figure out how to add automatic timestamps when a document is indexed.

2014-12-19 Thread Jef Statham
Got the posting of the mapping figured out I think I was just missing some 
outter braces on my request body. 

now enabling _timestamp on the _source of my document type still didn't 
enable automatic timestamps being added to the documents in Elasticsearch 
1.4.1. 

My mapping now looks like:

{
   
   - solink_health_monitor: 
   {
  - mappings: 
  {
 - stats: 
 {
- _timestamp: 
{
   - enabled: true,
   - store: true
   },
- properties: 
{
   - numChanges: 
   {
  - 
 - type: "long"
  },
- tag: 
{
   - type: "string"
   }
}
 }
  }
   
}
}

On Friday, 19 December 2014 15:33:19 UTC-5, Jef Statham wrote:
>
> Same error 
>
> "error": "ActionRequestValidationException[Validation Failed: 1: mapping 
> type is missing;]",
> "status": 400
>
> On Friday, 19 December 2014 15:18:46 UTC-5, Christian Hedegaard wrote:
>>
>>  Try this:
>>
>>  
>>
>> { 
>>
>> "template" : "whateverindex-*",
>>
>>"mappings" : {
>>
>>"events" : {
>>
>>  "_timestamp" : { "enabled" : true },
>>
>> }
>>
>>   }
>>
>> }
>>
>>  
>>
>> *From:* elasti...@googlegroups.com [mailto:elasti...@googlegroups.com] *On 
>> Behalf Of *Jef Statham
>> *Sent:* Friday, December 19, 2014 12:09 PM
>> *To:* elasti...@googlegroups.com
>> *Subject:* Cannot figure out how to add automatic timestamps when a 
>> document is indexed.
>>
>>  
>>  
>> I've been trying a PUT to an existing index 
>> /solink_health_monitor/_mapping to add a timestamp field to the document 
>> _source
>>  
>>  
>>  
>> "mappings": {
>>  
>> "stats": {
>>  
>> "properties": {
>>  
>> "@timestamp:": {
>>  
>> "enabled": true,
>>  
>> "store": true
>>  
>> }
>>  
>> }
>>  
>> }
>>  
>> }
>>  
>>  
>>  
>> I get the following response 
>>  
>>  
>>   
>> {
>>  
>> "error": "ActionRequestValidationException[Validation 
>> Failed: 1: mapping type is missing;]",
>>  
>> "status": 400
>>  
>> }
>>   
>>  
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2fe8afa3-31f7-427b-bc12-48f0c6d49694%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66b983c1-9a80-4f9c-9916-31facf1afb45%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart

2014-12-19 Thread Nikolas Everett
I believe so.

On Fri, Dec 19, 2014 at 3:39 PM,  wrote:
>
>
>
> On Friday, December 19, 2014 12:31:33 PM UTC-8, Nikolas Everett wrote:
>>
>> You have to reenable allocation after the node comes back and wait for
>> the shards to initialize there.
>>
>
> So this means the tutorial is wrong (current version):
>
> 2. Disable allocation
> 3. stop node
> 4. ...
> 5. start node
> 6. Repeat 3-5 for the rest of your nodes
> 7. Re-enable shard allocation using ...
>
> It should be:
>
> 2. disable allocation
> 3. stop node
> 4. ...
> 5. start node
> 6. enable allocation
> 7. repeat steps 2-6 for the rest of your nodes
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/066a7a52-c011-44c3-b630-953c82ab4818%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1UpZHXYuTWPtJOjXFKE7wPfwQe4puaT30yM32KfDHESw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart

2014-12-19 Thread iskren . chernev


On Friday, December 19, 2014 12:31:33 PM UTC-8, Nikolas Everett wrote:
>
> You have to reenable allocation after the node comes back and wait for the 
> shards to initialize there.
>

So this means the tutorial is wrong (current version):

2. Disable allocation
3. stop node
4. ...
5. start node
6. Repeat 3-5 for the rest of your nodes
7. Re-enable shard allocation using ...

It should be:

2. disable allocation
3. stop node
4. ...
5. start node
6. enable allocation
7. repeat steps 2-6 for the rest of your nodes 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/066a7a52-c011-44c3-b630-953c82ab4818%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot figure out how to add automatic timestamps when a document is indexed.

2014-12-19 Thread Jef Statham
Same error 

"error": "ActionRequestValidationException[Validation Failed: 1: mapping 
type is missing;]",
"status": 400

On Friday, 19 December 2014 15:18:46 UTC-5, Christian Hedegaard wrote:
>
>  Try this:
>
>  
>
> { 
>
> "template" : "whateverindex-*",
>
>"mappings" : {
>
>"events" : {
>
>  "_timestamp" : { "enabled" : true },
>
> }
>
>   }
>
> }
>
>  
>
> *From:* elasti...@googlegroups.com  [mailto:
> elasti...@googlegroups.com ] *On Behalf Of *Jef Statham
> *Sent:* Friday, December 19, 2014 12:09 PM
> *To:* elasti...@googlegroups.com 
> *Subject:* Cannot figure out how to add automatic timestamps when a 
> document is indexed.
>
>  
>  
> I've been trying a PUT to an existing index 
> /solink_health_monitor/_mapping to add a timestamp field to the document 
> _source
>  
>  
>  
> "mappings": {
>  
> "stats": {
>  
> "properties": {
>  
> "@timestamp:": {
>  
> "enabled": true,
>  
> "store": true
>  
> }
>  
> }
>  
> }
>  
> }
>  
>  
>  
> I get the following response 
>  
>  
>   
> {
>  
> "error": "ActionRequestValidationException[Validation Failed: 
> 1: mapping type is missing;]",
>  
> "status": 400
>  
> }
>   
>  
>  
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/2fe8afa3-31f7-427b-bc12-48f0c6d49694%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/018967f3-6b08-4e09-a750-bd7339bd48a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart

2014-12-19 Thread Nikolas Everett
You have to reenable allocation after the node comes back and wait for the
shards to initialize there.

On Fri, Dec 19, 2014 at 3:23 PM,  wrote:
>
> I'm maintaining a small cluster of 9 nodes, and was trying to perform
> rolling restart as outlined here:
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_rolling_restarts.html#_rolling_restarts
>
> The problem is that after I disable reallocation and restart a single
> node, it appears it looses all its shards indefinitely (until I turn back
> reallocation). So if I do this for all nodes in the cluster I'll run out of
> primary shards at some point.
>
> I have an upstart task for Elasticsearch, so I stopped nodes with that (it
> sends SIGTERM). I tried the shutdown API but it did have the same effect --
> after node joins the cluster, it doesn't own any shards, and that doesn't
> change if I wait for a while.
>
> Am I doing something wrong?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c9eef95e-d7cf-4278-a99f-89d9ab878791%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2zKhhiDpwH%2BWj4SoQYP5B6C5seET%2BBtTYCDM%2B-3rS%3D0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Rolling restart

2014-12-19 Thread iskren . chernev
I'm maintaining a small cluster of 9 nodes, and was trying to perform 
rolling restart as outlined 
here: 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_rolling_restarts.html#_rolling_restarts

The problem is that after I disable reallocation and restart a single node, 
it appears it looses all its shards indefinitely (until I turn back 
reallocation). So if I do this for all nodes in the cluster I'll run out of 
primary shards at some point.

I have an upstart task for Elasticsearch, so I stopped nodes with that (it 
sends SIGTERM). I tried the shutdown API but it did have the same effect -- 
after node joins the cluster, it doesn't own any shards, and that doesn't 
change if I wait for a while.

Am I doing something wrong?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c9eef95e-d7cf-4278-a99f-89d9ab878791%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


RE: Cannot figure out how to add automatic timestamps when a document is indexed.

2014-12-19 Thread Christian Hedegaard
Try this:

{
"template" : "whateverindex-*",
   "mappings" : {
   "events" : {
 "_timestamp" : { "enabled" : true },
}
  }
}

From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On 
Behalf Of Jef Statham
Sent: Friday, December 19, 2014 12:09 PM
To: elasticsearch@googlegroups.com
Subject: Cannot figure out how to add automatic timestamps when a document is 
indexed.

I've been trying a PUT to an existing index /solink_health_monitor/_mapping to 
add a timestamp field to the document _source

"mappings": {
"stats": {
"properties": {
"@timestamp:": {
"enabled": true,
"store": true
}
}
}
}

I get the following response

{
"error": "ActionRequestValidationException[Validation Failed: 1: 
mapping type is missing;]",
"status": 400
}

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fe8afa3-31f7-427b-bc12-48f0c6d49694%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EFF0743%40s-us-ex-6.US.R5S.com.
For more options, visit https://groups.google.com/d/optout.


Cannot figure out how to add automatic timestamps when a document is indexed.

2014-12-19 Thread Jef Statham
I've been trying a PUT to an existing index /solink_health_monitor/_mapping 
to add a timestamp field to the document _source

"mappings": {
"stats": {
"properties": {
"@timestamp:": {
"enabled": true,
"store": true
}
}
}
}

I get the following response 

{
"error": "ActionRequestValidationException[Validation Failed: 1: mapping 
type is missing;]",
"status": 400
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fe8afa3-31f7-427b-bc12-48f0c6d49694%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using wildcards to speficy multiple indexes in Kibana, is it still not supported?

2014-12-19 Thread antoine . girbal
Vagif,
you should try out Kibana 4, with which you can define either wildcard 
based name (e.g. logstash-*) or date based name (e.g. logstash-.MM.DD)
Note that it is typically much more efficient to use a date based approach, 
since K4 can then only query the indices that match your query, instead of 
all of them.


On Thursday, December 18, 2014 3:24:55 AM UTC-8, Vagif Abilov wrote:
>
> We have groups of multiple indexes which names differ in a few characters, 
> and we want to query all indexes belonging to the same group. I've found 
> some old threads where it's explained that Kibana doesn't support wildcard 
> in index names (except for date specifier) and there only way to specify 
> such multiple indexes is just to list all of them explicitly in Kibana 
> settings. But that's not flexible in our case.
>
> I wonder if Kibana supports or plan to support wildcards in index names.
>
> Thanks in advance
>
> Vagif Abilov
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d51b047-6d15-45d1-b411-361b03bebf41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch as a Seach Engine for our Intranet Site

2014-12-19 Thread Gill Singh
Ok, thanks. So I guess it can't crawl or index but can serve as a Search 
tool maybe through Plugin. Is there any combination of tools or any other 
Open Source Search Engine that can serve the need? Thanks.

On Friday, December 19, 2014 1:04:44 PM UTC-5, Nikolas Everett wrote:
>
>
>
> On Fri, Dec 19, 2014 at 12:51 PM, Gill Singh  > wrote:
>>
>> Hi, I am new here, just joined this group!
>>
>> We are looking for a new Search Engine for our Intranet site. Can 
>> ElasticSearch be used for Crawling, Indexing and Searching Intranet type 
>> sites? We will need to crawl/index our web pages within Intranet, documents 
>> (PDF's Word etc) plus potentially Database indexing and then provide a user 
>> interface where our internal user base can search and show results. Thanks.
>>
>>
>>
> Elasticsearch doesn't really do any of that.  Its more a building block 
> for implementing this stuff.  For instance, MediaWiki has a plugin 
>  to implement its 
> search using Elasticsearch.  I'm sure there are other examples but I happen 
> to work on the MediaWiki one so I have the link on hand.
>
> Nik
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/30892510-5c7b-4470-9f01-e88740e31e49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Shard query cache not working for aggregates with simple date range filter

2014-12-19 Thread Luke Nezda
I'm using Elasticsearch 1.4.2 and was excited about the new shard query 
cache 
,
 
but I am surprised this simple terms aggregate doesn't seem to show up in 
the cache if I have a date range filter: 
https://gist.github.com/nezda/c65dd66785d5f1e4dbd4 -- I'm not referencing 
`now` or anything, so this seems like a bug to me.

Please advise,
- Luke

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fc9298f9-cb10-4698-a8f5-09b4d7e19713%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch as a Seach Engine for our Intranet Site

2014-12-19 Thread Nikolas Everett
On Fri, Dec 19, 2014 at 12:51 PM, Gill Singh 
wrote:
>
> Hi, I am new here, just joined this group!
>
> We are looking for a new Search Engine for our Intranet site. Can
> ElasticSearch be used for Crawling, Indexing and Searching Intranet type
> sites? We will need to crawl/index our web pages within Intranet, documents
> (PDF's Word etc) plus potentially Database indexing and then provide a user
> interface where our internal user base can search and show results. Thanks.
>
>
>
Elasticsearch doesn't really do any of that.  Its more a building block for
implementing this stuff.  For instance, MediaWiki has a plugin
 to implement its
search using Elasticsearch.  I'm sure there are other examples but I happen
to work on the MediaWiki one so I have the link on hand.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2vJCvANHi1XnvCswepCU8aVY-YotWofMM%2B0Qf2%2BcWFeg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Wrong routing of TransportClient with sniffing enabled

2014-12-19 Thread Bae, Jae Hyeon
I already did because it's default right? Also, I am seeing the following
which is the proof of not ignoring cluster name.

2014-12-19 06:58:08,187 WARN elasticsearch[Sun Girl][generic][T#49358]
transport - [Sun Girl] node null not part of the cluster Cluster
[es_logsummary], ignoring...


On Fri, Dec 19, 2014 at 1:56 AM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:
>
> You must set
>
> client.transport.ignore_cluster_name
>
> to "false", see
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html
>
> Jörg
>
>
>
> On Fri, Dec 19, 2014 at 8:09 AM, Jae  wrote:
>
>> Hi
>>
>> I am using ES 1.1.0 with TransportClient.
>>
>> We observed wrong routing from TransportClient when we scale up the
>> cluster. For example, suppose that we have two ES clusters, es0, es1 and
>> es_sink_0 is the TransportClient talking to es0, es_sink_1 is one talking
>> to es1. If we scale up es1, it happens that es_sink_0 is sending data to
>> es1. We are using client.transport.sniff=true by default. This should not
>> happen theoretically because TransportClient will refresh its server list
>> through communicating with the cluster and new nodes should not join to the
>> wrong cluster.
>>
>> Is there anybody who has seen this problem before? Any comments will be
>> totally appreciated. We didn't find the root cause yet but this is the
>> really serious problem. So, temporarily, I want to turn off sniff and add
>> the feature that manually updates the server list through external discover
>> module.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/7a34e016-b322-48aa-a1de-a43249e7b58d%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2eMmpbvo%2Bvymyg6Y5XxxYTd_dABZ2VkmNJfa-bVKXNA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKe7ALc%2BCFz9wA4ha_MK%2BAMMvAv9dJM%2BBPdHExU%2B%3Dq%2BgQoa-cw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch as a Seach Engine for our Intranet Site

2014-12-19 Thread Gill Singh
Hi, I am new here, just joined this group!

We are looking for a new Search Engine for our Intranet site. Can 
ElasticSearch be used for Crawling, Indexing and Searching Intranet type 
sites? We will need to crawl/index our web pages within Intranet, documents 
(PDF's Word etc) plus potentially Database indexing and then provide a user 
interface where our internal user base can search and show results. Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7ebfe64a-2056-4edf-afdb-763e6fba3d8b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Updating Elastic search index along with DB

2014-12-19 Thread teseter


Hi 

 To fasten the search operations we are planning to use Elastic search for 
transactional data. Data has to be fed into Elastic search from DB. The 
main problem which we are foreseeing is to keep the data in synch between 
ES and DB as the transactional data can be updated. We are looking for ways 
to update the ES index after the transaction data is updated successfully 
to avoid showing the stale data from ES. 

We are looking for different solution options to achieve the same. if 
anybody has worked on similar requirement please do let us know.

We are using Oracle DB? Should we use Oracle AQ on the database side or 
some Java JMS Q to update Elastic search index.

Also we want to know when we update ES index will all the shards 
automatically get updated?



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b63b2347-700b-4bc0-a5a8-09a8c3b4681e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES stopped logging

2014-12-19 Thread digitalx00
Topic says itI'm not seeing any log files in 
/var/log/elasticsearch...ES is running, and I haven't modified anything 
since starting.  Any way to track this down?  Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01ed7f35-1c06-4a2c-b1bb-96bace45e2cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2014-12-19 Thread joergpra...@gmail.com
Yes, I have 3 nodes and each index has 3 shards, on 32 core machines.

Each shard contains many segments, which can be read and written
concurrently by Lucene. Since Lucene 4, there have been massive
improvements in that area.

Maybe you have observed the effect that many shards on a node for a single
index show a different performance behavior when docs are added over long
periods of time. It simply takes longer before large segment merging begins
because docs are wider distributed and use smaller segment sizes for a
longer time. The downside is that huge segment counts may occur (and many
users encounter high file descriptor numbers). With the right
configuration, you can set up a single shard per index on a node, and
segment merging / segment count is not a real problem.

You are right if you consider shard size as a factor for moving the shard
around (into snapshot/restore) or for export, or at node recovery when the
node starts up. I think shard sizes over 30 GB are a bit heavy, but this
also depends on the speed of the I/O subsystem. With SSD or RAID 0, I can
operate at I/O rates of over 1 GB/sec at sequential read. The shard size
factor has to be balanced out, either by using more than one index, or a
higher number of nodes, or faster I/O subsystem.

Jörg



On Fri, Dec 19, 2014 at 3:42 PM, AlexR  wrote:

> Jorg, if you have a single large index and a cluster with 3 nodes do you
> suggest to create just 3 shards even though each node has say 16 cores.
> With just three shards they will be very big and not much patallelism in
> computations will occur.
> am I missing something?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/36d1a61a-e996-4bec-97b7-0842fc118cb2%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%2BK%2BuoGzViR%2B-TjYi97zYHVmkuqxT6eWXDu9xiek-NNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2014-12-19 Thread AlexR
Jorg, if you have a single large index and a cluster with 3 nodes do you 
suggest to create just 3 shards even though each node has say 16 cores. With 
just three shards they will be very big and not much patallelism in 
computations will occur.
am I missing something?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/36d1a61a-e996-4bec-97b7-0842fc118cb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Oracle to Elasticsearch

2014-12-19 Thread Marian Valero
Yesss sure, thank you so much and merry christmas

2014-12-19 9:54 GMT-04:30 joergpra...@gmail.com :
>
> If you can give me a few calm days over the holidays, I will try to
> rewrite the Mongo DB example recipe with Oracle triggers to Elasticsearch
> with JDBC plugin.
>
> Jörg
>
> On Fri, Dec 19, 2014 at 3:08 PM, Marian Valero 
> wrote:
>
>> Yes I know, only that they tell me and my questions was for know what
>> recomended us to me, but this is not reason for I don't use JDBC, also I
>> want to know how can I do that. I have been an example of this like explain
>> here: https://github.com/jprante/elasticsearch-river-jdbc but I don't
>> know how to use that very well. So if you can help me I appreciate this.
>> Thanks.
>>
>> El viernes, 19 de diciembre de 2014 09:28:41 UTC-4:30, Jörg Prante
>> escribió:
>>>
>>> So you avoid all community supported plugins?
>>>
>>> As a side note, I wrote JDBC plugin for ES a few years ago so I can load
>>> my Oracle production data to ES.
>>>
>>> Jörg
>>>
>>> On Fri, Dec 19, 2014 at 2:54 PM, Marian Valero 
>>> wrote:
>>>
 ES and Logstash support tell me that at this time, they do not have an
 input for Logstash that accepts SQL from Oracle. they also do not recommend
 use of the existing JDBC river as we don't support it.



 2014-12-19 9:21 GMT-04:30 joerg...@gmail.com :
>
> If you can be more specific, I might try to help as best as I can.
>
> Oracle offers Oracle RDBMS and Oracle also offers JDBC so all Oracle
> RDBMS provide JDBC support.
>
> Jörg
>
> On Fri, Dec 19, 2014 at 2:28 PM, Marian Valero 
> wrote:
>
>> They tell me that JDBC don't support
>>
>> El viernes, 19 de diciembre de 2014 05:08:27 UTC-4:30, Jörg Prante
>> escribió:
>>>
>>> With JDBC plugin, you can realize scenarios comparable to this one
>>>
>>> http://tebros.com/2011/09/keep-mongodb-and-oracle-in-sync-
>>> using-streams-advanced-queuing/
>>>
>>> so I am not sure why do do not want JDBC plugin?
>>>
>>> Do you need a documentation?
>>>
>>> Jörg
>>>
>>> On Thu, Dec 18, 2014 at 7:07 PM, Marian Valero 
>>> wrote:
>>>
 I want to migrate from oracle to elasticsearch data for analize
 that. I have using logstash to read a csv file but when input this 
 data I
 have much logs that I have inserting, for example I have 10 lines 
 and
 this has inserting 15 lines of logs. How can I fix that? This is my
 logstash.conf:

 input {
   file {
   path => "/home/mogangi/logstash-1.4.2/bin/eso.csv"
   type => "responselog"
   start_position => "beginning"
   }
 }
 filter {
 csv {
 columns => ["ID", "DELIVERYID", "MSGID", "RSPDATE",
 "RAWRESPONSE", "PARSEDRESPONSE", "SHORTCODE", "INSID", "MOBILENUMBER",
 "CLEANTEXT"]
 separator => ","
 }
 }
 output {
 elasticsearch {
 action => "index"
 host => "localhost"
 index => "logstash-%{+.MM.dd}"
 workers => 1
 }
 # stdout {
 # codec => rubydebug
 # }
 }

 Other questions is, how can I connect logstash to oracle in real
 time, no local? without to use JDBC river.

 Thanks

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0963bc61-83f
 e-4ae7-8dec-adfc3430c16b%40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>> You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/e9d5bd5f-b922-4a3d-a63b-7519e6aa8fbc%
>> 40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Group

Re: Bulk API Indexing Status 500 Error NullPointerException[null]

2014-12-19 Thread drjz
I solved this by adding this setting in the index configuration.

"index" : {
"refresh_interval" : "-1"
} 

So it was a connectivity issue after all. :-)

/JZ


On Friday, December 19, 2014 2:49:05 PM UTC+1, Jörg Prante wrote:
>
> Can you show the data you send in the bulk request, and your code?
>
> Most probably you send null values for index or type.
>
> Jörg
>
> On Fri, Dec 19, 2014 at 2:28 PM, drjz > 
> wrote:
>
>> Dear all,
>>
>> I am encountering a weird issue. When I use the Bulk API (REST) to index 
>> documents, I am getting for the same documents each time the following 
>> error returned:
>>
>> "status":500,"error":"NullPointerException[null]"
>>
>> However, when I re-index the rejected documents, it will index them the 
>> second time.
>>
>> Is there a work-around for this? Is it a connectivity issue? 
>>
>> Help greatly appreciated!
>>
>> Thanks!
>> /JZ
>>
>>
>>
>>  
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e11ea64e-4e01-43ea-973d-8c58e7020e5c%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fd53d6f2-4531-4d19-922a-5292bf2ef2ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Oracle to Elasticsearch

2014-12-19 Thread joergpra...@gmail.com
If you can give me a few calm days over the holidays, I will try to rewrite
the Mongo DB example recipe with Oracle triggers to Elasticsearch with JDBC
plugin.

Jörg

On Fri, Dec 19, 2014 at 3:08 PM, Marian Valero 
wrote:

> Yes I know, only that they tell me and my questions was for know what
> recomended us to me, but this is not reason for I don't use JDBC, also I
> want to know how can I do that. I have been an example of this like explain
> here: https://github.com/jprante/elasticsearch-river-jdbc but I don't
> know how to use that very well. So if you can help me I appreciate this.
> Thanks.
>
> El viernes, 19 de diciembre de 2014 09:28:41 UTC-4:30, Jörg Prante
> escribió:
>>
>> So you avoid all community supported plugins?
>>
>> As a side note, I wrote JDBC plugin for ES a few years ago so I can load
>> my Oracle production data to ES.
>>
>> Jörg
>>
>> On Fri, Dec 19, 2014 at 2:54 PM, Marian Valero 
>> wrote:
>>
>>> ES and Logstash support tell me that at this time, they do not have an
>>> input for Logstash that accepts SQL from Oracle. they also do not recommend
>>> use of the existing JDBC river as we don't support it.
>>>
>>>
>>>
>>> 2014-12-19 9:21 GMT-04:30 joerg...@gmail.com :

 If you can be more specific, I might try to help as best as I can.

 Oracle offers Oracle RDBMS and Oracle also offers JDBC so all Oracle
 RDBMS provide JDBC support.

 Jörg

 On Fri, Dec 19, 2014 at 2:28 PM, Marian Valero 
 wrote:

> They tell me that JDBC don't support
>
> El viernes, 19 de diciembre de 2014 05:08:27 UTC-4:30, Jörg Prante
> escribió:
>>
>> With JDBC plugin, you can realize scenarios comparable to this one
>>
>> http://tebros.com/2011/09/keep-mongodb-and-oracle-in-sync-
>> using-streams-advanced-queuing/
>>
>> so I am not sure why do do not want JDBC plugin?
>>
>> Do you need a documentation?
>>
>> Jörg
>>
>> On Thu, Dec 18, 2014 at 7:07 PM, Marian Valero 
>> wrote:
>>
>>> I want to migrate from oracle to elasticsearch data for analize
>>> that. I have using logstash to read a csv file but when input this data 
>>> I
>>> have much logs that I have inserting, for example I have 10 lines 
>>> and
>>> this has inserting 15 lines of logs. How can I fix that? This is my
>>> logstash.conf:
>>>
>>> input {
>>>   file {
>>>   path => "/home/mogangi/logstash-1.4.2/bin/eso.csv"
>>>   type => "responselog"
>>>   start_position => "beginning"
>>>   }
>>> }
>>> filter {
>>> csv {
>>> columns => ["ID", "DELIVERYID", "MSGID", "RSPDATE",
>>> "RAWRESPONSE", "PARSEDRESPONSE", "SHORTCODE", "INSID", "MOBILENUMBER",
>>> "CLEANTEXT"]
>>> separator => ","
>>> }
>>> }
>>> output {
>>> elasticsearch {
>>> action => "index"
>>> host => "localhost"
>>> index => "logstash-%{+.MM.dd}"
>>> workers => 1
>>> }
>>> # stdout {
>>> # codec => rubydebug
>>> # }
>>> }
>>>
>>> Other questions is, how can I connect logstash to oracle in real
>>> time, no local? without to use JDBC river.
>>>
>>> Thanks
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/0963bc61-83f
>>> e-4ae7-8dec-adfc3430c16b%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/e9d5bd5f-b922-4a3d-a63b-7519e6aa8fbc%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

  --
 You received this message because you are subscribed to a topic in the
 Google Groups "elasticsearch" group.
 To unsubscribe from this topic, visit https://groups.google.com/d/
 topic/elasticsearch/14C3mxsYQtA/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearc...@goog

Re: Oracle to Elasticsearch

2014-12-19 Thread Marian Valero
Yes I know, only that they tell me and my questions was for know what 
recomended us to me, but this is not reason for I don't use JDBC, also I 
want to know how can I do that. I have been an example of this like explain 
here: https://github.com/jprante/elasticsearch-river-jdbc but I don't know 
how to use that very well. So if you can help me I appreciate this. Thanks. 

El viernes, 19 de diciembre de 2014 09:28:41 UTC-4:30, Jörg Prante escribió:
>
> So you avoid all community supported plugins?
>
> As a side note, I wrote JDBC plugin for ES a few years ago so I can load 
> my Oracle production data to ES.
>
> Jörg
>
> On Fri, Dec 19, 2014 at 2:54 PM, Marian Valero  > wrote:
>
>> ES and Logstash support tell me that at this time, they do not have an 
>> input for Logstash that accepts SQL from Oracle. they also do not recommend 
>> use of the existing JDBC river as we don't support it.
>>
>>
>>
>> 2014-12-19 9:21 GMT-04:30 joerg...@gmail.com  <
>> joerg...@gmail.com >:
>>>
>>> If you can be more specific, I might try to help as best as I can. 
>>>
>>> Oracle offers Oracle RDBMS and Oracle also offers JDBC so all Oracle 
>>> RDBMS provide JDBC support.
>>>
>>> Jörg
>>>
>>> On Fri, Dec 19, 2014 at 2:28 PM, Marian Valero >> > wrote:
>>>
 They tell me that JDBC don't support

 El viernes, 19 de diciembre de 2014 05:08:27 UTC-4:30, Jörg Prante 
 escribió:
>
> With JDBC plugin, you can realize scenarios comparable to this one
>
> http://tebros.com/2011/09/keep-mongodb-and-oracle-in-
> sync-using-streams-advanced-queuing/
>
> so I am not sure why do do not want JDBC plugin?
>
> Do you need a documentation?
>
> Jörg
>
> On Thu, Dec 18, 2014 at 7:07 PM, Marian Valero  
> wrote:
>
>> I want to migrate from oracle to elasticsearch data for analize that. 
>> I have using logstash to read a csv file but when input this data I have 
>> much logs that I have inserting, for example I have 10 lines and 
>> this 
>> has inserting 15 lines of logs. How can I fix that? This is my 
>> logstash.conf:
>>
>> input {  
>>   file {
>>   path => "/home/mogangi/logstash-1.4.2/bin/eso.csv"
>>   type => "responselog"
>>   start_position => "beginning"
>>   }
>> }
>> filter {  
>> csv {
>> columns => ["ID", "DELIVERYID", "MSGID", "RSPDATE", 
>> "RAWRESPONSE", "PARSEDRESPONSE", "SHORTCODE", "INSID", "MOBILENUMBER", 
>> "CLEANTEXT"]
>> separator => ","
>> }
>> }
>> output {  
>> elasticsearch {
>> action => "index"
>> host => "localhost"
>> index => "logstash-%{+.MM.dd}"
>> workers => 1
>> }
>> # stdout {
>> # codec => rubydebug
>> # }
>> }
>>
>> Other questions is, how can I connect logstash to oracle in real 
>> time, no local? without to use JDBC river.
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/0963bc61-83fe-4ae7-8dec-adfc3430c16b%
>> 40googlegroups.com 
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com .
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/e9d5bd5f-b922-4a3d-a63b-7519e6aa8fbc%40googlegroups.com
  
 
 .

 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/elasticsearch/14C3mxsYQtA/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEnCx27Gq%3Dray-RzTb8J2O2mXsEX9gnze4zsjM1jbnbwA%40mail.gmail.com
>>>  
>>> 

Re: Default shard allocation (where new shards are created)

2014-12-19 Thread Nikolas Everett
Check what curator is doing with your index.  Its probably fiddling with
index.routing.allocation.include and index.routing.allocation.exclude.
When you create the new index just set it pick up the ssd tag.  You'll have
to make sure that curator knows how to strip that tag when the time comes
to move it to the spinning disks.

Nik

On Fri, Dec 19, 2014 at 3:36 AM, Robin Clarke  wrote:
>
> My current setup is with 10 nodes with ample space on spinning disks, and
> 20 nodes with smaller SSD disks.
> I would like my workflow to be that all data is initially indexed on the
> SSD nodes, after 10 days is reallocated to the spinning disks, after a
> further 10 days the index is closed, and after a further 70 days the
> indexes are deleted.
> The curator is great for moving them to the spinning disks, but what I am
> not sure about is how to define that initially all shards (primary and
> replica) of an index should be created on the ssd nodes.
> The spinning disk nodes are tagged:
> node.disk_type: spinning
> The ssd nodes are tagged
> node.disk_type: ssd
>
> To transfer after 10 days to spinning:
>  /usr/local/bin/curator --host es101 allocation --older-than 9 --rule
> disk_type=spinning
>
> But how do I define that the default location for all new shards should be
> on disk_type:ssd ?
>
> I have the example here
> 
>  which
> I think could be modified like this:
>
> curl -XPUT localhost:9200/_cluster/settings -d '{
> "persistent" : {
> "cluster.routing.allocation.require.disk_type" : "ssd"
> }
> }'
>
> But for one this setting does not exist, and I'm not sure if this will
> stop the shards being reallocated to spinning later on...
>
> Any ideas how to implement my desired workflow?
>
> Thank you!
> -Robin-
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/18615b7c-8e99-42ee-b54d-ef06a1888181%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0sjd_VEY1sC_e5dswtBFT51M6wdsa1y%3D8kDx0zKMad6Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Oracle to Elasticsearch

2014-12-19 Thread joergpra...@gmail.com
So you avoid all community supported plugins?

As a side note, I wrote JDBC plugin for ES a few years ago so I can load my
Oracle production data to ES.

Jörg

On Fri, Dec 19, 2014 at 2:54 PM, Marian Valero 
wrote:

> ES and Logstash support tell me that at this time, they do not have an
> input for Logstash that accepts SQL from Oracle. they also do not recommend
> use of the existing JDBC river as we don't support it.
>
>
>
> 2014-12-19 9:21 GMT-04:30 joergpra...@gmail.com :
>>
>> If you can be more specific, I might try to help as best as I can.
>>
>> Oracle offers Oracle RDBMS and Oracle also offers JDBC so all Oracle
>> RDBMS provide JDBC support.
>>
>> Jörg
>>
>> On Fri, Dec 19, 2014 at 2:28 PM, Marian Valero 
>> wrote:
>>
>>> They tell me that JDBC don't support
>>>
>>> El viernes, 19 de diciembre de 2014 05:08:27 UTC-4:30, Jörg Prante
>>> escribió:

 With JDBC plugin, you can realize scenarios comparable to this one

 http://tebros.com/2011/09/keep-mongodb-and-oracle-in-
 sync-using-streams-advanced-queuing/

 so I am not sure why do do not want JDBC plugin?

 Do you need a documentation?

 Jörg

 On Thu, Dec 18, 2014 at 7:07 PM, Marian Valero 
 wrote:

> I want to migrate from oracle to elasticsearch data for analize that.
> I have using logstash to read a csv file but when input this data I have
> much logs that I have inserting, for example I have 10 lines and this
> has inserting 15 lines of logs. How can I fix that? This is my
> logstash.conf:
>
> input {
>   file {
>   path => "/home/mogangi/logstash-1.4.2/bin/eso.csv"
>   type => "responselog"
>   start_position => "beginning"
>   }
> }
> filter {
> csv {
> columns => ["ID", "DELIVERYID", "MSGID", "RSPDATE",
> "RAWRESPONSE", "PARSEDRESPONSE", "SHORTCODE", "INSID", "MOBILENUMBER",
> "CLEANTEXT"]
> separator => ","
> }
> }
> output {
> elasticsearch {
> action => "index"
> host => "localhost"
> index => "logstash-%{+.MM.dd}"
> workers => 1
> }
> # stdout {
> # codec => rubydebug
> # }
> }
>
> Other questions is, how can I connect logstash to oracle in real time,
> no local? without to use JDBC river.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/0963bc61-83fe-4ae7-8dec-adfc3430c16b%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/e9d5bd5f-b922-4a3d-a63b-7519e6aa8fbc%40googlegroups.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/14C3mxsYQtA/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEnCx27Gq%3Dray-RzTb8J2O2mXsEX9gnze4zsjM1jbnbwA%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CADT%2BR%3DY3au76pkDxtdvi81Uax88PNUu6RruTLaazYHV2WrOa%2Bg%40mail.gmail.com
> 

Re: Oracle to Elasticsearch

2014-12-19 Thread Marian Valero
ES and Logstash support tell me that at this time, they do not have an
input for Logstash that accepts SQL from Oracle. they also do not recommend
use of the existing JDBC river as we don't support it.



2014-12-19 9:21 GMT-04:30 joergpra...@gmail.com :
>
> If you can be more specific, I might try to help as best as I can.
>
> Oracle offers Oracle RDBMS and Oracle also offers JDBC so all Oracle RDBMS
> provide JDBC support.
>
> Jörg
>
> On Fri, Dec 19, 2014 at 2:28 PM, Marian Valero 
> wrote:
>
>> They tell me that JDBC don't support
>>
>> El viernes, 19 de diciembre de 2014 05:08:27 UTC-4:30, Jörg Prante
>> escribió:
>>>
>>> With JDBC plugin, you can realize scenarios comparable to this one
>>>
>>> http://tebros.com/2011/09/keep-mongodb-and-oracle-in-
>>> sync-using-streams-advanced-queuing/
>>>
>>> so I am not sure why do do not want JDBC plugin?
>>>
>>> Do you need a documentation?
>>>
>>> Jörg
>>>
>>> On Thu, Dec 18, 2014 at 7:07 PM, Marian Valero 
>>> wrote:
>>>
 I want to migrate from oracle to elasticsearch data for analize that. I
 have using logstash to read a csv file but when input this data I have much
 logs that I have inserting, for example I have 10 lines and this has
 inserting 15 lines of logs. How can I fix that? This is my
 logstash.conf:

 input {
   file {
   path => "/home/mogangi/logstash-1.4.2/bin/eso.csv"
   type => "responselog"
   start_position => "beginning"
   }
 }
 filter {
 csv {
 columns => ["ID", "DELIVERYID", "MSGID", "RSPDATE",
 "RAWRESPONSE", "PARSEDRESPONSE", "SHORTCODE", "INSID", "MOBILENUMBER",
 "CLEANTEXT"]
 separator => ","
 }
 }
 output {
 elasticsearch {
 action => "index"
 host => "localhost"
 index => "logstash-%{+.MM.dd}"
 workers => 1
 }
 # stdout {
 # codec => rubydebug
 # }
 }

 Other questions is, how can I connect logstash to oracle in real time,
 no local? without to use JDBC river.

 Thanks

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/0963bc61-83fe-4ae7-8dec-adfc3430c16b%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/e9d5bd5f-b922-4a3d-a63b-7519e6aa8fbc%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/14C3mxsYQtA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEnCx27Gq%3Dray-RzTb8J2O2mXsEX9gnze4zsjM1jbnbwA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADT%2BR%3DY3au76pkDxtdvi81Uax88PNUu6RruTLaazYHV2WrOa%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Oracle to Elasticsearch

2014-12-19 Thread joergpra...@gmail.com
If you can be more specific, I might try to help as best as I can.

Oracle offers Oracle RDBMS and Oracle also offers JDBC so all Oracle RDBMS
provide JDBC support.

Jörg

On Fri, Dec 19, 2014 at 2:28 PM, Marian Valero 
wrote:

> They tell me that JDBC don't support
>
> El viernes, 19 de diciembre de 2014 05:08:27 UTC-4:30, Jörg Prante
> escribió:
>>
>> With JDBC plugin, you can realize scenarios comparable to this one
>>
>> http://tebros.com/2011/09/keep-mongodb-and-oracle-in-
>> sync-using-streams-advanced-queuing/
>>
>> so I am not sure why do do not want JDBC plugin?
>>
>> Do you need a documentation?
>>
>> Jörg
>>
>> On Thu, Dec 18, 2014 at 7:07 PM, Marian Valero 
>> wrote:
>>
>>> I want to migrate from oracle to elasticsearch data for analize that. I
>>> have using logstash to read a csv file but when input this data I have much
>>> logs that I have inserting, for example I have 10 lines and this has
>>> inserting 15 lines of logs. How can I fix that? This is my
>>> logstash.conf:
>>>
>>> input {
>>>   file {
>>>   path => "/home/mogangi/logstash-1.4.2/bin/eso.csv"
>>>   type => "responselog"
>>>   start_position => "beginning"
>>>   }
>>> }
>>> filter {
>>> csv {
>>> columns => ["ID", "DELIVERYID", "MSGID", "RSPDATE",
>>> "RAWRESPONSE", "PARSEDRESPONSE", "SHORTCODE", "INSID", "MOBILENUMBER",
>>> "CLEANTEXT"]
>>> separator => ","
>>> }
>>> }
>>> output {
>>> elasticsearch {
>>> action => "index"
>>> host => "localhost"
>>> index => "logstash-%{+.MM.dd}"
>>> workers => 1
>>> }
>>> # stdout {
>>> # codec => rubydebug
>>> # }
>>> }
>>>
>>> Other questions is, how can I connect logstash to oracle in real time,
>>> no local? without to use JDBC river.
>>>
>>> Thanks
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/0963bc61-83fe-4ae7-8dec-adfc3430c16b%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e9d5bd5f-b922-4a3d-a63b-7519e6aa8fbc%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEnCx27Gq%3Dray-RzTb8J2O2mXsEX9gnze4zsjM1jbnbwA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk API Indexing Status 500 Error NullPointerException[null]

2014-12-19 Thread joergpra...@gmail.com
Can you show the data you send in the bulk request, and your code?

Most probably you send null values for index or type.

Jörg

On Fri, Dec 19, 2014 at 2:28 PM, drjz  wrote:

> Dear all,
>
> I am encountering a weird issue. When I use the Bulk API (REST) to index
> documents, I am getting for the same documents each time the following
> error returned:
>
> "status":500,"error":"NullPointerException[null]"
>
> However, when I re-index the rejected documents, it will index them the
> second time.
>
> Is there a work-around for this? Is it a connectivity issue?
>
> Help greatly appreciated!
>
> Thanks!
> /JZ
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e11ea64e-4e01-43ea-973d-8c58e7020e5c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHe7hzKCU46wxy7V25Ygh-1bnO-wcUbrqpOH9n2doAh5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Bulk API Indexing Status 500 Error NullPointerException[null]

2014-12-19 Thread drjz
Dear all,

I am encountering a weird issue. When I use the Bulk API (REST) to index 
documents, I am getting for the same documents each time the following 
error returned:

"status":500,"error":"NullPointerException[null]"

However, when I re-index the rejected documents, it will index them the 
second time.

Is there a work-around for this? Is it a connectivity issue? 

Help greatly appreciated!

Thanks!
/JZ



 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e11ea64e-4e01-43ea-973d-8c58e7020e5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Oracle to Elasticsearch

2014-12-19 Thread Marian Valero
They tell me that JDBC don't support

El viernes, 19 de diciembre de 2014 05:08:27 UTC-4:30, Jörg Prante escribió:
>
> With JDBC plugin, you can realize scenarios comparable to this one
>
>
> http://tebros.com/2011/09/keep-mongodb-and-oracle-in-sync-using-streams-advanced-queuing/
>
> so I am not sure why do do not want JDBC plugin?
>
> Do you need a documentation?
>
> Jörg
>
> On Thu, Dec 18, 2014 at 7:07 PM, Marian Valero  > wrote:
>
>> I want to migrate from oracle to elasticsearch data for analize that. I 
>> have using logstash to read a csv file but when input this data I have much 
>> logs that I have inserting, for example I have 10 lines and this has 
>> inserting 15 lines of logs. How can I fix that? This is my 
>> logstash.conf:
>>
>> input {  
>>   file {
>>   path => "/home/mogangi/logstash-1.4.2/bin/eso.csv"
>>   type => "responselog"
>>   start_position => "beginning"
>>   }
>> }
>> filter {  
>> csv {
>> columns => ["ID", "DELIVERYID", "MSGID", "RSPDATE", 
>> "RAWRESPONSE", "PARSEDRESPONSE", "SHORTCODE", "INSID", "MOBILENUMBER", 
>> "CLEANTEXT"]
>> separator => ","
>> }
>> }
>> output {  
>> elasticsearch {
>> action => "index"
>> host => "localhost"
>> index => "logstash-%{+.MM.dd}"
>> workers => 1
>> }
>> # stdout {
>> # codec => rubydebug
>> # }
>> }
>>
>> Other questions is, how can I connect logstash to oracle in real time, no 
>> local? without to use JDBC river.
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0963bc61-83fe-4ae7-8dec-adfc3430c16b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e9d5bd5f-b922-4a3d-a63b-7519e6aa8fbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Multilevel aggregation of terms when using nested objects.

2014-12-19 Thread Michael
We have a document format where there are a lot of nested objects and wants 
to look at e.g. location / language distribution. 

When performing a aggregation where we have 2 levels of nested objects we 
get data only from the first level but not from the second level, it is 
always 0 in count.

If we do a non nested + nested aggregation it works.

{
  "from": 0,
  "size": 0,
  "aggs": {
"Provider": {
  "terms": {
"field": "providerId",
"size": 10,
"order": {
  "_count": "desc"
}
  },
  "aggs": {
"Locations": {
  "nested": {
"path": "meta"
  },
  "aggs": {
"filter": {
  "filter": {
"term": {
  "meta.type": "locations"
}
  },
  "aggs": {
"Locations": {
  "terms": {
"field": "meta.value",
"size": 3,
"order": {
  "_count": "desc"
}
  }
}
  }
}
  }
}
  }
}
  },
  "query": {
"query_string": {
  "query": "barclays bank"
}
  }
}


If we do a nested + non nested aggregation it does not work.

{
  "from": 0,
  "size": 0,
  "aggs": {
"Locations": {
  "nested": {
"path": "meta"
  },
  "aggs": {
"filter": {
  "filter": {
"term": {
  "meta.type": "locations"
}
  },
  "aggs": {
"Locations": {
  "terms": {
"field": "meta.value",
"size": 10,
"order": {
  "_count": "desc"
}
  },
  "aggs": {
"Provider": {
  "terms": {
"field": "providerId",
"size": 3,
"order": {
  "_count": "desc"
}
  }
}
  }
}
  }
}
  }
}
  },
  "query": {
"query_string": {
  "query": "barclays bank"
}
  }
}


Is there something wrong with my query or is this an issue in elasticsearch?

-- 


Solid Media Group is a global provider of marketing services within social 
and digital media. With more than 60 employees and offices or partners in 
19 countries the Group is among the leading European providers, trusted by 
more than 11 000 corporations. *From Buzz to Business* is our mission, 
helping our customers to easily and successfully engage, market, publish, 
monitor  and manage activities across the digital and social media world. 
Being the European leader within online news-, social-media-, and 
web-monitoring services, we already monitor more than 52 000 news sources 
in 51 languages and 180 countries. We have more than 1,5 billion downloaded 
and indexed news articles on our servers, a number which increases with 
more than 1.5 million each day. Through our partners and integrated 
solutions more than 20 000 enterprises are finding and receiving market 
updates from our monitoring services every day.

Copyright © 2011-2014 | Solid Media Group | NO 967 771 067 MVA

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91ca631f-36d0-4325-aa8f-5cce3d340006%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES upgrade 0.20.6 to 1.3.4 -> CorruptIndexException

2014-12-19 Thread Georgeta Boanea
Hi All,

After upgrading from ES 0.20.6 to 1.3.4 the following messages occurred:

[2014-12-19 10:02:06.714 GMT] WARN || 
elasticsearch[es-node-name][generic][T#14] 
org.elasticsearch.cluster.action.shard  [es-node-name] [index-name][3] 
sending failed shard for [index-name][3], node[qOTLmb3IQC2COXZh1n9O2w], 
[P], s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, 
message [IndexShardGatewayRecoveryException[[index-name][3] failed to fetch 
index version after copying it over]; nested: 
CorruptIndexException[[index-name][3] Corrupted index 
[corrupted_Ackui00SSBi8YXACZGNDkg] caused by: CorruptIndexException[did not 
read all bytes from file: read 112 vs size 113 (resource: 
BufferedChecksumIndexInput(NIOFSIndexInput(path="path/3/index/_uzm_2.del")))]]; 
]]

[2014-12-19 10:02:08.390 GMT] WARN || 
elasticsearch[es-node-name][generic][T#20] 
org.elasticsearch.indices.cluster  [es-node-name] [index-name][3] failed to 
start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
[index-name][3] failed to fetch index version after copying it over
at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at 
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: [index-name][3] 
Corrupted index [corrupted_Ackui00SSBi8YXACZGNDkg] caused by: 
CorruptIndexException[did not read all bytes from file: read 112 vs size 
113 (resource: 
BufferedChecksumIndexInput(NIOFSIndexInput(path="path/3/index/_uzm_2.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more

Shard [3] of the index remains unallocated and the cluster remains in a RED 
state.

curl -XGET 'http://localhost:48012/_cluster/health?pretty=true'
{
  "cluster_name" : "cluster-name",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 5,
  "number_of_data_nodes" : 5,
  "active_primary_shards" : 10,
  "active_shards" : 20,
  "relocating_shards" : 0,
  "initializing_shards" : 1,
  "unassigned_shards" : 1
}

If I do an optimize (curl -XPOST 
http://localhost:48012/index-name/_optimize?max_num_segments=1) for the 
index before the update, everything is fine. Optimize works just before the 
update, if is done after the update the problem remains the same.

Any idea why this problem occurs?
Is there another way to avoid this problem? I want to avoid optimize in 
case of large volume of data.

Thank you,
Georgeta

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/74d0af86-c661-4e58-ba2c-d38adde1291c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Globally disable analysis (i.e. no via per-field mapping)?

2014-12-19 Thread Mark Walkom
Ok that makes a bit more sense, but it seems the amount of CPU you will
save isn't worth the effort.

You could create an index template that matches fields with pattern "*" and
sets index: not_analyzed, that'd be easiest.

On 19 December 2014 at 10:12, Eran Duchan  wrote:
>
> We use ElasticSearch to index our structured analytics data. We chose it
> for a few reasons:
>
>1. All fields are indexed so we can search by any field or combination
>of fields, including nested fields
>2. Flexible and built in geospatial searches
>3. Can scale with our data, which grows at ~100M documents a day
>
> It's pretty much a generic datastore (though not the source of truth).
>
> While we do have quite a few string fields in our data, these are mostly
> enumeration values ("connected", "not connected") and in preliminary tests
> we've found that disabling analysis (per field) shows savings of ~5% CPU.
> Not a huge amount but every bit helps.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ea428a1b-5b64-4992-8134-12a1ff4f0e97%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X84eb6wUqOfMNS8uBqvS5DdGR%2BXZx_zWZBdCKyfbkhH8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Globally disable analysis (i.e. no via per-field mapping)?

2014-12-19 Thread Eran Duchan
Ahh, that should probably do the trick. Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e9e3bac7-a74b-443b-a479-bac4891e324b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Wrong routing of TransportClient with sniffing enabled

2014-12-19 Thread joergpra...@gmail.com
You must set

client.transport.ignore_cluster_name

to "false", see

http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html

Jörg



On Fri, Dec 19, 2014 at 8:09 AM, Jae  wrote:

> Hi
>
> I am using ES 1.1.0 with TransportClient.
>
> We observed wrong routing from TransportClient when we scale up the
> cluster. For example, suppose that we have two ES clusters, es0, es1 and
> es_sink_0 is the TransportClient talking to es0, es_sink_1 is one talking
> to es1. If we scale up es1, it happens that es_sink_0 is sending data to
> es1. We are using client.transport.sniff=true by default. This should not
> happen theoretically because TransportClient will refresh its server list
> through communicating with the cluster and new nodes should not join to the
> wrong cluster.
>
> Is there anybody who has seen this problem before? Any comments will be
> totally appreciated. We didn't find the root cause yet but this is the
> really serious problem. So, temporarily, I want to turn off sniff and add
> the feature that manually updates the server list through external discover
> module.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7a34e016-b322-48aa-a1de-a43249e7b58d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2eMmpbvo%2Bvymyg6Y5XxxYTd_dABZ2VkmNJfa-bVKXNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Globally disable analysis (i.e. no via per-field mapping)?

2014-12-19 Thread joergpra...@gmail.com
The "keyword" analyzer is the "none" analyzer you are looking for.

Example settings:

{
"index" : {
"analysis" : {
"analyzer" : {
"default" : {
 "type" : "keyword"
}
}
}
}
}

Jörg

On Fri, Dec 19, 2014 at 8:15 AM, Eran Duchan  wrote:

> I'd like not to use analysis across my schema to save a bit of CPU (I know
> the penalty this inflicts on searching). Right now I set "index":
> "not_analyzed" per field but this is cumbersome.
>
> I know I can choose between default analyzers
> but
> there's no "none" analyzer to choose from. Short of writing a custom one
> that does nothing, is there a way to globally disable analysis?
>
> Eran
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9d75b098-451c-488e-a29f-4e1b116538d1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%2B-wi22gGEunNUKO1uEdZjGHoFNXvaZQF-cVjWUicq5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: $ES_HEAP_SIZE

2014-12-19 Thread Johan Öhr
What you need to do is to just create another init.d script, the difference 
between them should be that they point at different 
/etc/sysconfig/elasticsearch files, in here you should put different 
configs, dont set it in the /usr/share/elasticsearch.in.sh (i think its 
there default) 

With this you can also spin up a new instance for master node, just put the 
differences in the sysconfig file, different heap etc etc

Den fredagen den 8:e februari 2013 kl. 15:21:39 UTC+1 skrev Shawn Ritchie:
>
> Sorry Again, What if i wanted to run them as Services? I tried looking in 
> the /service/elasticsearch.conf and init.d but with no luck
>
> On Friday, 8 February 2013 13:40:19 UTC+1, Clinton Gormley wrote:
>>
>> On Fri, 2013-02-08 at 04:27 -0800, Shawn Ritchie wrote: 
>> > Already read that post, but from what i understood or misunderstood, 
>> > is its making the assumption you will have 1 instance of elastic 
>> > search running on a machine. 
>> > 
>> > What i'd like to do is with 1 elastic search installation is lunch 2 
>> > instance of elastic search with different /config /data /log node 
>> > name. 
>> > 
>> > 
>> > Or is it that multiple versions of elastic search on the same machine 
>> > run @ a directory level, that is 2 instances which are sharing 
>> > the /config /data and /log directries together with the node name? 
>>
>> You can run multiple instances with the same paths (including logging 
>> and data). 
>>
>> If you just want to specify a different node name, then you could do so 
>> on the command line: 
>>
>> ./bin/elasticsearch -Des.node.name=node_1 
>> ./bin/elasticsearch -Des.node.name=node_2 
>>
>> If you want to change more than that, you could specify a specific 
>> config file: 
>>
>> ./bin/elasticsearch -Des.config=/path/to/config/file_1 
>> ./bin/elasticsearch -Des.config=/path/to/config/file_2 
>>
>> clint 
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e0192b3d-1ee3-4192-92e7-d4a3180a467d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Oracle to Elasticsearch

2014-12-19 Thread joergpra...@gmail.com
With JDBC plugin, you can realize scenarios comparable to this one

http://tebros.com/2011/09/keep-mongodb-and-oracle-in-sync-using-streams-advanced-queuing/

so I am not sure why do do not want JDBC plugin?

Do you need a documentation?

Jörg

On Thu, Dec 18, 2014 at 7:07 PM, Marian Valero 
wrote:

> I want to migrate from oracle to elasticsearch data for analize that. I
> have using logstash to read a csv file but when input this data I have much
> logs that I have inserting, for example I have 10 lines and this has
> inserting 15 lines of logs. How can I fix that? This is my
> logstash.conf:
>
> input {
>   file {
>   path => "/home/mogangi/logstash-1.4.2/bin/eso.csv"
>   type => "responselog"
>   start_position => "beginning"
>   }
> }
> filter {
> csv {
> columns => ["ID", "DELIVERYID", "MSGID", "RSPDATE", "RAWRESPONSE",
> "PARSEDRESPONSE", "SHORTCODE", "INSID", "MOBILENUMBER", "CLEANTEXT"]
> separator => ","
> }
> }
> output {
> elasticsearch {
> action => "index"
> host => "localhost"
> index => "logstash-%{+.MM.dd}"
> workers => 1
> }
> # stdout {
> # codec => rubydebug
> # }
> }
>
> Other questions is, how can I connect logstash to oracle in real time, no
> local? without to use JDBC river.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0963bc61-83fe-4ae7-8dec-adfc3430c16b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHY6CMuqWEwCg705koK-JgX1x-i2VK%2Byzs7SQTJYDBZKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Globally disable analysis (i.e. no via per-field mapping)?

2014-12-19 Thread Eran Duchan
We use ElasticSearch to index our structured analytics data. We chose it 
for a few reasons:

   1. All fields are indexed so we can search by any field or combination 
   of fields, including nested fields
   2. Flexible and built in geospatial searches
   3. Can scale with our data, which grows at ~100M documents a day

It's pretty much a generic datastore (though not the source of truth). 

While we do have quite a few string fields in our data, these are mostly 
enumeration values ("connected", "not connected") and in preliminary tests 
we've found that disabling analysis (per field) shows savings of ~5% CPU. 
Not a huge amount but every bit helps.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ea428a1b-5b64-4992-8134-12a1ff4f0e97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2014-12-19 Thread joergpra...@gmail.com
A node does not send shard aggregations to the master, but to the client
node.

The basic idea of sharding in Elasticsearch is that shards spread over all
the nodes, and the shard count matches or comes close to the maximum number
of nodes. The shard distribution should be undistorted, that means, all
shards should be equal in size, volume, terms distribution etc. So in the
general case, a node has to process just one shard for aggregation, or
better, all nodes do equivalent work in the aggregation process.

There is not much gain in implementing extra intermediate aggregation stage
per node only because some users put more than one shard per index on a
node und configure weighted indices where some nodes have more shards than
others. Instead, adding more nodes is the best method to achieve better
scalability for this index, or creating more indices on more nodes, and
combining them with index aliases.

Jörg



On Thu, Dec 18, 2014 at 7:22 PM, AlexR  wrote:

> Nick,
>
> I am not an expert in this area either but with multi-core processors (24,
> 32, 48) it is not uncommon to have fairly large number of shards on a node
> so 30 shards is not out of question
> I assumed that ES aggregate shard results on a node prior to shipping them
> to a master but I do not know if it is true. It may very well be that node
> sends per shard aggregations to the master which case it it 32xShard
> ResultSize for our 32 shard node. reducing size of network packet by 32
> (even if it were just 8) and work for master by the same ratio is not a
> chump change. Somehow I think ES already doing it :-) but who knows
>
> Another potential benefit of doing node aggregation is that on a single
> node when aggregating multiple shards ES could resolve potential errors by
> aggregating all buckets and re-calculating buckets not present in every
> shard at a fairly low cost while doing so across nodes is costly. On the
> other hand it may amplify the error across nodes do not know
>
>
> On Thursday, December 18, 2014 11:26:37 AM UTC-5, Nikolas Everett wrote:
>>
>> I think aggregating 32 shards on one node is a bit degenerate.  I imagine
>> its more typical to aggregate across one of two shards per node.  Don't get
>> me wrong, you can totally have nodes store and query ~100 shards each
>> without much trouble.  If aggregating across a bunch of shards per node
>> were a common thing I think a node level reduce step might help.  I'm
>> certainly no expert in the reduce code though.
>>
>> Nik
>>
>> On Thu, Dec 18, 2014 at 10:48 AM, Yifan Wang 
>> wrote:
>>>
>>> Sorry, if I did not make it clear. For sure I know aggregation is done
>>> on the node for each shard, but here is the challenge. Say we set
>>> shard_size=50,000. ES will aggregate on each shard and create buckets for
>>> the matching documents, and then send top 50,000 buckets to the client node
>>> for Reduce. Say we have 50 data nodes, and each node has 32 shards. This
>>> means we need to send 50,000 buckets from each shard to the client node for
>>> final aggregation. First, this may add heavy traffic to the network (what
>>> if we have 100 nodes?). And second, the client will need to aggregate on
>>> received 50*32*50,000 buckets. Would this cause any congestion on the
>>> client node? However if we can aggregate on the node first, meaning reduce
>>> from 32 buckets to only one bucket, then the client node only has to
>>> process 50 buckets. This would significanly reduce the network traffic and
>>> improve the scalability, plus because we can set relatively larger
>>> shard_size, it will improve the accuracy of the final results, which is
>>> another key issue we are facing in distributed environment on aggregations.
>>>
>>> So my key question is about the scalability particularly on
>>> aggregations. It seems to be a challenge in my experience. I just want to
>>> hear other people's experience. On heavy analytics applications, this will
>>> be a key.
>>>
>>> Of course, I also understand, adding node level aggregation may impact
>>> the overall performance. I am wondering if anyone has thought about or done
>>> anything in this aspect.
>>>
>>> BTW, I like ElasticSearch, but want to hear from the community on some
>>> of the key challenges.
>>>
>>>
>>>
>>> On Thursday, December 18, 2014 9:34:07 AM UTC-5, Adrien Grand wrote:

 +1 to what AlexR said. I think there is indeed a bad assumption that
 shards just forward data to the coordinating node, this is not the case.

 On Thu, Dec 18, 2014 at 1:09 AM, AlexR  wrote:
>
> if you take a terms aggregation, the heavy lifting of the aggregation
> is done on each node then aggregated results are combined on the master
> node. So if you have thousands of nodes and very high cardinality nested
> aggs the merging part may become a bottleneck but cost of doing actual
> aggregation in most cases is far higher than cost of merging results from
> reasonable number of shards. So in practice I thin

Re: Elasticsearch taking a long time for garbage collection

2014-12-19 Thread shriyansh jain
Okay, I have 10GB and 8GB, will make them same, will make it same. And 
another issue I am facing is the ELK stack is choking, and as soon as I 
restart elasticsearch it starts up.

Thanks!
Shriyansh

On Friday, December 19, 2014 12:47:43 AM UTC-8, Mark Walkom wrote:
>
> Sorry, I mean you should really have the heap the *same* size on each node.
>
> On 19 December 2014 at 09:16, shriyansh jain  > wrote:
>>
>> I have Heap Size on both the nodes.
>>
>> Using openjdk 1.7.0_09-icedtea and elasticsearch : "1.2.2".
>>
>> Thanks!
>> Shriyansh
>>
>> On Thursday, December 18, 2014 11:50:32 PM UTC-8, Mark Walkom wrote:
>>>
>>> You should really have heap the size on both nodes.
>>>
>>> What ES and java versions are you on?
>>>
>>> On 18 December 2014 at 19:54, shriyansh jain  
>>> wrote:

 Hi All,

 I am seeing some warning message in elasticsearh log files which are 
 taking pretty long time for garbage collection.

 [2014-12-17 10:12:23,789][INFO ][monitor.jvm  ] [Node1] 
 [gc][young][145219][4796] duration [901ms], collections [1]/[1.5s], total 
 [901ms]/[7.1m], memory [7.2gb]->[7.2gb]/[9.9gb], all_pools {[young] 
 [104.7mb]->[518.1kb]/[133.1mb]}{[survivor] 
 [8.3mb]->[12.6mb]/[16.6mb]}{[old] [7.1gb]->[7.2gb]/[9.8gb]}
 [2014-12-17 10:12:29,172][WARN ][monitor.jvm  ] [Node1] 
 [gc][young][145220][4797] duration [4.1s], collections [1]/[5s], total 
 [4.1s]/[7.2m], memory [7.2gb]->[7.4gb]/[9.9gb], all_pools {[young] 
 [518.1kb]->[109.4mb]/[133.1mb]}{[survivor] 
 [12.6mb]->[16.6mb]/[16.6mb]}{[old] [7.2gb]->[7.3gb]/[9.8gb]}
 [2014-12-17 10:12:33,188][INFO ][monitor.jvm  ] [Node1] 
 [gc][young][145224][4798] duration [791ms], collections [1]/[1s], total 
 [791ms]/[7.2m], memory [8.9gb]->[8.8gb]/[9.9gb], all_pools {[young] 
 [130.2mb]->[2kb]/[133.1mb]}{[survivor] [16.6mb]->[16.6mb]/[16.6mb]}{[old] 
 [8.7gb]->[8.8gb]/[9.8gb]}
 [2014-12-17 10:13:18,476][INFO ][monitor.jvm  ] [Node1] 
 [gc][young][145268][4799] duration [710ms], collections [1]/[1.3s], total 
 [710ms]/[7.2m], memory [2.9gb]->[2.8gb]/[9.9gb], all_pools {[young] 
 [128.6mb]->[3.5mb]/[133.1mb]}{[survivor] [16.6mb]->[11mb]/[16.6mb]}{[old] 
 [2.7gb]->[2.8gb]/[9.8gb]}


 I have cluster of two elasticsearch nodes, with 10GB and 8GB of Heap. 
 Currently 60 shards in elasticsearch with around 2.5GB of data. What can 
 be 
 a probable reason for GC to take such a along time.


 Thanks!
 Shriyansh  

 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/6a92e329-a8ad-4d8e-96ac-5a9025c647cb%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/bf0052a6-aba0-4b7f-b84f-051b5bd7fb44%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6aa4dbcc-52b6-4703-8ab6-aa9462738849%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch taking a long time for garbage collection

2014-12-19 Thread Mark Walkom
Sorry, I mean you should really have the heap the *same* size on each node.

On 19 December 2014 at 09:16, shriyansh jain 
wrote:
>
> I have Heap Size on both the nodes.
>
> Using openjdk 1.7.0_09-icedtea and elasticsearch : "1.2.2".
>
> Thanks!
> Shriyansh
>
> On Thursday, December 18, 2014 11:50:32 PM UTC-8, Mark Walkom wrote:
>>
>> You should really have heap the size on both nodes.
>>
>> What ES and java versions are you on?
>>
>> On 18 December 2014 at 19:54, shriyansh jain 
>> wrote:
>>>
>>> Hi All,
>>>
>>> I am seeing some warning message in elasticsearh log files which are
>>> taking pretty long time for garbage collection.
>>>
>>> [2014-12-17 10:12:23,789][INFO ][monitor.jvm  ] [Node1]
>>> [gc][young][145219][4796] duration [901ms], collections [1]/[1.5s], total
>>> [901ms]/[7.1m], memory [7.2gb]->[7.2gb]/[9.9gb], all_pools {[young]
>>> [104.7mb]->[518.1kb]/[133.1mb]}{[survivor] [8.3mb]->[12.6mb]/[16.6mb]}{[old]
>>> [7.1gb]->[7.2gb]/[9.8gb]}
>>> [2014-12-17 10:12:29,172][WARN ][monitor.jvm  ] [Node1]
>>> [gc][young][145220][4797] duration [4.1s], collections [1]/[5s], total
>>> [4.1s]/[7.2m], memory [7.2gb]->[7.4gb]/[9.9gb], all_pools {[young]
>>> [518.1kb]->[109.4mb]/[133.1mb]}{[survivor]
>>> [12.6mb]->[16.6mb]/[16.6mb]}{[old] [7.2gb]->[7.3gb]/[9.8gb]}
>>> [2014-12-17 10:12:33,188][INFO ][monitor.jvm  ] [Node1]
>>> [gc][young][145224][4798] duration [791ms], collections [1]/[1s], total
>>> [791ms]/[7.2m], memory [8.9gb]->[8.8gb]/[9.9gb], all_pools {[young]
>>> [130.2mb]->[2kb]/[133.1mb]}{[survivor] [16.6mb]->[16.6mb]/[16.6mb]}{[old]
>>> [8.7gb]->[8.8gb]/[9.8gb]}
>>> [2014-12-17 10:13:18,476][INFO ][monitor.jvm  ] [Node1]
>>> [gc][young][145268][4799] duration [710ms], collections [1]/[1.3s], total
>>> [710ms]/[7.2m], memory [2.9gb]->[2.8gb]/[9.9gb], all_pools {[young]
>>> [128.6mb]->[3.5mb]/[133.1mb]}{[survivor] [16.6mb]->[11mb]/[16.6mb]}{[old]
>>> [2.7gb]->[2.8gb]/[9.8gb]}
>>>
>>>
>>> I have cluster of two elasticsearch nodes, with 10GB and 8GB of Heap.
>>> Currently 60 shards in elasticsearch with around 2.5GB of data. What can be
>>> a probable reason for GC to take such a along time.
>>>
>>>
>>> Thanks!
>>> Shriyansh
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/6a92e329-a8ad-4d8e-96ac-5a9025c647cb%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/bf0052a6-aba0-4b7f-b84f-051b5bd7fb44%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_-PK4%3DN-5%3Dc2X8WnybYfkrw5qBcQ7i8%3D2j3kNbyp%2B7uA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Default shard allocation (where new shards are created)

2014-12-19 Thread Robin Clarke
My current setup is with 10 nodes with ample space on spinning disks, and 
20 nodes with smaller SSD disks.
I would like my workflow to be that all data is initially indexed on the 
SSD nodes, after 10 days is reallocated to the spinning disks, after a 
further 10 days the index is closed, and after a further 70 days the 
indexes are deleted.
The curator is great for moving them to the spinning disks, but what I am 
not sure about is how to define that initially all shards (primary and 
replica) of an index should be created on the ssd nodes.
The spinning disk nodes are tagged:
node.disk_type: spinning
The ssd nodes are tagged
node.disk_type: ssd

To transfer after 10 days to spinning:
 /usr/local/bin/curator --host es101 allocation --older-than 9 --rule 
disk_type=spinning

But how do I define that the default location for all new shards should be 
on disk_type:ssd ?

I have the example here 

 which 
I think could be modified like this:

curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"cluster.routing.allocation.require.disk_type" : "ssd"
}
}'

But for one this setting does not exist, and I'm not sure if this will stop 
the shards being reallocated to spinning later on...

Any ideas how to implement my desired workflow?

Thank you!
-Robin-

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/18615b7c-8e99-42ee-b54d-ef06a1888181%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance difference between REST and Java API

2014-12-19 Thread joergpra...@gmail.com
The idea is for tracing the issue to rebuild the query from simple to
complex and compare.

I would start with the query and add the filters step by step, to identify
the part which causes trouble.

On possible cause is that REST API is using some defaults which have to be
set in the Java API explicitly.

Jörg

On Fri, Dec 19, 2014 at 1:18 AM, Marie Jacob  wrote:

> Thanks Jörg.
>
> I tried that, and it seems to make no difference. All I could gather are
> these graphs (attached) from the bigdesk plugin that are showing
> performance for the query and fetch phase (the first is during ES java
> client use, the second is using the REST calls) . It's really weird that
> the FETCH phase is taking any time at all, if I set _source = false.
>
>
>
> On Thursday, December 18, 2014 5:33:05 PM UTC-5, Jörg Prante wrote:
>>
>> How about this?
>>
>> https://gist.github.com/anonymous/509b3db873a30d8961ed#comment-1359074
>>
>> Jörg
>>
>> On Thu, Dec 18, 2014 at 7:48 PM, Marie Jacob  wrote:
>>>
>>>
>>> I'm benchmarking results for an ES cluster, using both the REST api and
>>> native Java client. We're getting very different response times, between
>>> each of these (the REST api is doing approximately 50% better in the 95th
>>> percentile). I was wondering what the cause of this is, since the search
>>> requests look identical to me (unless I missed something).
>>>
>>> Here are the queries:
>>>
>>> https://gist.github.com/anonymous/509b3db873a30d8961ed
>>>
>>> Our setup: 20 node cluster, 20 shards/2 replicas, and benchmarking from
>>> a separate machine with JMeter (an HTTP sampler for each of the 20 nodes,
>>> and a separate test plan with a Java sampler initializing the ES java
>>> client, and sending queries).
>>>
>>> Any help appreciated.
>>>
>>>
>>> -M
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/a5915f2f-ed31-4abb-af68-77c67a586aa9%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/060e3e34-1326-4d6d-a645-670924818d6c%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHRJSL_wJoj-Q%2BypXC3Ej79fNPC-6j-BEJ-LMpPXU8yDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


native script sorting geoPoints

2014-12-19 Thread Ronny Deter
Hi i have an nested object 

"root" {

  "userLocations": {
"type" : "nested",
"include_in_parent" : true,
"properties": {
"id": {
"type": "integer"
},
"location": {
"properties": {
"city": {
 "type": "string"
},
"zipCode": {
"type": "string"
},
"countryIsoCode": {
"type": "string"
},
"longitude": {
"type": "float"
},
"latitude": {
"type": "float"
},
"geoPoint" : {
"type": "geo_point"
}
}
}
}
}

i have to one root document n userLocations as nested objects.
I can serach with an native script for the closest distance to an point.
But i can not filter the userLocations.id for the closest distance.
Because userLocations.id is sorted by numeric.

Here is my script:

/**
 * Created by rdeter on 18.12.14.
 */
public class NestedLocation extends AbstractLongSearchScript {
double distance = 10;
double lat;
double lon;
long   id;

double tmpDistance;

public NestedLocation(@Nullable Map params) {
lat = ((Double)params.get("lat")).doubleValue();
lon = ((Double)params.get("lon")).doubleValue();
}

@Override
public long runAsLong() {

List list  = ((ScriptDocValues.GeoPoints) 
doc().get("userLocations.location.geoPoint")).getValues();
List userLocationIds = ((ScriptDocValues.Longs) 
doc().get("userLocations.id")).getValues();

int z = 0;

for (GeoPoint geoPoint : list) {
tmpDistance = GeoDistance.PLANE.calculate(geoPoint.getLat(), 
geoPoint.getLon(), lat, lon, DistanceUnit.KILOMETERS);
if (tmpDistance < distance ) {
distance = tmpDistance;
id = userLocationIds.get(z);
}
z++;
}

return id;
}
}



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/46a872e7-cb9e-4230-b15f-74b035890a28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch taking a long time for garbage collection

2014-12-19 Thread shriyansh jain
I have Heap Size on both the nodes.

Using openjdk 1.7.0_09-icedtea and elasticsearch : "1.2.2".

Thanks!
Shriyansh

On Thursday, December 18, 2014 11:50:32 PM UTC-8, Mark Walkom wrote:
>
> You should really have heap the size on both nodes.
>
> What ES and java versions are you on?
>
> On 18 December 2014 at 19:54, shriyansh jain  > wrote:
>>
>> Hi All,
>>
>> I am seeing some warning message in elasticsearh log files which are 
>> taking pretty long time for garbage collection.
>>
>> [2014-12-17 10:12:23,789][INFO ][monitor.jvm  ] [Node1] 
>> [gc][young][145219][4796] duration [901ms], collections [1]/[1.5s], total 
>> [901ms]/[7.1m], memory [7.2gb]->[7.2gb]/[9.9gb], all_pools {[young] 
>> [104.7mb]->[518.1kb]/[133.1mb]}{[survivor] 
>> [8.3mb]->[12.6mb]/[16.6mb]}{[old] [7.1gb]->[7.2gb]/[9.8gb]}
>> [2014-12-17 10:12:29,172][WARN ][monitor.jvm  ] [Node1] 
>> [gc][young][145220][4797] duration [4.1s], collections [1]/[5s], total 
>> [4.1s]/[7.2m], memory [7.2gb]->[7.4gb]/[9.9gb], all_pools {[young] 
>> [518.1kb]->[109.4mb]/[133.1mb]}{[survivor] 
>> [12.6mb]->[16.6mb]/[16.6mb]}{[old] [7.2gb]->[7.3gb]/[9.8gb]}
>> [2014-12-17 10:12:33,188][INFO ][monitor.jvm  ] [Node1] 
>> [gc][young][145224][4798] duration [791ms], collections [1]/[1s], total 
>> [791ms]/[7.2m], memory [8.9gb]->[8.8gb]/[9.9gb], all_pools {[young] 
>> [130.2mb]->[2kb]/[133.1mb]}{[survivor] [16.6mb]->[16.6mb]/[16.6mb]}{[old] 
>> [8.7gb]->[8.8gb]/[9.8gb]}
>> [2014-12-17 10:13:18,476][INFO ][monitor.jvm  ] [Node1] 
>> [gc][young][145268][4799] duration [710ms], collections [1]/[1.3s], total 
>> [710ms]/[7.2m], memory [2.9gb]->[2.8gb]/[9.9gb], all_pools {[young] 
>> [128.6mb]->[3.5mb]/[133.1mb]}{[survivor] [16.6mb]->[11mb]/[16.6mb]}{[old] 
>> [2.7gb]->[2.8gb]/[9.8gb]}
>>
>>
>> I have cluster of two elasticsearch nodes, with 10GB and 8GB of Heap. 
>> Currently 60 shards in elasticsearch with around 2.5GB of data. What can be 
>> a probable reason for GC to take such a along time.
>>
>>
>> Thanks!
>> Shriyansh  
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6a92e329-a8ad-4d8e-96ac-5a9025c647cb%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf0052a6-aba0-4b7f-b84f-051b5bd7fb44%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.