Logstash not connecting elasticsearch?

2014-11-27 Thread Siddharth Trikha
I am running ELK setup on a single machine. Everything was working fine 
untill I switched off my internet connection.

My logstash console shows this error when I switch off Internet connection: 

log4j, [2014-11-27T10:31:57.480]  WARN: 
org.elasticsearch.transport.netty: [logstash-HP-Pro] exception caught on 
transport layer [[id: 0x7a124750]], closing connection
java.net.SocketException: Network is unreachable
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:465)
at sun.nio.ch.Net.connect(Net.java:457)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
at 
org.elasticsearch.common.netty.channel.Channels.connect(Channels.java:634)
at 
org.elasticsearch.common.netty.channel.AbstractChannel.connect(AbstractChannel.java:207)
at 
org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:229)
at 
org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182)
at 
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:705)
at 
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:647)
at 
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:615)
at 
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:129)
at 
org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:338)
at 
org.elasticsearch.discovery.zen.ZenDiscovery.access$500(ZenDiscovery.java:79)
at 
org.elasticsearch.discovery.zen.ZenDiscovery$1.run(ZenDiscovery.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

***EDIT***

Logstash output config:

output {

elasticsearch { host = localhost }
stdout { codec = rubydebug }
}

So it seams it's unable to connect to ES server.

So is an internet connection be required always?? I am new to the setup.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/293a5ddf-cab0-4ef0-80d1-41f924528d48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this

2014-11-27 Thread Jilles van Gurp
Our production cluster went yellow last night after our logstash index 
rolled over to the next version. I've seen this happen before but this time 
I decided to properly diagnose and seek some feedback on what might be 
going on. 

So, I'd love some feedback on what is going on. I'm happy to keep this 
cluster in a yellow state for a limited time to get some help from people 
in this group trying to diagnose this properly and maybe help some others 
who face the same issues. However, I will need to fix this one way or 
another before end of business day today. I plan to perform a rolling 
restart to see if node reinitialization fixes things. If not, I'll remove 
the problematic logstash index and move on. I'd love suggesttions for less 
intrusive solutions. I don't like losing data and rolling restarts are kind 
of tedious to babysit. Tends to take 45 minutes or so.

Below is some information I've gathered. Let me know if you need me to 
extract more data. 

First the obvious:

{
  status : 200,
  name : 192.168.1.13,
  cluster_name : linko_elasticsearch,
  version : {
number : 1.4.0,
build_hash : bc94bd81298f81c656893ab1d30a99356066,
build_timestamp : 2014-11-05T14:26:12Z,
build_snapshot : false,
lucene_version : 4.10.2
  },
  tagline : You Know, for Search
}

[linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty
{
  cluster_name : linko_elasticsearch,
  status : yellow,
  timed_out : false,
  number_of_nodes : 5,
  number_of_data_nodes : 3,
  active_primary_shards : 221,
  active_shards : 619,
  relocating_shards : 0,
  initializing_shards : 2,
  unassigned_shards : 1
}

So we're yellow and the reason is initializing and unassigned shards. We 
have five nodes, of which three are data nodes. It seems we are hitting 
some kind of resilience issue. The three machines have plenty of diskspace 
and memory.

I found this in the log of one of our es nodes:
[2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] [192.168.1.13] 
[logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], 
node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID 
[-mMLqYjAQuCUDcczYf5SHA], reason [Failed to start shard, message 
[RecoveryFailedException[[logstash-2014.11.27][4]: Recovery failed from 
[192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io][inet[/192.168.1.14:9300]] 
into 
[192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}];
 
nested: 
RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:9300]][internal:index/shard/recovery/start_recovery]];
 
nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution 
failed]; nested: 
RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]];
 
nested: NumberFormatException[For input string: finished]; ]]

on the mentioned node there's a corresponding messages:
[2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] [192.168.1.14] 
[logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], 
node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID 
[-mMLqYjAQuCUDcczYf5SHA], reason [Failed to perform 
[indices:data/write/bulk[s]] on replica, message 
[RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][indices:data/write/bulk[s][r]]];
 
nested: NumberFormatException[For input string: finished]; ]]

All three data nodes have similar messages happening over and over again.

Our cluster has been up for a couple of weeks and seems pretty happy 
otherwise. I deleted some older logstash indices a few days ago. The 
cluster has logstash data and a few smallish indiceses we use for our 
inbot.io service. The issue appears to be related to the logstash index 
rollover. Our app servers and kibana talk to the two non data nodes that we 
run on both our application servers.

My next stop was kibana which we use on the same cluster with the logstash 
index that is probably causing us issues. Looking at that, I noticed a few 
interesting things:

   - logstash indexing seems to be fine (good) and it appears there has 
   been no data loss yet
   - our cpu load jumped around midnight and sort of stayed up on all three 
   nodes. We measure this using collectd and both mean and max load jumped to 
   around 1 around the time the index rollover happened. 

My next step was using curl -XGET 'localhost:9200/_cat/recovery?v'

All the indices listed there looked fine. I'll spare you the output but 
everything appared to be in the 'done' stage. 

Finally, I did 
[linko@es3 elasticsearch]$ curl -XGET 
'localhost:9200/_cluster/health/logstash-2014.11.27/?pretty'
{
  cluster_name : linko_elasticsearch,
  status : yellow,
  timed_out : false,
  number_of_nodes : 5,
  number_of_data_nodes : 3,
  active_primary_shards : 5,
  active_shards : 12,
  relocating_shards : 0,
  initializing_shards : 2,
  unassigned_shards : 1
}

So, that confirms last night's new logstash index is the issue and it 

Kibana: To get all _types by using _index

2014-11-27 Thread tudit
I want to search on an index and get all index types under that index.I 
want to create a terms panel/table for it.It works at index_type level but 
not at index level.

I'am unable to search by index as well.I used the filter- 
_index:name_of_index, it returned no results but 
_type:name_of_index_type works fine for searching on type. It returned 
the expected result. 

How can I achieve this using Kibana?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/26d5fa7e-a442-4646-9fcb-c763e56242b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Requests per second

2014-11-27 Thread Ernesto Reig
Hello,
I don´t really know how to measure the number of requests per second made 
to Elasticsearch cluster. I would like to know specifically the number of 
search requests per second.
If I am not wrong, the stat query_total in search increases by the 
number of shards being accessed in every search, so if it increases in 
let´s say 25 units, that does not mean you had 25 _search requests but ES 
has used 25 shards in total to answer to a single (or several) _search 
request(s).
So, is it possible at all to know the number of _search requests per second 
that your cluster receives? If not, is there any close approach to this?

Thank you very much,

Ernesto

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Requests per second

2014-11-27 Thread joergpra...@gmail.com
You have to count the requests per second in your client.

Jörg

On Thu, Nov 27, 2014 at 1:25 PM, Ernesto Reig erniru...@gmail.com wrote:

 Hello,
 I don´t really know how to measure the number of requests per second made
 to Elasticsearch cluster. I would like to know specifically the number of
 search requests per second.
 If I am not wrong, the stat query_total in search increases by the
 number of shards being accessed in every search, so if it increases in
 let´s say 25 units, that does not mean you had 25 _search requests but ES
 has used 25 shards in total to answer to a single (or several) _search
 request(s).
 So, is it possible at all to know the number of _search requests per
 second that your cluster receives? If not, is there any close approach to
 this?

 Thank you very much,

 Ernesto

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFJa1KHgdHmkazm%3DqZLG8HXzGMaWMg%2BcsBhRNtVn2ZTpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Requests per second

2014-11-27 Thread Jürgen Wagner (DVT)
Hi Ernesto,
  you may use a tool like JMeter to throw randomized queries and other
operations at an Elasticsearch instance. JMeter will take care of
counting requests per time and possibly also register other parameters
you may use for a more detailed evaluation. At the end of the day, the
total client count of queries that were processed in a given time frame
is important. Don't rely on Elasticsearch-internal counters that may
include other requests not visible to clients. The net performance
should be measured end-to-end, possibly not even on the REST service or
transport level, but at the respective front-end application (where one
page view may trigger multiple Elasticsearch requests).

Best regards,
--Jürgen

On 27.11.2014 13:25, Ernesto Reig wrote:
 Hello,
 I don´t really know how to measure the number of requests per second
 made to Elasticsearch cluster. I would like to know specifically the
 number of search requests per second.
 If I am not wrong, the stat query_total in search increases by the
 number of shards being accessed in every search, so if it increases in
 let´s say 25 units, that does not mean you had 25 _search requests but
 ES has used 25 shards in total to answer to a single (or several)
 _search request(s).
 So, is it possible at all to know the number of _search requests per
 second that your cluster receives? If not, is there any close approach
 to this?

 Thank you very much,

 Ernesto
 -- 
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout.


-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center Intelligence
 Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
mailto:juergen.wag...@devoteam.com, URL: www.devoteam.de
http://www.devoteam.de/


Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54771E87.4080403%40devoteam.com.
For more options, visit https://groups.google.com/d/optout.
attachment: juergen_wagner.vcf

Re: Requests per second

2014-11-27 Thread Ernesto Reig
Ok, so I guess the answer is No, there is no way to know it from ES point 
of view. If you want to know it you have to count them from the client.
Thank you very much for the answers :)

On Thursday, November 27, 2014 1:52:27 PM UTC+1, Jürgen Wagner (DVT) wrote:

  Hi Ernesto,
   you may use a tool like JMeter to throw randomized queries and other 
 operations at an Elasticsearch instance. JMeter will take care of counting 
 requests per time and possibly also register other parameters you may use 
 for a more detailed evaluation. At the end of the day, the total client 
 count of queries that were processed in a given time frame is important. 
 Don't rely on Elasticsearch-internal counters that may include other 
 requests not visible to clients. The net performance should be measured 
 end-to-end, possibly not even on the REST service or transport level, but 
 at the respective front-end application (where one page view may trigger 
 multiple Elasticsearch requests).

 Best regards,
 --Jürgen

 On 27.11.2014 13:25, Ernesto Reig wrote:
  
 Hello, 
 I don´t really know how to measure the number of requests per second made 
 to Elasticsearch cluster. I would like to know specifically the number of 
 search requests per second.
 If I am not wrong, the stat query_total in search increases by the 
 number of shards being accessed in every search, so if it increases in 
 let´s say 25 units, that does not mean you had 25 _search requests but ES 
 has used 25 shards in total to answer to a single (or several) _search 
 request(s).
 So, is it possible at all to know the number of _search requests per 
 second that your cluster receives? If not, is there any close approach to 
 this?

  Thank you very much,

  Ernesto
  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/69edcaa5-8fb1-4c65-a2c4-f3fd25c259da%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



 -- 

 Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С 
 уважением
 *i.A. Jürgen Wagner*
 Head of Competence Center Intelligence
  Senior Cloud Consultant 

 Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
 Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
 E-Mail: juergen...@devoteam.com javascript:, URL: www.devoteam.de
 --
 Managing Board: Jürgen Hatzipantelis (CEO)
 Address of Record: 64331 Weiterstadt, Germany; Commercial Register: 
 Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 


  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/38de9d8c-ad6b-4f63-ac82-59a004a10c20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES security measures?

2014-11-27 Thread joergpra...@gmail.com
It is no difference to other distributed software.

There are many facets of security.

If you want authorized access, add a system which authenticates users and
manages roles. Elasticsearch does not do this for you.

If you want others to not read the Elasticsearch data traffic, set up a
private network http://en.wikipedia.org/wiki/Private_network with your own
gateway/router plus a reverse proxy for internet access.

If you want to trust in your Elasticsearch cluster and keep others from
tampering your data, then set up all the hardware and the network
connection by yourself and lock others out from physical access to the
facility.

You can wait for the Elasticsearch security extension which has been
announced.

Jörg


On Thu, Nov 27, 2014 at 6:39 AM, Siddharth Trikha 
siddharthtrik...@gmail.com wrote:

 I have set up my ELK stack on a single server and tested it on a very
 small setup to get a hands-down on ELK.
 I want to use ELK for my system logs analysis.

 Now, I have been reading about ES that it has no security. Also read
 something like this:
 DO NOT have ES publicly accessible. That's the equivalent of making your
 Wordpress MySQL database accessible to the world. ES is a REST accessible
 DB which means that anyone can delete all of your data with access to the
 endpoint.

 I am a noob in this. So this means if I put my logs in ES will they be
 accessible to everyone (which is scary) ??

 Please guide me with what all security measures must be taken ?? Please
 suggest some links so that I can ensure security.
 How to keep my ES cluster private ??

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/236c0359-46cb-4359-8484-c311fb102db2%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/236c0359-46cb-4359-8484-c311fb102db2%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2D6qpQubeceKoz0N67RFTD9HWW%2BK4Dk1Q7b1%3DUgJzdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Behavior of multi_field at index and query time

2014-11-27 Thread nilsga
I hate to bump this post, but I would really appreciate if anyone has any 
input regarding this.

Regards,

Nils-Helge Garli Hegvik

On Tuesday, November 25, 2014 4:20:49 PM UTC+1, nil...@gmail.com wrote:

 We have a mapping where one of the fields is an integer, but we want to 
 change this to a double. We want to avoid re-indexing, since there will be 
 a lot of documents at migration time. Hence, we were considering using a 
 multi_field (now apparently deprecated, but I guess the same applies for 
 the fields of a property) for this scenario, where the field is both 
 treated as an integer and a double. This means that on the day of 
 migration, all the old documents will only have the integer value set, and 
 all new documents will have the double value set. In our code, we will only 
 treat this value as a double. We have been doing some testing, and it seems 
 like it should work, but I would like to confirm our findings as expected 
 rather than by chance. Let's call the field the_field, and the double 
 property double, like this:

 the_field: {
 type: integer,
  fields: {
  double: {
  type: double
   }
  }
 } 

 - Without changing our indexing code, when writing a double value to 
 the_field, the value is automatically written to the field as a double. 
 When fetching the document back, the value of the_field is a double. New 
 and old documents looks the same. Old documents have integer values and new 
 documents have the double values for the_field. I would expect new 
 documents to have the_field.double in the result instead, but this does 
 not seem to be the case (which is good for us, if that is intended).
 - When querying the_field, say with a range query, both old and new 
 documents appear in the result, but the double part of the value in the new 
 documents are ignored. So 2.534 is treated as the value 2 in the range (or 
 in sorting). This means that if the range is lte: 2, then even double 
 values up to  3 are included in the range.
 - When querying the_field.double, say with a range query both old and 
 new documents appear in the result, and the values from both old and new 
 documents are treated as double values, as opposed to the previous example. 
 So if the range is lte: 2, then only integer _and_ double values  2 are 
 included in the range.

 Are these observations correct, and as expected? Or is it a side effect 
 of some kind that we should not rely upon? And I assume the rules for 
 queries also applies to aggregations? If this is in fact expected behavior, 
 is it possible to alias the_field.double to the_field in queries, so 
 it is by default treated as the double value? 

 Regards,

 Nils-Helge Garli Hegvik


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52160e2d-0c9b-4f81-8a1d-e0e2b9afd437%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to migrate lucene index into elasticsearch

2014-11-27 Thread Gaurav gupta
Otis,
I am not sure how many of our customers will accept to re-index the whole
data as they are using it since long, although I am trying to convince my
Senior Product Management to keep both Lucene and ES. Some old customers
can think to migrate to ES if they need better real-time performance
through distributed ES.

Note :- Currently, the major reason to migrate to ES from Lucene is to have
better distributed support for faster real-time search. I have an embedded
Search Engine in our product which is based on Lucene 4.8.1 and now I would
like to migrate it to latest ElasticSearch 1.4 for better distributed
support (sharding and replication, mainly).

Thanks
Gaurav

On Sun, Nov 23, 2014 at 4:11 AM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 I can not tell if it will work, but if you could translate your xml
 mapping into an Elasticsearch mapping it would be great.

 The next steps would be to create an empty index with the mapping, using 1
 shard and no replica, _source and _all disabled. Then you could index one
 test doc over the ES API. After this, you can find out in the data folder
 where ES created the segments files. By exchanging them with a copy of your
 Lucene segment files, they should get picked up - or you get nasty errors
 because ES uses a custom Lucene index format and can not process standard
 Lucene segments.

 Jörg


 On Thu, Nov 20, 2014 at 2:26 PM, Gaurav gupta gupta.gaurav0...@gmail.com
 wrote:

 Thanks Jorg for the guidance and I have am trying the suggested approach
 #1 and I have further question on it.

 As you mentioned - *- a custom written tool could traverse the segments
 and extract field information and build a rudimentary mapping (without
 analyzer, without info about _all and _source and all Elasticsearch
 add-ons).*

 We already have a Lucene Index metadata (i.e. field names, type, analyzer
 etc.) available as an xml, so I can create the mapping without traversing
 the segments. Should I create segment file segments.gen using the mapping
 file and using some dummy values and then put all the other old lucene
 index files ( except segments.gen ) from existing lucene index files
 (e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.)

 *sample mapping xml file :-*
 Mapping
 indexField
 analyzedtrue/analyzed
 fieldanalyzerStandard/fieldanalyzer
 indexFieldNameAddressLine1/indexFieldName
 nameAddressLine1/name
 storedtrue/stored
 typestring/type
 /indexField
 indexField
 analyzedtrue/analyzed
 fieldanalyzerStandard/fieldanalyzer
 indexFieldNameBuilding_Name/indexFieldName
 nameBuilding_Name/name
 storedtrue/stored
 typestring/type
 /indexField
 indexField
 analyzedtrue/analyzed
 fieldanalyzerKeyword/fieldanalyzer
 indexFieldNameGNAF_PID/indexFieldName
 nameGNAF_PID/name
 storedtrue/stored
 typestring/type
 /indexField


 ...
 /Mapping

 Thanks

 On Thu, Nov 13, 2014 at 11:59 PM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 It is almost impossible to use just binary-only Lucene index for
 migration, because Elasticsearch needs additional info which is not
 available in Lucene. The only method is to reindex data over the
 Elasticsearch API.

 There is a bumpy road but I don't know if one ever tried that:

 - a custom written tool could traverse the segments and extract field
 information and build a rudimentary mapping (without analyzer, without info
 about _all and _source and all Elasticsearch add-ons)

 - another tool could try to reconstruct docs (like the tool Luke) and
 write them to a file in bulk format. Not having the source of the docs
 means it must be possible to retrieve the original input from the Lucene
 index (which is almost never the case)

 - the result could be re-indexed using the Elasticsearch API (assuming
 all analyzers and tokenizers are in place) but a lot of work would have to
 be done

 The preferred way is to rewrite the code that uses the Lucene API to use
 the Elasticsearch API and re-run the indexing process.

 Jörg

 On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta 
 gupta.gaurav0...@gmail.com wrote:

 Hi All,

 I have an embedded Search Engine in our product which is based on
 Lucene 4.8.1 and now I would like to migrate it to latest ElasticSearch 1.4
 for better distributed support (sharding and replication, mainly). Could
 you guide me how one should migrate the existing indexes created by Lucene
 to ES.

 I have referred to the mail thread - migrate lucene index into
 elasticsearch
 https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
 And based on the discussion in it appears to me that  it's not a easy job
 or even not feasible. I am wondering if there is some plugin (river) or
 tool or any work around available to migrate the existing indexes
 created by Lucene to ES.

 I googled that an ES plugin available for SOLR to ES migration :
 http://blog.trifork.com/2013/01/29/migrating-apache-solr-to-elasticsearch/ 
 .
 Do we have someting similar for Lucene to ES 

ClusterBlockException after closing an index

2014-11-27 Thread Bruno Cruz
Hi,

I hope this is the correct group for asking about this behavior, sorry in 
advance if it isn't but I would greatly appreciate some help. I'm pretty 
new to ElasticSearch itself, currently trying to setup an ELK stack to 
analyze a couple of logs.

Recently, I've tried to look around for a mechanism for data retiring and I 
started looking into the flush/close/delete options in ES. What I would 
like to do ultimately is create a cron job (maybe using Curator[1] for 
help) that would close indexes over a couple a days and delete them after a 
week. So as a test, I tried closing an index to see what would happen and, 
after I closed it, I noticed this error in the Kibana interface:

 Oops! ClusterBlockException[blocked by: [FORBIDDEN/4/index closed];]

the same doesn't happen if I simply delete the index. I would like to 
continue seeing data from more recent indexes in Kibana, but simply close 
older ones. What am I doing wrong? Is there a way for me to keep some older 
(Logstash sent) data closed in case I need to open and query it later?

Best regards,

Bruno C.


[1] https://github.com/elasticsearch/curator/

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/609afd12-b257-4a06-b690-88a1f9ed027f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to migrate lucene index into elasticsearch

2014-11-27 Thread Gaurav gupta
Thanks Jorg but I didn't able to migrate the lucene indexes to ES even
after trying what you have suggested. Maybe, I need to follow some more
steps.
I am not getting any error but the search is not showing any docs/records.
While comparing the files, I found that segments.gen are identical but
the  segments_N (segments_2 in lucene and segments_3 in ES) are slightly
different

[image: Inline image 1]

Lucene Vs ES :-
[image: Inline image 2]

Thanks
Gaurav

On Thu, Nov 27, 2014 at 8:09 PM, Gaurav gupta gupta.gaurav0...@gmail.com
wrote:

 Otis,
 I am not sure how many of our customers will accept to re-index the whole
 data as they are using it since long, although I am trying to convince my
 Senior Product Management to keep both Lucene and ES. Some old customers
 can think to migrate to ES if they need better real-time performance
 through distributed ES.

 Note :- Currently, the major reason to migrate to ES from Lucene is to
 have better distributed support for faster real-time search. I have an
 embedded Search Engine in our product which is based on Lucene 4.8.1 and
 now I would like to migrate it to latest ElasticSearch 1.4 for better
 distributed support (sharding and replication, mainly).

 Thanks
 Gaurav

 On Sun, Nov 23, 2014 at 4:11 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 I can not tell if it will work, but if you could translate your xml
 mapping into an Elasticsearch mapping it would be great.

 The next steps would be to create an empty index with the mapping, using
 1 shard and no replica, _source and _all disabled. Then you could index one
 test doc over the ES API. After this, you can find out in the data folder
 where ES created the segments files. By exchanging them with a copy of your
 Lucene segment files, they should get picked up - or you get nasty errors
 because ES uses a custom Lucene index format and can not process standard
 Lucene segments.

 Jörg


 On Thu, Nov 20, 2014 at 2:26 PM, Gaurav gupta gupta.gaurav0...@gmail.com
  wrote:

 Thanks Jorg for the guidance and I have am trying the suggested approach
 #1 and I have further question on it.

 As you mentioned - *- a custom written tool could traverse the
 segments and extract field information and build a rudimentary mapping
 (without analyzer, without info about _all and _source and all
 Elasticsearch add-ons).*

 We already have a Lucene Index metadata (i.e. field names, type,
 analyzer etc.) available as an xml, so I can create the mapping
 without traversing the segments. Should I create segment file
 segments.gen using the mapping file and using some dummy values and then
 put all the other old lucene index files ( except segments.gen ) from
 existing lucene index files (e.g. - 
 segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs
 etc.)

 *sample mapping xml file :-*
 Mapping
 indexField
 analyzedtrue/analyzed
 fieldanalyzerStandard/fieldanalyzer
 indexFieldNameAddressLine1/indexFieldName
 nameAddressLine1/name
 storedtrue/stored
 typestring/type
 /indexField
 indexField
 analyzedtrue/analyzed
 fieldanalyzerStandard/fieldanalyzer
 indexFieldNameBuilding_Name/indexFieldName
 nameBuilding_Name/name
 storedtrue/stored
 typestring/type
 /indexField
 indexField
 analyzedtrue/analyzed
 fieldanalyzerKeyword/fieldanalyzer
 indexFieldNameGNAF_PID/indexFieldName
 nameGNAF_PID/name
 storedtrue/stored
 typestring/type
 /indexField


 ...
 /Mapping

 Thanks

 On Thu, Nov 13, 2014 at 11:59 PM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 It is almost impossible to use just binary-only Lucene index for
 migration, because Elasticsearch needs additional info which is not
 available in Lucene. The only method is to reindex data over the
 Elasticsearch API.

 There is a bumpy road but I don't know if one ever tried that:

 - a custom written tool could traverse the segments and extract field
 information and build a rudimentary mapping (without analyzer, without info
 about _all and _source and all Elasticsearch add-ons)

 - another tool could try to reconstruct docs (like the tool Luke) and
 write them to a file in bulk format. Not having the source of the docs
 means it must be possible to retrieve the original input from the Lucene
 index (which is almost never the case)

 - the result could be re-indexed using the Elasticsearch API (assuming
 all analyzers and tokenizers are in place) but a lot of work would have to
 be done

 The preferred way is to rewrite the code that uses the Lucene API to
 use the Elasticsearch API and re-run the indexing process.

 Jörg

 On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta 
 gupta.gaurav0...@gmail.com wrote:

 Hi All,

 I have an embedded Search Engine in our product which is based on
 Lucene 4.8.1 and now I would like to migrate it to latest ElasticSearch 
 1.4
 for better distributed support (sharding and replication, mainly). Could
 you guide me how one should migrate the existing indexes created by Lucene
 to ES.

 I have referred to the mail thread - migrate lucene index 

Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this

2014-11-27 Thread Martijn v Groningen
This looks like a mapping issue to me (not 100% sure). A document that is
in the translog has a string field (with value: 'finished'), but it is
mapped as a number field (long, integer, double, etc.) in the mapping. This
causes the number format exception that you're seeing in your logs when
that document is indexed from the translog as part of the recovery and this
then prevents the shard from getting started.

These problems can occur when new fields are introduced at index time and
also when numeric_detection is enabled in the mapping (which makes these
errors more likely). Is this the case in your ES setup?

Can you also check the mappings of the logstash-2014.11.27 index and see
what fields can possible contain 'finished'? Unfortunately the field name
didn't get included with your errors.

On 27 November 2014 at 11:19, Jilles van Gurp jillesvang...@gmail.com
wrote:

 Our production cluster went yellow last night after our logstash index
 rolled over to the next version. I've seen this happen before but this time
 I decided to properly diagnose and seek some feedback on what might be
 going on.

 So, I'd love some feedback on what is going on. I'm happy to keep this
 cluster in a yellow state for a limited time to get some help from people
 in this group trying to diagnose this properly and maybe help some others
 who face the same issues. However, I will need to fix this one way or
 another before end of business day today. I plan to perform a rolling
 restart to see if node reinitialization fixes things. If not, I'll remove
 the problematic logstash index and move on. I'd love suggesttions for less
 intrusive solutions. I don't like losing data and rolling restarts are kind
 of tedious to babysit. Tends to take 45 minutes or so.

 Below is some information I've gathered. Let me know if you need me to
 extract more data.

 First the obvious:

 {
   status : 200,
   name : 192.168.1.13,
   cluster_name : linko_elasticsearch,
   version : {
 number : 1.4.0,
 build_hash : bc94bd81298f81c656893ab1d30a99356066,
 build_timestamp : 2014-11-05T14:26:12Z,
 build_snapshot : false,
 lucene_version : 4.10.2
   },
   tagline : You Know, for Search
 }

 [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty
 {
   cluster_name : linko_elasticsearch,
   status : yellow,
   timed_out : false,
   number_of_nodes : 5,
   number_of_data_nodes : 3,
   active_primary_shards : 221,
   active_shards : 619,
   relocating_shards : 0,
   initializing_shards : 2,
   unassigned_shards : 1
 }

 So we're yellow and the reason is initializing and unassigned shards. We
 have five nodes, of which three are data nodes. It seems we are hitting
 some kind of resilience issue. The three machines have plenty of diskspace
 and memory.

 I found this in the log of one of our es nodes:
 [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] [192.168.1.13]
 [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4],
 node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID
 [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to start shard, message
 [RecoveryFailedException[[logstash-2014.11.27][4]: Recovery failed from
 [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io][inet[/
 192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][
 es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested:
 RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:9300]][internal:index/shard/recovery/start_recovery]];
 nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution
 failed]; nested:
 RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]];
 nested: NumberFormatException[For input string: finished]; ]]

 on the mentioned node there's a corresponding messages:
 [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] [192.168.1.14]
 [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4],
 node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID
 [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to perform
 [indices:data/write/bulk[s]] on replica, message
 [RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][indices:data/write/bulk[s][r]]];
 nested: NumberFormatException[For input string: finished]; ]]

 All three data nodes have similar messages happening over and over again.

 Our cluster has been up for a couple of weeks and seems pretty happy
 otherwise. I deleted some older logstash indices a few days ago. The
 cluster has logstash data and a few smallish indiceses we use for our
 inbot.io service. The issue appears to be related to the logstash index
 rollover. Our app servers and kibana talk to the two non data nodes that we
 run on both our application servers.

 My next stop was kibana which we use on the same cluster with the logstash
 index that is probably causing us issues. Looking at that, I noticed a few
 interesting things:

- logstash indexing seems to 

Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this

2014-11-27 Thread Jilles van Gurp
Thanks for the explanation. I suspect many logstash users might be running 
into this one since you typically use a dynamic mapping with that. We have 
some idea where this is happening though and we can probably fix it 
properly. This happened during index roll over and we indeed are indexing a 
lot of things via logstash almost continuously. 

Jilles



On Thursday, November 27, 2014 4:06:21 PM UTC+1, Martijn v Groningen wrote:

 This looks like a mapping issue to me (not 100% sure). A document that is 
 in the translog has a string field (with value: 'finished'), but it is 
 mapped as a number field (long, integer, double, etc.) in the mapping. This 
 causes the number format exception that you're seeing in your logs when 
 that document is indexed from the translog as part of the recovery and this 
 then prevents the shard from getting started.

 These problems can occur when new fields are introduced at index time and 
 also when numeric_detection is enabled in the mapping (which makes these 
 errors more likely). Is this the case in your ES setup?

 Can you also check the mappings of the logstash-2014.11.27 index and see 
 what fields can possible contain 'finished'? Unfortunately the field name 
 didn't get included with your errors.

 On 27 November 2014 at 11:19, Jilles van Gurp jilles...@gmail.com 
 javascript: wrote:

 Our production cluster went yellow last night after our logstash index 
 rolled over to the next version. I've seen this happen before but this time 
 I decided to properly diagnose and seek some feedback on what might be 
 going on. 

 So, I'd love some feedback on what is going on. I'm happy to keep this 
 cluster in a yellow state for a limited time to get some help from people 
 in this group trying to diagnose this properly and maybe help some others 
 who face the same issues. However, I will need to fix this one way or 
 another before end of business day today. I plan to perform a rolling 
 restart to see if node reinitialization fixes things. If not, I'll remove 
 the problematic logstash index and move on. I'd love suggesttions for less 
 intrusive solutions. I don't like losing data and rolling restarts are kind 
 of tedious to babysit. Tends to take 45 minutes or so.

 Below is some information I've gathered. Let me know if you need me to 
 extract more data. 

 First the obvious:

 {
   status : 200,
   name : 192.168.1.13,
   cluster_name : linko_elasticsearch,
   version : {
 number : 1.4.0,
 build_hash : bc94bd81298f81c656893ab1d30a99356066,
 build_timestamp : 2014-11-05T14:26:12Z,
 build_snapshot : false,
 lucene_version : 4.10.2
   },
   tagline : You Know, for Search
 }

 [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty
 {
   cluster_name : linko_elasticsearch,
   status : yellow,
   timed_out : false,
   number_of_nodes : 5,
   number_of_data_nodes : 3,
   active_primary_shards : 221,
   active_shards : 619,
   relocating_shards : 0,
   initializing_shards : 2,
   unassigned_shards : 1
 }

 So we're yellow and the reason is initializing and unassigned shards. We 
 have five nodes, of which three are data nodes. It seems we are hitting 
 some kind of resilience issue. The three machines have plenty of diskspace 
 and memory.

 I found this in the log of one of our es nodes:
 [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] 
 [192.168.1.13] [logstash-2014.11.27][4] sending failed shard for 
 [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], 
 s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to 
 start shard, message [RecoveryFailedException[[logstash-2014.11.27][4]: 
 Recovery failed from [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io
 ][inet[/192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][
 es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested: 
 RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:9300]][internal:index/shard/recovery/start_recovery]];
  
 nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution 
 failed]; nested: 
 RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]];
  
 nested: NumberFormatException[For input string: finished]; ]]

 on the mentioned node there's a corresponding messages:
 [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] 
 [192.168.1.14] [logstash-2014.11.27][4] sending failed shard for 
 [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R], 
 s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to 
 perform [indices:data/write/bulk[s]] on replica, message 
 [RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][indices:data/write/bulk[s][r]]];
  
 nested: NumberFormatException[For input string: finished]; ]]

 All three data nodes have similar messages happening over and over again.

 Our cluster has been up for a couple of weeks and seems pretty happy 
 otherwise. I deleted some 

Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this

2014-11-27 Thread Martijn v Groningen
If the field you suspect causing this is a string field in the mapping then
you can try to close and open the index. This will then sync the in-memory
representation of the mapping with what is in the cluster state.

On 27 November 2014 at 16:49, Jilles van Gurp jillesvang...@gmail.com
wrote:

 Thanks for the explanation. I suspect many logstash users might be running
 into this one since you typically use a dynamic mapping with that. We have
 some idea where this is happening though and we can probably fix it
 properly. This happened during index roll over and we indeed are indexing a
 lot of things via logstash almost continuously.

 Jilles



 On Thursday, November 27, 2014 4:06:21 PM UTC+1, Martijn v Groningen wrote:

 This looks like a mapping issue to me (not 100% sure). A document that is
 in the translog has a string field (with value: 'finished'), but it is
 mapped as a number field (long, integer, double, etc.) in the mapping. This
 causes the number format exception that you're seeing in your logs when
 that document is indexed from the translog as part of the recovery and this
 then prevents the shard from getting started.

 These problems can occur when new fields are introduced at index time and
 also when numeric_detection is enabled in the mapping (which makes these
 errors more likely). Is this the case in your ES setup?

 Can you also check the mappings of the logstash-2014.11.27 index and see
 what fields can possible contain 'finished'? Unfortunately the field name
 didn't get included with your errors.

 On 27 November 2014 at 11:19, Jilles van Gurp jilles...@gmail.com
 wrote:

 Our production cluster went yellow last night after our logstash index
 rolled over to the next version. I've seen this happen before but this time
 I decided to properly diagnose and seek some feedback on what might be
 going on.

 So, I'd love some feedback on what is going on. I'm happy to keep this
 cluster in a yellow state for a limited time to get some help from people
 in this group trying to diagnose this properly and maybe help some others
 who face the same issues. However, I will need to fix this one way or
 another before end of business day today. I plan to perform a rolling
 restart to see if node reinitialization fixes things. If not, I'll remove
 the problematic logstash index and move on. I'd love suggesttions for less
 intrusive solutions. I don't like losing data and rolling restarts are kind
 of tedious to babysit. Tends to take 45 minutes or so.

 Below is some information I've gathered. Let me know if you need me to
 extract more data.

 First the obvious:

 {
   status : 200,
   name : 192.168.1.13,
   cluster_name : linko_elasticsearch,
   version : {
 number : 1.4.0,
 build_hash : bc94bd81298f81c656893ab1d30a99356066,
 build_timestamp : 2014-11-05T14:26:12Z,
 build_snapshot : false,
 lucene_version : 4.10.2
   },
   tagline : You Know, for Search
 }

 [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty
 {
   cluster_name : linko_elasticsearch,
   status : yellow,
   timed_out : false,
   number_of_nodes : 5,
   number_of_data_nodes : 3,
   active_primary_shards : 221,
   active_shards : 619,
   relocating_shards : 0,
   initializing_shards : 2,
   unassigned_shards : 1
 }

 So we're yellow and the reason is initializing and unassigned shards. We
 have five nodes, of which three are data nodes. It seems we are hitting
 some kind of resilience issue. The three machines have plenty of diskspace
 and memory.

 I found this in the log of one of our es nodes:
 [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ]
 [192.168.1.13] [logstash-2014.11.27][4] sending failed shard for
 [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R],
 s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to
 start shard, message [RecoveryFailedException[[logstash-2014.11.27][4]:
 Recovery failed from [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io
 ][inet[/192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][
 es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested:
 RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:
 9300]][internal:index/shard/recovery/start_recovery]]; nested:
 RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution
 failed]; nested: RemoteTransportException[[192.168.1.13][inet[/
 192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]];
 nested: NumberFormatException[For input string: finished]; ]]

 on the mentioned node there's a corresponding messages:
 [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ]
 [192.168.1.14] [logstash-2014.11.27][4] sending failed shard for
 [logstash-2014.11.27][4], node[o9vhU4BhSCuQ4BmLJjPtfA], [R],
 s[INITIALIZING], indexUUID [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to
 perform [indices:data/write/bulk[s]] on replica, message
 [RemoteTransportException[[192.168.1.13][inet[/192.168.1.
 13:9300]][indices:data/write/bulk[s][r]]]; 

Re: elasticsearch 1.4.0 yellow with shards stuck initializing - need help diagnosing this

2014-11-27 Thread Jilles van Gurp
BTW. I should mention that I also filed a bug for this earlier today. 
 https://github.com/elasticsearch/elasticsearch/issues/8684

Clinton Gormley kindly replied to that and provided some additional insight.

It indeed seems our mapping is part of the problem but there's also the es 
side of things where it shouldn't get in this state. Apparently a fix for 
that part is coming.

Best,

Jilles

On Thursday, November 27, 2014 11:19:20 AM UTC+1, Jilles van Gurp wrote:

 Our production cluster went yellow last night after our logstash index 
 rolled over to the next version. I've seen this happen before but this time 
 I decided to properly diagnose and seek some feedback on what might be 
 going on. 

 So, I'd love some feedback on what is going on. I'm happy to keep this 
 cluster in a yellow state for a limited time to get some help from people 
 in this group trying to diagnose this properly and maybe help some others 
 who face the same issues. However, I will need to fix this one way or 
 another before end of business day today. I plan to perform a rolling 
 restart to see if node reinitialization fixes things. If not, I'll remove 
 the problematic logstash index and move on. I'd love suggesttions for less 
 intrusive solutions. I don't like losing data and rolling restarts are kind 
 of tedious to babysit. Tends to take 45 minutes or so.

 Below is some information I've gathered. Let me know if you need me to 
 extract more data. 

 First the obvious:

 {
   status : 200,
   name : 192.168.1.13,
   cluster_name : linko_elasticsearch,
   version : {
 number : 1.4.0,
 build_hash : bc94bd81298f81c656893ab1d30a99356066,
 build_timestamp : 2014-11-05T14:26:12Z,
 build_snapshot : false,
 lucene_version : 4.10.2
   },
   tagline : You Know, for Search
 }

 [linko@app2 elasticsearch]$ curl localhost:9200/_cluster/health?pretty
 {
   cluster_name : linko_elasticsearch,
   status : yellow,
   timed_out : false,
   number_of_nodes : 5,
   number_of_data_nodes : 3,
   active_primary_shards : 221,
   active_shards : 619,
   relocating_shards : 0,
   initializing_shards : 2,
   unassigned_shards : 1
 }

 So we're yellow and the reason is initializing and unassigned shards. We 
 have five nodes, of which three are data nodes. It seems we are hitting 
 some kind of resilience issue. The three machines have plenty of diskspace 
 and memory.

 I found this in the log of one of our es nodes:
 [2014-11-27 10:15:12,585][WARN ][cluster.action.shard ] [192.168.1.13] 
 [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], 
 node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID 
 [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to start shard, message 
 [RecoveryFailedException[[logstash-2014.11.27][4]: Recovery failed from 
 [192.168.1.14][sE51TBxfQ2q6pD5k7G7piA][es2.inbot.io][inet[/
 192.168.1.14:9300]] into [192.168.1.13][o9vhU4BhSCuQ4BmLJjPtfA][
 es1.inbot.io][inet[/192.168.1.13:9300]]{master=true}]; nested: 
 RemoteTransportException[[192.168.1.14][inet[/192.168.1.14:9300]][internal:index/shard/recovery/start_recovery]];
  
 nested: RecoveryEngineException[[logstash-2014.11.27][4] Phase[2] Execution 
 failed]; nested: 
 RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][internal:index/shard/recovery/translog_ops]];
  
 nested: NumberFormatException[For input string: finished]; ]]

 on the mentioned node there's a corresponding messages:
 [2014-11-27 10:17:54,187][WARN ][cluster.action.shard ] [192.168.1.14] 
 [logstash-2014.11.27][4] sending failed shard for [logstash-2014.11.27][4], 
 node[o9vhU4BhSCuQ4BmLJjPtfA], [R], s[INITIALIZING], indexUUID 
 [-mMLqYjAQuCUDcczYf5SHA], reason [Failed to perform 
 [indices:data/write/bulk[s]] on replica, message 
 [RemoteTransportException[[192.168.1.13][inet[/192.168.1.13:9300]][indices:data/write/bulk[s][r]]];
  
 nested: NumberFormatException[For input string: finished]; ]]

 All three data nodes have similar messages happening over and over again.

 Our cluster has been up for a couple of weeks and seems pretty happy 
 otherwise. I deleted some older logstash indices a few days ago. The 
 cluster has logstash data and a few smallish indiceses we use for our 
 inbot.io service. The issue appears to be related to the logstash index 
 rollover. Our app servers and kibana talk to the two non data nodes that we 
 run on both our application servers.

 My next stop was kibana which we use on the same cluster with the logstash 
 index that is probably causing us issues. Looking at that, I noticed a few 
 interesting things:

- logstash indexing seems to be fine (good) and it appears there has 
been no data loss yet
- our cpu load jumped around midnight and sort of stayed up on all 
three nodes. We measure this using collectd and both mean and max load 
jumped to around 1 around the time the index rollover happened. 

 My next step was using curl -XGET 'localhost:9200/_cat/recovery?v'

 All the 

Re: Is re-election/assignment of the master node possible?

2014-11-27 Thread Norberto Meijome
The load issue affecting master detection / election shouldn't happen if
you have dedicated masters... At least it is with 0.90.x

( with my limited knowledge of ES implementation details, there seems to be
a lock or priority issue when serving large # of requests (http / thrift) ,
affecting cluster / metadata updates... I would think these metadata tasks
ought to take priority in some cases over queries... )
On 26/11/2014 6:11 pm, Erik theRed j.e.redd...@gmail.com wrote:

 Thanks, Nik -

 There's no data on the node so it sounds like master reelection should
 fail over fairly quickly.

 On Wednesday, November 26, 2014 2:58:43 PM UTC-6, Nikolas Everett wrote:



 On Wed, Nov 26, 2014 at 3:47 PM, Erik theRed j.e.r...@gmail.com wrote:

 Is there any notion triggering a re-election of the master node?

 I'm currently running 1.2.4, and I have an instance that is scheduled
 for retirement (my favorite!) and it just so happens that it's my master
 node.  What can I do to avoid the dreaded RED state?  Is there some
 mechanism that can allow me to re-assign the current master to one of the
 other available two dedicated master nodes so I can reboot the current
 master?


 Move all the shards off of the node using allocation include/exclude
 settings.  If you shoot the master one of the other master eligible nodes
 will take over quickly and there won't be any interruptions.


 I ask because I'm a bit gun-shy due to my experience when an elected
 master node has gone unresponsive (before I created dedicated masters) due
 to excessive HTTP connections, master re-election seemed to never occur and
 everything comes crumbling down.


 I've never had that problem.  My cluster is pretty small though - only 31
 nodes.

 Nik

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b9506885-e321-4abe-b1c2-db0d802b07ec%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b9506885-e321-4abe-b1c2-db0d802b07ec%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACj2-4J8Ycj07jhHdX71JJjAHW0_ZALMH9mPUcW1__sF_1NagA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hive write data to elastic search

2014-11-27 Thread Atul Paldhikar
HI David,

did you find any fix for this issue, I am also facing the same problem.

Thanks
- Atul

On Tuesday, July 8, 2014 8:37:09 AM UTC-7, David Zabner wrote:

 Hi all,

 I am trying to write data to elastic search from hive and whenever I try I 
 get this error:

 org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No resource 
 ['es.resource'] (index/query/location) specified


 The script I am running looks like this:

 USE pl_10;

 ADD jar /home/hdfs/sql-tests/schema/elasticsearch-hadoop-2.0.0.jar;

 CREATE EXTERNAL TABLE IF NOT EXISTS REGIONES (

 R_REGIONKEY  INT,

 R_NAME   STRING,

 R_COMMENTSTRING

 ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'

 STORED by 'org.elasticsearch.hadoop.hive.EsStorageHandler'

 TBLPROPERTIES('es.resource'='radio/artists',

 'es.nodes'='elastic-1');

 INSERT OVERWRITE TABLE REGIONES select R_REGIONKEY, R_NAME, R_COMMENT from 
 REGION;


 Any help would be much appreciated


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch deployment advise

2014-11-27 Thread Mark Walkom
1 - Depends on your use.
2 - Yes there are, see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#circuit-breaker

On 28 November 2014 at 07:17, Denis J. Cirulis denis.ciru...@gmail.com
wrote:

 Hello,

 I'm in need to plan a new deployment of elasticsearch. Single node, 128GB
 ram, for log indexing (amount 50 million records a day)

 1. What's the best heap size for elasticsearch 1.4 (running Oracle java
 7u72) ?
 2. Is there some kind of query throttling technique to stop deep drill
 downs to prevent ES out of memory errors ?

 Thanks in advance.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/f1c1c29e-9a25-4015-a7e6-6591e9e09118%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f1c1c29e-9a25-4015-a7e6-6591e9e09118%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZnDvMNb%2BAB5ZvMgTw5g-ePNgktQDVGjCYdhX9-FcQ7pLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch deployment advise

2014-11-27 Thread Denis J. Cirulis
20-30Gb per index a day. I've read in setup guide that heap more than 32Gb is 
useless, that's why I'm asking. 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dfd0a2a4-505d-4496-8b49-2df00a65fa93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch deployment advise

2014-11-27 Thread Nikolas Everett
We have 128gb on some nodes and run 30gb heaps. Lucene memory maps files so
the extra memory would be put to good use. The 32gb memory limit comes from
the JVM compressing pointers. It can't compress after 32 and so you see
everything expand in size.
On Nov 27, 2014 4:18 PM, Denis J. Cirulis denis.ciru...@gmail.com wrote:

 20-30Gb per index a day. I've read in setup guide that heap more than 32Gb
 is useless, that's why I'm asking.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/dfd0a2a4-505d-4496-8b49-2df00a65fa93%40googlegroups.com
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0JUV%2BL8hi%3D9_PEML%3D4PdnO%2BfMY_MYXG9ANvPW6Qj0KLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hive write data to elastic search

2014-11-27 Thread Costin Leau

Hi Atul,

What does your Hive script looks like? What version of hive and es-hadoop are 
you using? Can you post them
along with the stacktrace on Gist or Pastebin [1]?
The exception message is pretty straight-forward - either 'es.resource' is 
missing or the resource
type is incorrectly specified [2].

Cheers,

[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/troubleshooting.html
[2] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/hive.html#hive-configuration

On 11/27/14 8:57 PM, Atul Paldhikar wrote:

HI David,

did you find any fix for this issue, I am also facing the same problem.

Thanks
- Atul

On Tuesday, July 8, 2014 8:37:09 AM UTC-7, David Zabner wrote:

Hi all,

I am trying to write data to elastic search from hive and whenever I try I 
get this error:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No resource 
['es.resource'] (index/query/location) specified


The script I am running looks like this:

USE pl_10;

ADD jar /home/hdfs/sql-tests/schema/elasticsearch-hadoop-2.0.0.jar;

CREATE EXTERNAL TABLE IF NOT EXISTS REGIONES (

 R_REGIONKEY  INT,

 R_NAME   STRING,

 R_COMMENTSTRING

) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'

STORED by'org.elasticsearch.hadoop.hive.EsStorageHandler'

TBLPROPERTIES('es.resource'='radio/artists',

'es.nodes'='elastic-1');

INSERT OVERWRITE TABLE REGIONES select R_REGIONKEY, R_NAME, R_COMMENT from 
REGION;


Any help would be much appreciated

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5477ACC3.9050407%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can't integrate Elasticsearch with Hive

2014-11-27 Thread Costin Leau

Hi,

The issue is most likely caused by two different versions of es-hadoop within your classpath, probably es-hadoop 2.0.x 
(2.0.2)
and 2.1.x (2.1.0.Beta3). If they are picked up by Hive or Hadoop it means the JVM will have two jars with classes under 
the same package name.
This leads to weird conflicts as classes from jar can interact with classes from the other jar, especially as between 
2.0.x/2.1.x the code internally

went through major changes.

Make sure you have only one version of es-hadoop in your classpath - both on the client and in the cluster. That 
includes the Hive classpath, Hadoop classpath

as well as the submitting jar (since the library might be embedded).

P.S. IllegalAccesException indicates an illegal call - such as calling a non-public class in a different class. However 
in this case both classes are in the same

package and HiveUtils class is not private...

Cheers,

On 11/27/14 9:19 AM, Atul Paldhikar wrote:

Hi All,

I am using Hive 0.13.1 and trying to create an external table so data can me 
loaded from Hive to Elasticsearch. However
I keep getting the following error. I have tried with following jars but same 
error. I will really appreciate for any
pointers.

Thanks
- Atul

property
   namehive.aux.jars.path/name
!--
   
value/apps/sas/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-2.0.2.jar/value
--
   
value/apps/sas/elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.Beta3.jar/value
   descriptionA comma separated list (with no spaces) of the jar 
files/description
/property

ERROR :

2014-11-26 23:09:22,069 ERROR [main]: exec.DDLTask (DDLTask.java:execute(478)) 
- java.lang.IllegalAccessError: tried to
access class org.elasticsearch.hadoop.hive.HiveUtils from class 
org.elasticsearch.hadoop.hive.EsSerDe
 at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:81)
 at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
 at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
 at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:281)
 at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:631)
 at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:593)
 at 
org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
 at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

2014-11-26 23:09:22,069 ERROR [main]: ql.Driver 
(SessionState.java:printError(545)) - FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. tried to access class 
org.elasticsearch.hadoop.hive.HiveUtils from
class org.elasticsearch.hadoop.hive.EsSerDe

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/78b85fb6-6eea-46e8-964a-d96e324e780d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/78b85fb6-6eea-46e8-964a-d96e324e780d%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: ES security measures?

2014-11-27 Thread Ivan G
I have all my clusters behind the amazon's VPC segurity groups, but this
week we're facing the need to let frontend clients (javascript) to access
the ES indexes.

There is an auth plugins (https://github.com/codelibs/elasticsearch-auth)
which seems insteresting.
It lets to limit the access to data limiting by user, pass, role, protocol
and index (does not mention anything about types).

I've not tested yet, but want to share because maybe is it useful for
someone more.



--

*Iván González Valiente*

Systems programmer



2014-11-27 13:23 GMT+01:00 joergpra...@gmail.com joergpra...@gmail.com:

 It is no difference to other distributed software.

 There are many facets of security.

 If you want authorized access, add a system which authenticates users and
 manages roles. Elasticsearch does not do this for you.

 If you want others to not read the Elasticsearch data traffic, set up a
 private network http://en.wikipedia.org/wiki/Private_network with your
 own gateway/router plus a reverse proxy for internet access.

 If you want to trust in your Elasticsearch cluster and keep others from
 tampering your data, then set up all the hardware and the network
 connection by yourself and lock others out from physical access to the
 facility.

 You can wait for the Elasticsearch security extension which has been
 announced.

 Jörg


 On Thu, Nov 27, 2014 at 6:39 AM, Siddharth Trikha 
 siddharthtrik...@gmail.com wrote:

 I have set up my ELK stack on a single server and tested it on a very
 small setup to get a hands-down on ELK.
 I want to use ELK for my system logs analysis.

 Now, I have been reading about ES that it has no security. Also read
 something like this:
 DO NOT have ES publicly accessible. That's the equivalent of making your
 Wordpress MySQL database accessible to the world. ES is a REST accessible
 DB which means that anyone can delete all of your data with access to the
 endpoint.

 I am a noob in this. So this means if I put my logs in ES will they be
 accessible to everyone (which is scary) ??

 Please guide me with what all security measures must be taken ?? Please
 suggest some links so that I can ensure security.
 How to keep my ES cluster private ??

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/236c0359-46cb-4359-8484-c311fb102db2%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/236c0359-46cb-4359-8484-c311fb102db2%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2D6qpQubeceKoz0N67RFTD9HWW%2BK4Dk1Q7b1%3DUgJzdw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2D6qpQubeceKoz0N67RFTD9HWW%2BK4Dk1Q7b1%3DUgJzdw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BjeyjNnJQ3AJPzn6%3Dc8c%2Bd127r%2BeF4NTJM_Zy_DN%2BMW%3D7qW7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hive write data to elastic search

2014-11-27 Thread Atul Paldhikar
Hi Costin,

actually I think I figured out the issue, my script had a typo (resources 
instead of resource)

create external table ex_address (name String, st_no INT, st_name string, 
city string, state string, zip INT) stored by 
'org.elasticsearch.hadoop.hive.EsStorageHandler' 
tblproperties('es.resources' = 'employee/address');

However, for some reason I am back to square one with the original problem 
mentioned in another thread [Can't integrate Elasticsearch with Hive]. I 
have started getting the old exception again when accessing the external 
table, nothing really changed in the environment !

java.lang.IllegalAccessError: tried to access class 
org.elasticsearch.hadoop.hive.HiveUtils from class 
org.elasticsearch.hadoop.hive.EsSerDe

Thanks
- Atul

On Thursday, November 27, 2014 2:59:49 PM UTC-8, Costin Leau wrote:

 Hi Atul, 

 What does your Hive script looks like? What version of hive and es-hadoop 
 are you using? Can you post them 
 along with the stacktrace on Gist or Pastebin [1]? 
 The exception message is pretty straight-forward - either 'es.resource' is 
 missing or the resource 
 type is incorrectly specified [2]. 

 Cheers, 

 [1] 
 http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/troubleshooting.html
  
 [2] 
 http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/hive.html#hive-configuration
  

 On 11/27/14 8:57 PM, Atul Paldhikar wrote: 
  HI David, 
  
  did you find any fix for this issue, I am also facing the same problem. 
  
  Thanks 
  - Atul 
  
  On Tuesday, July 8, 2014 8:37:09 AM UTC-7, David Zabner wrote: 
  
  Hi all, 
  
  I am trying to write data to elastic search from hive and whenever I 
 try I get this error: 
  
  org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No 
 resource ['es.resource'] (index/query/location) specified 
  
  
  The script I am running looks like this: 
  
  USE pl_10; 
  
  ADD jar /home/hdfs/sql-tests/schema/elasticsearch-hadoop-2.0.0.jar; 
  
  CREATE EXTERNAL TABLE IF NOT EXISTS REGIONES ( 
  
   R_REGIONKEY  INT, 
  
   R_NAME   STRING, 
  
   R_COMMENTSTRING 
  
  ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
  
  STORED by'org.elasticsearch.hadoop.hive.EsStorageHandler' 
  
  TBLPROPERTIES('es.resource'='radio/artists', 
  
  'es.nodes'='elastic-1'); 
  
  INSERT OVERWRITE TABLE REGIONES select R_REGIONKEY, R_NAME, 
 R_COMMENT from REGION; 
  
  
  Any help would be much appreciated 
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to 
  elasticsearc...@googlegroups.com javascript: mailto:
 elasticsearch+unsubscr...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com
  
  
 https://groups.google.com/d/msgid/elasticsearch/54cced44-4f87-4569-923f-30e406ddb79c%40googlegroups.com?utm_medium=emailutm_source=footer.
  

  For more options, visit https://groups.google.com/d/optout. 

 -- 
 Costin 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/41b72358-82ca-4b21-bd59-b5b8f7dc2adf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can't integrate Elasticsearch with Hive

2014-11-27 Thread Atul Paldhikar


Hi Costin,

Actually even that issue is resolved J There is spelling difference in the 
sample available on the web, all of them have the storage class as 
“EsStorageHandler” however only your GitHub post says it is 
“ESStorageHandler” which is right (https://gist.github.com/costin/8025827) ! 
The error should have been more accurate if I am using a wrong class name.

 Now the next problem, the MapReduce job is failing for some reason. I am 
still a beginner in Hadoop so not exactly sure where to debug. Here are 
some logs, looks like some bad character “#” in the job.xml file. But I 
that is generated by Hive right ?

 *Hive Log :*

hive insert overwrite table ex_address select name, st_no, st_name, city, 
state, zip from employee.address;

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1417158738771_0001, Tracking URL = 
http://finattr-comp-dev-01:8088/proxy/application_1417158738771_0001/

Kill Command = /apps/hadoop-2.5.1/bin/hadoop job  -kill 
job_1417158738771_0001

Hadoop job information for Stage-0: number of mappers: 0; number of 
reducers: 0

2014-11-27 23:13:37,547 Stage-0 map = 0%,  reduce = 0%

Ended Job = job_1417158738771_0001 with errors

Error during job, obtaining debugging information...

FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask

MapReduce Jobs Launched: 

Job 0:  HDFS Read: 0 HDFS Write: 0 FAIL

Total MapReduce CPU Time Spent: 0 msec

 Container Job Logs *

 Stderr:-

[sas@finattr-comp-dev-01 container_1417158738771_0001_02_01]$ cat 
stderr 

[Fatal Error] job.xml:606:51: Character reference #

log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.mapreduce.v2.app.MRAppMaster).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.

 Syslog:-

[sas@finattr-comp-dev-01 container_1417158738771_0001_02_01]$ cat 
syslog 

2014-11-27 23:13:36,023 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1417158738771_0001_02

2014-11-27 23:13:36,334 FATAL [main] org.apache.hadoop.conf.Configuration: 
error parsing conf job.xml

org.xml.sax.SAXParseException; systemId: 
file:///tmp/hadoop-sas/nm-local-dir/usercache/sas/appcache/application_1417158738771_0001/container_1417158738771_0001_02_01/job.xml
 
file:///\\tmp\hadoop-sas\nm-local-dir\usercache\sas\appcache\application_1417158738771_0001\container_1417158738771_0001_02_01\job.xml;
 
lineNumber: 606; columnNumber: 51; Character reference #

at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)

at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347)

at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)

at 
org.apache.hadoop.conf.Configuration.parse(Configuration.java:2183)

at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2252)

at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2205)

at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112)

at org.apache.hadoop.conf.Configuration.get(Configuration.java:1078)

at 
org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:50)

at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407)

2014-11-27 23:13:36,337 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster

java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
file:///tmp/hadoop-sas/nm-local-dir/usercache/sas/appcache/application_1417158738771_0001/container_1417158738771_0001_02_01/job.xml
 
file:///\\tmp\hadoop-sas\nm-local-dir\usercache\sas\appcache\application_1417158738771_0001\container_1417158738771_0001_02_01\job.xml;
 
lineNumber: 606; columnNumber: 51; Character reference #

at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)

at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2205)

at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112)

at org.apache.hadoop.conf.Configuration.get(Configuration.java:1078)

at 
org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:50)

at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407)

Caused by: org.xml.sax.SAXParseException; systemId: 
file:///tmp/hadoop-sas/nm-local-dir/usercache/sas/appcache/application_1417158738771_0001/container_1417158738771_0001_02_01/job.xml
 

How do I get all the distinct field values that satisfy a particular query..

2014-11-27 Thread Anil Karaka
My sample document looks like this,

{
   user: adfasdk,
   act: Made Purchase,
   productPrice: 5099
}

What I want is all the unique users that satisfy a particular query.

Say for example I have the query like below, that gets all the documents 
whose productPrice is between 5000 and 1, and whose length is between 
50, 100.  The returned documents will have repeated users. I just need 
all the unique users. Do I have calculate all the unique users from this 
result myself or is it possible to do using aggregation.

{
query: {
bool: {
must: [
{
range: {
productPrice: {
gte: 5000,
lte: 1
}
}
},
{
range: {
length: {
gte: 50,
lte: 100
}
}
}
]
}
},
_source: [
user
]
}


Below aggregation aggregates all my documents,
{
aggs: {
unique_users: {
terms: {
field: user
}
}
}
}
Instead I want to put my query from above into my aggregation of unique 
users.. Is it possible? Or do I have to just do my normal query and 
calculate the unique users myself from the query result?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7dd1fba8-a5fc-4234-9f95-88995ab9e1fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.