Re: elasticsearch-hadoop-hive exception when writing array> column

2015-03-12 Thread Chen Wang
Costin,
Thanks for your info.
I am mapping array of maps to nested objects in ES, and in this specific 
case, the expected document in ES will look like
{
   _id:customer_id,
   store_purchase:[{item_id:123, category:'pants', department:'clothes'}, 
...]
}

so that I can do query like find all users between T1 and T2 ,have 
purchased items whose department is A and category is B.
Anyway of achieving this with es-hadoop?

Chen

On Thursday, March 12, 2015 at 9:18:14 PM UTC-7, Costin Leau wrote:
>
> The exception occurs because you are trying to extract a field (the script 
> parameters) from a complex type (array) and not a primitive. The issue with 
> that (and why it's currently not supported) is because the internal 
> structure of the complex type can get quite complex and its serialized, 
> JSON form incorrect.
> Any reason why you need to pass the array of maps as a script parameter 
> and not use primitives instead (you can use Hive column mapping to extract 
> the ones you need)?
>
> On Thu, Mar 12, 2015 at 11:56 PM, Chen Wang  > wrote:
>
>> Folks,
>> I am using elasticsearch-hadoop-hive-2.1.0.Beta3.jar
>>
>> I defined the external table as:.
>> CREATE EXTERNAL TABLE IF NOT EXISTS ${staging_table}(
>> customer_id STRING,
>>  store_purchase array>)
>> ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
>> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
>> TBLPROPERTIES (
>> 'es.nodes'='localhost:9200',
>> 'es.resource'='user_activity/store',
>> 'es.mapping.id'='customer_id',
>> 'es.input.json'='false',
>> 'es.write.operation'='upsert',
>> 'es.update.script'='ctx._source.store_purchase += purchase',
>> 'es.update.script.params'='purchase:store_purchase'
>> ) ;"
>>
>> I create another source table with the same column names and put some 
>> sample data.
>>
>> Running INSERT OVERWRITE TABLE ${staging_table}
>>
>> SELECT customer_id, store_purchase) FROM ${test_table} 
>>
>> but it throws EsHadoopIllegalArgumentException: Field [_col1] needs to be 
>> a primitive; found [array>]. Is array> supported yet? If not, how can I get 
>> around this issue?
>>
>> Thanks~
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/929c7b5b-fbb4-4232-821b-331499c18369%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/96c5b247-987b-45c5-bdfd-e9a690ad0215%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES isn't properly handling unicode ? advice for debugging this problem?

2015-03-12 Thread Kevin Burton
I have unit tests setup to test using transport client to write unicode 
data into ES and then read it back out.

It's using the standard ElasticsearchIntegrationTest that ES recommends.

I'm using MY JSON encoder... and then I write my JSON to the 
TransportClient, and read it back out, and it's correct!  

The problem is that IN PRODUCTION it doesn't work and all my data is 
garbled.  I *think* it's treading the data either as ASCII or ISO-8859-1 
(which are the usual defaults).

What's the best way to test this.

I imagine I could sue ethereal to look at the raw protocol and verify that 
the data is being sent properly.

The only other thing I can think of is to step through ES directly in a 
debugger. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fa5bb1a1-9e1e-4376-b947-5bffeab8efee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Snapshot Scaling Problems

2015-03-12 Thread Andy Nemzek
Thank you guys for your thoughts here.  This is really useful information. 
 Again, we're creating daily indexes because that's what logstash does out 
of the box with the elasticsearch plugin, and this kind of tuning info 
isn't included with that plugin.

Minimizing both the number of indexes and shards now sounds like a great 
idea.

We are indeed using EC2.  We're just using an m1.small that's EBS backed 
(non-SSD).  So, yes, it's not a very powerful machine, but again, we're not 
throwing a lot of data at it either.


On Thursday, March 12, 2015 at 12:50:22 PM UTC-5, aa...@definemg.com wrote:
>
> With the low volume of ingest, and the long duration of history, Id 
> suggest you may want to trim back the number of shards per index from the 
> default 5.  Based on your 100 docs per day Id say 1 shard per day.  If you 
> combined this with the other suggestion to increase the duration of an 
> index, then you might increase the number of shards, but maybe still not. 
>  Running an optimize once you have completed a time period is great advice 
> if you can afford the overhead, sounds like one day at a time you should be 
> able to, and that the overhead of not optimizing is costing you more when 
> you snapshot.
>
> And index is made of shards, a shard is made of lucene segments.  Lucene 
> segments are the actual files that you copy when you snapshot.  As such the 
> number of segments is multiplied by the number of shards per index and the 
> number of indexes.  Reducing the number of indexes by creating larger time 
> periods will significantly reduce the number of segments.  Reducing the 
> number of shards per index will significantly reduce the number of 
> segments.  Optimizing the index will also consolidate many segments into a 
> single segment.
>
> Based on the use of S3 should we assume you are using AWS EC2?  What 
> instance size?  Your data volume seems very low so it seems concerning that 
> you have such a large time period to snapshot, and points to a slow file 
> system, or a significant number of segments (100 indexes, 5 shards per 
> index, xx segments per shard, == many thousands of segments).  What does 
> your storage system look like?  If you are using EC2 are you using the 
> newer EBS volumes (SSD backed)? Some of the smaller instance size 
> significantly limit prolonged EBS throughput, in my experience. 
>
> On Wednesday, March 11, 2015 at 1:12:01 AM UTC-6, Magnus Bäck wrote:
>>
>> On Monday, March 09, 2015 at 20:29 CET, 
>>  Andy Nemzek  wrote: 
>>
>> > We've been using logstash for several months now and it creates a new 
>> > index each day, so I imagine there are over 100 indexes at this point. 
>>
>> Why create daily indexes if you only have a few hundred entries in each? 
>> There's a constant overhead for each shard so you don't want more 
>> indexes than you need. Seems like you'd be fine with montly indexes, 
>> and then your snapshot problems would disappear too. 
>>
>> > Elasticsearch is running on a single machine...I haven't done anything 
>> > with shards, so the defaults must be in use.  Haven't optimized old 
>> > indexes.  We're pretty much just running ELK out of the box.  When you 
>> > mention 'optimizing indexes', does this process combine indexes? 
>>
>> No, but it can combine segments in a Lucene index (that make up 
>> Elasticsearch indexes), and segments are what's being backed up. 
>> So the more segments you have the the longer time snapshots are 
>> going to take. 
>>
>> > Do you know if these performance problems are typical when 
>> > using ELK out of the box? 
>>
>> 100 indexes on a single box should be okay but it depends on 
>> the size of the JVM heap. 
>>
>> -- 
>> Magnus Bäck| Software Engineer, Development Tools 
>> magnu...@sonymobile.com | Sony Mobile Communications 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58a94899-185c-4c4d-ad5f-ac2e0a5eed2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch-hadoop-hive exception when writing array> column

2015-03-12 Thread Costin Leau
The exception occurs because you are trying to extract a field (the script
parameters) from a complex type (array) and not a primitive. The issue with
that (and why it's currently not supported) is because the internal
structure of the complex type can get quite complex and its serialized,
JSON form incorrect.
Any reason why you need to pass the array of maps as a script parameter and
not use primitives instead (you can use Hive column mapping to extract the
ones you need)?

On Thu, Mar 12, 2015 at 11:56 PM, Chen Wang 
wrote:

> Folks,
> I am using elasticsearch-hadoop-hive-2.1.0.Beta3.jar
>
> I defined the external table as:.
> CREATE EXTERNAL TABLE IF NOT EXISTS ${staging_table}(
> customer_id STRING,
>  store_purchase array>)
> ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
> TBLPROPERTIES (
> 'es.nodes'='localhost:9200',
> 'es.resource'='user_activity/store',
> 'es.mapping.id'='customer_id',
> 'es.input.json'='false',
> 'es.write.operation'='upsert',
> 'es.update.script'='ctx._source.store_purchase += purchase',
> 'es.update.script.params'='purchase:store_purchase'
> ) ;"
>
> I create another source table with the same column names and put some
> sample data.
>
> Running INSERT OVERWRITE TABLE ${staging_table}
>
> SELECT customer_id, store_purchase) FROM ${test_table}
>
> but it throws EsHadoopIllegalArgumentException: Field [_col1] needs to be
> a primitive; found [array>]. Is array> supported yet? If not, how can I get
> around this issue?
>
> Thanks~
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/929c7b5b-fbb4-4232-821b-331499c18369%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJogdmdtqpqyK%2BYGKDmkZwxFUJtpqty-q1Q1WrOAXwFB24zgaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Please help to understand these Exceptions

2015-03-12 Thread Mark Walkom
The limit of a node is hard to definitively know as use cases vary so much,
but from what I have seen 3TB on 3 nodes is pretty dense.

On 12 March 2015 at 08:09, Chris Neal  wrote:

> Thank you Mark.
>
> May I ask what about my answers caused you to say "definitely"? :)  I want
> to better understand capacity related items for ES for sure.
>
> Many thanks!
> Chris
>
> On Wed, Mar 11, 2015 at 2:13 PM, Mark Walkom  wrote:
>
>> Then you're definitely going to be seeing node pressure. I'd add another
>> one or two and see how things look after that.
>>
>> On 11 March 2015 at 07:21, Chris Neal  wrote:
>>
>>> Again Mark, thank you for your time :)
>>>
>>> 157 Indicies
>>> 928 Shards
>>> Daily indexing that adds 7 indexes per day
>>> Each index has 3 shards and 1 replica
>>> 2.27TB of data in the cluster
>>> Index rate averages about 1500/sec
>>> IOps on the servers is ~40
>>>
>>> Chris
>>>
>>> On Tue, Mar 10, 2015 at 7:57 PM, Mark Walkom 
>>> wrote:
>>>
 It looks like heap pressure.
 How many indices, how many shards, how much data do you have in the
 cluster?

 On 8 March 2015 at 19:24, Chris Neal  wrote:

> Thank you Mark for your reply.
>
> I do have Marvel running, on a separate cluster even, so I do have
> that data from the time of the problem.  I've attached 4 screenshots for
> reference.
>
> It appears that node 10.0.0.12 (the green line on the charts) had
> issues.  The heap usage drops from 80% to 0%.  I'm guessing that is some
> sort of crash, because the heap should not empty itself.  Also its load
> goes to 0.
>
> I also see a lot of Old GC duration on 10.0.0.45 (blue line).  Lots of
> excessive Old GC Counts, so it does appear that the problem was memory
> pressure on this node.  That's what I was thinking, but was hoping for
> validation on that.
>
> If it was, I'm hoping to get some suggestions on what to do about it.
> As I mentioned in the original post, I've tweaked I think needs tweaking
> based on the system, and it still happens.
>
> Maybe it's just that I'm pushing the cluster too much for the
> resources I'm giving it, and it "just won't work".
>
> The index rate was only about 2500/sec, and the search request rate
> had one small spike that went to 3.0.  But 3 searches in one timeslice is
> nothing.
>
> Thanks again for the help and reading all this stuff.  It is
> appreciated.  Hopefully I can get a solution to keep the cluster stable.
>
> Chris
>
> On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom 
> wrote:
>
>> You really need some kind of monitoring, like Marvel, around this to
>> give you an idea of what was happening prior to the OOM.
>> Generally a node becoming unresponsive will be due to GC, so take a
>> look at the timings there.
>>
>> On 5 March 2015 at 02:32, Chris Neal 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm hoping someone can help me piece together the below log
>>> entries/stack traces/Exceptions.  I have a 3 node cluster in 
>>> Development in
>>> EC2, and two of them had issues.  I'm running ES 1.4.4, 32GB RAM, 16GB
>>> heaps, dedicated servers to ES.  My idex rate averages about 10k/sec.
>>> There were no searches going on at the time of the incident.
>>>
>>> It appears to me that node 10.0.0.12 began timing out requests to
>>> 10.0.45, indicating that 10.0.0.45 was having issues.
>>> Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
>>>  IndexWriter already closed", caused by an OOME.
>>> Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an
>>> OOME.
>>> Then things are basically down and unresponsive.
>>>
>>> What is weird to me is that if 10.0.0.45 was the node having issues,
>>> why did 10.0.0.12 log an exception 7 minutes before that?  Did both 
>>> nodes
>>> run out of memory?  Or is one of the Exceptions actually saying, "I see
>>> that this other node hit an OOME, and I'm telling you about it."
>>>
>>> I have a few values tweaked in the elasticsearch.yml file to try and
>>> keep this from happening (configured from Puppet):
>>>
>>> 'indices.breaker.fielddata.limit' => '20%',
>>> 'indices.breaker.total.limit' => '25%',
>>> 'indices.breaker.request.limit' => '10%',
>>> 'index.merge.scheduler.type' => 'concurrent',
>>> 'index.merge.scheduler.max_thread_count' => '1',
>>> 'index.merge.policy.type' => 'tiered',
>>> 'index.merge.policy.max_merged_segment' => '1gb',
>>> 'index.merge.policy.segments_per_tier' => '4',
>>> 'index.merge.policy.max_merge_at_once' => '4',
>>> 'index.merge.policy.max_merge_at_once_explicit' => '4',
>>> 'indices.memory.index_buffer_size' => '10%',
>>> 'indices.store.throttle.type' => 'none',
>>> 'index.translog.flush_threshol

Re: Hive to elasticsearch Parsing exception.

2015-03-12 Thread Costin Leau
Likely the issue is caused by the fact that in your manual mapping, the
"NULL" value is not actually mapped to null but actually to a string value.
You should be able to get around it by converting "NULL" to a proper NULL
value which es-hadoop can recognized; additionally you can 'translate' it
to a default one.

As for understanding what field caused the exception, unfortunately
Elasticsearch doesn't provide enough information about this yet but it
should. Can you please raise a quick issue on es-hadoop about this?

Thanks,

On Thu, Mar 12, 2015 at 10:12 PM, P lva  wrote:

> Hello Everyone,
>
> I'm loading data from a a hive table (0.13) in to elasticsearch (1.4.4).
> With the auto create index option turned on , I don't face any problems
> and I can see all the data in ES.
>
> However, I get the following error when i create the index manually.
>
> Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
> unrecoverable error [Bad Request(400) - [MapperParsingException[failed to
> parse]; nested: NumberFormatException[For input string: "NULL"]; ]];
> Bailing out..
> at
> org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
> at
> org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
> at
> org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
> at
> org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:152)
> at
> org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:146)
> at
> org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:621)
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> at
> org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> at
> org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:262)
>
>
> To create the index manually, I've used the same mappings from the first
> auto create step and changed one field to geo point type.
> Changing the field type is the only change I made.
>
>
> The column that I wanted to be geo fields had a few nulls, so i selected
> rows without nulls and still have the same error.
>
> Is there any way to identify which column is causing the issue ? There's
> about 70 columns in my table.
>
> Tl;dr
> Hive table to elasticsearch
> Auto create index works fine
> Fails when I manually created index with almost same mapping (except one
> field changed from string to geopoint)
>
>
> Thanks
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAO9TxdO22hy2%3Dcz1S_DJgvtd0rsw%2Bu0WL8SqLFR8GTbbGJr9EQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJogdmderE6q3w0mJytbmfKkYHyegs7zwi9x5wtOe9G_MWKEyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch manifest.xml possible configuration issue.

2015-03-12 Thread Mark Walkom
Technically we don't support SmartOS, but please raise a Github issue
anyway as it'd be interesting to look into more.

On 12 March 2015 at 12:12, dj.hutch  wrote:

> Hi All,
>
> I've been working with Elasticsearch on a Joyent SmartOS instance and
> discovered a possible issue with the java_opts value in the
> elasticsearch.xml file used to create the service.
>
> The line currently reads:
>
> 
>
> This property ends up being passed to the startup command with the space
> between them escaped, which causes the JVM to treat them as all one
> argument. This can be seen in the output of the jinfo
> -sysprops .
>
> The first argument in the java_opts property is
> "-Djava.awt.headless=true", but the java.awt.headless property contains
> true plus all the remaining arguments:
> java.awt.headless = true -Xss256k -XX:+DisableExplicitGC
> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:CMSInitiatingOccupancyFraction=75.
>
> If this property line is removed, and the value from this line is used to
> replace *%{java_opts}* in the start exec_method, the results are as
> expected from the jinfo command.
>
> Note: After the elasticsearch.xml manifest edit, use "svcadm restart
> manifest-import" for changes to take affect... or whatever method you
> prefer.
>
> Just wondering if there is a better solution? I'd also think that this a
> bug since all the java options after ...headless = true are ignored (i.e.
> no Heap dump etc.).
>
> Thanks,
> Dean
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d12b9f63-a1b4-4b1b-a446-7a1278cd428a%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9iZTCzjURLtsT-3uFvk3x-uXRRawg2K5XNSQCdVQeTCA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: snapshot and zone

2015-03-12 Thread Mark Walkom
Just use a local mount point, as long as the path is the same it doesn't
matter.

Also, we do not recommend cross DC clusters, there is a lot of potential
issues you can run into.

On 12 March 2015 at 17:31, Foobar Geez  wrote:

> Hello,
> We use one ES cluster with 4 nodes spread across 2 data centers (2
> nodes/DC).  Each DC is configured as a zone
> (via cluster.routing.allocation.awareness.attributes).
>
> I would like to use snapshot to backup indexes using type:fs.  Per
> http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html,
> the requirement is to have a location that is accessible on all data and
> master nodes.  Unfortunately, I don't have a location that is accessible
> from both DCs (cross-mounts across DCs is not supported).
>
> Is there a way to backup indexes in a given DC/zone, especially given that
> all the primary and replica shards are available in a DC anyway?
>
> Thanks in advance!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/72d3b5e9-7afa-44a7-bbc8-7e4665636f19%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-Qm4Dz%3D_gBtkkgRTrgCNAiVw531ZsB8RYNEx9HHpbCaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana with Hadoop directly?

2015-03-12 Thread Costin Leau
On the bright side, you can use es-hadoop connector [1] to easily get data
from Hadoop/HDFS to Elasticsearch and back whatever your Hadoop stack
(Map/Reduce, Cascading, Pig, Hive, Spark, Storm).

[1] https://www.elastic.co/products/hadoop

On Fri, Mar 13, 2015 at 3:15 AM,  wrote:

> Kibana is very tightly integrated with ElasticSearch, to the point of
> requiring specific versions of ElasticSearch for a given version of Kibana.
>
> When you say Hadoop that really means nothing.   Most of the Hadoop
> EcoSystem is not realtime.  There are some exceptions like HBase, but their
> characteristics are very different from ElasticSearch.  This is why it is
> so common to use ElasticSearch with a Hadoop environment, to provide a
> realtime aspect to your big data.  This is why as you say ElasticSearch is
> in the middle.  There is not reasonable way Kibana could extract random
> bits of data from "Hadoop".
>
> That said there are many people who have built data visualizations using
> Hadoop and D3, the js lib that is used in Kibana.  If you really want to go
> without ElasticSearch (you don't), I would recommend looking for something
> like that.
>
> Aaron
>
> On Thursday, March 12, 2015 at 2:41:42 AM UTC-6, KRRK2015 wrote:
>>
>> Hello, has anyone tried to get Kibana work directly with Hadoop (without
>> elasticsearch in the middle)? If yes, how? Any references would help.
>> Thanks.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5a2201d3-ae86-4130-8194-0c0560223891%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJogdmca%3DbH2mPep5j3WUdjz8YT3UvPSuGG0UvFvjPMZvWwzHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


"now throttling indexing"

2015-03-12 Thread Eric Jain
I set `indices.store.throttle.type: none` in the elasticsearch.yml, and yet 
this shows up in the logs:

  now throttling indexing: numMergesInFlight=5, maxNumMerges=4
  stop throttling indexing: numMergesInFlight=3, maxNumMerges=4

Did I misunderstand the purpose of this setting?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3b0f57e1-f78e-4782-ad4d-2dfe42bb5c17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana with Hadoop directly?

2015-03-12 Thread aaron
Kibana is very tightly integrated with ElasticSearch, to the point of 
requiring specific versions of ElasticSearch for a given version of Kibana.

When you say Hadoop that really means nothing.   Most of the Hadoop 
EcoSystem is not realtime.  There are some exceptions like HBase, but their 
characteristics are very different from ElasticSearch.  This is why it is 
so common to use ElasticSearch with a Hadoop environment, to provide a 
realtime aspect to your big data.  This is why as you say ElasticSearch is 
in the middle.  There is not reasonable way Kibana could extract random 
bits of data from "Hadoop".

That said there are many people who have built data visualizations using 
Hadoop and D3, the js lib that is used in Kibana.  If you really want to go 
without ElasticSearch (you don't), I would recommend looking for something 
like that.

Aaron

On Thursday, March 12, 2015 at 2:41:42 AM UTC-6, KRRK2015 wrote:
>
> Hello, has anyone tried to get Kibana work directly with Hadoop (without 
> elasticsearch in the middle)? If yes, how? Any references would help. 
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5a2201d3-ae86-4130-8194-0c0560223891%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Oops! SearchPhaseExecutionException[Failed to execute phase [query], all shards failed]

2015-03-12 Thread aaron
You should be able to set the number of replicas for all previous indexes 
to 0.  You cannot reduce the shard count once an index is created, or 
increase for that matter.  You could reindex your shards.

http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html


curl -XPUT 'localhost:9200/my_index/_settings' -d '
{
"index" : {
"number_of_replicas" : 0
}
}'





On Thursday, March 12, 2015 at 11:35:12 AM UTC-6, Taylor Wood wrote:
>
> I didn't get any help on this but as an FYI for those that may have this 
> issue and are just starting:
>
> Digging deeper it appears our system was created with 5 shards and 1 
> replica.   Granted we are only using 1 node so every day elasticsearch 
> would create an indice of 10 shards, 5 for the primary node and 5 for the 
> secondary node (which doesn't exist on our system but would for 
> redundancy).  We made it so all future indices created have 0 replicas in 
> the future.I can't find a way to clean up all the unallocated shards 
> from previous indices without deleting the data.
>
> If the active shards is almost = to unassigned shards you are using 
> replication and need to have a second node running.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b36108c-3e90-4865-85e3-d6a74b53ee97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch across multiple data center architecture design options

2015-03-12 Thread aaron
Perhaps you are misunderstanding me.  ElasticSearch does not provide a load 
balancer for this purpose.  You would use a typical HTTP load balancer 
which could be anything as simple as Nginx, to something costly and 
expensive like a NetScalar.  Configuring such a loadbalancer I believe is 
outside the scope of this list.

On Thursday, March 12, 2015 at 11:16:12 AM UTC-6, Abigail wrote:
>
> Yes, that is what I meant. Is there any reference for set up the load 
> balance for Kibana 4?  Or if it is easier for Kibana 3?
>
>
>
> On Thu, Mar 12, 2015 at 12:26 PM, > 
> wrote:
>
>> Why not load balance multiple tribe nodes, if you need multiple.
>>
>>
>> On Wednesday, March 11, 2015 at 9:41:39 AM UTC-6, Abigail wrote:
>>>
>>> Hi Mark,
>>>
>>> Thank you for your reply. Is there any existing approach for kibana to 
>>> communicate with multiple tribe nodes? Or is it something we should 
>>> implement by ourselves by customizing kibana?
>>>
>>> Thank you!
>>> Abigail
>>>
>>> On Tuesday, March 10, 2015 at 8:56:25 PM UTC-4, Mark Walkom wrote:

 1 - It's pretty simple and has been used before.
 2 - it can be yes. You can have multiple tribe nodes though.
 3 - This may be possible but you'd have to hack a fair bit of code, so 
 it's not really practical.

 On 10 March 2015 at 13:00, Alex  wrote:

> Hi all,
>
> We are planning to use ELK for our log analysis. We have multiple data 
> centers. Since it is not recommended to have across data center cluster, 
> we 
> are going to have one ES cluster per data center,  here are the three 
> design options we have:
>
> 1. Use snapshot & restore to replicate data across clusters.
> 2. Use tribe node to achieve across cluster queries
> 3. Ship and index logs to each cluster
>
> Here are our questions, and any comments will be appreciated:
> 1. How complex is snapshot & restore, anyone has experience on this 
> purpose?
> 2. Would the performance of only one tribe node be a concern or 
> bottleneck, is it possible to have multiple tribe nodes for scale up or 
> load balancing?
> 3. Is it possible to customize Kibana so that it can go to different 
> cluster to query data depends on the query?
>
> Thank you!
> Abigail
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/2d46f80b-8579-4f2b-86c0-5ad654a5bba3%
> 40googlegroups.com 
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/NPSIdmm9NX0/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/b69c667f-b1c6-46ce-8122-e809a22110c0%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c125f7bf-7908-4932-b022-88df49fc5f81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch with large amount of data

2015-03-12 Thread aaron
First going to assume you mean 8GBs of memory or I am very impressed that 
ElasticSearch runs at all.

Second, when are you running out of memory?  
Do you run out of memory while indexing?
  Is it a specific document when indexing?
Do you run out of memory when searching?
  Is it a specific search when searching?
What type of search, sort, filter?
How many documents do you index each day
 What is the largest document?
 What is the average document?
 Are you indexing in batches?
   How big are your batches?
Of your 8 gb how much is allocated to ElasticSearch?
  How much is allocated to File System Cache?
(I usually start with 2 GB to the OS, and split the remaining ram 
between ElasticSearch and FileSystem Cache.  This means allocate 3GB to 
ElasticSearch.

By a rough swag based on the very little info you have provided, I would 
say that your cluster does not have enough ram for the level of data you 
are trying to load into it.  In general I have found that lucene indexes 
like to be in memory.  When they cannot performance is poor and operations 
can fail.  By indexing 100GBs of data a day, you are asking ElasticSearch 
to store some pretty large segments for 8GB or memory (effectively 3GB of 
ES).

>From this page:
http://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB 
machines are also common. Less than 8 GB tends to be counterproductive (you 
end up needing many, many small machines), and greater than 64 GB has 
problems that we will discuss in Heap: Sizing and Swapping 

.

Also review:
http://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
The default installation of Elasticsearch is configured with a 1 GB heap. 

 

 

 

 
For just about every deployment, this number is far too small. If you are 
using the default heap values, your cluster is probably configured 
incorrectly.

I ran into similar problems with machines that had only 8GB or memory when 
indexing.  My data volume was lower than what you have indicated. 
 Upgrading to larger instances with 16GB resolved the issue and I have not 
had a problem since.  Of course I had tuned everything previously according 
to what I outlined above.  The 16 GB box means that instead of 3GB for ES 
you have (16G-2G)/2= 7GB, more than double.  In consulting engagements I 
always recommend 16GB as a bare minimum, but 32GB as a realistic minimum.

This page also has some good info on it:

https://www.found.no/foundation/sizing-elasticsearch/

Aaron


On Thursday, March 12, 2015 at 6:12:11 PM UTC-6, Jeferson Martins wrote:
>
> Hi,
>
> I have 5 nodes of ElasticSearch with 4 CPUs, 8 Mbs of RAM.
>
> My Index today have 1TB of data and my index have about 100GBs By day and 
> i configure 3 primary shards and 1 replica but my elasticsearch gets 
> OutOfMemoy in every two days.
>
> There is some configuration to resolve this problem?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02979d4a-c24d-44c0-85a4-a34e01b7dc20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


snapshot and zone

2015-03-12 Thread Foobar Geez
Hello,
We use one ES cluster with 4 nodes spread across 2 data centers (2 
nodes/DC).  Each DC is configured as a zone 
(via cluster.routing.allocation.awareness.attributes).

I would like to use snapshot to backup indexes using type:fs.  Per 
http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html,
 
the requirement is to have a location that is accessible on all data and 
master nodes.  Unfortunately, I don't have a location that is accessible 
from both DCs (cross-mounts across DCs is not supported).

Is there a way to backup indexes in a given DC/zone, especially given that 
all the primary and replica shards are available in a DC anyway?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/72d3b5e9-7afa-44a7-bbc8-7e4665636f19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch with large amount of data

2015-03-12 Thread Jeferson Martins
Hi,

I have 5 nodes of ElasticSearch with 4 CPUs, 8 Mbs of RAM.

My Index today have 1TB of data and my index have about 100GBs By day and i 
configure 3 primary shards and 1 replica but my elasticsearch gets 
OutOfMemoy in every two days.

There is some configuration to resolve this problem?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c018f48-d760-43a3-9878-e3608a113d1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch-hadoop-hive exception when writing array> column

2015-03-12 Thread Chen Wang
Folks,
I am using elasticsearch-hadoop-hive-2.1.0.Beta3.jar

I defined the external table as:.
CREATE EXTERNAL TABLE IF NOT EXISTS ${staging_table}(
customer_id STRING,
 store_purchase array>)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.nodes'='localhost:9200',
'es.resource'='user_activity/store',
'es.mapping.id'='customer_id',
'es.input.json'='false',
'es.write.operation'='upsert',
'es.update.script'='ctx._source.store_purchase += purchase',
'es.update.script.params'='purchase:store_purchase'
) ;"

I create another source table with the same column names and put some 
sample data.

Running INSERT OVERWRITE TABLE ${staging_table}

SELECT customer_id, store_purchase) FROM ${test_table} 

but it throws EsHadoopIllegalArgumentException: Field [_col1] needs to be a 
primitive; found [array>]. Is array> supported yet? If not, how can I get 
around this issue?

Thanks~

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/929c7b5b-fbb4-4232-821b-331499c18369%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregations across multiple indices

2015-03-12 Thread Karl Putland
you might look at
http://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation

--K

Karl Putland
Senior Engineer
*SimpleSignal*
Anywhere: 303-242-8608



On Thu, Mar 12, 2015 at 10:04 AM, Christian Rohling 
wrote:

> Hello Everyone,
> I am attempting to use aggregations to count the number of documents
> matching a given query across multiple indices. What I would like to do, is
> make those counts on distinct keys. Say I had following document in 2
> different indices, aliased together.
> ```
> {
> _index: myindex
> _type: mytype
> _id: 1
> _version: 1
> _score: 1
> _source: {
> country: MEXICO
> }
> }```
>
> When I make an aggs term query on the field "country" I would like it to
> only return a single count for the document with id=1(which exists in both
> indices). The actual use case is a bit more complicated than what's
> described above, this is just an example of the functionality that I am
> looking for. I cannot find any info in the docs, and have asked in the IRC
> channel to no avail.
>
> -Christian Rohling
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALsYvrzV-PyUNUHcUHWNCDBQKz5jV9%3DTPoQ2hW1me8q%2BhBgKDg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BEXWszW-B43Mc%2B6LZMxA5x2Hym5EPgNFQ%3DZ0a1da7s2yjEAyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Analyzers and JSON

2015-03-12 Thread Aaron Mefford
Take a look at Apache Tika http://tika.apache.org/.  It will allow you to
extract the contents of the documents for indexing, this is outside of the
scope of the ElasticSearch indexing.  A good tool to make these files
downloadable is also out of scope, but I'll answer to what is in scope.
You need to put the files some where that they can be accessed by a URL.
Any webserver is capable of this, of course your needs may very but this
isnt the list for those questions.  Once you have a URL that the document
can be accessed by, include that in your indexing of the document so that
you can point to that URL in your search results.

I am sure there are other options out there for extracting the contents of
word documents, Apache Tika is one that is frequently used for this purpose
though.

On Thu, Mar 12, 2015 at 2:56 PM, Austin Harmon 
wrote:

> Okay so I have a large amount of data 2 TB and its all microsoft office
> documents and pdfs and emails. What is the best way to go about indexing
> the body of these documents so making the contents of the document
> searchable. I tried to use the php client but that isn't helping and I know
> there are ways to convert files in php but is there nothing available that
> takes in these types of documents? I tried the file_get_contents function
> in php but it only takes in text documents. Also would you know of a good
> tool or a method to make the files that are searched downloadable?
>
> Thanks,
> Austin
>
>
> On Thursday, March 12, 2015 at 12:26:13 PM UTC-5, aa...@definemg.com
> wrote:
>>
>> Yes you need to include all the text you want indexed and searchable as
>> part of the JSON.
>>
>> How else would you expect ElasticSearch to receive the data?
>>
>> Regarding large scale production environments, this is why ElasticSearch
>> scales out.
>>
>> Aaron
>>
>> On Wednesday, March 11, 2015 at 12:50:25 PM UTC-6, Austin Harmon wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to get an understand of the how to have full text search on
>>> the document and have the body of the document be considered during search.
>>> I understand how to do the mapping and use analyzers but what I don't
>>> understand is how they get the body of the document. If your fields are
>>> file name, file size, file path, file type how do the analyzers get the
>>> body of the document. Surely you wouldn't have to put the body of every
>>> document into the JSON, that is how I've seen it done in all the examples
>>> I've seen but that doesn't make sense for large scale production
>>> environments. If someone could please give me some  insight as to how this
>>> process works it would be greatly appreciated.
>>>
>>> Thank you,
>>> Austin Harmon
>>>
>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/mG2k23vbzXQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/41516b36-18e3-4ef8-8d8d-1e9da6b727a4%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF9vEEpvt4ZkL%3DZ4_tXv0S9xWs-f-pzZae3iMpFHyRmhDH1SBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: OOME When large segments merge

2015-03-12 Thread Michael McCandless
Do you have many fields with norms enabled?

Mike McCandless



On Thu, Mar 12, 2015 at 1:20 PM, Mark Greene  wrote:

> I've noticed periodically that data nodes in my cluster will run out of
> heap space when large segments start merging. I attached a screenshot of
> what marvel looked like leading up to the OOME on one of my data nodes.
>
> My question is, generally speaking, what knobs should I be turning to
> throttle resource consumption when large segment files are being merged?
>
> I'm running ES version 1.4.1.
>
> Thanks,
> Mark
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/35f2c096-ee1e-4a30-a11c-0177e84356ff%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPi4_NZdB_0xZ1-13ke2qJOUr%3DuH9BUzHTj5Em0GWJ3uNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Analyzers and JSON

2015-03-12 Thread Austin Harmon
Okay so I have a large amount of data 2 TB and its all microsoft office 
documents and pdfs and emails. What is the best way to go about indexing 
the body of these documents so making the contents of the document 
searchable. I tried to use the php client but that isn't helping and I know 
there are ways to convert files in php but is there nothing available that 
takes in these types of documents? I tried the file_get_contents function 
in php but it only takes in text documents. Also would you know of a good 
tool or a method to make the files that are searched downloadable?

Thanks,
Austin

On Thursday, March 12, 2015 at 12:26:13 PM UTC-5, aa...@definemg.com wrote:
>
> Yes you need to include all the text you want indexed and searchable as 
> part of the JSON.
>
> How else would you expect ElasticSearch to receive the data?
>
> Regarding large scale production environments, this is why ElasticSearch 
> scales out.
>
> Aaron
>
> On Wednesday, March 11, 2015 at 12:50:25 PM UTC-6, Austin Harmon wrote:
>>
>> Hello,
>>
>> I'm trying to get an understand of the how to have full text search on 
>> the document and have the body of the document be considered during search. 
>> I understand how to do the mapping and use analyzers but what I don't 
>> understand is how they get the body of the document. If your fields are 
>> file name, file size, file path, file type how do the analyzers get the 
>> body of the document. Surely you wouldn't have to put the body of every 
>> document into the JSON, that is how I've seen it done in all the examples 
>> I've seen but that doesn't make sense for large scale production 
>> environments. If someone could please give me some  insight as to how this 
>> process works it would be greatly appreciated.
>>
>> Thank you,
>> Austin Harmon
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/41516b36-18e3-4ef8-8d8d-1e9da6b727a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Dealing with spam in this forum

2015-03-12 Thread Gavin Seng

Hi,

What is the current policy on this?

I just tried creating 2 new posts ... they showed up for awhile ... and 
then disappeared. I thought that it could be because I did inline pictures 
... so I tried reposting and got the same result.

Not sure if they're in a "to be moderated" bucket ... or googles spam 
filter just deleted them.

The posts had these subjects:
* What is a reasonable number of evictions for filter_cache and fielddata?
* Are long queues in management threadpool a problem?

Thanks,
Gavin

On Tuesday, December 2, 2014 at 4:56:54 AM UTC-5, nodexy wrote:
>
> The first post should be approved .
>
>
> On Wednesday, July 2, 2014 2:36:44 AM UTC+8, Clinton Gormley wrote:
>>
>> Hi all
>>
>> Recently we've had a few spam emails that have made it through Google's 
>> filters, and there have been a calls for us to change to a 
>> moderate-first-post policy. I am reluctant to adopt this policy for the 
>> following reasons:
>>
>> We get about 30 new users every day from all over the world, many of whom 
>> are early in their learning phase and are quite stuck - they need help as 
>> soon as possible. Fortunately this list is very active and helpful. In 
>> contrast, we've only ever banned 34 users from the list for spamming.  So 
>> making new users wait for timezones to swing their way feels like a heavy 
>> handed solution to a small problem. Yes, spammers are annoying but they are 
>> a small minority on this list.
>>
>> Instead, we have asked 10 of our long standing members to help us with 
>> banning spammers.  This way we have Spam Guardians active around the globe, 
>> who only need to do something if a spammer raises their ugly head above the 
>> parapet. One or two spam emails may get through, but hopefully somebody 
>> will leap into action and stop their activity before it becomes too 
>> tiresome.
>>
>> This isn't an exclusive list. If you would like to be on it, feel free to 
>> email me.  Note: I expect you to be a long standing and currently active 
>> member of this list to be included. 
>>
>> If this solution doesn't solve the problem, then we can reconsider 
>> moderate-first-post, but we've managed to go 5 years without requiring it, 
>> and I'd prefer to keep things as easy as possible for new users.
>>
>> Clint
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3c9db9fa-4676-4413-b7e6-e309869a6643%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Should clause behaves like a must clause in filtered query

2015-03-12 Thread Les Barstow
"should" has a "minimum_should_match" of 1 when there is no "must" or
"must_not". With only a single "should", that makes it act like "must".

On Thu, Mar 12, 2015 at 9:08 AM, parq  wrote:

> However, the following query returns the expected document,
>
> curl -XGET "http://localhost:9200/test-cbx/bug/_search"; -d'
> {
>"query": {
>   "filtered": {
>  "query": {
> "bool": {
> "must": [
>{
>"match": {
>   "type": {
>   "query": "some type"
>
>   }
>}
>}
> ],
> "should": [
>{
>"match": {
>   "country": {
>   "query": "de"
>
>   }
>}
>}
> ]
> }
>  },
>  "filter": {
>   "term": {
>  "type": "some type"
>   }
>  }
>   }
>}
> }'
>
> May be it is like "should clause" does not work without a "Must clause" in
> query?
>
>
> On Thursday, March 12, 2015 at 2:54:00 PM UTC+1, parq wrote:
>>
>> Hello all,
>>
>> We have a single document in an index:
>>
>> $  curl -XGET "http://localhost:9200/test-cbx/bug/_search?q=*";  gives us
>> the following response
>> {"took":2,"timed_out":false,"_shards":{"total":5,"
>> successful":5,"failed":0},"hits":{"total":1,"max_score":
>> 1.0,"hits":[{"_index":"test-cbx","_type":"bug","_id":"1","
>> _score":1.0,"_source":
>> {
>> "country": "lu",
>> "type": “some type"
>> }}]}}
>>
>> And the following two queries give no results, even though it’s a should
>> clause:
>>
>> $ curl -XGET "http://localhost:9200/test-cbx/bug/_search"; -d'
>> {
>>"query": {
>>   "filtered": {
>>  "query": {
>> "match_all": {}
>>  },
>>  "filter": {
>> "bool": {
>>"should": {
>>   "term": {
>>  "country": "de"
>>   }
>>}
>> }
>>  }
>>   }
>>}
>> }'
>>
>> $ curl -XGET "http://localhost:9200/test-cbx/bug/_search"; -d'
>> {
>>"query": {
>>   "filtered": {
>>  "query": {
>> "bool": {
>> "should": [
>>{
>>"match": {
>>   "country": {
>>   "query": "de"
>>
>>   }
>>}
>>}
>> ]
>> }
>>  },
>>  "filter": {
>>   "term": {
>>  "type": “some type"
>>   }
>>  }
>>   }
>>}
>> }'
>>
>> What is the preferred way to approach the bool query? Filter or the query?
>>
>>
>> Regards,
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/58b7f178-4c09-4de7-9d38-c7aa3bc39a05%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOppbCUJZ-pqrJMGpvk1Vv-Wqyanmz8XDW6MmBdCj6mvuuXUcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


OOME When large segments merge

2015-03-12 Thread Mark Greene
I've noticed periodically that data nodes in my cluster will run out of 
heap space when large segments start merging. I attached a screenshot of 
what marvel looked like leading up to the OOME on one of my data nodes. 

My question is, generally speaking, what knobs should I be turning to 
throttle resource consumption when large segment files are being merged?

I'm running ES version 1.4.1. 

Thanks,
Mark

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/35f2c096-ee1e-4a30-a11c-0177e84356ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Hive to elasticsearch Parsing exception.

2015-03-12 Thread P lva
Hello Everyone,

I'm loading data from a a hive table (0.13) in to elasticsearch (1.4.4).
With the auto create index option turned on , I don't face any problems and
I can see all the data in ES.

However, I get the following error when i create the index manually.

Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
unrecoverable error [Bad Request(400) - [MapperParsingException[failed to
parse]; nested: NumberFormatException[For input string: "NULL"]; ]];
Bailing out..
at
org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
at
org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
at
org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
at
org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:152)
at
org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:146)
at
org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:621)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:262)


To create the index manually, I've used the same mappings from the first
auto create step and changed one field to geo point type.
Changing the field type is the only change I made.


The column that I wanted to be geo fields had a few nulls, so i selected
rows without nulls and still have the same error.

Is there any way to identify which column is causing the issue ? There's
about 70 columns in my table.

Tl;dr
Hive table to elasticsearch
Auto create index works fine
Fails when I manually created index with almost same mapping (except one
field changed from string to geopoint)


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO9TxdO22hy2%3Dcz1S_DJgvtd0rsw%2Bu0WL8SqLFR8GTbbGJr9EQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Issue with bettermap / kibana

2015-03-12 Thread squeaky jetpack
On Wednesday, 12 March 2014 13:31:12 UTC-7, Clinton Gormley wrote:

> This issue is fixed in master.  cloudmade turned off public access, so we 
> have switched to the mapquest servers.
>
 
I'm still having this exact same issue:

https://ssl_tiles.cloudmade.com/57cbb6ca8cac418dbb1a402586df4528/22677/256/2/2/1.png

*Result: Wrong apikey*


Has this change gone into releases or is it still only in master?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/19b4eb48-61f0-44df-8360-6689bebc9f34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch manifest.xml possible configuration issue.

2015-03-12 Thread dj.hutch
Hi All,

I've been working with Elasticsearch on a Joyent SmartOS instance and 
discovered a possible issue with the java_opts value in the 
elasticsearch.xml file used to create the service.

The line currently reads:



This property ends up being passed to the startup command with the space 
between them escaped, which causes the JVM to treat them as all one 
argument. This can be seen in the output of the jinfo 
-sysprops .

The first argument in the java_opts property is "-Djava.awt.headless=true", 
but the java.awt.headless property contains true plus all the remaining 
arguments:
java.awt.headless = true -Xss256k -XX:+DisableExplicitGC 
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:CMSInitiatingOccupancyFraction=75.

If this property line is removed, and the value from this line is used to 
replace *%{java_opts}* in the start exec_method, the results are as 
expected from the jinfo command.

Note: After the elasticsearch.xml manifest edit, use "svcadm restart 
manifest-import" for changes to take affect... or whatever method you 
prefer.

Just wondering if there is a better solution? I'd also think that this a 
bug since all the java options after ...headless = true are ignored (i.e. 
no Heap dump etc.).

Thanks,
Dean

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d12b9f63-a1b4-4b1b-a446-7a1278cd428a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Snapshot Scaling Problems

2015-03-12 Thread aaron
With the low volume of ingest, and the long duration of history, Id suggest 
you may want to trim back the number of shards per index from the default 
5.  Based on your 100 docs per day Id say 1 shard per day.  If you combined 
this with the other suggestion to increase the duration of an index, then 
you might increase the number of shards, but maybe still not.  Running an 
optimize once you have completed a time period is great advice if you can 
afford the overhead, sounds like one day at a time you should be able to, 
and that the overhead of not optimizing is costing you more when you 
snapshot.

And index is made of shards, a shard is made of lucene segments.  Lucene 
segments are the actual files that you copy when you snapshot.  As such the 
number of segments is multiplied by the number of shards per index and the 
number of indexes.  Reducing the number of indexes by creating larger time 
periods will significantly reduce the number of segments.  Reducing the 
number of shards per index will significantly reduce the number of 
segments.  Optimizing the index will also consolidate many segments into a 
single segment.

Based on the use of S3 should we assume you are using AWS EC2?  What 
instance size?  Your data volume seems very low so it seems concerning that 
you have such a large time period to snapshot, and points to a slow file 
system, or a significant number of segments (100 indexes, 5 shards per 
index, xx segments per shard, == many thousands of segments).  What does 
your storage system look like?  If you are using EC2 are you using the 
newer EBS volumes (SSD backed)? Some of the smaller instance size 
significantly limit prolonged EBS throughput, in my experience. 

On Wednesday, March 11, 2015 at 1:12:01 AM UTC-6, Magnus Bäck wrote:
>
> On Monday, March 09, 2015 at 20:29 CET, 
>  Andy Nemzek > wrote: 
>
> > We've been using logstash for several months now and it creates a new 
> > index each day, so I imagine there are over 100 indexes at this point. 
>
> Why create daily indexes if you only have a few hundred entries in each? 
> There's a constant overhead for each shard so you don't want more 
> indexes than you need. Seems like you'd be fine with montly indexes, 
> and then your snapshot problems would disappear too. 
>
> > Elasticsearch is running on a single machine...I haven't done anything 
> > with shards, so the defaults must be in use.  Haven't optimized old 
> > indexes.  We're pretty much just running ELK out of the box.  When you 
> > mention 'optimizing indexes', does this process combine indexes? 
>
> No, but it can combine segments in a Lucene index (that make up 
> Elasticsearch indexes), and segments are what's being backed up. 
> So the more segments you have the the longer time snapshots are 
> going to take. 
>
> > Do you know if these performance problems are typical when 
> > using ELK out of the box? 
>
> 100 indexes on a single box should be okay but it depends on 
> the size of the JVM heap. 
>
> -- 
> Magnus Bäck| Software Engineer, Development Tools 
> magnu...@sonymobile.com  | Sony Mobile Communications 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7be4c805-b4f1-424d-b67b-2ad70e5da659%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: best practice for rebuilding an index using aliases

2015-03-12 Thread aaron
I tried to reply earlier but seems Google lost that reply.

My suggestion would be to create a v1_new index that has the same mappings 
as v1.  When you are ready to migrate to v2, change indexing to go to 
v1_new, change searches to cover v1 and v1_new (alias or query string), 
copy v1 to v2, change indexing to go to v2, and searches to go to v2, copy 
v1_new to v2.  This will allow you to index while copying while being able 
to easily identify the new documents.  

If you are ok with only searching new documents for a while then you can 
start indexing to v2, change search to v2, and start the copy.

If you are ok with only searching old documents for the duration of the 
transfer start indexing to v2, do the copy, then change search to v2.

The last option is to leave indexing and search on v1, do the copy to v2, 
switch indexing and search to v2, do another copy from v1, and finally 
optimize.  This has alot of potential problems.  It will essentially create 
a deleted version of all your documents, so the optimize is needed to 
correct that.  Also if your indexing is adding updates, and not just new 
documents, then the second copy from v1 might overwrite some of those 
updates, not good.  If it were me and I was not ok with the 2nd or 3rd 
option I would defintely go route 1.

On Wednesday, March 11, 2015 at 10:47:59 AM UTC-6, mzrth_7810 wrote:
>
> Hey everyone,
>
> I have a question about rebuilding an index. After reading the 
> elasticsearch guide and various topics here I've found that the best 
> practice for rebuilding an index without any downtime is by using aliases. 
> However, there are certain steps and processes around that, which I seek 
> advice for. First I'm going to take you through an example scenario, and 
> then I'll have some questions.
>
> For example, you have "workshop_index_v1", with an alias "workshop". The 
> "workshop_index_v1" has a type called "guitar" which has three properties 
> with the following mapping:
>
> "identifier" : "string"
> "make" : "string"
> "model" : "string"
>
> Lets assume there is a lot of data in workshop_index_v1/guitar at the 
> moment, which has been populated from a separate database.
>
> Now, I need to modify the mapping, because I've changed the source data, I 
> would like get rid of the "identifier" property, so my mapping becomes:
>
> "make" : "string"
> "model" : "string"
>
> As we all know elasticsearch does not allow you to remove a property in 
> the mapping directly, you inevitably have to rebuild the index, which is 
> fine in my case.
>
> So now a few things came to mind when I thought how to do this:
>
>- Create another index "workshop_index_v2", populate it with the data 
>in "workshop_index_v1" using scroll and scan with the bulk API and later 
>remove "workshop_index_v1" and add "workshop_index_v2" to the alias.
>- This will not work because the incorrect mapping(or a field value in 
>   the incorrect mapping) is already present in  "workshop_index_v1", I do 
> not 
>   want to copy everything as is.
>- Create another index "workshop_index_v2", populate it with the data 
>from the original source
>   - This works
>
> One of the big issues here is, what happens to write requests while the 
> new index is being rebuilt.
>
> As you can only write to one index, which one do you write to, the old one 
> or the new one, or both?
>
> I feel, that writing to the new one, would work. I am beginner when it 
> comes to elasticsearch, any advice regarding any of this would be greatly 
> appreciated.
>
> Best regards
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3b2d4361-1145-4f77-921a-c7be38e5bfa5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Oops! SearchPhaseExecutionException[Failed to execute phase [query], all shards failed]

2015-03-12 Thread Taylor Wood
I didn't get any help on this but as an FYI for those that may have this 
issue and are just starting:

Digging deeper it appears our system was created with 5 shards and 1 
replica.   Granted we are only using 1 node so every day elasticsearch 
would create an indice of 10 shards, 5 for the primary node and 5 for the 
secondary node (which doesn't exist on our system but would for 
redundancy).  We made it so all future indices created have 0 replicas in 
the future.I can't find a way to clean up all the unallocated shards 
from previous indices without deleting the data.

If the active shards is almost = to unassigned shards you are using 
replication and need to have a second node running.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0c368f4-2550-459b-b61e-6f781477eaee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Analyzers and JSON

2015-03-12 Thread aaron
Yes you need to include all the text you want indexed and searchable as 
part of the JSON.

How else would you expect ElasticSearch to receive the data?

Regarding large scale production environments, this is why ElasticSearch 
scales out.

Aaron

On Wednesday, March 11, 2015 at 12:50:25 PM UTC-6, Austin Harmon wrote:
>
> Hello,
>
> I'm trying to get an understand of the how to have full text search on the 
> document and have the body of the document be considered during search. I 
> understand how to do the mapping and use analyzers but what I don't 
> understand is how they get the body of the document. If your fields are 
> file name, file size, file path, file type how do the analyzers get the 
> body of the document. Surely you wouldn't have to put the body of every 
> document into the JSON, that is how I've seen it done in all the examples 
> I've seen but that doesn't make sense for large scale production 
> environments. If someone could please give me some  insight as to how this 
> process works it would be greatly appreciated.
>
> Thank you,
> Austin Harmon
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/147dea4b-54cb-43fe-b1df-6e2425c7ab99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch across multiple data center architecture design options

2015-03-12 Thread naye923
Yes, that is what I meant. Is there any reference for set up the load
balance for Kibana 4?  Or if it is easier for Kibana 3?



On Thu, Mar 12, 2015 at 12:26 PM,  wrote:

> Why not load balance multiple tribe nodes, if you need multiple.
>
>
> On Wednesday, March 11, 2015 at 9:41:39 AM UTC-6, Abigail wrote:
>>
>> Hi Mark,
>>
>> Thank you for your reply. Is there any existing approach for kibana to
>> communicate with multiple tribe nodes? Or is it something we should
>> implement by ourselves by customizing kibana?
>>
>> Thank you!
>> Abigail
>>
>> On Tuesday, March 10, 2015 at 8:56:25 PM UTC-4, Mark Walkom wrote:
>>>
>>> 1 - It's pretty simple and has been used before.
>>> 2 - it can be yes. You can have multiple tribe nodes though.
>>> 3 - This may be possible but you'd have to hack a fair bit of code, so
>>> it's not really practical.
>>>
>>> On 10 March 2015 at 13:00, Alex  wrote:
>>>
 Hi all,

 We are planning to use ELK for our log analysis. We have multiple data
 centers. Since it is not recommended to have across data center cluster, we
 are going to have one ES cluster per data center,  here are the three
 design options we have:

 1. Use snapshot & restore to replicate data across clusters.
 2. Use tribe node to achieve across cluster queries
 3. Ship and index logs to each cluster

 Here are our questions, and any comments will be appreciated:
 1. How complex is snapshot & restore, anyone has experience on this
 purpose?
 2. Would the performance of only one tribe node be a concern or
 bottleneck, is it possible to have multiple tribe nodes for scale up or
 load balancing?
 3. Is it possible to customize Kibana so that it can go to different
 cluster to query data depends on the query?

 Thank you!
 Abigail

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/2d46f80b-8579-4f2b-86c0-5ad654a5bba3%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/NPSIdmm9NX0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b69c667f-b1c6-46ce-8122-e809a22110c0%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJNTK58QCAoD%3D1Tx-9anADBRPzp41bWZhHFvn7hx%3DHwSme%3DLqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch across multiple data center architecture design options

2015-03-12 Thread naye923
Yes, that is what I meant. Is there any reference for set up the load
balance for Kibana 4?  Or if it is easier for Kibana 3?



On Thu, Mar 12, 2015 at 12:26 PM,  wrote:

> Why not load balance multiple tribe nodes, if you need multiple.
>
>
> On Wednesday, March 11, 2015 at 9:41:39 AM UTC-6, Abigail wrote:
>>
>> Hi Mark,
>>
>> Thank you for your reply. Is there any existing approach for kibana to
>> communicate with multiple tribe nodes? Or is it something we should
>> implement by ourselves by customizing kibana?
>>
>> Thank you!
>> Abigail
>>
>> On Tuesday, March 10, 2015 at 8:56:25 PM UTC-4, Mark Walkom wrote:
>>>
>>> 1 - It's pretty simple and has been used before.
>>> 2 - it can be yes. You can have multiple tribe nodes though.
>>> 3 - This may be possible but you'd have to hack a fair bit of code, so
>>> it's not really practical.
>>>
>>> On 10 March 2015 at 13:00, Alex  wrote:
>>>
 Hi all,

 We are planning to use ELK for our log analysis. We have multiple data
 centers. Since it is not recommended to have across data center cluster, we
 are going to have one ES cluster per data center,  here are the three
 design options we have:

 1. Use snapshot & restore to replicate data across clusters.
 2. Use tribe node to achieve across cluster queries
 3. Ship and index logs to each cluster

 Here are our questions, and any comments will be appreciated:
 1. How complex is snapshot & restore, anyone has experience on this
 purpose?
 2. Would the performance of only one tribe node be a concern or
 bottleneck, is it possible to have multiple tribe nodes for scale up or
 load balancing?
 3. Is it possible to customize Kibana so that it can go to different
 cluster to query data depends on the query?

 Thank you!
 Abigail

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/2d46f80b-8579-4f2b-86c0-5ad654a5bba3%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/NPSIdmm9NX0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b69c667f-b1c6-46ce-8122-e809a22110c0%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJNTK59DepRdN39kbrDwiSAT15rC82Pm18Vmd%2BSXcBT_vhf60w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana4 + Apache server ?

2015-03-12 Thread aaron
The latest versions of Kibana are very different than the older versions. 
 The old version was just a bunch of javascript that needed any old 
webserver to host the files.  The new version is a full blown node.js 
application and as such does not use Apache at all, but requires node.js. 
 It also requires the latest version of ElasticSearch.

On Thursday, March 12, 2015 at 9:26:20 AM UTC-6, Guillaume RICHAUD wrote:
>
> Hi guys, 
>
> I'm trying to install the latest ELK stack including Kibana4.0.1 on a 
> virtual machine with CentOS 7 minimal. 
>
> My aim is to access kibana via an Apache server (httpd) from my computer 
> (because the centOS mini hasn't any gnome installed so it's all in command 
> lines).
>
> I've got an issue with configuring both Kibana and Apache to run, it just 
> doesn't work ! :D
>
> Does one of you have ever tried to install the latest stack and use Kibana 
> through Apache ? 
>
> I am used to install ELK in local, but olders versions and I'm confident 
> that they are well setup for a local application, what I need are the 
> modifications to go Apache :)
>
> Thanks in advance !
> G.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/669fbed4-5520-482c-88f5-728cfac78f8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch across multiple data center architecture design options

2015-03-12 Thread aaron
Why not load balance multiple tribe nodes, if you need multiple.

On Wednesday, March 11, 2015 at 9:41:39 AM UTC-6, Abigail wrote:
>
> Hi Mark,
>
> Thank you for your reply. Is there any existing approach for kibana to 
> communicate with multiple tribe nodes? Or is it something we should 
> implement by ourselves by customizing kibana?
>
> Thank you!
> Abigail
>
> On Tuesday, March 10, 2015 at 8:56:25 PM UTC-4, Mark Walkom wrote:
>>
>> 1 - It's pretty simple and has been used before.
>> 2 - it can be yes. You can have multiple tribe nodes though.
>> 3 - This may be possible but you'd have to hack a fair bit of code, so 
>> it's not really practical.
>>
>> On 10 March 2015 at 13:00, Alex  wrote:
>>
>>> Hi all,
>>>
>>> We are planning to use ELK for our log analysis. We have multiple data 
>>> centers. Since it is not recommended to have across data center cluster, we 
>>> are going to have one ES cluster per data center,  here are the three 
>>> design options we have:
>>>
>>> 1. Use snapshot & restore to replicate data across clusters.
>>> 2. Use tribe node to achieve across cluster queries
>>> 3. Ship and index logs to each cluster
>>>
>>> Here are our questions, and any comments will be appreciated:
>>> 1. How complex is snapshot & restore, anyone has experience on this 
>>> purpose?
>>> 2. Would the performance of only one tribe node be a concern or 
>>> bottleneck, is it possible to have multiple tribe nodes for scale up or 
>>> load balancing?
>>> 3. Is it possible to customize Kibana so that it can go to different 
>>> cluster to query data depends on the query?
>>>
>>> Thank you!
>>> Abigail
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/2d46f80b-8579-4f2b-86c0-5ad654a5bba3%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b69c667f-b1c6-46ce-8122-e809a22110c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Machine Learning / Decision Tree Learning with Elasticsearch

2015-03-12 Thread Michael Sander
Hi All,

Has anyone tried putting advanced decision tree analysis on top of 
Elasticsearch? 

I've seen others users Naive Bayes 

 
on top of Elasticsearch, which is really great, but I want to try to move 
past that with some of the more advanced techniques.  The standard I know 
that Solr has something called decision trees, but I don't think they are 
the same as what is normally referred to as decision trees in the machine 
learning literature (here's the wiki on decision trees 
).

Any guidance on how to implement advanced machine learning techniques on 
Elasticsearch would be helpful.

Thanks,
-Michael

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/64b4c9ca-09a4-4ea7-bd87-9d3cfdd0f2b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Machine Learning / Decision Tree Learning with Elasticsearch

2015-03-12 Thread vineeth mohan
Hello Michael ,

There is Hadoop integration with Elasticsearch.
With this integration , it can run against each feed in elasticsearch in a
highly optimized way.
This gives you opportunity to couple mahout library with Elasticsearch.

I would advice this approach.


Thanks
   Vineeth Mohan,
 Elasticsearch consultant,
 qbox.io ( Elasticsearch service provider )


On Thu, Mar 12, 2015 at 9:35 PM, Michael Sander 
wrote:

> Hi All,
>
> Has anyone tried putting advanced decision tree analysis on top of
> Elasticsearch?
>
> I've seen others users Naive Bayes
> 
> on top of Elasticsearch, which is really great, but I want to try to move
> past that with some of the more advanced techniques.  The standard I know
> that Solr has something called decision trees, but I don't think they are
> the same as what is normally referred to as decision trees in the machine
> learning literature (here's the wiki on decision trees
> ).
>
> Any guidance on how to implement advanced machine learning techniques on
> Elasticsearch would be helpful.
>
> Thanks,
> -Michael
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/64b4c9ca-09a4-4ea7-bd87-9d3cfdd0f2b0%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mFYLX%3Db1-rVb2J6-b_V7kjRjz%3Doy18G4HgB4HU46hJkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana connection with ElasticSearch

2015-03-12 Thread aaron
Kibana is tightly coupled with features that are available in 
ElasticSearch.  As those features change versions of Kibana change.  For 
instance the latest version of Kibana requires that you are using 1.4.4. 
 Unless more updates have changed that.

If you are running a version that predates .90.9 you are way out of date 
and need to focus on upgrading your cluster.  There are so many performance 
enhancements, security improvements and features that you are missing out 
on by running a version that is several years old.

On Wednesday, March 11, 2015 at 9:59:57 AM UTC-6, luiz felipe wrote:
>
>
> Upgrade Required Your version of Elasticsearch is too old. Kibana requires 
> Elasticsearch 0.90.9 or above
>  
>  how to solve ??
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d536ae2b-908d-4bf0-b72d-7bbef8355d7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregations across multiple indices

2015-03-12 Thread Christian Rohling
Hello Everyone,
I am attempting to use aggregations to count the number of documents
matching a given query across multiple indices. What I would like to do, is
make those counts on distinct keys. Say I had following document in 2
different indices, aliased together.
```
{
_index: myindex
_type: mytype
_id: 1
_version: 1
_score: 1
_source: {
country: MEXICO
}
}```

When I make an aggs term query on the field "country" I would like it to
only return a single count for the document with id=1(which exists in both
indices). The actual use case is a bit more complicated than what's
described above, this is just an example of the functionality that I am
looking for. I cannot find any info in the docs, and have asked in the IRC
channel to no avail.

-Christian Rohling

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALsYvrzV-PyUNUHcUHWNCDBQKz5jV9%3DTPoQ2hW1me8q%2BhBgKDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Kibana4 + Apache server ?

2015-03-12 Thread Guillaume RICHAUD
Hi guys, 

I'm trying to install the latest ELK stack including Kibana4.0.1 on a 
virtual machine with CentOS 7 minimal. 

My aim is to access kibana via an Apache server (httpd) from my computer 
(because the centOS mini hasn't any gnome installed so it's all in command 
lines).

I've got an issue with configuring both Kibana and Apache to run, it just 
doesn't work ! :D

Does one of you have ever tried to install the latest stack and use Kibana 
through Apache ? 

I am used to install ELK in local, but olders versions and I'm confident 
that they are well setup for a local application, what I need are the 
modifications to go Apache :)

Thanks in advance !
G.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0fd49f9d-bf5c-4148-b152-1d1347210c12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: char_filter for German

2015-03-12 Thread joergpra...@gmail.com
Yes, please upgrade Elasticsearch to use the official german normalizer.

I added it to decompound plugin for convenience, it may be removed at any
later time.

Jörg

On Wed, Mar 11, 2015 at 9:54 PM, Krešimir Slugan 
wrote:

> Thanks!
>
> I assume that "german_normalize" is also part of Decompounder Analysis
> Plugin ( https://github.com/jprante/elasticsearch-analysis-decompound )
> since that is the only analysis plugin we have installed?
>
> Btw. "german_normalization" doesn't seems to be available for our ES
> version (1.2), would you recommend upgrading instead of using
>  "german_normalize"?
>
> Best,
>
> Kresimir
>
> On Wednesday, March 11, 2015 at 5:31:40 PM UTC+1, Jörg Prante wrote:
>>
>> Use "german_normalization"
>>
>> "german_normalize" is the same filter I implemented in my plugin
>> https://github.com/jprante/elasticsearch-analysis-german/
>> blob/master/src/main/java/org/xbib/elasticsearch/index/analysis/german/
>> GermanAnalysisBinderProcessor.java when it was not available in ES core.
>>
>> Jörg
>>
>> On Wed, Mar 11, 2015 at 3:11 PM, Krešimir Slugan 
>> wrote:
>>
>>>
>>> Where is this "german_normalize" filter coming from? It solves my
>>> problem completely and magically but it's not documented anywhere (and
>>> seems like it's not part of ICU plugin either).
>>>
>>>
>>>
>>> What is also weird is that filter can not be used in global context,
>>> e.g. it's not possible to try something like this:
>>>
>>> curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=
>>> lowercase,german_normalize' -d 'this is a test'
>>>
>>> but it is possible to use it in index context:
>>>
>>> curl -XGET 'localhost:9200/test_index/_analyze?tokenizer=whitespace&
>>> filters=lowercase,german_normalize' -d 'this is a test'
>>>
>>>
>>> In first case I get "*ElasticsearchIllegalArgumentException[failed to
>>> find global token filter under [german_normalize]]*"
>>>
>>>
>>> On Sunday, November 30, 2014 at 5:20:16 PM UTC+1, Jörg Prante wrote:
>>>
 Do not use regex, this will give wrong results.

 Elasticsearch comes with full support for german umlaut handling.

 If you install ICU plugin, you can use something like this analysis
 setting

 {
 "index" : {
 "analysis" : {
 "filter" : {
 "german_normalize_stem" : {
   "type" : "snowball",
   "name" : "German2"
 }
 },
 "analyzer" : {
 "stemmed" : {
 "type" : "custom",
 "tokenizer" : "standard",
 "filter" : [
 "lowercase",
 "icu_normalizer",
 "icu_folding",
 "german_normalize_stem"
 ]
 },
 "unstemmed" : {
 "type" : "custom",
 "tokenizer" : "standard",
 "filter" : [
 "lowercase",
 "icu_normalizer",
 "icu_folding",
 "german_normalize"
 ]
 }
 }
 }
 }
 }

 ICU handles german umlauts, and also case folding like "ss" and "ß".

 Snowball handles umlaut expansions (ae, oe, ue) at the right places in
 words.

 You can choose between stemmed and unstemmed analysis. Snowball tends
 to overstem words. The "german_normalize" token filter is copied from
 Snowball but works without stemming.

 The effect of the combination is that all german words like Jörg,
  Joerg, Jorg are reduced to jorg in the index.

 Best,

 Jörg


 On Sun, Nov 30, 2014 at 11:37 AM, Krešimir Slugan >>> > wrote:

> Hi Jürgen,
>
> Currently we don't have big volumes of data to index so we would like
> to yield more results in hope that proper ones would still be shown in the
> top. In future, when we have more data, we'll have to sacrifice some use
> cases in order to provide more precise results for the rest of users.
>
> I think I will try regexp token approach to replace umlauts with "e"
> forms to solve this double expansion problem.
>
> Best,
>
> Krešimir
>
> On Saturday, November 29, 2014 11:23:47 PM UTC+1, Jürgen Wagner (DVT)
> wrote:
>>
>>  Hi Krešimir,
>>   the correct term is "über" (over, above) or "hören" (hear) or
>> "ändern" (change). When you cannot write umlauts, the correct alternative
>> spelling in print is "ueber", "hoeren", "aendern". Everybody can write 
>> this
>> in ASCII. However, those who are possibly non-speakers of German who 
>> still
>> want to search for German terms are usually not aware of thi

Re: Please help to understand these Exceptions

2015-03-12 Thread Chris Neal
Thank you Mark.

May I ask what about my answers caused you to say "definitely"? :)  I want
to better understand capacity related items for ES for sure.

Many thanks!
Chris

On Wed, Mar 11, 2015 at 2:13 PM, Mark Walkom  wrote:

> Then you're definitely going to be seeing node pressure. I'd add another
> one or two and see how things look after that.
>
> On 11 March 2015 at 07:21, Chris Neal  wrote:
>
>> Again Mark, thank you for your time :)
>>
>> 157 Indicies
>> 928 Shards
>> Daily indexing that adds 7 indexes per day
>> Each index has 3 shards and 1 replica
>> 2.27TB of data in the cluster
>> Index rate averages about 1500/sec
>> IOps on the servers is ~40
>>
>> Chris
>>
>> On Tue, Mar 10, 2015 at 7:57 PM, Mark Walkom 
>> wrote:
>>
>>> It looks like heap pressure.
>>> How many indices, how many shards, how much data do you have in the
>>> cluster?
>>>
>>> On 8 March 2015 at 19:24, Chris Neal  wrote:
>>>
 Thank you Mark for your reply.

 I do have Marvel running, on a separate cluster even, so I do have that
 data from the time of the problem.  I've attached 4 screenshots for
 reference.

 It appears that node 10.0.0.12 (the green line on the charts) had
 issues.  The heap usage drops from 80% to 0%.  I'm guessing that is some
 sort of crash, because the heap should not empty itself.  Also its load
 goes to 0.

 I also see a lot of Old GC duration on 10.0.0.45 (blue line).  Lots of
 excessive Old GC Counts, so it does appear that the problem was memory
 pressure on this node.  That's what I was thinking, but was hoping for
 validation on that.

 If it was, I'm hoping to get some suggestions on what to do about it.
 As I mentioned in the original post, I've tweaked I think needs tweaking
 based on the system, and it still happens.

 Maybe it's just that I'm pushing the cluster too much for the resources
 I'm giving it, and it "just won't work".

 The index rate was only about 2500/sec, and the search request rate had
 one small spike that went to 3.0.  But 3 searches in one timeslice is
 nothing.

 Thanks again for the help and reading all this stuff.  It is
 appreciated.  Hopefully I can get a solution to keep the cluster stable.

 Chris

 On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom 
 wrote:

> You really need some kind of monitoring, like Marvel, around this to
> give you an idea of what was happening prior to the OOM.
> Generally a node becoming unresponsive will be due to GC, so take a
> look at the timings there.
>
> On 5 March 2015 at 02:32, Chris Neal  wrote:
>
>> Hi all,
>>
>> I'm hoping someone can help me piece together the below log
>> entries/stack traces/Exceptions.  I have a 3 node cluster in Development 
>> in
>> EC2, and two of them had issues.  I'm running ES 1.4.4, 32GB RAM, 16GB
>> heaps, dedicated servers to ES.  My idex rate averages about 10k/sec.
>> There were no searches going on at the time of the incident.
>>
>> It appears to me that node 10.0.0.12 began timing out requests to
>> 10.0.45, indicating that 10.0.0.45 was having issues.
>> Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
>>  IndexWriter already closed", caused by an OOME.
>> Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an
>> OOME.
>> Then things are basically down and unresponsive.
>>
>> What is weird to me is that if 10.0.0.45 was the node having issues,
>> why did 10.0.0.12 log an exception 7 minutes before that?  Did both nodes
>> run out of memory?  Or is one of the Exceptions actually saying, "I see
>> that this other node hit an OOME, and I'm telling you about it."
>>
>> I have a few values tweaked in the elasticsearch.yml file to try and
>> keep this from happening (configured from Puppet):
>>
>> 'indices.breaker.fielddata.limit' => '20%',
>> 'indices.breaker.total.limit' => '25%',
>> 'indices.breaker.request.limit' => '10%',
>> 'index.merge.scheduler.type' => 'concurrent',
>> 'index.merge.scheduler.max_thread_count' => '1',
>> 'index.merge.policy.type' => 'tiered',
>> 'index.merge.policy.max_merged_segment' => '1gb',
>> 'index.merge.policy.segments_per_tier' => '4',
>> 'index.merge.policy.max_merge_at_once' => '4',
>> 'index.merge.policy.max_merge_at_once_explicit' => '4',
>> 'indices.memory.index_buffer_size' => '10%',
>> 'indices.store.throttle.type' => 'none',
>> 'index.translog.flush_threshold_size' => '1GB',
>>
>> I have done a fair bit of reading on this, and have tried about
>> everything I can think of. :(
>>
>> Can anyone tell me what caused this scenario, and what can be done to
>> avoid it?
>> Thank you so much for taking the time to read this.
>> Chris
>>
>> 

Re: Should clause behaves like a must clause in filtered query

2015-03-12 Thread parq
However, the following query returns the expected document,

curl -XGET "http://localhost:9200/test-cbx/bug/_search"; -d'
{
   "query": {
  "filtered": {
 "query": {
"bool": {
"must": [
   {
   "match": {
  "type": {
  "query": "some type"
  
  }
   }
   }
],
"should": [
   {
   "match": {
  "country": {
  "query": "de"
  
  }
   }
   }
]
}
 },
 "filter": {
  "term": {
 "type": "some type"
  }
 }
  }
   }
}'

May be it is like "should clause" does not work without a "Must clause" in 
query? 


On Thursday, March 12, 2015 at 2:54:00 PM UTC+1, parq wrote:
>
> Hello all,
>
> We have a single document in an index: 
>
> $  curl -XGET "http://localhost:9200/test-cbx/bug/_search?q=*";  gives us 
> the following response
>
> {"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"test-cbx","_type":"bug","_id":"1","_score":1.0,"_source":
> {
> "country": "lu",
> "type": “some type"
> }}]}}
>
> And the following two queries give no results, even though it’s a should 
> clause:
>
> $ curl -XGET "http://localhost:9200/test-cbx/bug/_search"; -d'
> {
>"query": {
>   "filtered": {
>  "query": {
> "match_all": {}
>  },
>  "filter": {
> "bool": {
>"should": {
>   "term": {
>  "country": "de"
>   }
>}
> }
>  }
>   }
>}
> }'
>
> $ curl -XGET "http://localhost:9200/test-cbx/bug/_search"; -d'
> {
>"query": {
>   "filtered": {
>  "query": {
> "bool": {
> "should": [
>{
>"match": {
>   "country": {
>   "query": "de"
>   
>   }
>}
>}
> ]
> }
>  },
>  "filter": {
>   "term": {
>  "type": “some type"
>   }
>  }
>   }
>}
> }'
>
> What is the preferred way to approach the bool query? Filter or the query?
>
>
> Regards,
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58b7f178-4c09-4de7-9d38-c7aa3bc39a05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I change _score based on string lenght ?

2015-03-12 Thread Arnaud Coutant
Any idea ?

Le lundi 9 mars 2015 22:33:50 UTC+1, Arnaud Coutant a écrit :
>
> Dear Members,
>
> When I get result of my multi match request based on two words I get this:
>
> Iphone 6C OR
> Iphone 6C ARGENT
>
> I would like that this result has the same score then order it by cheapest 
> price first (float value), is it possible ?
>
> Currently if "Iphone 6C OR" = 700 and "iphone 6C ARGENT" = 600, "Iphone 6C 
> OR" is first. It's not what I want.
>
> Thanks in advance for your help.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/224f67d6-3e36-4d3b-8a16-f13e5ac0cbb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How ELK stores data

2015-03-12 Thread Austin Harmon
Hello,

to add on to the searching historical data question, I know Elasticsearch 
using JSON to index documents but how do you get it to index the body of 
the document without copy and pasting the body into JSON. I assume there is 
a way to do this. I have used analyzers in my mapping but it didn't get the 
body of the document.

thanks,
Austin

On Monday, March 9, 2015 at 11:10:40 AM UTC-5, Magnus Bäck wrote:
>
> On Monday, March 09, 2015 at 16:34 CET, 
>  vikas gopal > wrote: 
>
> > I am totally new to this tool, so I have couple of basic queries 
> > 1) How ELK stores indexed data. Like traditional analytic tools 
> > stores data in flat files or in their own database . 
>
> Elasticsearch is based on Lucene and the data is stored in 
> whatever format Lucene uses. This isn't something you have 
> to care about. 
>
> > 2) How we can perform historical search 
>
> Using the regular query APIs. Sorry for such a general answer 
> but your question is very general. 
>
> > 3) How license is provided , I mean is it based on data 
> > indexed per day ? 
>
> It's free Apache-licensed software so you don't have to pay 
> anything. If you feel you need a support contract that's 
> being offered at a couple of different levels. I'm sure there 
> are third parties offering similar services. 
>
> http://www.elasticsearch.com/support/ 
>
> > 4) If I want to start do I need to download 3 tools 
> > (ElasticSearch,Logstash, Kibana) 
>
> If you want the whole stack from log collection to storage 
> to visualization then yes, you need all three. But apart 
> from a dependency from Kibana to Elasticsearch the tools 
> are independent. 
>
> I suggest you download them and try them out. That's the 
> quickest way to figure out whether the tool stack (or a subset 
> thereof) fits your needs. There are also a number of videos 
> available. 
>
> -- 
> Magnus Bäck| Software Engineer, Development Tools 
> magnu...@sonymobile.com  | Sony Mobile Communications 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1f07b87-b8d3-4401-8dae-431264352809%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


whitespace tokenizer not working as I'd expect

2015-03-12 Thread Craig Ching
Hi all,

I'm trying to break up some strings to use in a full text search leaving 
the original field intact.  I have created a "full_text" field that is 
populated from a "name" field using "copy_to" and an analyzer that looks 
like this:


"settings" : {
"analysis": {
"char_filter" : {
"full_text_mapping" : {
"type": "mapping",
"mappings" : [".=>%20", "_=>%20"]
}
},
"analyzer" : {
"full_text_analyzer" : {
"type" : "custom",
"char_filter" : "full_text_mapping",
"tokenizer" : "whitespace",
"filter" : ["lowercase"]
}
}
}
},



As you can see I'm trying to convert '.' and '_' to ' ' before the 
whitespace tokenizer kicks in.  It's my understanding that the char_filter 
will replace those characters with whitespace that the whitespace tokenizer 
would then tokenize and then all components could be searchable.  For 
instance, I would expect "GRIZZLY.BEAR" to be found using both "grizzly" 
and "bear".  But with the whitespace tokenizer I am not able to find the 
document with either term.  So what am I not understanding?  Full script 
showing what I'm doing:

#!/bin/sh

ES=localhost:9200

echo ">>> Deleting _all"
curl -XDELETE $ES/_all

echo ">>> Creating the index 'animals'"
curl -XPUT $ES/animals -d'
{
"settings" : {
"analysis": {
"char_filter" : {
"full_text_mapping" : {
"type": "mapping",
"mappings" : [".=>%20", "_=>%20"]
}
},
"analyzer" : {
"full_text_analyzer" : {
"type" : "custom",
"char_filter" : "full_text_mapping",
"tokenizer" : "whitespace",
"filter" : ["lowercase"]
}
}
}
},
"mappings" : {
"bear" : {
"properties" : {
"suggest" : {
"type" : "completion",
"analyzer" : "simple",
"payloads" : true
},
"full_text" : {
"type" : "string",
"analyzer" : "full_text_analyzer"
},
"name" : {
"type" : "string",
"index" : "not_analyzed",
"copy_to" : "full_text"
}
}
}
}
}' && echo

echo ">>> Indexing the GRIZZLY.BEAR document"
curl -XPOST $ES/animals/bear -d'
{
"name": "GRIZZLY.BEAR"
}
' && echo

curl -XPOST $ES/animals/_flush && echo

# Search for the document using the name
echo
echo ">>> Searching for name:GRIZZLY.BEAR"
echo
curl $ES/animals/bear/_search -d'
{
"query" : {
"match" : {
"name" : "GRIZZLY.BEAR"
}
}
}
' && echo

# Search for the document using a general term
echo
echo ">>> Searching for full_text:grizzly"
echo
curl $ES/animals/bear/_search -d'
{
"query" : {
"match" : {
"full_text" : "grizzly"
}
}
}
' && echo

# Search for the document using a general term
echo
echo ">>> Searching for full_text:bear"
echo
curl $ES/animals/bear/_search -d'
{
"query" : {
"match" : {
"full_text" : "bear"
}
}
}
' && echo

I appreciate any help with this!

Cheers,
Craig

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fa2347f-3019-4973-9d67-7f18b3dfee9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Query string order field

2015-03-12 Thread Dan


Hello,


Is it possible to do a query string search on multi fields, where I can 
determine the result order by field?

When a string is found in field-A it would be more important then an result in 
field-B


[query] => Array
(
[query_string] => Array
(
[default_operator] => AND
[query] => *string*
)

)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5ff5d0a9-7936-464c-a223-072ef6e5d8d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Should clause behaves like a must clause in filtered query

2015-03-12 Thread parq
Hello all,

We have a single document in an index: 

$  curl -XGET "http://localhost:9200/test-cbx/bug/_search?q=*";  gives us 
the following response
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"test-cbx","_type":"bug","_id":"1","_score":1.0,"_source":
{
"country": "lu",
"type": “some type"
}}]}}

And the following two queries give no results, even though it’s a should 
clause:

$ curl -XGET "http://localhost:9200/test-cbx/bug/_search"; -d'
{
   "query": {
  "filtered": {
 "query": {
"match_all": {}
 },
 "filter": {
"bool": {
   "should": {
  "term": {
 "country": "de"
  }
   }
}
 }
  }
   }
}'

$ curl -XGET "http://localhost:9200/test-cbx/bug/_search"; -d'
{
   "query": {
  "filtered": {
 "query": {
"bool": {
"should": [
   {
   "match": {
  "country": {
  "query": "de"
  
  }
   }
   }
]
}
 },
 "filter": {
  "term": {
 "type": “some type"
  }
 }
  }
   }
}'

What is the preferred way to approach the bool query? Filter or the query?


Regards,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/25811121-bbb5-44c2-9c07-835597331917%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it possible to write my own filter ?

2015-03-12 Thread David Pilato
I wonder if you could use a Pattern Tokenizer in that case???

http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 12 mars 2015 à 04:32, Ivan Brusic  a écrit :
> 
> Off the top of my head,  I cannot think of an existing filter that 
> accomplishes that task.
> 
> Creating a custom filter is easy. Simply creating a Lucene filter and create 
> a plug-in around it. Take a look at existing analysis plug-ins for 
> inspiration.
> 
> http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-plugins.html#analysis-plugins
> 
> Cheers,
> 
> Ivan
> 
>> On Mar 12, 2015 11:43 AM,  wrote:
>> Hi everyone,
>> 
>> 
>> I need a filter to split in two words a word containing a suffix that 
>> belongs to a list (Maybe a text file containing all the suffix) but I can't 
>> find an existing filter doing that.
>> 
>> 
>> Does anyone have a solution to this?
>> If not, is there a way to write my own filter in Java and add it to 
>> ElasticSearch ? : )
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/ef12f3ec-1210-4890-8f52-49cb5d7243d1%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB9QcgC-d%3DkE36U04k9_S1QrzdZbEj_%3Dk2UCtrOSz8b3A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5AF81E5A-912E-4CD0-9E06-C3730C62433E%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Is it possible to write my own filter ?

2015-03-12 Thread Ivan Brusic
Off the top of my head,  I cannot think of an existing filter that
accomplishes that task.

Creating a custom filter is easy. Simply creating a Lucene filter and
create a plug-in around it. Take a look at existing analysis plug-ins for
inspiration.

http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-plugins.html#analysis-plugins

Cheers,

Ivan
On Mar 12, 2015 11:43 AM,  wrote:

> Hi everyone,
>
>
> I need a filter to split in two words a word containing a suffix that
> belongs to a list (Maybe a text file containing all the suffix) but I can't
> find an existing filter doing that.
>
>
> Does anyone have a solution to this?
> If not, is there a way to write my own filter in Java and add it to
> ElasticSearch ? : )
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ef12f3ec-1210-4890-8f52-49cb5d7243d1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB9QcgC-d%3DkE36U04k9_S1QrzdZbEj_%3Dk2UCtrOSz8b3A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Kibana build

2015-03-12 Thread Mohit Garg
I tried to build kibana using the instructions 
from https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md.

At the last step: grunt dev, I get the following error:

Running "dev" task

Running "less:src" (less) task

FileError: 'lesshat.less' wasn't found in 
/opt/JS/kibana/kibana/src/kibana/components/agg_table/agg_table.less on 
line 1, column 1:
1 @import (reference) "lesshat.less";
2 @import (reference) "../../styles/theme/_variables.less";

Warning: Error compiling 
/opt/JS/kibana/kibana/src/kibana/components/agg_table/agg_table.less Use 
--force to continue.

Aborted due to warnings.


Any idea what went wrong?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/628c1c9d-fa75-4060-813f-56dff005a687%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sanitize a text for indexing

2015-03-12 Thread Bernhard Berger

On 12.03.15 10:03, Itamar Syn-Hershko wrote:
See 
http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-length-tokenfilter.html


Unfortunately the length token filter also doesn't filter out these 
immense terms.
See my example from https://gist.github.com/Hocdoc/68b5fcf8819a51816b53 
: I have created a length filter for terms greater than 5000 
(characters? bytes?) but still get the exception when using the 
icu_normalizer :


|IllegalArgumentException:  Document  contains at least one immense term
in field="message"  (whose UTF8 encoding is longer than the max length32766),|

( length of this message value is 3728 Bytes UTF8-encoded)



--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5501637E.2070400%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Is it possible to write my own filter ?

2015-03-12 Thread cornet . remi
Hi everyone,


I need a filter to split in two words a word containing a suffix that 
belongs to a list (Maybe a text file containing all the suffix) but I can't 
find an existing filter doing that.


Does anyone have a solution to this?
If not, is there a way to write my own filter in Java and add it to 
ElasticSearch ? : )

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef12f3ec-1210-4890-8f52-49cb5d7243d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Rebuilding an index with zero downtime using aliases

2015-03-12 Thread mzrth_7810
Hey everyone,

I have a question about rebuilding an index. After reading the 
elasticsearch guide and various topics here I've found that the best 
practice for rebuilding an index without any downtime is by using aliases. 
However, there are certain steps and processes around that, which I seek 
advice for. First I'm going to take you through an example scenario, and 
then I'll have some questions.

For example, you have "workshop_index_v1", with an alias "workshop". The 
"workshop_index_v1" has a type called "guitar" which has three properties 
with the following mapping:

"identifier" : "string"
"make" : "string"
"model" : "string"

Lets assume there is a lot of data in workshop_index_v1/guitar at the 
moment, which has been populated from a separate database.

Now, I need to modify the mapping, I would like get rid of the "identifier" 
property, so my mapping becomes:

"make" : "string"
"model" : "string"

As we all know elasticsearch does not allow you to remove a property in the 
mapping directly, you inevitably have to rebuild the index, which is fine 
in my case.

So now a few things came to mind when I though how to do this:

   - Create another index "workshop_index_v2", populate it with the data in 
   "workshop_index_v1" using scroll and scan with the bulk API and later 
   remove "workshop_index_v2" and add "workshop_index_v1" to the alias.
   - This will not work because the incorrect mapping(or a field value in 
  the incorrect mapping) is already present in  "workshop_index_v1", I do 
not 
  want to copy everything as is.
   - Create another index "workshop_index_v2", populate it with the data 
   from the original source
  - This works
   
One of the big issues here is, what happens to write requests while the new 
index is being rebuilt.

As you can only write to one index, which one do you write to, the old one 
or the new one, or both?

I feel, that writing to the new one, would work, but any advice regarding 
any of this would be greatly appreciated.

Best regards




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ebaef7c5-fa32-49a9-832e-cc9c6216ed8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sanitize a text for indexing

2015-03-12 Thread Itamar Syn-Hershko
See
http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-length-tokenfilter.html

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Mar 12, 2015 at 10:52 AM, Bernhard Berger <
bernhardberger3...@gmail.com> wrote:

> Hi,
>
> while indexing various comments from Facebook I sometimes get Exceptions:
>
> IllegalArgumentException: Document contains at least one immense term...
>
> Is it possible to sanitize a text for indexing in Elasticsearch so it doesn't 
> throw these Exceptions? Maybe there is a Filter to remove too-long Unicode 
> terms?
>
> For details about the failing documents, see my (unanswered) Stackoverflow 
> question: 
> http://stackoverflow.com/questions/28941570/remove-long-unicode-terms-from-string-in-java
> (I fear to break another Elasticsearch-based (Maillist) crawler, so I better 
> don't write the failing doc text here ;-) )
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/93a5ed0d-6486-48b4-a228-1aff47d14ce0%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtqBSYcM9oFRa%3DGsWeafzHsE%3DSVMSa6H9e1aVfDbS2q%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Sanitize a text for indexing

2015-03-12 Thread Bernhard Berger
Hi,

while indexing various comments from Facebook I sometimes get Exceptions: 

IllegalArgumentException: Document contains at least one immense term...

Is it possible to sanitize a text for indexing in Elasticsearch so it doesn't 
throw these Exceptions? Maybe there is a Filter to remove too-long Unicode 
terms?

For details about the failing documents, see my (unanswered) Stackoverflow 
question: 
http://stackoverflow.com/questions/28941570/remove-long-unicode-terms-from-string-in-java
(I fear to break another Elasticsearch-based (Maillist) crawler, so I better 
don't write the failing doc text here ;-) )

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/93a5ed0d-6486-48b4-a228-1aff47d14ce0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: unable to create Elasticsearch cluster using multiple physical server

2015-03-12 Thread David Pilato
If you meant "how to secure communication between nodes?", you could look at 
Shield project.



--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 12 mars 2015 à 00:42, Gurunath pai  a écrit :
> 
> Thanks David, yes both these systems are of production and transport layer 
> uses http connection. Now my question is, secure connection possible among 
> the physical systems, If yes then any reference doc will be helpfull. 
> 
> 
>> On Wednesday, 11 March 2015 16:14:35 UTC+5:30, Gurunath pai wrote:
>> HI All,
>> 
>> I am trying to setup elasticsearch cluster for production environment. 
>> creating cluster in local environment where i used single physical system 
>> has been succesfull. When i tried the same in 2 different physical server 
>> they are not forming cluster while using multicast feature of it. So i 
>> turned down to unicast where i mentioned all the instances(host + port) 
>> facing error of routeToNetwork in case transport port nos.
>> 
>> Has anyone tried setup of elasticsearch cluster on different physical 
>> machines. Need urgent help on this.
>> 
>> Thanks
>> Gurunath Pai.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/e7d973b2-ad2e-4416-a8b6-ed758009286e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/F2F72E37-776B-441E-A20D-3F77D3AEE304%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Need Urgent Hekp: new node not joining to existed cluster

2015-03-12 Thread phani . nadiminti
Hi Mark and mkBig,

 Thank you for your suggestions 

 *  i disabled multicast and enabled unicast properties and zend 
discovery.
 * And installed what ever the plugins I have previously in existed 
cluster those are installed in new node

it got worked.

Thanks 
phani

  

On Wednesday, March 11, 2015 at 12:05:05 AM UTC+5:30, mkBig wrote:
>
> try the following:
>
> 1. restart all servers simultaneously
> 2. and verify if you have plugins in existing cluster, that are installed 
> in new node as well
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/08360a19-4377-4d20-a262-a57a9c50bb3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana with Hadoop directly?

2015-03-12 Thread KRRK2015
Hello, has anyone tried to get Kibana work directly with Hadoop (without 
elasticsearch in the middle)? If yes, how? Any references would help. 
Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1682c52b-a1bf-401d-81cc-a1e6cb9a41cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: char_filter for German

2015-03-12 Thread Krešimir Slugan

Where is this "german_normalize" filter coming from? It solves my problem 
completely and magically but it's not documented anywhere (and seems like 
it's not part of ICU plugin either). 

 

What is also weird is that filter can not be used in global context, e.g. 
it's not possible to try something like this: 

curl -XGET 
'localhost:9200/_analyze?tokenizer=whitespace&filters=lowercase,german_normalize'
 
-d 'this is a test'

but it is possible to use it in index context:

curl -XGET 
'localhost:9200/test_index/_analyze?tokenizer=whitespace&filters=lowercase,german_normalize'
 
-d 'this is a test'


In first case I get "*ElasticsearchIllegalArgumentException[failed to find 
global token filter under [german_normalize]]*"


On Sunday, November 30, 2014 at 5:20:16 PM UTC+1, Jörg Prante wrote:
>
> Do not use regex, this will give wrong results.
>
> Elasticsearch comes with full support for german umlaut handling.
>
> If you install ICU plugin, you can use something like this analysis setting
>
> {
> "index" : {
> "analysis" : {
> "filter" : {
> "german_normalize_stem" : {
>   "type" : "snowball",
>   "name" : "German2"
> }
> },
> "analyzer" : {
> "stemmed" : {
> "type" : "custom",
> "tokenizer" : "standard",
> "filter" : [
> "lowercase",
> "icu_normalizer",
> "icu_folding",
> "german_normalize_stem"
> ]
> },
> "unstemmed" : {
> "type" : "custom",
> "tokenizer" : "standard",
> "filter" : [
> "lowercase",
> "icu_normalizer",
> "icu_folding",
> "german_normalize"
> ]
> }
> }
> }
> }
> }
>
> ICU handles german umlauts, and also case folding like "ss" and "ß".
>
> Snowball handles umlaut expansions (ae, oe, ue) at the right places in 
> words.
>
> You can choose between stemmed and unstemmed analysis. Snowball tends to 
> overstem words. The "german_normalize" token filter is copied from Snowball 
> but works without stemming.
>
> The effect of the combination is that all german words like Jörg,  Joerg, 
> Jorg are reduced to jorg in the index.
>
> Best,
>
> Jörg
>
>
> On Sun, Nov 30, 2014 at 11:37 AM, Krešimir Slugan  > wrote:
>
>> Hi Jürgen,
>>
>> Currently we don't have big volumes of data to index so we would like to 
>> yield more results in hope that proper ones would still be shown in the 
>> top. In future, when we have more data, we'll have to sacrifice some use 
>> cases in order to provide more precise results for the rest of users. 
>>
>> I think I will try regexp token approach to replace umlauts with "e" 
>> forms to solve this double expansion problem. 
>>
>> Best,
>>
>> Krešimir
>>
>> On Saturday, November 29, 2014 11:23:47 PM UTC+1, Jürgen Wagner (DVT) 
>> wrote:
>>>
>>>  Hi Krešimir,
>>>   the correct term is "über" (over, above) or "hören" (hear) or "ändern" 
>>> (change). When you cannot write umlauts, the correct alternative spelling 
>>> in print is "ueber", "hoeren", "aendern". Everybody can write this in 
>>> ASCII. However, those who are possibly non-speakers of German who still 
>>> want to search for German terms are usually not aware of this and believe 
>>> it's like with accents in French, where "á" is lexically treated like "a". 
>>> Those users are wrong in spelling "uber", "horen", "andern" because "u" and 
>>> "ü" are in fact different letters. It's like "ll" in Spanish. "ll" is ONE 
>>> letter :-)
>>>
>>> However, in order to provide a convenience to those users as well,  you 
>>> could decide that - to yield at least some meaningful results - you will 
>>> also consider the versions without the umlaut dots equivalent. In that 
>>> case, you want to map any token containing an umlaut (ä, ö, ü) to three 
>>> alternatives: umlaut, without umlaut marker, alternative spelling with 'e'. 
>>> This won't let you distinguish between the "Bar" (bar, the place to get a 
>>> drink) and "Bär" (bear, the one giving you a great, dangerous hug). 
>>> "Forderung" (demand) and "Förderung" (encouragement, facilitation, 
>>> promotion, extraction [geol.]) are also quite different, just to give a few 
>>> examples.
>>>
>>> For the proper recognition of those terms, you would normally use a 
>>> dictionary of German, including some frequent proper names as well. So, if 
>>> you look for "clown boll", you would not only get "Der Clown im Advent - 
>>> Evangelische Akademie Bad Boll", but also "Heinrich Böll, Ansichten eines 
>>> Clowns", because the query would be transformed into "clown AND (boll OR 
>>> boell OR böll)" as "boll" matches an umlaut candidat

Re: Elasticsearch 1.4.4-1 with Shield 1.0.1 on CentOS 6.6 - Authentication issue when running as service vs bin/elasticsearch.

2015-03-12 Thread Jason Nagashima
Hi fmarchand,

I ended up not pursuing Shield any further after finding out how much the
licensing would cost, but this might solve your issue:
http://stackoverflow.com/questions/28571868/elk-shield-auth-problems

Hope that helps!

Cheers,
Jason

On Wed, Mar 11, 2015 at 9:25 AM, fmarchand  wrote:

> Hi,
>
> I got the same problem. Did you figure out what was your configuration pb
> Jason ?
>
>
> On Monday, February 23, 2015 at 4:28:32 PM UTC+1, Jason wrote:
>>
>> I am able to run bin/elasticsearch and authenticate just fine but
>> whenever I run elasticsearch as a service, I consistently receive the
>> following error:
>>
>> {
>>   "error" : "AuthenticationException[unable to authenticate user
>> [rdeniro] for REST request [/idx1?pretty]]",
>>   "status" : 401
>> }
>>
>> I made sure that the $ES_HOME/config/shield/ dir and files were all owned
>> by ES user and tried to dig down to see how the two startup methods differ.
>> Ultimately, it came down to something in /etc/elasticsearch/elasticsearch.yml
>> by comparing the two runtime environments, but haven't been able to make
>> much headway from there.
>>
>> Anyone have any thoughts on this? Thanks in advance for any help.
>>
>> Cheers,
>> Jason
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/IWSPszJDgn8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a1ade7fb-78bf-46cf-906c-ed6c9c827872%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPkOeLynJ82xZUCOOKzU%3DBm%2B7OHvCX0em232RnT471iw04xswA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: filtered has_child query?

2015-03-12 Thread asanderson
Actually, I  do want only parent documents returned, but I want the filter to 
be applied to both parent and child documents. Is there a way to specify that 
the filter is to be applied before the query, so that this would be possible? 
If not, how would I rewrite the query to do this? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05a86e8c-9ef2-4028-b937-e6370202e677%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: weighted average scripted metric usage

2015-03-12 Thread bowlby
I have data that has a weighting field and I'd like to visualize the 
weighted average in Kibana4. Is it possible to visualize this query in 
Kibana4? 





On Friday, 9 January 2015 23:42:42 UTC+1, Kallin Nagelberg wrote:
>
> The current 1.4 docs mention that the scripted_metric aggregation is 
> experimental, and to share our usages. I've found a really great use on our 
> project so I thought I'd share!
>
> While the 'stats' metric provides great data like 'sum' and 'average', we 
> needed to calculate a weighted average. In this case, the weighting field 
> is 'principal_amount', and the field we'd like the metric on is 'rate'.   A 
> CURL for this agg is here:
>
> curl -XGET "http://localhost:9200/my_index/_search"; -d'
> {
>   "size": 0, 
>   "aggs": {
> "test_script": {
>   "scripted_metric": {
> "init_script": "_agg[\"numerator\"] = []; _agg[\"denominator\"] = 
> [];",
> "map_script": "_agg.numerator << (doc[\"principal_amount\"].value 
> * doc[\"rate\"].value); _agg.denominator << 
> doc[\"principal_amount\"].value",
> "combine_script" : "num_sum = 0; den_sum = 0; for (t in 
> _agg.numerator) { num_sum += t }; for (t in _agg.denominator) { den_sum += 
> t };return [num_sum: num_sum, den_sum: den_sum]",
> "reduce_script" : "num_sum = 0; den_sum=0; for (a in _aggs) { 
> num_sum += a.num_sum; den_sum += a.den_sum; }; return num_sum/den_sum"
>   }
> }
>   }
> }'
>
> For reference, a weighted average is defined here (it's pretty simple):
> http://www.investopedia.com/terms/w/weightedaverage.asp
>
> Without this great new aggregation type, I guess I'd have to index the 
> product of 'rate' and 'principal_amount' so that I could run a stats agg on 
> that. It would work, but not as clean.
>
> One thing I notice here is that there is quite a bit of redundancy between 
> the 'combine' and 'reduce' scripts. I haven't fully explored what the most 
> concise representation might look like, but it could be something to think 
> about as this aggregation develops.
>
> Thanks for including it!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5774a2de-1c4b-4e91-865b-b95522690db7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Resuming a river plugin after failure

2015-03-12 Thread David Pilato
I'm afraid you can not resume Wikipedia river but have to restart from the 
begining.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 10 mars 2015 à 13:07, reza sadoddin  a écrit :
> 
> I was using ElasricSearch river plugin for indexing wikipedia. However, I got 
> the following error message in the middle of indexing, and the process 
> stopped. Can I resume indexing from the point of failure?
> Thanks,
> 
> 
> 
> 
> ][ERROR][river.wikipedia  ] [Mayhem] [wikipedia][my_river] failed to 
> parse stream
> java.io.IOException: unexpected end of stream
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/12838b30-add3-40cf-80b0-61274edb8b28%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/B0E82CCC-807A-4037-A2A5-B5C23023C0E7%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch across multiple data center architecture design options

2015-03-12 Thread Alex
Hi all,

We are planning to use ELK for our log analysis. We have multiple data 
centers. Since it is not recommended to have across data center cluster, we 
are going to have one ES cluster per data center,  here are the three 
design options we have:

1. Use snapshot & restore to replicate data across clusters.
2. Use tribe node to achieve across cluster queries
3. Ship and index logs to each cluster

Here are our questions, and any comments will be appreciated:
1. How complex is snapshot & restore, anyone has experience on this purpose?
2. Would the performance of only one tribe node be a concern or bottleneck, 
is it possible to have multiple tribe nodes for scale up or load balancing?
3. Is it possible to customize Kibana so that it can go to different 
cluster to query data depends on the query?

Thank you!
Abigail

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d46f80b-8579-4f2b-86c0-5ad654a5bba3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: unable to create Elasticsearch cluster using multiple physical server

2015-03-12 Thread Gurunath pai
Thanks David, yes both these systems are of production and transport layer 
uses http connection. Now my question is, secure connection possible among 
the physical systems, If yes then any reference doc will be helpfull. 


On Wednesday, 11 March 2015 16:14:35 UTC+5:30, Gurunath pai wrote:
>
> HI All,
>
> I am trying to setup elasticsearch cluster for production environment. 
> creating cluster in local environment where i used single physical system 
> has been succesfull. When i tried the same in 2 different physical server 
> they are not forming cluster while using multicast feature of it. So i 
> turned down to unicast where i mentioned all the instances(host + port) 
> facing error of *routeToNetwork* in case transport port nos.
>
> Has anyone tried setup of elasticsearch cluster on different physical 
> machines. Need urgent help on this.
>
> Thanks
> Gurunath Pai.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e7d973b2-ad2e-4416-a8b6-ed758009286e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: unable to create Elasticsearch cluster using multiple physical server

2015-03-12 Thread Gurunath pai
Thanks David, yes both these systems are of production and transport layer 
uses http connection. Is secure connection possible among the physical, If 
yes then any reference doc will be helpfull. 


On Wednesday, 11 March 2015 16:14:35 UTC+5:30, Gurunath pai wrote:
>
> HI All,
>
> I am trying to setup elasticsearch cluster for production environment. 
> creating cluster in local environment where i used single physical system 
> has been succesfull. When i tried the same in 2 different physical server 
> they are not forming cluster while using multicast feature of it. So i 
> turned down to unicast where i mentioned all the instances(host + port) 
> facing error of *routeToNetwork* in case transport port nos.
>
> Has anyone tried setup of elasticsearch cluster on different physical 
> machines. Need urgent help on this.
>
> Thanks
> Gurunath Pai.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a9b83b10-f6c8-4fde-92fd-36088647fb63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


problem in using JDBC plugin for Elasticsearch

2015-03-12 Thread Ali Lotfdar
Hello All,

I am going to use this plugin for transferring some data from mysql to 
elasticsearch.
I followed all steps in 
"https://github.com/jprante/elasticsearch-river-jdbc"; but I encounter with 
error(the log is in below).
Plugin version: 4.0.10
ElasticSearch Version: 1.4.4


Thank to let me know what I have to to to solve this problem.


[2015-03-11 11:46:30,877][INFO ][river.jdbc.JDBCRiver ] started river 
instance for single run
[2015-03-11 11:47:01,118][ERROR][river.jdbc.BulkNodeClient] cluster state 
is RED and not YELLOW, cowardly refusing to continue with operations
java.io.IOException: cluster state is RED and not YELLOW, cowardly refusing 
to continue with operations
at 
org.xbib.elasticsearch.plugin.jdbc.client.ClientHelper.waitForCluster(ClientHelper.java:85)
at 
org.xbib.elasticsearch.plugin.jdbc.client.node.BulkNodeClient.waitForCluster(BulkNodeClient.java:411)
at 
org.xbib.elasticsearch.plugin.jdbc.client.node.BulkNodeClient.newClient(BulkNodeClient.java:205)
at 
org.xbib.elasticsearch.plugin.jdbc.river.JDBCRiver$1.create(JDBCRiver.java:237)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth.setIngestFactory(SimpleRiverMouth.java:88)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth.setIngestFactory(SimpleRiverMouth.java:45)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.createRiverMouth(SimpleRiverFlow.java:304)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.beforeFetch(SimpleRiverFlow.java:184)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.execute(SimpleRiverFlow.java:148)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.request(RiverPipeline.java:88)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(RiverPipeline.java:66)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(RiverPipeline.java:30)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 11:47:02,603][ERROR][river.jdbc.SimpleRiverFlow] client is 
closed
org.elasticsearch.ElasticsearchIllegalStateException: client is closed
at 
org.xbib.elasticsearch.plugin.jdbc.client.node.BulkNodeClient.flushIngest(BulkNodeClient.java:347)
at 
org.xbib.elasticsearch.plugin.jdbc.client.node.BulkNodeClient.flushIngest(BulkNodeClient.java:53)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth.flush(SimpleRiverMouth.java:284)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth.afterFetch(SimpleRiverMouth.java:120)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.afterFetch(SimpleRiverFlow.java:270)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.execute(SimpleRiverFlow.java:151)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.request(RiverPipeline.java:88)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(RiverPipeline.java:66)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(RiverPipeline.java:30)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2015-03-11 11:47:02,653][ERROR][river.jdbc.RiverPipeline ] 
org.elasticsearch.ElasticsearchIllegalStateException: client is closed
java.io.IOException: org.elasticsearch.ElasticsearchIllegalStateException: 
client is closed
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:353)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.fetch(SimpleRiverFlow.java:220)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow.execute(SimpleRiverFlow.java:149)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.request(RiverPipeline.java:88)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(RiverPipeline.java:66)
at 
org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(RiverPipeline.java:30)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalStateException: client is 
closed
at 
org.xbib.elasticsearch.plugin.jdbc.client.node.BulkNodeClient.bulkIndex(BulkNodeClient.java:269)
at 
org.xbib.elasticsearch.plugin.jdbc.client.node.BulkNodeClient.bulkIndex(BulkNodeClient.java:53)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth.index(SimpleRiverMouth.java:236)
at 
org.xbib.elasticsearch.plugin.jdbc.util.RiverMouthKeyValueStreamListener.end(RiverMouthKeyValueStreamListener.j

Re: Shield with Java Client

2015-03-12 Thread Jettro Coenradie
Can you try to switch off client.transport.sniff, this might trigger
another authority rule. I am not sure, but it is worth a try.

On Tue, Mar 10, 2015 at 5:30 AM, Zsolt Bákonyi  wrote:

> Dear Jettro.
>
> Can you help me, how could you do it?
> I try to comminicate to Elasticsearch with Shield plugin. This is done
> when I make CURL requests.
> Without shield plugin my JAVA code ( Client is same as yours )  works
> well. But after install Shield, and put Shield into maven depencies in my
> application:
>
> 
> org.elasticsearch
> elasticsearch-shield
> 1.0.1
> 
>
> I got strange error, without creating any change in my code:
>
> 12:53:23,746 INFO  [org.elasticsearch.plugins] (default task-1) [Honey
> Lemon] loaded [shield], sites []
> 12:53:23,987 INFO  [org.elasticsearch.transport] (default task-1) [Honey
> Lemon] Using
> [org.elasticsearch.shield.transport.ShieldClientTransportService] as
> transport service, overridden by [shield]
> 12:53:23,987 INFO  [org.elasticsearch.transport] (default task-1) [Honey
> Lemon] Using
> [org.elasticsearch.shield.transport.netty.ShieldNettyTransport] as
> transport, overridden by [shield]
> 12:53:24,232 ERROR [org.jboss.as.ejb3.invocation] (default task-1)
> JBAS014134: EJB Invocation failed on component ElasticSearch for method
> public org.elasticsearch.client.Client
> net.***.***.search.ElasticSearch.AuthElasticSearch(java.lang.String,java.lang.String):
> javax.ejb.EJBException: org.elasticsearch.common.inject.CreationException:
> Guice creation errors:
>
>
>
> 1) A binding to org.elasticsearch.shield.transport.filter.IPFilter was
> already configured at _unknown_.
>
>   at _unknown_
>
>
>
> 2) A binding to org.elasticsearch.shield.transport.ClientTransportFilter
> was already configured at _unknown_.
>
>   at _unknown_
>
>
>
> 3) A binding to org.elasticsearch.shield.ssl.SSLService was already
> configured at _unknown_.
>
>   at _unknown_
>
>
>
> 3 errors
>
>
>
> My code:
>
> @SuppressWarnings("resource")
> public Client AuthElasticSearch(String user, String pass) {
>
> Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name",
> "").put("client.transport.sniff", true)
> .put("shield.user", user + ":" + pass).build();
> Client client = new TransportClient(settings).addTransportAddress(new
> InetSocketTransportAddress("localhost", 9300));
>
> return client;
>
> }
>
> Both ES versions in Ubuntu and app are 1.4.4
> Both SHIELD versions are 1.0.1
>
> java version "1.8.0_40"
> Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
> Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
>
> Thank you.
>
> 2015. január 29., csütörtök 0:46:53 UTC+1 időpontban Jettro Coenradie a
> következőt írta:
>
>> Never mind, misread the documentation. It seems a node info request  is
>> done first, therefore you need to provide the username password in the
>> client. Than if you want to, you can change the username password for each
>> request that you do.
>>
>> It works now.
>>
>> Op woensdag 28 januari 2015 22:12:31 UTC+1 schreef Jettro Coenradie:
>>>
>>> Hi,
>>> trying to get Shield working with a java client. When setting the header
>>> token on the client, there is no problem. But when I try to use the header
>>> of a request there is no succes. I am trying this code, which is almost a
>>> copy of the sample code in the documentation. It does not work, if I
>>> uncomment the line with shield.user, it does work: Any clues on what I
>>> should do are appreciated.
>>>
>>> package nl.gridshore.dwes.elastic;
>>>
>>> import org.elasticsearch.action.count.CountResponse;
>>> import org.elasticsearch.action.search.SearchResponse;
>>> import org.elasticsearch.client.Client;
>>> import org.elasticsearch.client.transport.TransportClient;
>>> import org.elasticsearch.common.settings.ImmutableSettings;
>>> import org.elasticsearch.common.settings.Settings;
>>> import org.elasticsearch.common.transport.InetSocketTransportAddress;
>>> import org.elasticsearch.common.transport.TransportAddress;
>>> import org.elasticsearch.shield.authc.support.SecuredString;
>>>
>>> import java.util.ArrayList;
>>> import java.util.List;
>>>
>>> import static java.util.stream.Collectors.toList;
>>> import static org.elasticsearch.shield.authc.support.
>>> UsernamePasswordToken.basicAuthHeaderValue;
>>>
>>> public class SecureElastic {
>>> public static void main(String[] args) {
>>> Settings settings = ImmutableSettings.settingsBuilder()
>>> .put("cluster.name", "jc-play")
>>> //.put("shield.user", "jettro:nopiforme")
>>> .build();
>>>
>>> Client client = new TransportClient(settings)
>>> .addTransportAddress(new InetSocketTransportAddress("
>>> localhost",9300));
>>>
>>> String token = basicAuthHeaderValue("jettro", new
>>> SecuredString("nopiforme".toCharArray()));
>>>
>>> SearchResponse searchResponse = client.prepareSearch("
>>> gridshore").putHeader("Authorization", token).get();
>>